Seamless and safe entry to information has turn into one of many greatest challenges dealing with organizations. Nowhere is that this extra evident than in technology-led exterior audits, the place analyzing 100% of transactional information is quick turning into the gold customary. These audits contain reviewing tens of billions of strains of economic and operational billing information.
To ship significant insights at scale, evaluation should not solely be strong but additionally environment friendly — balancing price, time, and high quality to realize one of the best outcomes in tight timeframes.
Not too long ago in collaboration with a significant UK vitality provider, KPMG leveraged Delta Sharing in Databricks to beat efficiency bottlenecks, enhance effectivity, and improve audit high quality. This weblog discusses our expertise, the important thing advantages, and the measurable impression on our audit course of from utilizing Delta Sharing.
The Enterprise Problem
To satisfy public monetary reporting deadlines, we would have liked to entry and analyze tens of billions of strains of the audited entity’s billing information inside a brief audit window.
Traditionally, we relied on the audited entity’s analytics atmosphere hosted in AWS PostgreSQL. As information volumes grew, the setup confirmed its limits:
- Knowledge Quantity: Our method required wanting past the audit interval to investigate historic information that was important for the routine. As this dataset has considerably grown yr on yr, it will definitely exceeded AWS PostgreSQL limits. This compelled us to separate the information throughout two separate databases, introducing further operational overhead and price.
- Knowledge Switch: Transferring and copying information from a manufacturing atmosphere to a ‘ring-fenced’ analytics PostgreSQL database prompted a delayed begin and an absence of freshness and agility.
- Question Efficiency Degradation: Whereas PostgreSQL does help parallelism, it doesn’t leverage a number of CPU cores when executing a single question, resulting in suboptimal efficiency.
- Resourcing: As a result of entry to the entity’s analytics atmosphere was restricted to their belongings, we confronted challenges in making one of the best use of our folks and rapidly onboarding new staff members.
Given these constraints, we would have liked a scalable, high-performance answer that will enable environment friendly entry to and processing of information with out compromising safety or governance, enabling decreased ‘machine time’ for faster outcomes.
Why Delta Sharing?
Delta Sharing, an open data-sharing protocol, offered the perfect answer by enabling safe and environment friendly cross-platform information trade between KPMG and the audited entity with out duplication.
In comparison with extending PostgreSQL, Databricks supplied a number of distinct benefits:
- Handles Giant Datasets: Delta Sharing is designed to deal with petabyte-scale information, eliminating PostgreSQL’s efficiency limitations.
- Decrease prices: Delta Sharing lowered storage and compute prices by lowering the necessity for large-scale information replication and transfers.
- Flexibility: Shared information might be accessed in Databricks utilizing all of PySpark, SQL, and BI instruments like Energy BI, facilitating seamless integration into our audit deliverables.
- Delta Tables: We might “time journey” to previous states of information. This was priceless for checking historic factors that had been beforehand misplaced within the consumer’s information mannequin.
Implementation Method
We launched Delta Sharing in a means that didn’t disrupt ongoing audit work:
- Knowledge Sharing: We gave the entity an inventory (in JSON format) of the tables and views we would have liked. They used Lakeflow Jobs and Delta Sharing to make these accessible to us instantly in our Databricks atmosphere. The audited entity offered entry by sharing a key, granting us permission to safe these pre-agreed datasets with minimal effort between AWS and Azure. Delta Sharing dealt with this cross-cloud trade securely, with out copying or shifting the information between platforms.
- Integration with Unity Catalog: Unity Catalog gave us a single place to handle permissions, apply governance insurance policies, and preserve full visibility of who accessed what information.
- Scheduled Knowledge Refreshes: Throughout key audit cycles, information was refreshed to align with monetary reporting timelines.
- Efficiency Optimization: As soon as inside Databricks, we reworked queries from PostgreSQL to Spark SQL and PySpark. With Delta Sharing offering ruled, ready-to-use information, we targeted on optimizing efficiency somewhat than managing information motion.

Measurable Influence
We used Delta Sharing to entry and analyze billions of meter readings throughout hundreds of thousands of their buyer accounts., We noticed vital enhancements throughout a number of KPIs:
- Quicker queries: Delta Sharing allowed us to make use of extra computing energy for giant information duties. A few of our most complicated queries completed over 80% sooner—for instance, going from 14.5 hours to 2.5 hours—in comparison with our previous PostgreSQL course of.
- Improved Audit High quality: By spending much less time ready for machines, we had extra time to deal with exceptions, uncommon patterns and complicated edge circumstances. This improved our information analytics outcomes by 15 proportion factors in some cases and decreased the burden of any residual sampling.
- Value Financial savings: By utilizing Delta Sharing, we averted making further copies of the information. This meant we solely saved and processed what was wanted, which introduced down each storage and compute prices.
- Faster entry: Because the information was provisioned by means of Delta Sharing, there was much less time wasted ready for it to be prepared, permitting us to start out work sooner.
- Simpler Crew Onboarding: Seamless on-boarding new staff members and broader mixture of coding abilities – SQL and PySpark.
Utilizing Delta Sharing has made a noticeable distinction to our audit course of. We will securely entry information throughout cloud platforms-without delays or guide information movement-so our groups at all times work from the most recent, single supply of reality. This cross-cloud functionality means sooner audits, extra dependable outcomes for the audited shoppers we work with, and tight management over information entry at each step. — Anna Barrell, Audit companion, KPMG UK
Technical Concerns
A few technical concerns of working with Databricks that ought to be thought-about:
• Delta Sharing: As early adopters, some options weren’t but accessible (for instance, sharing materialized views) although we’re excited that these at the moment are refined with the GA launch and we’ll be enhancing our delta sharing options with this performance.
• Lakeflow Jobs: At present, there is no such thing as a mechanism to substantiate whether or not an upstream job for a Delta Shared desk has been accomplished. One script was executed earlier than completion and led to an incomplete output, although this was rapidly recognized by means of our completeness and accuracy procedures.
Trying to the Future
Delta Sharing has confirmed to be a game-changer for audit information analytics, enabling environment friendly, scalable, and safe collaboration. Our profitable implementation with the vitality provider demonstrates the worth of Delta Sharing for shoppers with numerous information sources throughout cloud and platform.
We acknowledge that many organizations retailer a good portion of their monetary information in SAP. This presents an extra alternative to use the identical rules of effectivity and high quality at an excellent better scale.
Via Databricks’ strategic partnership with SAP, introduced in February of this yr, we are able to now entry SAP information by way of Delta Sharing. This joint answer, which has turn into certainly one of SAP’s fastest-selling merchandise in a decade, permits us to faucet into this information whereas preserving its context and syntax. By doing so, we are able to guarantee the information stays totally ruled beneath Unity Catalog and its complete price of possession is optimized. Because the entities we audit progress on their transformation journey, we at KPMG wish to construct on this traction, anticipating the extra advantages it can carry to a streamlined audit course of.
