BearingPoint’s Shruti Goyal talks about zero-copy architecture and why it’s ultimately a game-changer for data teams.
The world of data architecture, according to Shruti Goyal, has been defined by one process for the last decade: extract, transform and load (ETL).
ETL is a three-phase computing process where data is extracted from transactional systems or real-time source systems, transformed (meaning cleaned, enriched and standardised) into an analytical format, and loaded (or stored) into a data hub or warehouse for reporting and analytics.
“In practice, this meant building complex pipelines using tools like SQL Server Integration Services (SSIS), Azure Data Factory (ADF) and Microsoft Data Pipelines,” explains Goyal, who is manager of data analytics and AI at BearingPoint.
“ETL ensures data is reliable, consistent, and ready for analysis and decision-making.”
However, Goyal believes that after a decade of data dominance, ETL may be on its way out due to the rise of zero-copy architecture – an approach “where data is used where it already lives, without physically copying it into downstream systems”.
“Data is no longer physically moved – instead, access to it is,” she says.
What is zero-copy?
As Goyal explains to SiliconRepublic.com, zero-copy architecture allows users to query, share and access data directly at the source, as opposed to ETL’s transitory process.
Zero-copy enables this by using metadata, permissions and query pushdown “without duplicating the underlying data”.
Goyal says the catalyst for this change is analytics platform Microsoft Fabric, specifically its OneLake storage platform.
“Fabric introduces a unified logical data core that renders traditional data duplication obsolete,” she explains. “The two critical mechanisms are Mirroring, which keeps source systems reflected in near real-time, and Shortcuts, which allow entire multiterabyte databases to be surfaced into an analytics environment in seconds without any physical copying.
“While ADF remains relevant for complex orchestration scenarios, it is no longer the backbone of data movement – OneLake is.”
‘Long-overdue liberation’
Significant changes in any industry can be met with either joy or disdain depending on the circumstances, but Goyal says that for data teams, the so-called ‘death of ETL’ has been described as nothing short of “a long-overdue liberation”.
“Years spent tuning SSIS packages and mapping ADF data flows are giving way to managing metadata and governance policies instead,” she says. “The burden shifts from responding to pipeline failures to maintaining stable, governed shortcuts.
“The skillset evolves accordingly – the focus moves from pipeline engineering toward data governance, metadata management and strategic architecture, representing a significant elevation of the data management role.”
But why specifically is zero-copy being embraced over ETL?
For starters, Goyal says zero-copy is replacing ETL because it is faster, cheaper and “fundamentally more reliable”.
“Zero‑copy architectures replace ETL by letting analytics and AI access live data at its source – eliminating duplication, latency and governance complexity while reducing cost.
“In short, ETL is costly, slow and brittle; zero-copy is lean, live and self-governing.”
Why it’s significant
Goyal believes the transition from ETL is significant because it “represents a fundamental architectural shift”, allowing teams to manage metadata and governance instead of fragmented data copies and “fragile pipelines”.
“The move is from a reactive, maintenance-heavy model – characterised by late-night pipeline failure alerts – to a live feed of the business.
“Over time, this means organisations can make decisions on current data rather than yesterday’s batch, reduce infrastructure overhead significantly and redirect skilled data teams away from operational firefighting toward strategic work.”
Goyal adds that from a data strategy standpoint, zero-copy “changes what is fundamentally possible”.
“When the analytics layer reflects the business in near real-time rather than hours after the fact, decisions can be made on current ground truth,” she says. “The elimination of redundant storage means strategies can scale without proportional cost increases.
“Built-in governance and metadata persistence also mean organisations can trust their data more deeply – enabling AI workloads, reporting and operational systems to coexist confidently on a single, well-governed data estate.”
Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.
You must be logged in to post a comment Login