Nahla Davies examines what constitutes an appropriate data integrity framework, and how inadequate frameworks damage data quality.
If you asked most companies whether they have a data integrity framework, they’d say yes without hesitation. They’d point you to a shared drive, maybe a Confluence page, possibly a colour-coded spreadsheet with tabs labelled ‘Validation Rules’ and ‘Ownership Matrix’. It looks official. It’s got a logo on it. Someone even added conditional formatting.
But here’s the thing: looking like a framework and actually functioning as one are two wildly different realities. Across industries, organisations are confusing documentation with governance, and the gap between those two things is where data quality quietly falls apart. The problem isn’t that teams don’t care. It’s that they’ve convinced themselves the spreadsheet is enough.
The spreadsheet trap is more common than anyone admits
There’s a pattern that plays out in nearly every mid-size org that’s undergone some kind of digital transformation push in the last five years. Someone in data engineering or analytics gets tasked with ‘building a data integrity framework’. They do their research, pull together some best practices, and create a document. Maybe it lives in Google Sheets, maybe it’s a Notion database, maybe it’s an actual PDF that got emailed around once and then forgotten about. Whatever form it takes, it checks a box. Leadership sees it and feels reassured.
The trouble starts when that document has to survive contact with reality. Data pipelines change. New sources get added. Team members rotate. And that spreadsheet? It doesn’t update itself. It doesn’t send alerts when a schema shifts or when a critical field starts returning nulls at twice the usual rate. It just sits there, frozen in the moment it was created, slowly becoming a historical artifact rather than an operational tool.
What’s worse is that people keep referencing it as though it’s still accurate. Decisions get made based on validation rules that haven’t been reviewed in months. Ownership columns list people who’ve left the company. It’s the organisational equivalent of navigating with a map from 2019 and wondering why you keep hitting dead ends.
And it’s not a niche problem. A 2023 Gartner survey found that poor data quality costs organisations an average of $12.9m per year. That number doesn’t come from dramatic, headline-grabbing breaches. It comes from the slow, invisible accumulation of bad records, missed anomalies, and unchecked assumptions that a static document simply can’t catch.
What a real framework actually looks like
So what separates a functioning data integrity framework from a well-formatted spreadsheet? It comes down to whether the thing can operate without someone manually babysitting it. A real framework is embedded in your infrastructure. It’s automated, observable and responsive.
That means validation checks run as part of your data pipelines, not as a quarterly audit someone remembers to do in the last week of the quarter. It means the data is correctly annotated and that there’s monitoring in place that flags anomalies in real time, whether that’s a sudden spike in null values or a mismatch between source and destination row counts. Tools like Great Expectations, Monte Carlo and dbt tests exist specifically to bring this kind of rigor into the workflow.
It also means ownership is enforced through tooling, not just documented in a tab. When a data asset has a registered owner in a data catalogue, and that catalogue integrates with your alerting system, accountability becomes structural. It stops being something you have to chase people about in Slack.
There’s a cultural component here, too. Organisations with mature data integrity practices treat data quality as a product concern and are better prepared to establish proper AI governance. Product managers care about it. Analysts flag issues proactively instead of working around them. Engineers write tests for data the same way they write tests for code. That kind of culture doesn’t emerge from a spreadsheet. It emerges from leadership, making it clear that data integrity is a priority, not a side project someone handles when things are slow.
The companies getting this right tend to share a few traits. They’ve invested in observability across their data stack. They treat schema changes as events that require review, not things that just happen silently. And they’ve moved past the idea that documentation alone equals governance.
Why it matters more now than it did five years ago
The stakes around data integrity have shifted significantly. Five years ago, a bad record in a reporting dashboard was annoying but manageable. Today, that same bad record might be feeding a machine learning model that’s making automated decisions about credit, hiring or patient care. The blast radius of poor data quality has expanded because the systems consuming that data have become more autonomous and more consequential.
Regulatory pressure is also mounting. Frameworks like the EU’s AI Act and evolving data privacy regulations are putting more scrutiny on how organisations manage the data that powers their products. It’s getting harder to shrug off data quality issues as ‘technical debt we’ll get to eventually’. Regulators want to see evidence of governance, and a spreadsheet with last year’s date on it won’t cut it.
There’s also the competitive angle. Companies that can trust their data move faster. They make decisions with more confidence. They spend less time reconciling conflicting reports and more time actually acting on insights. Data integrity isn’t glamorous, but it’s one of those foundational things that quietly determines whether an organisation can execute on its strategy or just talk about it.
Final thoughts
The uncomfortable truth is that most data integrity frameworks weren’t built to be frameworks at all. They were built to satisfy a request, to check a compliance box, or to give someone something to present in a meeting.
And that’s fine as a starting point. Every mature system started somewhere. But if your ‘framework’ is still a spreadsheet that no one’s touched in six months, it’s time to be honest about what you actually have.
Real integrity requires automation, observability and cultural buy-in. The spreadsheet was never the destination. Treat it as the rough draft it always was, and start building something that can actually keep up with your data.
By Nahla Davies
Nahla Davies is a software developer and tech writer. Before devoting her work full time to technical writing, she managed – among other intriguing things – to serve as a lead programmer at an Inc. 5,000 experiential branding organisation, where clients include Samsung, Time Warner, Netflix and Sony.
Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.





















You must be logged in to post a comment Login