← All articles
Data FoundationFebruary 5, 20267 min read

The Modern Data Stack Is Not a Stack. It Is a Foundation.

By Wesley Nitikromo

Gartner predicts that by 2027, companies that do not prioritize AI-ready data will suffer a 15% productivity loss compared to those that do. IDC projects that 70% of G2000 CEOs will redirect AI ROI toward growth in 2026. And yet, when I walk into boardrooms across Amsterdam, London, and Frankfurt, I find the same scene playing out: the executive team has approved a seven-figure AI budget, the data team is running Snowflake and dbt, and nobody can tell me what "revenue" means without three people disagreeing.

The modern data stack was supposed to solve this. Cloud-native tools. Modular architecture. Best-of-breed components that snap together like LEGO. And for many organizations, the promise delivered on the infrastructure side. Data moves faster. Storage scales on demand. Transformation pipelines run in minutes instead of hours.

But infrastructure is not a foundation. A foundation is infrastructure that your entire organization trusts. That distinction is where most modern data stack investments fall short, and it is the single largest obstacle between your AI ambitions and measurable business outcomes.

What a Modern Data Stack Actually Looks Like in 2026

The core components are well-established by now. Data ingestion through tools like Fivetran or Airbyte. A cloud data warehouse or lakehouse, typically Snowflake, Databricks, or BigQuery. A transformation layer, almost always dbt. An orchestration layer using Airflow, Dagster, or Prefect. A BI layer for visualization. And increasingly, a semantic layer and governance framework that ties everything together.

What changed between 2022 and 2026 is not the components. It is what those components need to serve. In 2022, the modern data stack served analysts building dashboards. In 2026, it needs to serve AI agents making autonomous decisions. The bar for data quality, consistency, and governance went from "good enough for a quarterly review" to "trusted enough for a machine to act on without human oversight."

That shift in the consumer of your data is the most consequential change in enterprise data architecture this decade. When the consumer was a human analyst, they could spot a suspicious number, apply context from tribal knowledge, and flag an issue before it reached a decision maker. When the consumer is an AI agent, it will execute on whatever data it receives with absolute confidence, regardless of whether that data is correct.

The Three Tests Every Data Foundation Must Pass

I use three tests when I assess whether an organization has a data foundation or merely a data stack. Every C-level leader should be able to answer these before approving any AI initiative.

Test one: can three different people run the same query and get the same answer? This sounds trivial. It is not. In most organizations, the finance team, the marketing team, and the product team each have their own definition of core business metrics. Revenue, active customers, churn, cost of acquisition. If your CFO and your VP of Marketing produce different revenue numbers from the same data warehouse, you do not have a foundation. You have a collection of pipes that deliver inconsistent water.

Test two: can a new hire query the data on day one without asking someone where things live? This tests whether your data infrastructure depends on tribal knowledge. If one senior engineer knows which pipeline breaks every Tuesday, which field in Salesforce was mislabeled three years ago, and why the CRM and the revenue dashboard never match, your foundation has a single point of failure with a name and a LinkedIn profile. When that person leaves, and they will, every system built on top of their knowledge becomes unreliable.

Test three: can you trace any number on any dashboard back to its raw source in under sixty seconds? This tests lineage and governance. If you cannot explain how a number was calculated, which data sources contributed to it, and when it was last refreshed, you cannot trust it. More importantly, neither can an AI agent. And neither can your auditor, your regulator, or your board.

Why Most AI Projects Fail at Layer One

The pattern is always the same. The executive team reads about AI agents. The board asks about automation. A consulting firm delivers a strategy deck. The AI team starts building on top of whatever data infrastructure exists. Six months later, the agents are returning confidently wrong answers, the data team is firefighting quality issues, and the project stalls.

This is not an AI problem. It is a data foundation problem. The AI did not hallucinate because the model was bad. It hallucinated because the data underneath it was ungoverned, inconsistent, or incomplete.

Stanford research published in early 2026 shows that AI has already cut entry-level developer hiring by 20% and call center jobs by 15%. Companies are shrinking teams in the name of AI efficiency. But the people being cut are often the ones who knew where the data lived. They knew which pipeline was fragile. They knew why two systems never reconciled. That knowledge lived in their heads and never made it into documentation.

Smaller teams mean less institutional data knowledge. Less knowledge means worse governance. Worse governance means AI operating on data nobody can verify. This is a compounding risk that most AI strategies ignore entirely, and it starts at the data foundation.

Building the Foundation: A Practical Framework for Executives

The Intelligence Allocation Stack is a four-layer framework that I developed over a decade of building data infrastructure across fintech, e-commerce, sustainability, and SaaS. The principle is simple: for every dollar companies spend on AI, they should be spending six on the data architecture underneath it. Almost none of them do.

Layer one is the data foundation. This is where data enters the organization, gets stored, and becomes governable. Ingestion pipelines, warehousing, data quality checks, and the basic infrastructure that ensures data is clean, consistent, and available.

Layer two is the semantic layer. This is where business logic gets translated into definitions that every downstream tool and AI agent can rely on. Revenue means one thing, everywhere, for everyone.

Layer three is the orchestration layer. This is where data gets connected, transformed, and routed. CRM syncs, reverse ETL, workflow automation, API integrations, and the coordination of data movement across systems.

Layer four is the AI layer. Models, agents, conversational AI. The most visible layer, the one investors get excited about, and the one that is entirely dependent on the three below it.

The order is not negotiable. Every company that fails with AI starts at layer four and works down. Every company that succeeds builds from the bottom up.

What to Invest in Before You Invest in AI

If I could sit in your next board meeting, here is what I would recommend before a single euro goes to AI tooling.

First, consolidate your data into one governed platform. Pick Snowflake, Databricks, or BigQuery. The choice matters less than having one source of truth. If your data is scattered across SaaS tools, spreadsheets, and departmental databases, that fragmentation is the first thing to fix. ELT architecture using dbt for transformation is the standard approach, and it works because it keeps all logic version-controlled and auditable.

Second, implement automated data quality checks. Every pipeline should validate data on ingestion. Schema changes, null values, volume anomalies, freshness checks. Tools like dbt tests, Great Expectations, or Monte Carlo handle this. The goal is to catch data issues before they propagate downstream, not after a stakeholder flags a suspicious number.

Third, define your metrics once. Deploy a semantic layer that establishes a single governed vocabulary for your business metrics. Whether you choose the dbt Semantic Layer, Cube, AtScale, or your warehouse's native semantic capabilities, this is the investment with the highest leverage. It eliminates the "whose numbers are right" meetings. It gives AI agents a business dictionary instead of raw tables. And it scales your data governance without scaling your data team.

Fourth, document everything that currently lives in people's heads. Run a knowledge audit. Identify every process, workaround, and business rule that depends on one person's memory. Encode it into your data infrastructure. Automated governance. Semantic definitions. Pipeline documentation. The companies that thrive with smaller teams will be the ones where tacit knowledge becomes irrelevant because the system documents itself.

The ROI of Getting the Foundation Right

Companies with mature data governance see 24% higher revenue from AI according to IDC. Organizations that implement semantic layers report that LLM accuracy on business questions improves from roughly 40% to over 83%. Major retailers deploying governed semantic models report 80% of queries completing in under one second. These are not theoretical projections. These are measured outcomes from organizations that invested in the foundation before they invested in the intelligence.

The modern data stack is not dead. It is necessary but insufficient. The tools are solved. What is not solved is the organizational discipline to turn a collection of cloud-native tools into a foundation that your entire enterprise, every analyst, every executive, and every AI agent, can trust.

Systems beat individuals at scale. The right architecture beats the smartest model. And the companies that understand that AI is an allocation problem, not a technology problem, will be the ones still standing when the hype cycle ends and the real work begins.

Wesley Nitikromo

Founder of Unwind Data. Previously co-founded DataBright (acquired 2023). Data architect, analytics engineering specialist, and builder of AI-ready data infrastructure. Based in Amsterdam.