Your AI tools are only as smart as your data foundation

Conversational BI is here, but every AI initiative we’ve seen succeed had one thing in common: a clean, governed semantic layer underneath.

Every quarter, a new AI-on-your-data product launches with a viral demo. Every quarter, we get the same call from a mid-market data leader: leadership saw the demo and wants to know why our version is producing wrong answers. The pattern is so consistent it’s almost a law.

The story is rarely about the model. It is about what the model is reading. The companies getting durable value from AI-on-their-data did the unglamorous semantic-layer work first. The ones in damage control skipped that step, plugged a chat interface into a chaotic warehouse, and then asked us to make the answers correct.

The demo is honest. The deployment isn’t.

Vendor demos run on clean, modeled, governed data — usually a single curated dataset with well-defined metrics. In the real org, the same question can resolve against three tables, two metric definitions, and a stale fact load from last Tuesday. The model isn’t hallucinating; the data really does support multiple contradictory answers.

Worse, the failure mode is silent. The AI returns a plausible answer with confident phrasing. A junior analyst can spot a mismatched chart axis; nobody can spot a wrong number wrapped in a paragraph. The failures move from the dashboard layer (where they were visible) to the language layer (where they look like authority).

The semantic layer is the AI substrate

Every AI deployment that has actually worked for our clients had one thing in common: a clean, governed semantic layer underneath. In Power BI / Fabric this means a well-built tabular model with curated measures, RLS, and named entities. In Snowflake / dbt land it means modeled marts with documented metric definitions. The exact tools matter less than the discipline.

What 'governed semantic layer' actually means

Named, version-controlled measures (not column math scattered across reports)
Documented business definitions stored next to the code, not in a deck
Row-Level Security and sensitivity labels enforced at the model layer
Lineage from source system to measure — visible to both humans and the AI tool
A measure library that the AI tool can actually reference by name
Refresh SLAs and freshness signals the AI tool can quote in its answers

The pre-AI readiness audit

Before any conversational-BI rollout we run a half-day audit against five domains: data foundation, governance, measure library, security/lineage, and operational readiness. Each domain gets a red/yellow/green rating against specific criteria. We have never started an AI program that scored more than one red without getting a slower start than the client expected.

Audit checklist

A single source-of-truth model for each business domain
Documented metric definitions with owners
Row-Level Security and sensitivity labels in place
Predictable refresh cadence and SLA, with freshness visible in the model
A measure library — not column math scattered across reports
Audit logging on the AI surface so wrong answers can be reconstructed
A governance owner for AI outputs, separate from the data team

When to start with AI anyway

Bottom-up adoption (a Copilot license for the FP&A team, GPT-4-class summarization on a single dashboard) is fine to start early — it surfaces where your model breaks. Org-wide rollouts should wait until at least the top one or two domains are modeled cleanly. The cost of a bad answer in front of the CFO is much higher than the cost of waiting a quarter.

We frame this as a portfolio decision. Small, contained pilots can run on any domain — they generate evidence about where the model breaks and create internal urgency to fix the foundation. Broad, executive-facing deployments must wait until the underlying domain is in good shape. Treating both as the same project is the failure mode that produces the worst headlines.

A phased rollout we use on most engagements

Months 1-3: foundation

Pick the highest-priority revenue domain. Build or harden the semantic model: named measures, RLS, lineage, refresh SLA, freshness signal. Document metric definitions next to the code. Run the readiness audit and resolve every red. Do not ship any AI surface in this phase — the goal is to be ready, not visible.

Months 4-6: contained pilot

Ship a Copilot-enabled report (or equivalent on your stack) to a small, sophisticated user group — usually FP&A, RevOps, or a specific business-line analyst team. Log every question they ask. The questions become the next backlog: ambiguities, missing measures, unmodeled domains, definitional conflicts. Iterate weekly.

Months 7-12: broaden carefully

Expand to the next one or two domains, applying the same foundation discipline before exposing AI on each. Introduce executive-facing surfaces only after the model has answered ~500 real user questions with a reviewable audit log and an issue rate that leadership can stomach.

Where this goes wrong

Plugging the AI into a raw warehouse and hoping the model figures out the joins
Letting product teams ship AI features without a data team review of the underlying model
Skipping audit logging — making failure modes invisible
Allowing the AI to coerce wrong answers into confident prose with no uncertainty signal
Treating the rollout as IT, not as a governance program with named accountability

What we’d build first

If we were standing up an AI program from scratch today, we’d model the top revenue domain end-to-end, ship a Copilot-enabled report on it, and instrument every question users ask. The questions become the next backlog — and within two quarters you have a governed AI surface across the metrics that actually matter.

“Garbage in, fluent garbage out. AI made the data problem more expensive, not less.”

The one-line takeaway

AI-on-your-data is a foundation problem dressed up as a product problem. The companies that win this cycle are the ones who spent the previous one cleaning their semantic layer. The ones that didn’t will spend this cycle apologizing for the answers.

Back to all posts

Published January 16, 2026 · 12 min read