Fractional Data Engineer
← All Posts

May 11, 2026

How to Leverage AI for Data Analytics (You Need a Data Infrastructure First)

AI · analytics · data infrastructure · dbt

Everyone wants to use AI for analytics right now. The demos are impressive. You ask a question in plain English, the AI queries your data, and you get an answer. No SQL required. No analyst needed.

It works when the foundations are right. When they're not, you get confident-sounding answers built on garbage data. And garbage data at AI speed is still garbage, just faster.

What "AI for Analytics" Actually Means

There are a few distinct things people mean when they say AI for analytics.

Natural language querying: ask a question in plain English, get a SQL query generated and executed automatically. Tools like Metabase's AI features and Thoughtspot do this. AI-assisted dashboards: BI tools that surface insights automatically, flag anomalies, or suggest visualizations based on your data. Conversational analytics: chat interfaces connected to your data warehouse where you ask questions and get answers. And predictive analytics: ML models built on your historical data to forecast outcomes like churn, revenue, or demand.

All of these share one requirement. The data underneath has to be clean, modelled, and centralized. The AI doesn't fix bad data. It amplifies it.

What Happens When You Skip the Foundation

When teams try to use AI analytics tools on top of raw, unmodelled data, a few things break.

Definitions don't match. The AI queries what it finds. If "revenue" means three different things across three different tables — recognized revenue, bookings, and cash collected — the AI will use one of them and present the answer as if it's definitive. It isn't. It's one interpretation.

Joins are wrong. Raw data needs context to join correctly. Without a modelled data layer, the AI may join tables in ways that produce counts or sums that look plausible but are mathematically wrong.

The data is stale. If your warehouse isn't being updated reliably because pipelines are failing or syncs are delayed, your AI answers are based on last week's data, and you won't know it.

The business context is missing. AI doesn't know that one of your customers should be excluded from cohort analysis because they're an internal test account. It doesn't know that the "cancelled" status in your CRM means something different for enterprise versus SMB customers. That context lives in your data models, not in the raw tables.

Clean foundations aren't optional. They're what the AI analytics layer is sitting on.

What "Good Foundations" Looks Like

Centralized data warehouse. All your data sources — your product database, CRM, billing platform, marketing tools — need to land in one place. A data warehouse like BigQuery, Snowflake, or Redshift. Not in separate databases, not in Airtable, not in spreadsheets that get manually updated.

If your data lives in multiple places with no unified layer, AI tools can only query one source at a time. The cross-source questions — like "which marketing channel produces customers with the highest LTV?" — require data from multiple systems joined correctly. That join has to be defined somewhere.

Modelled, documented data layer. Raw tables from your source systems are not queryable by humans or AI without context. A modelled data layer, built with dbt or similar, transforms raw data into clean, tested, well-named models that represent your business entities: customers, orders, events, sessions.

This is where business logic lives. What counts as an "active customer"? What's the definition of "churn"? How do you attribute a conversion when someone touched five touchpoints? These questions have answers. They should be encoded in your data models, not left for the AI to guess.

Reliable pipeline orchestration. Data needs to flow consistently. If your pipelines fail silently, your warehouse has gaps. AI analytics tools will query those gaps and return results that look complete but aren't. Orchestration with monitoring and alerting — Airflow or Prefect or similar — ensures you know when something breaks before it affects a business decision.

Access controls. AI analytics tools are often connected to your full warehouse, which means anyone using the tool can ask questions about any data, including sensitive tables like salary data, PII, or unpublished financials. Before connecting an AI tool to your warehouse, set up proper access controls: read-only service accounts, column-level security where needed, and row-level security if you're serving multiple business units.

A Note on Which Tools to Trust with Your Data

A few tools worth knowing: Metabase (with AI features) for teams already in that ecosystem, Thoughtspot for natural language search across a warehouse, Atlan and Select Star for data cataloging with AI search, and Snowflake Cortex or BigQuery ML if you want native AI capabilities inside your warehouse.

One thing to be deliberate about: which tools can access your business data. If you're using a consumer AI assistant to query your data, read the privacy policy carefully. Your business data — customer records, revenue figures, product usage — should not be used to train external models. Look for providers that offer business data agreements, or use tools that keep queries within your own infrastructure. When in doubt, ask your legal or compliance team before connecting anything.

The Sequence That Actually Works

The companies getting real value from AI analytics right now are not the ones who bought the most impressive AI tool. They're the ones who built solid data infrastructure first and then layered AI tools on top of it.

The sequence is: centralize your data in a warehouse, model and document your data with dbt, orchestrate and monitor your pipelines, connect BI tools to your modelled layer, then add AI analytics on top of clean trusted data.

The first four steps are the foundation. The fifth is where the AI pays off.

If you're at step one or two, starting with an AI analytics tool is premature. Build the foundation. The AI will still be there when you're ready, and it'll actually work.

Want to Build the Foundation for AI Analytics?

If you're not sure where your data infrastructure stands, or if you've tried AI analytics tools and the results aren't trustworthy, the issue is almost certainly in the layers underneath.

Want to build the data foundation that makes AI analytics work?

Get your personalized roadmap and find out exactly what needs to be in place.

Get Your Free Data Roadmap →