Fractional Data Engineer
← All Posts

May 19, 2026

How to Set Up dbt for the First Time (For Small Teams)

dbt · data transformation · data engineering · BigQuery · Snowflake

dbt (data build tool) has become the standard for data transformation in modern data stacks. If you have a data warehouse and you're writing SQL to clean and model your data, dbt is the layer that makes that SQL testable, documented, version-controlled, and reusable.

This is a practical guide for small teams setting it up for the first time.

What dbt Actually Does

Before touching any setup, it's worth being clear on what dbt is and isn't.

dbt is a transformation tool. It takes raw data that already exists in your warehouse and transforms it into clean, modeled tables your analysts and dashboards can query. It does not move data into your warehouse. That's the job of a connector tool like Fivetran or Airbyte. dbt starts after the data lands.

What dbt adds on top of plain SQL: it runs your models in the right order based on their dependencies, it lets you write tests to check that your data meets certain conditions (no nulls in a primary key, revenue is never negative), and it generates documentation automatically from your code. These things sound small. They're not. They're the difference between a SQL folder that nobody maintains and a data layer that actually holds up.

dbt Core vs. dbt Cloud

Before you set anything up, you need to decide which version you're using.

dbt Core is the open-source command-line version. Free, runs locally or on your own infrastructure, requires more setup. Good choice if you have engineering resources and want full control.

dbt Cloud is the hosted version with a web-based IDE, managed scheduling, and a documentation portal. It has a free tier that covers most small team needs. Good choice if you want to get started quickly without infrastructure overhead.

For most small teams starting out, dbt Cloud's free tier is the right call. You can always migrate to Core later if you need more control.

Connecting to Your Warehouse

dbt connects to your data warehouse. The setup differs slightly depending on which one you're using, but the pattern is the same: you give dbt the credentials to connect, and it runs your SQL against your warehouse.

For BigQuery, you'll authenticate via a service account with the right permissions. For Snowflake, you'll use a username and password or key pair authentication. For Redshift, standard PostgreSQL credentials. dbt's documentation for each warehouse is thorough and worth following step by step rather than improvising.

One thing to get right early: use a separate service account or role for dbt with read access to your raw data and write access to the schemas where dbt will create its models. Don't connect dbt as a superuser. It doesn't need those permissions and it's good practice to limit access.

Project Structure That Works for Small Teams

When you initialize a dbt project, you get a default folder structure. The key folders are:

models/
  staging/
  marts/
tests/
macros/
dbt_project.yml

Staging models are the first layer. One staging model per source table. Their job is to rename columns to consistent naming conventions, cast data types correctly, and do light cleaning. Nothing complex. These are the clean raw inputs that everything else builds on.

Mart models are the business layer. These are the tables your dashboards and analysts actually query: customers, orders, revenue, sessions. They join and aggregate staging models into something that reflects how your business actually works.

Keep this separation clean from the start. Staging models should not be queried directly by dashboards. Mart models should not read directly from raw source tables. The layering matters because it makes things maintainable.

Writing Your First Model

A dbt model is just a SQL file. Create a file in your models folder, write a SELECT statement, and dbt handles creating the table or view in your warehouse.

-- models/staging/stg_stripe__payments.sql

select
    id as payment_id,
    customer_id,
    amount / 100.0 as amount_usd,  -- Stripe stores amounts in cents
    currency,
    status,
    created as created_at
from {{ source('stripe', 'payment_intents') }}

The {{ source() }} reference is how dbt knows where to read from. You define sources separately in a YAML file, which also lets dbt track lineage and test source data.

Run dbt run and dbt builds the model in your warehouse. Run dbt test and it checks any tests you've defined. Run dbt docs generate && dbt docs serve and you get a documentation site automatically built from your code.

Tests Worth Setting Up from Day One

dbt has built-in generic tests you can apply to any column without writing any code. The ones that matter most to start:

not_null: checks that a column has no null values. Apply this to primary keys and any column that should always be populated.

unique: checks that every value in a column is distinct. Apply this to primary keys.

accepted_values: checks that a column only contains values from a defined list. Useful for status columns where you want to catch unexpected values from your source.

relationships: checks that a foreign key in one model exists in another. Useful for catching data integrity issues before they reach dashboards.

These four tests catch the most common data quality issues. Set them up on your mart models before you connect dashboards to them.

The Most Common Mistakes

Skipping the staging layer. It's tempting to go straight to building mart models. Don't. Without a clean staging layer, your mart models become messy and hard to maintain as your source data changes.

Building business logic into BI tools instead of dbt. If a metric calculation exists in Metabase or Tableau but not in a dbt model, it's invisible, untestable, and fragile. Anything that defines how a metric is calculated belongs in dbt.

Not writing tests. Tests are not optional overhead. They're the thing that tells you something broke before a decision gets made on wrong data.

Inconsistent naming. Pick a convention early: snake_case, consistent prefixes for IDs, consistent suffixes for timestamps. Enforce it. A model called customer_data_v2_FINAL is a sign that naming wasn't treated as important. That compounds over time.

When You Need Engineering Help

dbt is designed to be approachable for analysts. The basic setup is genuinely manageable without deep engineering expertise. But there are points where engineering knowledge matters: designing the right data model for your business, setting up orchestration so dbt runs on a schedule and alerts you on failure, handling complex incremental models for large tables, and connecting dbt's output to downstream tools correctly.

If you want to get started yourself, the foundation above will take you a long way. If you want it done right from day one, with the architecture designed for how your business will look in 12 months rather than today, that's where we come in.

Not sure where to start with your data stack?

Get your personalized data roadmap in under 5 minutes.

Get Your Free Data Roadmap →