dbt-coreproductiontutorial

Data Quality in dbt: Built-in Tests, Custom Tests, and Elementary

A practical guide to data quality testing in dbt — from built-in tests and custom macros to dbt-expectations and Elementary for data observability.

ModelDock TeamFebruary 17, 202613 min read

Your dbt models can be perfectly written SQL and still produce garbage results. A source table silently drops a column. A vendor changes their API response format. A batch job double-loads yesterday's data. These aren't transformation bugs — they're data quality failures, and they're the most common reason dashboards show wrong numbers in production.

dbt has a testing framework built in. Most teams use it lightly — a few not_null and unique tests sprinkled across their project. But dbt's testing capabilities go much deeper than that, and when combined with packages like dbt-expectations and tools like Elementary, you get a data quality system that catches problems before they reach your stakeholders.

This guide covers the full spectrum: dbt's built-in tests, custom generic tests, singular tests, the dbt-expectations package, and Elementary for ongoing data observability. We'll also cover how to structure your testing strategy across the medallion architecture (Bronze, Silver, Gold) and integrate tests into CI/CD.

Why Data Quality Testing Matters

Here's the uncomfortable truth about data pipelines: they fail silently. A transformation error throws an exception and you know about it. A data quality issue — duplicate records, null IDs, stale data, values outside expected ranges — doesn't. Your pipeline runs successfully, your models build, and your dashboards update with wrong numbers.

By the time someone notices, it's usually a business user asking "why does this report look off?" That's the worst way to discover a data issue. It erodes trust, triggers fire drills, and makes stakeholders question every number your team produces.

Automated data quality tests flip this dynamic. Instead of waiting for humans to spot anomalies, your pipeline checks for known failure modes on every run. When a test fails, the pipeline stops (or alerts), and you fix the issue before it reaches downstream consumers.

The ROI of testing is asymmetric: a few hours spent writing tests saves days of debugging and weeks of rebuilding stakeholder trust.

dbt's Built-in Tests

dbt ships with four generic tests out of the box. They're simple, fast to add, and cover the most common data quality checks.

unique

Asserts that every value in a column is unique. This is the first line of defense against duplicate records.

# models/staging/_stg__models.yml
models:
  - name: stg_orders
    columns:
      - name: order_id
        tests:
          - unique

When this test runs, dbt executes a query like SELECT order_id FROM stg_orders GROUP BY order_id HAVING COUNT(*) > 1. If any rows come back, the test fails.

not_null

Asserts that a column contains no NULL values. Critical for primary keys, foreign keys, and any column that downstream models depend on.

      - name: order_id
        tests:
          - unique
          - not_null
      - name: customer_id
        tests:
          - not_null
      - name: order_date
        tests:
          - not_null

accepted_values

Asserts that every value in a column belongs to a predefined list. Use this for status columns, category fields, and enums.

      - name: order_status
        tests:
          - accepted_values:
              values: ['pending', 'shipped', 'delivered', 'cancelled', 'returned']

If a new status value appears in the source data (maybe someone added "on_hold" without telling you), this test catches it.

relationships

Asserts referential integrity between two models. The test checks that every value in a column exists in another model's column — essentially a foreign key constraint.

      - name: customer_id
        tests:
          - relationships:
              to: ref('stg_customers')
              field: customer_id

This catches orphaned records — orders that reference customers who don't exist in the customers table. Common when source systems have eventual consistency or when data lands out of order.

Running Tests

Tests run as part of dbt build (which runs models + tests) or independently with dbt test:

# Run all tests
dbt test

# Run tests for a specific model
dbt test --select stg_orders

# Run only tests tagged as "critical"
dbt test --select tag:critical

Custom Generic Tests

The four built-in tests cover the basics, but production data quality requires more. dbt lets you write custom generic tests — reusable test macros that you can apply to any model and column via YAML.

Writing a Custom Generic Test

Custom generic tests are Jinja macros that live in tests/generic/ (or macros/). They receive a model and a column_name and return a SQL query. If the query returns rows, the test fails.

Here's a test that asserts a numeric column is always positive:

-- tests/generic/test_is_positive.sql
{% test is_positive(model, column_name) %}

select {{ column_name }}
from {{ model }}
where {{ column_name }} < 0

{% endtest %}

Use it in YAML:

models:
  - name: fct_orders
    columns:
      - name: order_total
        tests:
          - is_positive

A More Complex Example: Percentage Range

A test that checks a column falls within a percentage range:

-- tests/generic/test_in_range.sql
{% test in_range(model, column_name, min_value=0, max_value=100) %}

select {{ column_name }}
from {{ model }}
where {{ column_name }} < {{ min_value }}
   or {{ column_name }} > {{ max_value }}

{% endtest %}

      - name: discount_percentage
        tests:
          - in_range:
              min_value: 0
              max_value: 100

Freshness Test

A custom test that checks if a timestamp column has recent data:

-- tests/generic/test_is_fresh.sql
{% test is_fresh(model, column_name, max_hours=24) %}

select 1
where (
    select max({{ column_name }})
    from {{ model }}
) < {{ dbt.dateadd('hour', -max_hours, dbt.current_timestamp()) }}

{% endtest %}

models:
  - name: stg_orders
    columns:
      - name: ordered_at
        tests:
          - is_fresh:
              max_hours: 12

Custom generic tests are the backbone of a mature testing strategy. Build a library of 5-10 tests that cover your domain's common failure modes and reuse them across your project.

Custom Singular Tests

Singular tests are one-off SQL queries that test a specific business rule. They live in tests/ as standalone .sql files. If the query returns any rows, the test fails.

Example: Revenue Sanity Check

-- tests/assert_daily_revenue_reasonable.sql

-- Flag any day where revenue is more than 3x the trailing 7-day average.
-- This catches double-loads, currency conversion errors, and other anomalies.

with daily_revenue as (
    select
        order_date,
        sum(order_total) as revenue
    from {{ ref('fct_orders') }}
    where order_date >= current_date - interval '30 days'
    group by order_date
),

with_avg as (
    select
        order_date,
        revenue,
        avg(revenue) over (
            order by order_date
            rows between 7 preceding and 1 preceding
        ) as trailing_7d_avg
    from daily_revenue
)

select *
from with_avg
where revenue > trailing_7d_avg * 3
  and trailing_7d_avg > 0
  and order_date = current_date - interval '1 day'

Example: Cross-Model Consistency

-- tests/assert_order_count_matches.sql

-- Verify that the count of orders in the mart matches
-- the count in the staging model (no records lost in transformation).

with staging_count as (
    select count(*) as cnt from {{ ref('stg_orders') }}
),

mart_count as (
    select count(*) as cnt from {{ ref('fct_orders') }}
)

select *
from staging_count
cross join mart_count
where staging_count.cnt != mart_count.cnt

Singular tests are perfect for complex business logic that doesn't generalize into a reusable pattern. If you find yourself writing the same singular test for multiple models, refactor it into a custom generic test.

dbt-expectations: Great Expectations for dbt

The dbt-expectations package brings Great Expectations-style tests to dbt. It adds 50+ generic tests that cover statistical distributions, string patterns, aggregate comparisons, and more.

Installation

Add to your packages.yml:

packages:
  - package: calogica/dbt_expectations
    version: [">=0.10.0", "<0.11.0"]

Then run dbt deps.

Key Tests

Here are the most useful tests from the package:

Row count validation:

models:
  - name: fct_orders
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1000
          max_value: 10000000

Column value distributions:

    columns:
      - name: order_total
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 100000
              mostly: 0.99  # Allow 1% of rows to be outside range

String pattern matching:

      - name: email
        tests:
          - dbt_expectations.expect_column_values_to_match_regex:
              regex: "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"

Aggregate comparisons:

    tests:
      - dbt_expectations.expect_table_row_count_to_equal_other_table:
          compare_model: ref("stg_orders")

Column completeness:

      - name: phone_number
        tests:
          - dbt_expectations.expect_column_values_to_not_be_null:
              mostly: 0.95  # At least 95% of rows should have a phone number

The mostly parameter is incredibly useful. Real data is messy — you rarely want a test that fails on a single NULL or outlier. Setting mostly: 0.99 means the test passes if 99% of rows meet the condition, which is usually the right threshold for production data.

Elementary: Data Observability for dbt

Elementary takes data quality a step further. While dbt tests check rules you define, Elementary automatically monitors your data for anomalies — schema changes, volume shifts, freshness degradation, and statistical outliers — without you writing explicit tests for every possible failure mode.

What Elementary Does

Elementary adds three capabilities to your dbt project:

Anomaly detection tests: Automatically detect unusual patterns in row counts, column statistics, and freshness based on historical baselines
Schema change monitoring: Alert when columns are added, removed, or change type
Data observability dashboard: A self-hosted UI that shows test results, model run history, and data quality trends over time

Installation

Add Elementary to your packages.yml:

packages:
  - package: elementary-data/elementary
    version: [">=0.16.0", "<0.17.0"]

Run dbt deps, then dbt run --select elementary to create Elementary's internal tables.

Also install the Elementary CLI for the dashboard and alerting:

pip install elementary-data

Anomaly Detection Tests

Elementary's anomaly detection tests compare current metrics against historical baselines. If today's value deviates significantly from the pattern of the last N days, the test fails.

Volume anomaly:

models:
  - name: fct_orders
    tests:
      - elementary.volume_anomalies:
          timestamp_column: ordered_at
          training_period:
            period: 14
            count: 1
          detection_period:
            period: 1
            count: 1

This test builds a baseline from the last 14 days of row counts and flags if today's volume is anomalous. If your pipeline usually processes 50,000 orders per day and today you see 500, Elementary catches it — even though you never wrote a test that says "orders should be more than 10,000."

Column-level anomalies:

      - name: order_total
        tests:
          - elementary.column_anomalies:
              timestamp_column: ordered_at
              column_anomalies:
                - zero_count
                - null_count
                - average

This monitors the null count, zero count, and average of order_total over time. If the average suddenly drops by 80% (maybe a currency conversion broke), Elementary flags it.

Freshness anomaly:

    tests:
      - elementary.freshness_anomalies:
          timestamp_column: ordered_at

Monitors when the most recent record was loaded. If data usually arrives by 6 AM and it's 10 AM with no new records, the test fails.

Schema Change Monitoring

models:
  - name: stg_orders
    tests:
      - elementary.schema_changes

This test tracks the schema of your model over time. If a column is added, removed, or changes type between runs, the test fails. Essential for catching upstream schema drift — especially from third-party data sources that change without notice.

The Elementary Dashboard

After your dbt runs, generate the observability report:

edr report

This creates a self-hosted HTML dashboard showing:

Test results across all models
Historical trends for anomaly detection metrics
Model run durations and statuses
Schema change history

For alerting, Elementary integrates with Slack:

edr monitor --slack-webhook https://hooks.slack.com/services/...

Testing Strategy Per Medallion Layer

Not all tests belong everywhere. Here's how to distribute testing effort across your Bronze, Silver, and Gold layers:

Bronze (Staging): Freshness and Schema

Bronze models are the first line of defense. Test that data is arriving and that the schema hasn't changed.

Test Type	What It Catches	Example
Source freshness	Data stopped arriving	`freshness: warn_after: 12 hours`
Schema changes	Upstream column added/removed	`elementary.schema_changes`
not_null on PKs	Source dropping primary keys	`not_null` on `order_id`
unique on PKs	Source sending duplicates	`unique` on `order_id`
Volume anomaly	Partial or double loads	`elementary.volume_anomalies`

# models/staging/_stg__models.yml
models:
  - name: stg_orders
    tests:
      - elementary.schema_changes
      - elementary.volume_anomalies:
          timestamp_column: ordered_at
    columns:
      - name: order_id
        tests:
          - unique
          - not_null

Silver (Intermediate): Uniqueness and Data Quality

Silver models handle cleaning and deduplication. Test that the cleaning actually worked.

Test Type	What It Catches	Example
unique after dedup	Deduplication logic broken	`unique` on business key
not_null after joins	Broken joins producing NULLs	`not_null` on enriched columns
referential integrity	Orphaned records after joins	`relationships` test
Value ranges	Invalid transformations	`is_positive` on amounts

# models/intermediate/_int__models.yml
models:
  - name: int_orders__enriched
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: order_total
        tests:
          - not_null
          - is_positive
      - name: customer_id
        tests:
          - relationships:
              to: ref('int_customers__deduplicated')
              field: customer_id

Gold (Marts): Business Logic and Aggregates

Gold models serve stakeholders. Test that business rules hold and aggregates make sense.

Test Type	What It Catches	Example
accepted_values	Invalid business states	Status columns
Aggregate anomalies	Revenue spikes/drops	`elementary.column_anomalies` on sum
Row count comparison	Records lost in transformation	Singular test comparing counts
Business rule tests	Logic errors	Custom singular tests

# models/marts/_marts__models.yml
models:
  - name: fct_orders
    tests:
      - elementary.volume_anomalies:
          timestamp_column: ordered_at
    columns:
      - name: order_state
        tests:
          - accepted_values:
              values: ['paid', 'refunded', 'cancelled', 'pending']
      - name: order_total
        tests:
          - elementary.column_anomalies:
              timestamp_column: ordered_at
              column_anomalies:
                - average
                - zero_count

CI/CD Integration: Testing on Pull Requests

Running tests in CI catches data quality regressions before they hit production. The most effective pattern: run dbt build (which includes tests) on every PR that modifies dbt models.

GitHub Actions Example

# .github/workflows/dbt-ci.yml
name: dbt CI
on:
  pull_request:
    paths:
      - 'models/**'
      - 'tests/**'
      - 'macros/**'
      - 'dbt_project.yml'

jobs:
  dbt-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dbt
        run: pip install dbt-core dbt-bigquery

      - name: Install packages
        run: dbt deps

      - name: Run dbt build
        run: dbt build --target ci --profiles-dir ./ci_profiles
        env:
          GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
          GCP_PRIVATE_KEY: ${{ secrets.GCP_PRIVATE_KEY }}
          GCP_CLIENT_EMAIL: ${{ secrets.GCP_CLIENT_EMAIL }}

This runs all models and tests against a CI-specific schema (so you don't pollute production). If any test fails, the PR is blocked.

What to Test in CI vs. Production

CI (on every PR):

Schema tests (unique, not_null, accepted_values, relationships)
Custom generic tests
Singular tests that don't depend on production data volumes

Production (on every scheduled run):

All of the above, plus:
Elementary anomaly detection tests (these need historical baselines from production data)
Source freshness checks
Volume and aggregate anomaly tests

Anomaly detection tests don't make sense in CI because CI runs against a small dataset without historical context. Run those only in production.

Wrapping Up

Data quality testing in dbt isn't optional — it's the difference between a pipeline your team trusts and one that generates fire drills. Start with the basics (unique, not_null on every primary key), add domain-specific tests as you discover failure modes, and graduate to Elementary when you want automated anomaly detection.

The testing pyramid for dbt looks like this:

Base: Built-in tests on every model (unique, not_null, accepted_values)
Middle: Custom generic tests for your domain's common patterns (freshness, ranges, percentages)
Top: Elementary anomaly detection for catching the unknown unknowns

The cost of adding a test is minutes. The cost of a data quality incident reaching stakeholders is days. Invest in testing early and compound those returns over the lifetime of your pipeline.

ModelDock runs your dbt tests on every scheduled build. Connect your Git repo, set a schedule, and ModelDock handles the orchestration — models, tests, and artifacts all in one place. If a test fails, you know about it before your stakeholders do. Try it free.

Why Data Quality Testing Matters

dbt's Built-in Tests

unique

not_null

accepted_values

relationships

Running Tests

Custom Generic Tests

Writing a Custom Generic Test

A More Complex Example: Percentage Range

Freshness Test

Custom Singular Tests

Example: Revenue Sanity Check

Example: Cross-Model Consistency

dbt-expectations: Great Expectations for dbt

Installation

Key Tests

Elementary: Data Observability for dbt

What Elementary Does

Installation

Anomaly Detection Tests

Schema Change Monitoring

The Elementary Dashboard

Testing Strategy Per Medallion Layer

Bronze (Staging): Freshness and Schema

Silver (Intermediate): Uniqueness and Data Quality

Gold (Marts): Business Logic and Aggregates

CI/CD Integration: Testing on Pull Requests

GitHub Actions Example

What to Test in CI vs. Production

Wrapping Up

Ready to run dbt-core in production?