Back to Blog
dbt-coreproductiontutorial

Data Quality in dbt: Built-in Tests, Custom Tests, and Elementary

A practical guide to data quality testing in dbt — from built-in tests and custom macros to dbt-expectations and Elementary for data observability.

ModelDock TeamFebruary 17, 202613 min read

Your dbt models can be perfectly written SQL and still produce garbage results. A source table silently drops a column. A vendor changes their API response format. A batch job double-loads yesterday's data. These aren't transformation bugs — they're data quality failures, and they're the most common reason dashboards show wrong numbers in production.

dbt has a testing framework built in. Most teams use it lightly — a few not_null and unique tests sprinkled across their project. But dbt's testing capabilities go much deeper than that, and when combined with packages like dbt-expectations and tools like Elementary, you get a data quality system that catches problems before they reach your stakeholders.

This guide covers the full spectrum: dbt's built-in tests, custom generic tests, singular tests, the dbt-expectations package, and Elementary for ongoing data observability. We'll also cover how to structure your testing strategy across the medallion architecture (Bronze, Silver, Gold) and integrate tests into CI/CD.

Why Data Quality Testing Matters

Here's the uncomfortable truth about data pipelines: they fail silently. A transformation error throws an exception and you know about it. A data quality issue — duplicate records, null IDs, stale data, values outside expected ranges — doesn't. Your pipeline runs successfully, your models build, and your dashboards update with wrong numbers.

By the time someone notices, it's usually a business user asking "why does this report look off?" That's the worst way to discover a data issue. It erodes trust, triggers fire drills, and makes stakeholders question every number your team produces.

Automated data quality tests flip this dynamic. Instead of waiting for humans to spot anomalies, your pipeline checks for known failure modes on every run. When a test fails, the pipeline stops (or alerts), and you fix the issue before it reaches downstream consumers.

The ROI of testing is asymmetric: a few hours spent writing tests saves days of debugging and weeks of rebuilding stakeholder trust.

dbt's Built-in Tests

dbt ships with four generic tests out of the box. They're simple, fast to add, and cover the most common data quality checks.

unique

Asserts that every value in a column is unique. This is the first line of defense against duplicate records.

# models/staging/_stg__models.yml
models:
- name: stg_orders
columns:
- name: order_id
tests:
- unique

When this test runs, dbt executes a query like SELECT order_id FROM stg_orders GROUP BY order_id HAVING COUNT(*) > 1. If any rows come back, the test fails.

not_null

Asserts that a column contains no NULL values. Critical for primary keys, foreign keys, and any column that downstream models depend on.

- name: order_id
tests:
- unique
- not_null
- name: customer_id
tests:
- not_null
- name: order_date
tests:
- not_null

accepted_values

Asserts that every value in a column belongs to a predefined list. Use this for status columns, category fields, and enums.

- name: order_status
tests:
- accepted_values:
values: ['pending', 'shipped', 'delivered', 'cancelled', 'returned']

If a new status value appears in the source data (maybe someone added "on_hold" without telling you), this test catches it.

relationships

Asserts referential integrity between two models. The test checks that every value in a column exists in another model's column — essentially a foreign key constraint.

- name: customer_id
tests:
- relationships:
to: ref('stg_customers')
field: customer_id

This catches orphaned records — orders that reference customers who don't exist in the customers table. Common when source systems have eventual consistency or when data lands out of order.

Running Tests

Tests run as part of dbt build (which runs models + tests) or independently with dbt test:

# Run all tests
dbt test
# Run tests for a specific model
dbt test --select stg_orders
# Run only tests tagged as "critical"
dbt test --select tag:critical

Custom Generic Tests

The four built-in tests cover the basics, but production data quality requires more. dbt lets you write custom generic tests — reusable test macros that you can apply to any model and column via YAML.

Writing a Custom Generic Test

Custom generic tests are Jinja macros that live in tests/generic/ (or macros/). They receive a model and a column_name and return a SQL query. If the query returns rows, the test fails.

Here's a test that asserts a numeric column is always positive:

-- tests/generic/test_is_positive.sql
{% test is_positive(model, column_name) %}
select {{ column_name }}
from {{ model }}
where {{ column_name }} < 0
{% endtest %}

Use it in YAML:

models:
- name: fct_orders
columns:
- name: order_total
tests:
- is_positive

A More Complex Example: Percentage Range

A test that checks a column falls within a percentage range:

-- tests/generic/test_in_range.sql
{% test in_range(model, column_name, min_value=0, max_value=100) %}
select {{ column_name }}
from {{ model }}
where {{ column_name }} < {{ min_value }}
or {{ column_name }} > {{ max_value }}
{% endtest %}
- name: discount_percentage
tests:
- in_range:
min_value: 0
max_value: 100

Freshness Test

A custom test that checks if a timestamp column has recent data:

-- tests/generic/test_is_fresh.sql
{% test is_fresh(model, column_name, max_hours=24) %}
select 1
where (
select max({{ column_name }})
from {{ model }}
) < {{ dbt.dateadd('hour', -max_hours, dbt.current_timestamp()) }}
{% endtest %}
models:
- name: stg_orders
columns:
- name: ordered_at
tests:
- is_fresh:
max_hours: 12

Custom generic tests are the backbone of a mature testing strategy. Build a library of 5-10 tests that cover your domain's common failure modes and reuse them across your project.

Custom Singular Tests

Singular tests are one-off SQL queries that test a specific business rule. They live in tests/ as standalone .sql files. If the query returns any rows, the test fails.

Example: Revenue Sanity Check

-- tests/assert_daily_revenue_reasonable.sql
-- Flag any day where revenue is more than 3x the trailing 7-day average.
-- This catches double-loads, currency conversion errors, and other anomalies.
with daily_revenue as (
select
order_date,
sum(order_total) as revenue
from {{ ref('fct_orders') }}
where order_date >= current_date - interval '30 days'
group by order_date
),
with_avg as (
select
order_date,
revenue,
avg(revenue) over (
order by order_date
rows between 7 preceding and 1 preceding
) as trailing_7d_avg
from daily_revenue
)
select *
from with_avg
where revenue > trailing_7d_avg * 3
and trailing_7d_avg > 0
and order_date = current_date - interval '1 day'

Example: Cross-Model Consistency

-- tests/assert_order_count_matches.sql
-- Verify that the count of orders in the mart matches
-- the count in the staging model (no records lost in transformation).
with staging_count as (
select count(*) as cnt from {{ ref('stg_orders') }}
),
mart_count as (
select count(*) as cnt from {{ ref('fct_orders') }}
)
select *
from staging_count
cross join mart_count
where staging_count.cnt != mart_count.cnt

Singular tests are perfect for complex business logic that doesn't generalize into a reusable pattern. If you find yourself writing the same singular test for multiple models, refactor it into a custom generic test.

dbt-expectations: Great Expectations for dbt

The dbt-expectations package brings Great Expectations-style tests to dbt. It adds 50+ generic tests that cover statistical distributions, string patterns, aggregate comparisons, and more.

Installation

Add to your packages.yml:

packages:
- package: calogica/dbt_expectations
version: [">=0.10.0", "<0.11.0"]

Then run dbt deps.

Key Tests

Here are the most useful tests from the package:

Row count validation:

models:
- name: fct_orders
tests:
- dbt_expectations.expect_table_row_count_to_be_between:
min_value: 1000
max_value: 10000000

Column value distributions:

columns:
- name: order_total
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 100000
mostly: 0.99 # Allow 1% of rows to be outside range

String pattern matching:

- name: email
tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"

Aggregate comparisons:

tests:
- dbt_expectations.expect_table_row_count_to_equal_other_table:
compare_model: ref("stg_orders")

Column completeness:

- name: phone_number
tests:
- dbt_expectations.expect_column_values_to_not_be_null:
mostly: 0.95 # At least 95% of rows should have a phone number

The mostly parameter is incredibly useful. Real data is messy — you rarely want a test that fails on a single NULL or outlier. Setting mostly: 0.99 means the test passes if 99% of rows meet the condition, which is usually the right threshold for production data.

Elementary: Data Observability for dbt

Elementary takes data quality a step further. While dbt tests check rules you define, Elementary automatically monitors your data for anomalies — schema changes, volume shifts, freshness degradation, and statistical outliers — without you writing explicit tests for every possible failure mode.

What Elementary Does

Elementary adds three capabilities to your dbt project:

  1. Anomaly detection tests: Automatically detect unusual patterns in row counts, column statistics, and freshness based on historical baselines
  2. Schema change monitoring: Alert when columns are added, removed, or change type
  3. Data observability dashboard: A self-hosted UI that shows test results, model run history, and data quality trends over time

Installation

Add Elementary to your packages.yml:

packages:
- package: elementary-data/elementary
version: [">=0.16.0", "<0.17.0"]

Run dbt deps, then dbt run --select elementary to create Elementary's internal tables.

Also install the Elementary CLI for the dashboard and alerting:

pip install elementary-data

Anomaly Detection Tests

Elementary's anomaly detection tests compare current metrics against historical baselines. If today's value deviates significantly from the pattern of the last N days, the test fails.

Volume anomaly:

models:
- name: fct_orders
tests:
- elementary.volume_anomalies:
timestamp_column: ordered_at
training_period:
period: 14
count: 1
detection_period:
period: 1
count: 1

This test builds a baseline from the last 14 days of row counts and flags if today's volume is anomalous. If your pipeline usually processes 50,000 orders per day and today you see 500, Elementary catches it — even though you never wrote a test that says "orders should be more than 10,000."

Column-level anomalies:

- name: order_total
tests:
- elementary.column_anomalies:
timestamp_column: ordered_at
column_anomalies:
- zero_count
- null_count
- average

This monitors the null count, zero count, and average of order_total over time. If the average suddenly drops by 80% (maybe a currency conversion broke), Elementary flags it.

Freshness anomaly:

tests:
- elementary.freshness_anomalies:
timestamp_column: ordered_at

Monitors when the most recent record was loaded. If data usually arrives by 6 AM and it's 10 AM with no new records, the test fails.

Schema Change Monitoring

models:
- name: stg_orders
tests:
- elementary.schema_changes

This test tracks the schema of your model over time. If a column is added, removed, or changes type between runs, the test fails. Essential for catching upstream schema drift — especially from third-party data sources that change without notice.

The Elementary Dashboard

After your dbt runs, generate the observability report:

edr report

This creates a self-hosted HTML dashboard showing:

  • Test results across all models
  • Historical trends for anomaly detection metrics
  • Model run durations and statuses
  • Schema change history

For alerting, Elementary integrates with Slack:

edr monitor --slack-webhook https://hooks.slack.com/services/...

Testing Strategy Per Medallion Layer

Not all tests belong everywhere. Here's how to distribute testing effort across your Bronze, Silver, and Gold layers:

Bronze (Staging): Freshness and Schema

Bronze models are the first line of defense. Test that data is arriving and that the schema hasn't changed.

Test TypeWhat It CatchesExample
Source freshnessData stopped arrivingfreshness: warn_after: 12 hours
Schema changesUpstream column added/removedelementary.schema_changes
not_null on PKsSource dropping primary keysnot_null on order_id
unique on PKsSource sending duplicatesunique on order_id
Volume anomalyPartial or double loadselementary.volume_anomalies
# models/staging/_stg__models.yml
models:
- name: stg_orders
tests:
- elementary.schema_changes
- elementary.volume_anomalies:
timestamp_column: ordered_at
columns:
- name: order_id
tests:
- unique
- not_null

Silver (Intermediate): Uniqueness and Data Quality

Silver models handle cleaning and deduplication. Test that the cleaning actually worked.

Test TypeWhat It CatchesExample
unique after dedupDeduplication logic brokenunique on business key
not_null after joinsBroken joins producing NULLsnot_null on enriched columns
referential integrityOrphaned records after joinsrelationships test
Value rangesInvalid transformationsis_positive on amounts
# models/intermediate/_int__models.yml
models:
- name: int_orders__enriched
columns:
- name: order_id
tests:
- unique
- not_null
- name: order_total
tests:
- not_null
- is_positive
- name: customer_id
tests:
- relationships:
to: ref('int_customers__deduplicated')
field: customer_id

Gold (Marts): Business Logic and Aggregates

Gold models serve stakeholders. Test that business rules hold and aggregates make sense.

Test TypeWhat It CatchesExample
accepted_valuesInvalid business statesStatus columns
Aggregate anomaliesRevenue spikes/dropselementary.column_anomalies on sum
Row count comparisonRecords lost in transformationSingular test comparing counts
Business rule testsLogic errorsCustom singular tests
# models/marts/_marts__models.yml
models:
- name: fct_orders
tests:
- elementary.volume_anomalies:
timestamp_column: ordered_at
columns:
- name: order_state
tests:
- accepted_values:
values: ['paid', 'refunded', 'cancelled', 'pending']
- name: order_total
tests:
- elementary.column_anomalies:
timestamp_column: ordered_at
column_anomalies:
- average
- zero_count

CI/CD Integration: Testing on Pull Requests

Running tests in CI catches data quality regressions before they hit production. The most effective pattern: run dbt build (which includes tests) on every PR that modifies dbt models.

GitHub Actions Example

# .github/workflows/dbt-ci.yml
name: dbt CI
on:
pull_request:
paths:
- 'models/**'
- 'tests/**'
- 'macros/**'
- 'dbt_project.yml'
jobs:
dbt-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dbt
run: pip install dbt-core dbt-bigquery
- name: Install packages
run: dbt deps
- name: Run dbt build
run: dbt build --target ci --profiles-dir ./ci_profiles
env:
GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
GCP_PRIVATE_KEY: ${{ secrets.GCP_PRIVATE_KEY }}
GCP_CLIENT_EMAIL: ${{ secrets.GCP_CLIENT_EMAIL }}

This runs all models and tests against a CI-specific schema (so you don't pollute production). If any test fails, the PR is blocked.

What to Test in CI vs. Production

CI (on every PR):

  • Schema tests (unique, not_null, accepted_values, relationships)
  • Custom generic tests
  • Singular tests that don't depend on production data volumes

Production (on every scheduled run):

  • All of the above, plus:
  • Elementary anomaly detection tests (these need historical baselines from production data)
  • Source freshness checks
  • Volume and aggregate anomaly tests

Anomaly detection tests don't make sense in CI because CI runs against a small dataset without historical context. Run those only in production.

Wrapping Up

Data quality testing in dbt isn't optional — it's the difference between a pipeline your team trusts and one that generates fire drills. Start with the basics (unique, not_null on every primary key), add domain-specific tests as you discover failure modes, and graduate to Elementary when you want automated anomaly detection.

The testing pyramid for dbt looks like this:

  • Base: Built-in tests on every model (unique, not_null, accepted_values)
  • Middle: Custom generic tests for your domain's common patterns (freshness, ranges, percentages)
  • Top: Elementary anomaly detection for catching the unknown unknowns

The cost of adding a test is minutes. The cost of a data quality incident reaching stakeholders is days. Invest in testing early and compound those returns over the lifetime of your pipeline.


ModelDock runs your dbt tests on every scheduled build. Connect your Git repo, set a schedule, and ModelDock handles the orchestration — models, tests, and artifacts all in one place. If a test fails, you know about it before your stakeholders do. Try it free.

Ready to run dbt-core in production?

ModelDock handles scheduling, infrastructure, and credential management so you don't have to.

Start For Free