Data Quality in dbt: Built-in Tests, Custom Tests, and Elementary
A practical guide to data quality testing in dbt — from built-in tests and custom macros to dbt-expectations and Elementary for data observability.
Your dbt models can be perfectly written SQL and still produce garbage results. A source table silently drops a column. A vendor changes their API response format. A batch job double-loads yesterday's data. These aren't transformation bugs — they're data quality failures, and they're the most common reason dashboards show wrong numbers in production.
dbt has a testing framework built in. Most teams use it lightly — a few not_null and unique tests sprinkled across their project. But dbt's testing capabilities go much deeper than that, and when combined with packages like dbt-expectations and tools like Elementary, you get a data quality system that catches problems before they reach your stakeholders.
This guide covers the full spectrum: dbt's built-in tests, custom generic tests, singular tests, the dbt-expectations package, and Elementary for ongoing data observability. We'll also cover how to structure your testing strategy across the medallion architecture (Bronze, Silver, Gold) and integrate tests into CI/CD.
Why Data Quality Testing Matters
Here's the uncomfortable truth about data pipelines: they fail silently. A transformation error throws an exception and you know about it. A data quality issue — duplicate records, null IDs, stale data, values outside expected ranges — doesn't. Your pipeline runs successfully, your models build, and your dashboards update with wrong numbers.
By the time someone notices, it's usually a business user asking "why does this report look off?" That's the worst way to discover a data issue. It erodes trust, triggers fire drills, and makes stakeholders question every number your team produces.
Automated data quality tests flip this dynamic. Instead of waiting for humans to spot anomalies, your pipeline checks for known failure modes on every run. When a test fails, the pipeline stops (or alerts), and you fix the issue before it reaches downstream consumers.
The ROI of testing is asymmetric: a few hours spent writing tests saves days of debugging and weeks of rebuilding stakeholder trust.
dbt's Built-in Tests
dbt ships with four generic tests out of the box. They're simple, fast to add, and cover the most common data quality checks.
unique
Asserts that every value in a column is unique. This is the first line of defense against duplicate records.
# models/staging/_stg__models.ymlmodels:- name: stg_orderscolumns:- name: order_idtests:- unique
When this test runs, dbt executes a query like SELECT order_id FROM stg_orders GROUP BY order_id HAVING COUNT(*) > 1. If any rows come back, the test fails.
not_null
Asserts that a column contains no NULL values. Critical for primary keys, foreign keys, and any column that downstream models depend on.
- name: order_idtests:- unique- not_null- name: customer_idtests:- not_null- name: order_datetests:- not_null
accepted_values
Asserts that every value in a column belongs to a predefined list. Use this for status columns, category fields, and enums.
- name: order_statustests:- accepted_values:values: ['pending', 'shipped', 'delivered', 'cancelled', 'returned']
If a new status value appears in the source data (maybe someone added "on_hold" without telling you), this test catches it.
relationships
Asserts referential integrity between two models. The test checks that every value in a column exists in another model's column — essentially a foreign key constraint.
- name: customer_idtests:- relationships:to: ref('stg_customers')field: customer_id
This catches orphaned records — orders that reference customers who don't exist in the customers table. Common when source systems have eventual consistency or when data lands out of order.
Running Tests
Tests run as part of dbt build (which runs models + tests) or independently with dbt test:
# Run all testsdbt test# Run tests for a specific modeldbt test --select stg_orders# Run only tests tagged as "critical"dbt test --select tag:critical
Custom Generic Tests
The four built-in tests cover the basics, but production data quality requires more. dbt lets you write custom generic tests — reusable test macros that you can apply to any model and column via YAML.
Writing a Custom Generic Test
Custom generic tests are Jinja macros that live in tests/generic/ (or macros/). They receive a model and a column_name and return a SQL query. If the query returns rows, the test fails.
Here's a test that asserts a numeric column is always positive:
-- tests/generic/test_is_positive.sql{% test is_positive(model, column_name) %}select {{ column_name }}from {{ model }}where {{ column_name }} < 0{% endtest %}
Use it in YAML:
models:- name: fct_orderscolumns:- name: order_totaltests:- is_positive
A More Complex Example: Percentage Range
A test that checks a column falls within a percentage range:
-- tests/generic/test_in_range.sql{% test in_range(model, column_name, min_value=0, max_value=100) %}select {{ column_name }}from {{ model }}where {{ column_name }} < {{ min_value }}or {{ column_name }} > {{ max_value }}{% endtest %}
- name: discount_percentagetests:- in_range:min_value: 0max_value: 100
Freshness Test
A custom test that checks if a timestamp column has recent data:
-- tests/generic/test_is_fresh.sql{% test is_fresh(model, column_name, max_hours=24) %}select 1where (select max({{ column_name }})from {{ model }}) < {{ dbt.dateadd('hour', -max_hours, dbt.current_timestamp()) }}{% endtest %}
models:- name: stg_orderscolumns:- name: ordered_attests:- is_fresh:max_hours: 12
Custom generic tests are the backbone of a mature testing strategy. Build a library of 5-10 tests that cover your domain's common failure modes and reuse them across your project.
Custom Singular Tests
Singular tests are one-off SQL queries that test a specific business rule. They live in tests/ as standalone .sql files. If the query returns any rows, the test fails.
Example: Revenue Sanity Check
-- tests/assert_daily_revenue_reasonable.sql-- Flag any day where revenue is more than 3x the trailing 7-day average.-- This catches double-loads, currency conversion errors, and other anomalies.with daily_revenue as (selectorder_date,sum(order_total) as revenuefrom {{ ref('fct_orders') }}where order_date >= current_date - interval '30 days'group by order_date),with_avg as (selectorder_date,revenue,avg(revenue) over (order by order_daterows between 7 preceding and 1 preceding) as trailing_7d_avgfrom daily_revenue)select *from with_avgwhere revenue > trailing_7d_avg * 3and trailing_7d_avg > 0and order_date = current_date - interval '1 day'
Example: Cross-Model Consistency
-- tests/assert_order_count_matches.sql-- Verify that the count of orders in the mart matches-- the count in the staging model (no records lost in transformation).with staging_count as (select count(*) as cnt from {{ ref('stg_orders') }}),mart_count as (select count(*) as cnt from {{ ref('fct_orders') }})select *from staging_countcross join mart_countwhere staging_count.cnt != mart_count.cnt
Singular tests are perfect for complex business logic that doesn't generalize into a reusable pattern. If you find yourself writing the same singular test for multiple models, refactor it into a custom generic test.
dbt-expectations: Great Expectations for dbt
The dbt-expectations package brings Great Expectations-style tests to dbt. It adds 50+ generic tests that cover statistical distributions, string patterns, aggregate comparisons, and more.
Installation
Add to your packages.yml:
packages:- package: calogica/dbt_expectationsversion: [">=0.10.0", "<0.11.0"]
Then run dbt deps.
Key Tests
Here are the most useful tests from the package:
Row count validation:
models:- name: fct_orderstests:- dbt_expectations.expect_table_row_count_to_be_between:min_value: 1000max_value: 10000000
Column value distributions:
columns:- name: order_totaltests:- dbt_expectations.expect_column_values_to_be_between:min_value: 0max_value: 100000mostly: 0.99 # Allow 1% of rows to be outside range
String pattern matching:
- name: emailtests:- dbt_expectations.expect_column_values_to_match_regex:regex: "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"
Aggregate comparisons:
tests:- dbt_expectations.expect_table_row_count_to_equal_other_table:compare_model: ref("stg_orders")
Column completeness:
- name: phone_numbertests:- dbt_expectations.expect_column_values_to_not_be_null:mostly: 0.95 # At least 95% of rows should have a phone number
The mostly parameter is incredibly useful. Real data is messy — you rarely want a test that fails on a single NULL or outlier. Setting mostly: 0.99 means the test passes if 99% of rows meet the condition, which is usually the right threshold for production data.
Elementary: Data Observability for dbt
Elementary takes data quality a step further. While dbt tests check rules you define, Elementary automatically monitors your data for anomalies — schema changes, volume shifts, freshness degradation, and statistical outliers — without you writing explicit tests for every possible failure mode.
What Elementary Does
Elementary adds three capabilities to your dbt project:
- Anomaly detection tests: Automatically detect unusual patterns in row counts, column statistics, and freshness based on historical baselines
- Schema change monitoring: Alert when columns are added, removed, or change type
- Data observability dashboard: A self-hosted UI that shows test results, model run history, and data quality trends over time
Installation
Add Elementary to your packages.yml:
packages:- package: elementary-data/elementaryversion: [">=0.16.0", "<0.17.0"]
Run dbt deps, then dbt run --select elementary to create Elementary's internal tables.
Also install the Elementary CLI for the dashboard and alerting:
pip install elementary-data
Anomaly Detection Tests
Elementary's anomaly detection tests compare current metrics against historical baselines. If today's value deviates significantly from the pattern of the last N days, the test fails.
Volume anomaly:
models:- name: fct_orderstests:- elementary.volume_anomalies:timestamp_column: ordered_attraining_period:period: 14count: 1detection_period:period: 1count: 1
This test builds a baseline from the last 14 days of row counts and flags if today's volume is anomalous. If your pipeline usually processes 50,000 orders per day and today you see 500, Elementary catches it — even though you never wrote a test that says "orders should be more than 10,000."
Column-level anomalies:
- name: order_totaltests:- elementary.column_anomalies:timestamp_column: ordered_atcolumn_anomalies:- zero_count- null_count- average
This monitors the null count, zero count, and average of order_total over time. If the average suddenly drops by 80% (maybe a currency conversion broke), Elementary flags it.
Freshness anomaly:
tests:- elementary.freshness_anomalies:timestamp_column: ordered_at
Monitors when the most recent record was loaded. If data usually arrives by 6 AM and it's 10 AM with no new records, the test fails.
Schema Change Monitoring
models:- name: stg_orderstests:- elementary.schema_changes
This test tracks the schema of your model over time. If a column is added, removed, or changes type between runs, the test fails. Essential for catching upstream schema drift — especially from third-party data sources that change without notice.
The Elementary Dashboard
After your dbt runs, generate the observability report:
edr report
This creates a self-hosted HTML dashboard showing:
- Test results across all models
- Historical trends for anomaly detection metrics
- Model run durations and statuses
- Schema change history
For alerting, Elementary integrates with Slack:
edr monitor --slack-webhook https://hooks.slack.com/services/...
Testing Strategy Per Medallion Layer
Not all tests belong everywhere. Here's how to distribute testing effort across your Bronze, Silver, and Gold layers:
Bronze (Staging): Freshness and Schema
Bronze models are the first line of defense. Test that data is arriving and that the schema hasn't changed.
| Test Type | What It Catches | Example |
|---|---|---|
| Source freshness | Data stopped arriving | freshness: warn_after: 12 hours |
| Schema changes | Upstream column added/removed | elementary.schema_changes |
| not_null on PKs | Source dropping primary keys | not_null on order_id |
| unique on PKs | Source sending duplicates | unique on order_id |
| Volume anomaly | Partial or double loads | elementary.volume_anomalies |
# models/staging/_stg__models.ymlmodels:- name: stg_orderstests:- elementary.schema_changes- elementary.volume_anomalies:timestamp_column: ordered_atcolumns:- name: order_idtests:- unique- not_null
Silver (Intermediate): Uniqueness and Data Quality
Silver models handle cleaning and deduplication. Test that the cleaning actually worked.
| Test Type | What It Catches | Example |
|---|---|---|
| unique after dedup | Deduplication logic broken | unique on business key |
| not_null after joins | Broken joins producing NULLs | not_null on enriched columns |
| referential integrity | Orphaned records after joins | relationships test |
| Value ranges | Invalid transformations | is_positive on amounts |
# models/intermediate/_int__models.ymlmodels:- name: int_orders__enrichedcolumns:- name: order_idtests:- unique- not_null- name: order_totaltests:- not_null- is_positive- name: customer_idtests:- relationships:to: ref('int_customers__deduplicated')field: customer_id
Gold (Marts): Business Logic and Aggregates
Gold models serve stakeholders. Test that business rules hold and aggregates make sense.
| Test Type | What It Catches | Example |
|---|---|---|
| accepted_values | Invalid business states | Status columns |
| Aggregate anomalies | Revenue spikes/drops | elementary.column_anomalies on sum |
| Row count comparison | Records lost in transformation | Singular test comparing counts |
| Business rule tests | Logic errors | Custom singular tests |
# models/marts/_marts__models.ymlmodels:- name: fct_orderstests:- elementary.volume_anomalies:timestamp_column: ordered_atcolumns:- name: order_statetests:- accepted_values:values: ['paid', 'refunded', 'cancelled', 'pending']- name: order_totaltests:- elementary.column_anomalies:timestamp_column: ordered_atcolumn_anomalies:- average- zero_count
CI/CD Integration: Testing on Pull Requests
Running tests in CI catches data quality regressions before they hit production. The most effective pattern: run dbt build (which includes tests) on every PR that modifies dbt models.
GitHub Actions Example
# .github/workflows/dbt-ci.ymlname: dbt CIon:pull_request:paths:- 'models/**'- 'tests/**'- 'macros/**'- 'dbt_project.yml'jobs:dbt-test:runs-on: ubuntu-lateststeps:- uses: actions/checkout@v4- name: Set up Pythonuses: actions/setup-python@v5with:python-version: '3.11'- name: Install dbtrun: pip install dbt-core dbt-bigquery- name: Install packagesrun: dbt deps- name: Run dbt buildrun: dbt build --target ci --profiles-dir ./ci_profilesenv:GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}GCP_PRIVATE_KEY: ${{ secrets.GCP_PRIVATE_KEY }}GCP_CLIENT_EMAIL: ${{ secrets.GCP_CLIENT_EMAIL }}
This runs all models and tests against a CI-specific schema (so you don't pollute production). If any test fails, the PR is blocked.
What to Test in CI vs. Production
CI (on every PR):
- Schema tests (unique, not_null, accepted_values, relationships)
- Custom generic tests
- Singular tests that don't depend on production data volumes
Production (on every scheduled run):
- All of the above, plus:
- Elementary anomaly detection tests (these need historical baselines from production data)
- Source freshness checks
- Volume and aggregate anomaly tests
Anomaly detection tests don't make sense in CI because CI runs against a small dataset without historical context. Run those only in production.
Wrapping Up
Data quality testing in dbt isn't optional — it's the difference between a pipeline your team trusts and one that generates fire drills. Start with the basics (unique, not_null on every primary key), add domain-specific tests as you discover failure modes, and graduate to Elementary when you want automated anomaly detection.
The testing pyramid for dbt looks like this:
- Base: Built-in tests on every model (unique, not_null, accepted_values)
- Middle: Custom generic tests for your domain's common patterns (freshness, ranges, percentages)
- Top: Elementary anomaly detection for catching the unknown unknowns
The cost of adding a test is minutes. The cost of a data quality incident reaching stakeholders is days. Invest in testing early and compound those returns over the lifetime of your pipeline.
ModelDock runs your dbt tests on every scheduled build. Connect your Git repo, set a schedule, and ModelDock handles the orchestration — models, tests, and artifacts all in one place. If a test fails, you know about it before your stakeholders do. Try it free.