The Medallion Architecture Explained: Bronze, Silver, Gold with dbt
A practical guide to implementing the medallion architecture (bronze, silver, gold) with dbt — real model examples, naming conventions, and best practices.
If you've spent any time around modern data teams, you've heard the terms bronze, silver, and gold thrown around. Maybe in a Databricks talk. Maybe in a system design doc. Maybe from a colleague who insists every warehouse needs exactly three layers.
The medallion architecture is one of those ideas that sounds like marketing but is actually useful. It gives your data pipeline a clear structure — raw data comes in at the bottom, clean data comes out at the top, and every step in between has a well-defined purpose.
The problem is that most guides on medallion architecture are written through a Databricks lens, tied to Delta Lake and Spark-specific tooling. But the concept works everywhere. If you're using dbt with Snowflake, BigQuery, PostgreSQL, Redshift, or anything else, you can implement this pattern today with nothing more than a good folder structure and some naming conventions.
This guide walks through the medallion architecture layer by layer, with real dbt model examples using an e-commerce dataset. No hand-waving — actual SQL you can adapt to your own project.
What Is the Medallion Architecture?
The medallion architecture is a data design pattern that organizes your pipeline into three layers:
- Bronze — raw data, loaded as-is from source systems
- Silver — cleaned, deduplicated, and properly typed data
- Gold — aggregated, business-ready tables for reporting and analytics
The term was popularized by Databricks as part of their lakehouse architecture, where it describes how data flows through a Delta Lake. But the pattern itself isn't new and isn't exclusive to any platform. Data teams have been building staging-intermediate-mart pipelines for years. The medallion architecture just gives that pattern a memorable name and a clear set of rules.
The core principle is simple: data quality and business value increase as you move from bronze to silver to gold. Each layer has a specific responsibility, and models at one layer should only reference models from the same layer or the one below it. Bronze models read from sources. Silver models read from bronze. Gold models read from silver (and occasionally from other gold models).
This layered approach makes your pipeline easier to debug, easier to test, and easier to explain to stakeholders who want to know where a number came from.
The Three Layers, With Real dbt Examples
Let's use a realistic scenario. You're building an analytics pipeline for an e-commerce company. Your source systems include a PostgreSQL application database with tables for orders, customers, and products, plus a Stripe webhook table for payment events.
Bronze Layer: Raw Data (Staging)
The bronze layer is your landing zone. Data arrives here in its original form — same column names, same data types, same messiness. The goal is to create a thin, reliable interface between your source systems and the rest of your pipeline.
In dbt, this maps to models/staging/. Bronze models use the source() function to reference raw tables, and they do only the bare minimum: renaming columns for consistency, casting timestamps, and maybe filtering out obviously invalid records like test rows.
First, define your sources in a YAML file:
# models/staging/ecommerce/_ecommerce__sources.ymlversion: 2sources:- name: ecommercedatabase: rawschema: publicfreshness:warn_after: { count: 12, period: hour }error_after: { count: 24, period: hour }loaded_at_field: _loaded_attables:- name: orders- name: customers- name: products- name: payment_eventsdescription: "Stripe webhook events for payments"
Then a staging model for orders:
-- models/staging/ecommerce/stg_ecommerce__orders.sqlwith source as (select * from {{ source('ecommerce', 'orders') }}),renamed as (selectid as order_id,user_id as customer_id,status as order_status,total_amount_cents,currency,shipping_address_id,cast(created_at as timestamp) as ordered_at,cast(updated_at as timestamp) as updated_atfrom source)select * from renamed
And one for customers:
-- models/staging/ecommerce/stg_ecommerce__customers.sqlwith source as (select * from {{ source('ecommerce', 'customers') }}),renamed as (selectid as customer_id,lower(trim(email)) as email,first_name,last_name,country_code,cast(created_at as timestamp) as created_atfrom source)select * from renamed
Notice what's happening here: no joins, no aggregations, no business logic. Just renaming columns to match your project's conventions, normalizing emails, and casting types. If someone asks "what does the raw data look like?", the bronze layer is the answer.
Materialization: Views work well for staging models. They add no storage cost and always reflect the latest source data. In your dbt_project.yml:
models:your_project:staging:+materialized: view
Silver Layer: Cleaned Data (Intermediate)
The silver layer is where the real work begins. Models here clean up data quality issues, deduplicate records, apply business rules, and join related entities together. This is the backbone of your pipeline — not yet shaped for any specific report, but trustworthy enough that downstream consumers can rely on it.
In dbt, this maps to models/intermediate/. Silver models reference bronze (staging) models and never touch raw sources directly.
Here's an intermediate model that builds a clean order history with payment status:
-- models/intermediate/int_orders__enriched.sqlwith orders as (select * from {{ ref('stg_ecommerce__orders') }}),payments as (select * from {{ ref('stg_ecommerce__payment_events') }}),latest_payment as (selectorder_id,payment_status,payment_method,row_number() over (partition by order_idorder by event_timestamp desc) as row_numfrom payments),joined as (selectorders.order_id,orders.customer_id,orders.order_status,orders.total_amount_cents / 100.0 as order_total,orders.currency,latest_payment.payment_status,latest_payment.payment_method,orders.ordered_at,orders.updated_atfrom ordersleft join latest_paymenton orders.order_id = latest_payment.order_idand latest_payment.row_num = 1)select * from joined
And an intermediate customer model that handles deduplication:
-- models/intermediate/int_customers__deduplicated.sqlwith customers as (select * from {{ ref('stg_ecommerce__customers') }}),deduplicated as (selectcustomer_id,email,first_name,last_name,country_code,created_at,row_number() over (partition by emailorder by created_at asc) as row_numfrom customers),final as (selectcustomer_id,email,first_name,last_name,country_code,created_atfrom deduplicatedwhere row_num = 1)select * from final
This is where you handle the messy reality of production data. Duplicate customer accounts with the same email? Resolved here. Payment events that need to be collapsed to the latest status? Done here. Currency conversion from cents to dollars? Here.
The silver layer is also the natural place for data that multiple gold models will need. If three different dashboards need enriched order data, you build it once in silver and reference it from gold — not three times in three separate mart models.
Materialization: Intermediate models benefit from being materialized as tables or incremental models, especially if they involve expensive joins or window functions. For large datasets, incremental is the right call:
models:your_project:intermediate:+materialized: table
Gold Layer: Business-Ready Data (Marts)
The gold layer is what your stakeholders actually see. These are the tables that power dashboards, feed into reverse ETL tools, and get queried by analysts writing ad-hoc reports. They're shaped for specific business questions and optimized for consumption.
In dbt, this maps to models/marts/. Gold models use the fct_ (fact) and dim_ (dimension) prefixes to signal what kind of table they are.
A fact table for order metrics:
-- models/marts/fct_orders.sqlwith orders as (select * from {{ ref('int_orders__enriched') }}),final as (selectorder_id,customer_id,order_status,payment_status,payment_method,order_total,currency,ordered_at,date_trunc('month', ordered_at) as order_month,casewhen payment_status = 'succeeded' then 'paid'when payment_status = 'refunded' then 'refunded'when order_status = 'cancelled' then 'cancelled'else 'pending'end as order_statefrom orders)select * from final
A dimension table for customers:
-- models/marts/dim_customers.sqlwith customers as (select * from {{ ref('int_customers__deduplicated') }}),orders as (select * from {{ ref('int_orders__enriched') }}),customer_orders as (selectcustomer_id,count(*) as total_orders,sum(order_total) as lifetime_value,min(ordered_at) as first_order_at,max(ordered_at) as most_recent_order_atfrom ordersgroup by customer_id),final as (selectcustomers.customer_id,customers.email,customers.first_name,customers.last_name,customers.country_code,customers.created_at as customer_since,coalesce(customer_orders.total_orders, 0) as total_orders,coalesce(customer_orders.lifetime_value, 0) as lifetime_value,customer_orders.first_order_at,customer_orders.most_recent_order_atfrom customersleft join customer_orderson customers.customer_id = customer_orders.customer_id)select * from final
Materialization: Gold models should almost always be tables. They're queried frequently and need to be fast. If your fact tables are large and append-heavy, incremental materialization is the way to go:
models:your_project:marts:+materialized: table
dbt Project Structure for Medallion Architecture
Here's the full folder layout that maps the medallion layers to dbt conventions:
models/staging/ecommerce/_ecommerce__sources.yml_ecommerce__models.ymlstg_ecommerce__orders.sqlstg_ecommerce__customers.sqlstg_ecommerce__products.sqlstg_ecommerce__payment_events.sqlstripe/_stripe__sources.ymlstg_stripe__charges.sqlintermediate/_int__models.ymlint_orders__enriched.sqlint_customers__deduplicated.sqlint_products__categorized.sqlmarts/_marts__models.ymlfct_orders.sqlfct_daily_revenue.sqldim_customers.sqldim_products.sql
A few things to notice:
- Staging models are grouped by source system, not by business domain. This keeps it clear which raw tables each model depends on.
- Intermediate and mart models are grouped by business domain (or kept flat if your project is small enough). As your project grows, you might add subfolders like
marts/finance/andmarts/marketing/. - YAML schema files live next to the models they describe, prefixed with an underscore so they sort to the top of the directory.
Naming Conventions
Consistent naming is what makes a large dbt project navigable. Here are the prefixes that most dbt projects follow, and they map cleanly to medallion layers:
| Layer | Prefix | Example | Purpose |
|---|---|---|---|
| Bronze | stg_ | stg_ecommerce__orders | 1:1 with source tables |
| Silver | int_ | int_orders__enriched | Cleaned, joined, intermediate |
| Gold | fct_ | fct_orders | Fact tables (events, transactions) |
| Gold | dim_ | dim_customers | Dimension tables (entities, attributes) |
The double underscore (__) separates the source or entity name from the descriptor. stg_ecommerce__orders tells you it's a staging model from the ecommerce source for the orders table. int_orders__enriched tells you it's an intermediate model that enriches order data.
This isn't just cosmetics. When you have 200 models, being able to glance at a name and immediately know which layer it belongs to saves real time.
How Layers Map to dbt Materializations
Choosing the right materialization for each layer has a meaningful impact on cost and performance:
Bronze (staging) -- Views. Staging models are simple renames and casts. Materializing them as views means they take up no warehouse storage and always reflect the freshest source data. The compute cost is negligible since the transformations are trivial.
Silver (intermediate) -- Tables or incremental. Intermediate models often involve joins, window functions, and deduplication logic. Materializing them as tables avoids recomputing expensive transformations every time a downstream model runs. For very large tables (hundreds of millions of rows), switch to incremental to avoid full rebuilds on every run.
Gold (marts) -- Tables. Mart models are what analysts and BI tools query directly. They need to be fast. Tables are the right default. For append-heavy fact tables, incremental models keep run times manageable as data grows.
Here's the full configuration in dbt_project.yml:
models:your_project:staging:+materialized: viewintermediate:+materialized: tablemarts:+materialized: table
Testing Strategy Per Layer
Each layer has different testing priorities. Spreading your tests across layers keeps the overall test suite fast while still catching the issues that matter.
Bronze: Source Freshness
The most important thing to test at the bronze layer is whether data is actually arriving. Use dbt's source freshness checks:
dbt source freshness
This checks the freshness configuration in your source YAML (which we defined earlier). If orders haven't been loaded in 24 hours, something is wrong upstream and you want to know about it before your pipeline runs.
You should also add basic schema tests on staging models:
# models/staging/ecommerce/_ecommerce__models.ymlversion: 2models:- name: stg_ecommerce__orderscolumns:- name: order_idtests:- unique- not_null- name: customer_idtests:- not_null
Silver: Data Quality
The silver layer is where you enforce data quality. After deduplication and cleaning, your tests should verify the results:
# models/intermediate/_int__models.ymlversion: 2models:- name: int_customers__deduplicatedcolumns:- name: emailtests:- unique- not_null- name: customer_idtests:- unique- not_null- name: int_orders__enrichedcolumns:- name: order_idtests:- unique- not_null- name: order_totaltests:- not_null- dbt_utils.accepted_range:min_value: 0inclusive: true
Gold: Business Logic
Gold layer tests validate business rules. These are the tests that catch logic errors that would show up in dashboards:
# models/marts/_marts__models.ymlversion: 2models:- name: fct_orderscolumns:- name: order_statetests:- accepted_values:values: ['paid', 'refunded', 'cancelled', 'pending']- name: dim_customerscolumns:- name: total_orderstests:- dbt_utils.accepted_range:min_value: 0inclusive: true- name: lifetime_valuetests:- dbt_utils.accepted_range:min_value: 0inclusive: true
You can also add custom singular tests for more nuanced business logic -- for example, verifying that daily revenue in fct_daily_revenue doesn't deviate more than 50% from the trailing 7-day average, which might indicate a data loading issue.
When NOT to Use Medallion Architecture
The medallion architecture is not always the right answer. Here's when you should skip it:
Your project has fewer than 15-20 models. If your entire pipeline is five staging models and three marts, adding an intermediate layer creates overhead with little benefit. Go straight from staging to marts. You can always add the silver layer later when complexity demands it.
Your data sources are already clean. If you're pulling from a well-maintained SaaS API that delivers consistent, typed, deduplicated data, the silver layer might be doing nothing useful. Not every source needs three layers of processing.
You're building a one-off analysis. If you're answering a specific business question and the pipeline won't be maintained long-term, skip the ceremony. A single model that reads from sources and produces a result is fine.
Your team is small and moves fast. Architecture patterns have a coordination cost. If it's just you and one other analyst, the overhead of maintaining strict layer boundaries might slow you down more than it helps. Use the conventions that feel natural and formalize them when the team grows.
The medallion pattern shines when you have multiple source systems, a growing model count, and several people working in the same dbt project. It gives everyone a shared mental model for where things live and how data flows. But like any architecture decision, it should be driven by actual need rather than theoretical purity.
Wrapping Up
The medallion architecture is a straightforward way to organize your dbt project as it scales. Bronze gives you a clean interface to your source systems. Silver handles the messy transformation work. Gold delivers business-ready tables that analysts and dashboards can trust.
The beauty of implementing this in dbt is that you don't need any special tooling. No Delta Lake, no Spark, no proprietary platform features. Just folders, naming conventions, and the ref function doing what it does best -- building a clean dependency graph from raw data to business insight.
Start with staging and marts. Add intermediate when you find yourself duplicating transformation logic across multiple mart models. Let the architecture grow with your project rather than trying to build the perfect structure on day one.
ModelDock handles dbt scheduling so you can focus on your data architecture. Connect your Git repo, configure your warehouse credentials, and let ModelDock run your dbt builds on Airflow -- no infrastructure to manage. Try it free.