Back to Blog
dbt-coreproductiontutorialarchitecture

dbt-core + Airflow: Setup Guide (And Why You Might Not Need It)

A step-by-step guide to orchestrating dbt-core with Apache Airflow — the Cosmos operator, BashOperator, Docker approach, credential handling, and a simpler alternative.

ModelDock TeamFebruary 16, 20268 min read

Apache Airflow is the most popular orchestrator for running dbt-core in production. It's battle-tested, widely adopted, and endlessly flexible.

It's also a lot of work to set up and maintain.

This guide walks through how to actually get dbt-core running on Airflow — from installation to DAG writing to credential management. We'll cover three approaches (Cosmos, BashOperator, Docker), explain the trade-offs, and then show you why you might not need any of it.

Prerequisites

Before you start, you'll need:

  • Apache Airflow installed and running (2.6+ recommended)
  • Python 3.9+ on your Airflow workers
  • dbt-core installed in the Airflow environment (or in a Docker image)
  • A dbt project in a Git repository
  • Warehouse credentials (Snowflake, BigQuery, PostgreSQL, etc.)

If you don't have Airflow running yet, that's the first hurdle — and it's not a small one.

Installing Airflow

Airflow requires a metadata database (PostgreSQL or MySQL), a message broker (Redis or RabbitMQ for CeleryExecutor), a webserver, and a scheduler. The minimum viable setup:

# Create a virtual environment
python -m venv airflow-venv
source airflow-venv/bin/activate
# Install Airflow (version-pinned)
pip install "apache-airflow==2.9.0" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.9.0/constraints-3.11.txt"
# Initialize the database
airflow db init
# Create an admin user
airflow users create \
--username admin \
--password admin \
--firstname Admin \
--lastname User \
--role Admin \
--email admin@example.com
# Start the webserver and scheduler (separate terminals)
airflow webserver --port 8080
airflow scheduler

This gets you a local setup with SQLite and the SequentialExecutor — fine for development, but not production. For production you need PostgreSQL, a proper executor (Celery or Kubernetes), and likely Docker or Kubernetes for deployment.

Already feeling the complexity? That's the point. We'll come back to this.

Approach 1: Cosmos (Recommended)

Astronomer Cosmos is an open-source package that renders your dbt project as native Airflow tasks. Each dbt model becomes an Airflow task, giving you per-model visibility and retry capability.

Installation

pip install astronomer-cosmos[dbt-postgres] # swap adapter as needed

Basic DAG

# dags/dbt_cosmos.py
from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping
from datetime import datetime
profile_config = ProfileConfig(
profile_name="my_project",
target_name="prod",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="my_warehouse", # Airflow Connection ID
profile_args={"schema": "analytics"},
),
)
dbt_dag = DbtDag(
project_config=ProjectConfig("/opt/dbt/my-project"),
profile_config=profile_config,
execution_config=ExecutionConfig(
dbt_executable_path="/opt/dbt/venv/bin/dbt",
),
schedule_interval="0 6 * * *",
start_date=datetime(2026, 1, 1),
catchup=False,
dag_id="dbt_production",
)

What Cosmos Gives You

  • Each dbt model appears as a separate Airflow task
  • Model dependencies are automatically mapped to Airflow task dependencies
  • Per-model retries and logging
  • dbt test results visible in the Airflow UI
  • Profile mapping from Airflow Connections (no profiles.yml needed)

What Cosmos Doesn't Solve

  • You still need to run and maintain Airflow
  • dbt must be installed on the Airflow workers (or use Docker/venv execution)
  • Version conflicts between Airflow and dbt dependencies can be painful
  • Cosmos itself has version compatibility requirements

Approach 2: BashOperator

The simplest Airflow approach — just shell out to dbt.

# dags/dbt_bash.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'data-team',
'retries': 2,
'retry_delay': timedelta(minutes=5),
'email_on_failure': True,
'email': ['data-team@company.com'],
}
with DAG(
'dbt_production',
default_args=default_args,
schedule_interval='0 6 * * *',
start_date=datetime(2026, 1, 1),
catchup=False,
) as dag:
dbt_deps = BashOperator(
task_id='dbt_deps',
bash_command='cd /opt/dbt/my-project && dbt deps',
)
dbt_seed = BashOperator(
task_id='dbt_seed',
bash_command='cd /opt/dbt/my-project && dbt seed --target prod',
)
dbt_run = BashOperator(
task_id='dbt_run',
bash_command='cd /opt/dbt/my-project && dbt run --target prod',
)
dbt_test = BashOperator(
task_id='dbt_test',
bash_command='cd /opt/dbt/my-project && dbt test --target prod',
)
dbt_deps >> dbt_seed >> dbt_run >> dbt_test

Pros

  • Easy to understand
  • No extra dependencies beyond Airflow
  • Full control over dbt commands

Cons

  • No per-model visibility (the entire dbt run is one task)
  • If dbt run fails on model 47 of 50, you re-run all 50
  • Credentials need to be handled via environment variables or profiles.yml on the worker
  • No automatic dependency mapping from your dbt DAG

Approach 3: DockerOperator

Run dbt inside a Docker container for complete environment isolation.

Dockerfile

FROM python:3.11-slim
RUN pip install --no-cache-dir dbt-core dbt-postgres
WORKDIR /dbt
COPY . /dbt/
ENTRYPOINT ["dbt"]

DAG

# dags/dbt_docker.py
from airflow import DAG
from airflow.providers.docker.operators.docker import DockerOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'data-team',
'retries': 2,
'retry_delay': timedelta(minutes=5),
}
with DAG(
'dbt_docker_production',
default_args=default_args,
schedule_interval='0 6 * * *',
start_date=datetime(2026, 1, 1),
catchup=False,
) as dag:
dbt_build = DockerOperator(
task_id='dbt_build',
image='my-dbt-project:latest',
command='build --target prod',
environment={
'DBT_HOST': '{{ var.value.db_host }}',
'DBT_USER': '{{ var.value.db_user }}',
'DBT_PASSWORD': '{{ var.value.db_password }}',
'DBT_DBNAME': '{{ var.value.db_name }}',
},
docker_url='unix://var/run/docker.sock',
network_mode='bridge',
auto_remove=True,
)

Pros

  • Full environment isolation — no dependency conflicts
  • Reproducible builds (pinned Docker image)
  • Clean separation between Airflow and dbt

Cons

  • Docker-in-Docker or socket mounting adds complexity
  • Need to build and push Docker images (another CI pipeline)
  • Credential injection through environment variables (visible in Docker inspect)
  • Slower startup (pulling/starting containers)
  • More moving parts to debug

Managing Credentials

This is where things get messy regardless of which approach you use.

Option A: Airflow Connections

Airflow has a built-in Connections system. You can store warehouse credentials in the Airflow metadata database and reference them by connection ID.

# In the Airflow UI or via CLI:
airflow connections add 'my_warehouse' \
--conn-type 'postgres' \
--conn-host 'warehouse.example.com' \
--conn-port 5432 \
--conn-login 'dbt_user' \
--conn-password 'secret' \
--conn-schema 'analytics'

Cosmos can use these connections directly via profile mappings. The BashOperator approach requires extracting credentials from connections into environment variables.

Option B: Environment Variables

The dbt-native approach. Set environment variables and reference them in profiles.yml:

# profiles.yml
my_project:
target: prod
outputs:
prod:
type: postgres
host: "{{ env_var('DBT_HOST') }}"
user: "{{ env_var('DBT_USER') }}"
password: "{{ env_var('DBT_PASSWORD') }}"
dbname: "{{ env_var('DBT_DBNAME') }}"
schema: analytics
threads: 4

Where those environment variables come from is up to you — Airflow Variables, a secrets manager, Kubernetes Secrets, or plain environment config.

Option C: External Secret Managers

For production security, integrate with a secrets manager:

  • HashiCorp Vault: Airflow has a native Vault backend
  • AWS Secrets Manager: Use the apache-airflow-providers-amazon package
  • GCP Secret Manager: Use apache-airflow-providers-google
# airflow.cfg
[secrets]
backend = airflow.providers.hashicorp.secrets.vault.VaultBackend
backend_kwargs = {"connections_path": "connections", "url": "http://vault:8200"}

This is the most secure approach, but it adds another service to manage.

Monitoring & Alerting

Airflow gives you several monitoring options:

Email Alerts

default_args = {
'email_on_failure': True,
'email_on_retry': True,
'email': ['data-team@company.com'],
}

Slack Notifications

from airflow.providers.slack.operators.slack_webhook import SlackWebhookOperator
def notify_failure(context):
slack_alert = SlackWebhookOperator(
task_id='slack_alert',
slack_webhook_conn_id='slack_webhook',
message=f"dbt run failed: {context['task_instance'].task_id}",
)
slack_alert.execute(context=context)
default_args = {
'on_failure_callback': notify_failure,
}

SLA Monitoring

with DAG(
'dbt_production',
sla_miss_callback=sla_alert,
) as dag:
dbt_build = BashOperator(
task_id='dbt_build',
bash_command='dbt build --target prod',
sla=timedelta(hours=1), # Alert if run takes > 1 hour
)

The Full Picture

Here's what a production-grade Airflow + dbt setup actually requires:

ComponentWhat You Need
Airflow infrastructurePostgreSQL, Redis, webserver, scheduler, workers
DeploymentDocker/Kubernetes, CI/CD for DAG deployment
dbt runtimedbt + adapter installed on workers or in Docker images
CredentialsAirflow Connections, Vault, or secret manager integration
Code syncGit sync or CI/CD pipeline to update dbt project
MonitoringEmail/Slack alerts, log aggregation, SLA tracking
MaintenanceAirflow upgrades, dependency updates, security patches

That's a lot of infrastructure for running dbt build on a schedule.

Do You Actually Need Airflow?

Airflow makes sense if:

  • You already run Airflow for other data pipelines
  • You need complex DAG dependencies beyond dbt (e.g., dbt runs after a Spark job that runs after an ingestion pipeline)
  • You have a platform team that manages Airflow
  • You need fine-grained per-model orchestration

Airflow is probably overkill if:

  • dbt is your only (or primary) workload
  • You don't have a team to maintain the infrastructure
  • You just need a reliable cron schedule with monitoring
  • You want to focus on writing models, not managing orchestration

The Easier Way

We built ModelDock because we were tired of maintaining Airflow just to run dbt.

ModelDock runs Airflow under the hood — so you get the same battle-tested orchestration — but you never have to touch it. No DAGs to write, no infrastructure to manage, no credentials sitting in config files.

Here's what the setup looks like:

  1. Connect your Git repository
  2. Pick your dbt adapter (PostgreSQL, BigQuery, Snowflake, Databricks, Fabric, etc.)
  3. Enter your warehouse credentials (encrypted with AES-256-GCM)
  4. Set a cron schedule
  5. Done

Your dbt project runs in an isolated container, on schedule, with full run logs and artifact storage. If it fails, you know about it.

All the reliability of Airflow. None of the operational overhead.

Free during open beta. No credit card required.

Ready to run dbt-core in production?

ModelDock handles scheduling, infrastructure, and credential management so you don't have to.

Start For Free