dbt-core + Airflow: Setup Guide (And Why You Might Not Need It)
A step-by-step guide to orchestrating dbt-core with Apache Airflow — the Cosmos operator, BashOperator, Docker approach, credential handling, and a simpler alternative.
Apache Airflow is the most popular orchestrator for running dbt-core in production. It's battle-tested, widely adopted, and endlessly flexible.
It's also a lot of work to set up and maintain.
This guide walks through how to actually get dbt-core running on Airflow — from installation to DAG writing to credential management. We'll cover three approaches (Cosmos, BashOperator, Docker), explain the trade-offs, and then show you why you might not need any of it.
Prerequisites
Before you start, you'll need:
- Apache Airflow installed and running (2.6+ recommended)
- Python 3.9+ on your Airflow workers
- dbt-core installed in the Airflow environment (or in a Docker image)
- A dbt project in a Git repository
- Warehouse credentials (Snowflake, BigQuery, PostgreSQL, etc.)
If you don't have Airflow running yet, that's the first hurdle — and it's not a small one.
Installing Airflow
Airflow requires a metadata database (PostgreSQL or MySQL), a message broker (Redis or RabbitMQ for CeleryExecutor), a webserver, and a scheduler. The minimum viable setup:
# Create a virtual environmentpython -m venv airflow-venvsource airflow-venv/bin/activate# Install Airflow (version-pinned)pip install "apache-airflow==2.9.0" \--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.9.0/constraints-3.11.txt"# Initialize the databaseairflow db init# Create an admin userairflow users create \--username admin \--password admin \--firstname Admin \--lastname User \--role Admin \--email admin@example.com# Start the webserver and scheduler (separate terminals)airflow webserver --port 8080airflow scheduler
This gets you a local setup with SQLite and the SequentialExecutor — fine for development, but not production. For production you need PostgreSQL, a proper executor (Celery or Kubernetes), and likely Docker or Kubernetes for deployment.
Already feeling the complexity? That's the point. We'll come back to this.
Approach 1: Cosmos (Recommended)
Astronomer Cosmos is an open-source package that renders your dbt project as native Airflow tasks. Each dbt model becomes an Airflow task, giving you per-model visibility and retry capability.
Installation
pip install astronomer-cosmos[dbt-postgres] # swap adapter as needed
Basic DAG
# dags/dbt_cosmos.pyfrom cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfigfrom cosmos.profiles import PostgresUserPasswordProfileMappingfrom datetime import datetimeprofile_config = ProfileConfig(profile_name="my_project",target_name="prod",profile_mapping=PostgresUserPasswordProfileMapping(conn_id="my_warehouse", # Airflow Connection IDprofile_args={"schema": "analytics"},),)dbt_dag = DbtDag(project_config=ProjectConfig("/opt/dbt/my-project"),profile_config=profile_config,execution_config=ExecutionConfig(dbt_executable_path="/opt/dbt/venv/bin/dbt",),schedule_interval="0 6 * * *",start_date=datetime(2026, 1, 1),catchup=False,dag_id="dbt_production",)
What Cosmos Gives You
- Each dbt model appears as a separate Airflow task
- Model dependencies are automatically mapped to Airflow task dependencies
- Per-model retries and logging
- dbt test results visible in the Airflow UI
- Profile mapping from Airflow Connections (no
profiles.ymlneeded)
What Cosmos Doesn't Solve
- You still need to run and maintain Airflow
- dbt must be installed on the Airflow workers (or use Docker/venv execution)
- Version conflicts between Airflow and dbt dependencies can be painful
- Cosmos itself has version compatibility requirements
Approach 2: BashOperator
The simplest Airflow approach — just shell out to dbt.
# dags/dbt_bash.pyfrom airflow import DAGfrom airflow.operators.bash import BashOperatorfrom datetime import datetime, timedeltadefault_args = {'owner': 'data-team','retries': 2,'retry_delay': timedelta(minutes=5),'email_on_failure': True,'email': ['data-team@company.com'],}with DAG('dbt_production',default_args=default_args,schedule_interval='0 6 * * *',start_date=datetime(2026, 1, 1),catchup=False,) as dag:dbt_deps = BashOperator(task_id='dbt_deps',bash_command='cd /opt/dbt/my-project && dbt deps',)dbt_seed = BashOperator(task_id='dbt_seed',bash_command='cd /opt/dbt/my-project && dbt seed --target prod',)dbt_run = BashOperator(task_id='dbt_run',bash_command='cd /opt/dbt/my-project && dbt run --target prod',)dbt_test = BashOperator(task_id='dbt_test',bash_command='cd /opt/dbt/my-project && dbt test --target prod',)dbt_deps >> dbt_seed >> dbt_run >> dbt_test
Pros
- Easy to understand
- No extra dependencies beyond Airflow
- Full control over dbt commands
Cons
- No per-model visibility (the entire
dbt runis one task) - If
dbt runfails on model 47 of 50, you re-run all 50 - Credentials need to be handled via environment variables or
profiles.ymlon the worker - No automatic dependency mapping from your dbt DAG
Approach 3: DockerOperator
Run dbt inside a Docker container for complete environment isolation.
Dockerfile
FROM python:3.11-slimRUN pip install --no-cache-dir dbt-core dbt-postgresWORKDIR /dbtCOPY . /dbt/ENTRYPOINT ["dbt"]
DAG
# dags/dbt_docker.pyfrom airflow import DAGfrom airflow.providers.docker.operators.docker import DockerOperatorfrom datetime import datetime, timedeltadefault_args = {'owner': 'data-team','retries': 2,'retry_delay': timedelta(minutes=5),}with DAG('dbt_docker_production',default_args=default_args,schedule_interval='0 6 * * *',start_date=datetime(2026, 1, 1),catchup=False,) as dag:dbt_build = DockerOperator(task_id='dbt_build',image='my-dbt-project:latest',command='build --target prod',environment={'DBT_HOST': '{{ var.value.db_host }}','DBT_USER': '{{ var.value.db_user }}','DBT_PASSWORD': '{{ var.value.db_password }}','DBT_DBNAME': '{{ var.value.db_name }}',},docker_url='unix://var/run/docker.sock',network_mode='bridge',auto_remove=True,)
Pros
- Full environment isolation — no dependency conflicts
- Reproducible builds (pinned Docker image)
- Clean separation between Airflow and dbt
Cons
- Docker-in-Docker or socket mounting adds complexity
- Need to build and push Docker images (another CI pipeline)
- Credential injection through environment variables (visible in Docker inspect)
- Slower startup (pulling/starting containers)
- More moving parts to debug
Managing Credentials
This is where things get messy regardless of which approach you use.
Option A: Airflow Connections
Airflow has a built-in Connections system. You can store warehouse credentials in the Airflow metadata database and reference them by connection ID.
# In the Airflow UI or via CLI:airflow connections add 'my_warehouse' \--conn-type 'postgres' \--conn-host 'warehouse.example.com' \--conn-port 5432 \--conn-login 'dbt_user' \--conn-password 'secret' \--conn-schema 'analytics'
Cosmos can use these connections directly via profile mappings. The BashOperator approach requires extracting credentials from connections into environment variables.
Option B: Environment Variables
The dbt-native approach. Set environment variables and reference them in profiles.yml:
# profiles.ymlmy_project:target: prodoutputs:prod:type: postgreshost: "{{ env_var('DBT_HOST') }}"user: "{{ env_var('DBT_USER') }}"password: "{{ env_var('DBT_PASSWORD') }}"dbname: "{{ env_var('DBT_DBNAME') }}"schema: analyticsthreads: 4
Where those environment variables come from is up to you — Airflow Variables, a secrets manager, Kubernetes Secrets, or plain environment config.
Option C: External Secret Managers
For production security, integrate with a secrets manager:
- HashiCorp Vault: Airflow has a native Vault backend
- AWS Secrets Manager: Use the
apache-airflow-providers-amazonpackage - GCP Secret Manager: Use
apache-airflow-providers-google
# airflow.cfg[secrets]backend = airflow.providers.hashicorp.secrets.vault.VaultBackendbackend_kwargs = {"connections_path": "connections", "url": "http://vault:8200"}
This is the most secure approach, but it adds another service to manage.
Monitoring & Alerting
Airflow gives you several monitoring options:
Email Alerts
default_args = {'email_on_failure': True,'email_on_retry': True,'email': ['data-team@company.com'],}
Slack Notifications
from airflow.providers.slack.operators.slack_webhook import SlackWebhookOperatordef notify_failure(context):slack_alert = SlackWebhookOperator(task_id='slack_alert',slack_webhook_conn_id='slack_webhook',message=f"dbt run failed: {context['task_instance'].task_id}",)slack_alert.execute(context=context)default_args = {'on_failure_callback': notify_failure,}
SLA Monitoring
with DAG('dbt_production',sla_miss_callback=sla_alert,) as dag:dbt_build = BashOperator(task_id='dbt_build',bash_command='dbt build --target prod',sla=timedelta(hours=1), # Alert if run takes > 1 hour)
The Full Picture
Here's what a production-grade Airflow + dbt setup actually requires:
| Component | What You Need |
|---|---|
| Airflow infrastructure | PostgreSQL, Redis, webserver, scheduler, workers |
| Deployment | Docker/Kubernetes, CI/CD for DAG deployment |
| dbt runtime | dbt + adapter installed on workers or in Docker images |
| Credentials | Airflow Connections, Vault, or secret manager integration |
| Code sync | Git sync or CI/CD pipeline to update dbt project |
| Monitoring | Email/Slack alerts, log aggregation, SLA tracking |
| Maintenance | Airflow upgrades, dependency updates, security patches |
That's a lot of infrastructure for running dbt build on a schedule.
Do You Actually Need Airflow?
Airflow makes sense if:
- You already run Airflow for other data pipelines
- You need complex DAG dependencies beyond dbt (e.g., dbt runs after a Spark job that runs after an ingestion pipeline)
- You have a platform team that manages Airflow
- You need fine-grained per-model orchestration
Airflow is probably overkill if:
- dbt is your only (or primary) workload
- You don't have a team to maintain the infrastructure
- You just need a reliable cron schedule with monitoring
- You want to focus on writing models, not managing orchestration
The Easier Way
We built ModelDock because we were tired of maintaining Airflow just to run dbt.
ModelDock runs Airflow under the hood — so you get the same battle-tested orchestration — but you never have to touch it. No DAGs to write, no infrastructure to manage, no credentials sitting in config files.
Here's what the setup looks like:
- Connect your Git repository
- Pick your dbt adapter (PostgreSQL, BigQuery, Snowflake, Databricks, Fabric, etc.)
- Enter your warehouse credentials (encrypted with AES-256-GCM)
- Set a cron schedule
- Done
Your dbt project runs in an isolated container, on schedule, with full run logs and artifact storage. If it fails, you know about it.
All the reliability of Airflow. None of the operational overhead.
Free during open beta. No credit card required.