dbt-coreproductiontutorialarchitecture

dbt-core + Airflow: Setup Guide (And Why You Might Not Need It)

A step-by-step guide to orchestrating dbt-core with Apache Airflow — the Cosmos operator, BashOperator, Docker approach, credential handling, and a simpler alternative.

ModelDock TeamFebruary 16, 20268 min read

Apache Airflow is the most popular orchestrator for running dbt-core in production. It's battle-tested, widely adopted, and endlessly flexible.

It's also a lot of work to set up and maintain.

This guide walks through how to actually get dbt-core running on Airflow — from installation to DAG writing to credential management. We'll cover three approaches (Cosmos, BashOperator, Docker), explain the trade-offs, and then show you why you might not need any of it.

Prerequisites

Before you start, you'll need:

Apache Airflow installed and running (2.6+ recommended)
Python 3.9+ on your Airflow workers
dbt-core installed in the Airflow environment (or in a Docker image)
A dbt project in a Git repository
Warehouse credentials (Snowflake, BigQuery, PostgreSQL, etc.)

If you don't have Airflow running yet, that's the first hurdle — and it's not a small one.

Installing Airflow

Airflow requires a metadata database (PostgreSQL or MySQL), a message broker (Redis or RabbitMQ for CeleryExecutor), a webserver, and a scheduler. The minimum viable setup:

# Create a virtual environment
python -m venv airflow-venv
source airflow-venv/bin/activate

# Install Airflow (version-pinned)
pip install "apache-airflow==2.9.0" \
  --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.9.0/constraints-3.11.txt"

# Initialize the database
airflow db init

# Create an admin user
airflow users create \
  --username admin \
  --password admin \
  --firstname Admin \
  --lastname User \
  --role Admin \
  --email admin@example.com

# Start the webserver and scheduler (separate terminals)
airflow webserver --port 8080
airflow scheduler

This gets you a local setup with SQLite and the SequentialExecutor — fine for development, but not production. For production you need PostgreSQL, a proper executor (Celery or Kubernetes), and likely Docker or Kubernetes for deployment.

Already feeling the complexity? That's the point. We'll come back to this.

Approach 1: Cosmos (Recommended)

Astronomer Cosmos is an open-source package that renders your dbt project as native Airflow tasks. Each dbt model becomes an Airflow task, giving you per-model visibility and retry capability.

Installation

pip install astronomer-cosmos[dbt-postgres]  # swap adapter as needed

Basic DAG

# dags/dbt_cosmos.py
from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping
from datetime import datetime

profile_config = ProfileConfig(
    profile_name="my_project",
    target_name="prod",
    profile_mapping=PostgresUserPasswordProfileMapping(
        conn_id="my_warehouse",  # Airflow Connection ID
        profile_args={"schema": "analytics"},
    ),
)

dbt_dag = DbtDag(
    project_config=ProjectConfig("/opt/dbt/my-project"),
    profile_config=profile_config,
    execution_config=ExecutionConfig(
        dbt_executable_path="/opt/dbt/venv/bin/dbt",
    ),
    schedule_interval="0 6 * * *",
    start_date=datetime(2026, 1, 1),
    catchup=False,
    dag_id="dbt_production",
)

What Cosmos Gives You

Each dbt model appears as a separate Airflow task
Model dependencies are automatically mapped to Airflow task dependencies
Per-model retries and logging
dbt test results visible in the Airflow UI
Profile mapping from Airflow Connections (no profiles.yml needed)

What Cosmos Doesn't Solve

You still need to run and maintain Airflow
dbt must be installed on the Airflow workers (or use Docker/venv execution)
Version conflicts between Airflow and dbt dependencies can be painful
Cosmos itself has version compatibility requirements

Approach 2: BashOperator

The simplest Airflow approach — just shell out to dbt.

# dags/dbt_bash.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'data-team',
    'retries': 2,
    'retry_delay': timedelta(minutes=5),
    'email_on_failure': True,
    'email': ['data-team@company.com'],
}

with DAG(
    'dbt_production',
    default_args=default_args,
    schedule_interval='0 6 * * *',
    start_date=datetime(2026, 1, 1),
    catchup=False,
) as dag:

    dbt_deps = BashOperator(
        task_id='dbt_deps',
        bash_command='cd /opt/dbt/my-project && dbt deps',
    )

    dbt_seed = BashOperator(
        task_id='dbt_seed',
        bash_command='cd /opt/dbt/my-project && dbt seed --target prod',
    )

    dbt_run = BashOperator(
        task_id='dbt_run',
        bash_command='cd /opt/dbt/my-project && dbt run --target prod',
    )

    dbt_test = BashOperator(
        task_id='dbt_test',
        bash_command='cd /opt/dbt/my-project && dbt test --target prod',
    )

    dbt_deps >> dbt_seed >> dbt_run >> dbt_test

Pros

Easy to understand
No extra dependencies beyond Airflow
Full control over dbt commands

Cons

No per-model visibility (the entire dbt run is one task)
If dbt run fails on model 47 of 50, you re-run all 50
Credentials need to be handled via environment variables or profiles.yml on the worker
No automatic dependency mapping from your dbt DAG

Approach 3: DockerOperator

Run dbt inside a Docker container for complete environment isolation.

Dockerfile

FROM python:3.11-slim

RUN pip install --no-cache-dir dbt-core dbt-postgres

WORKDIR /dbt
COPY . /dbt/

ENTRYPOINT ["dbt"]

DAG

# dags/dbt_docker.py
from airflow import DAG
from airflow.providers.docker.operators.docker import DockerOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'data-team',
    'retries': 2,
    'retry_delay': timedelta(minutes=5),
}

with DAG(
    'dbt_docker_production',
    default_args=default_args,
    schedule_interval='0 6 * * *',
    start_date=datetime(2026, 1, 1),
    catchup=False,
) as dag:

    dbt_build = DockerOperator(
        task_id='dbt_build',
        image='my-dbt-project:latest',
        command='build --target prod',
        environment={
            'DBT_HOST': '{{ var.value.db_host }}',
            'DBT_USER': '{{ var.value.db_user }}',
            'DBT_PASSWORD': '{{ var.value.db_password }}',
            'DBT_DBNAME': '{{ var.value.db_name }}',
        },
        docker_url='unix://var/run/docker.sock',
        network_mode='bridge',
        auto_remove=True,
    )

Pros

Full environment isolation — no dependency conflicts
Reproducible builds (pinned Docker image)
Clean separation between Airflow and dbt

Cons

Docker-in-Docker or socket mounting adds complexity
Need to build and push Docker images (another CI pipeline)
Credential injection through environment variables (visible in Docker inspect)
Slower startup (pulling/starting containers)
More moving parts to debug

Managing Credentials

This is where things get messy regardless of which approach you use.

Option A: Airflow Connections

Airflow has a built-in Connections system. You can store warehouse credentials in the Airflow metadata database and reference them by connection ID.

# In the Airflow UI or via CLI:
airflow connections add 'my_warehouse' \
    --conn-type 'postgres' \
    --conn-host 'warehouse.example.com' \
    --conn-port 5432 \
    --conn-login 'dbt_user' \
    --conn-password 'secret' \
    --conn-schema 'analytics'

Cosmos can use these connections directly via profile mappings. The BashOperator approach requires extracting credentials from connections into environment variables.

Option B: Environment Variables

The dbt-native approach. Set environment variables and reference them in profiles.yml:

# profiles.yml
my_project:
  target: prod
  outputs:
    prod:
      type: postgres
      host: "{{ env_var('DBT_HOST') }}"
      user: "{{ env_var('DBT_USER') }}"
      password: "{{ env_var('DBT_PASSWORD') }}"
      dbname: "{{ env_var('DBT_DBNAME') }}"
      schema: analytics
      threads: 4

Where those environment variables come from is up to you — Airflow Variables, a secrets manager, Kubernetes Secrets, or plain environment config.

Option C: External Secret Managers

For production security, integrate with a secrets manager:

HashiCorp Vault: Airflow has a native Vault backend
AWS Secrets Manager: Use the apache-airflow-providers-amazon package
GCP Secret Manager: Use apache-airflow-providers-google

# airflow.cfg
[secrets]
backend = airflow.providers.hashicorp.secrets.vault.VaultBackend
backend_kwargs = {"connections_path": "connections", "url": "http://vault:8200"}

This is the most secure approach, but it adds another service to manage.

Monitoring & Alerting

Airflow gives you several monitoring options:

Email Alerts

default_args = {
    'email_on_failure': True,
    'email_on_retry': True,
    'email': ['data-team@company.com'],
}

Slack Notifications

from airflow.providers.slack.operators.slack_webhook import SlackWebhookOperator

def notify_failure(context):
    slack_alert = SlackWebhookOperator(
        task_id='slack_alert',
        slack_webhook_conn_id='slack_webhook',
        message=f"dbt run failed: {context['task_instance'].task_id}",
    )
    slack_alert.execute(context=context)

default_args = {
    'on_failure_callback': notify_failure,
}

SLA Monitoring

with DAG(
    'dbt_production',
    sla_miss_callback=sla_alert,
) as dag:
    dbt_build = BashOperator(
        task_id='dbt_build',
        bash_command='dbt build --target prod',
        sla=timedelta(hours=1),  # Alert if run takes > 1 hour
    )

The Full Picture

Here's what a production-grade Airflow + dbt setup actually requires:

Component	What You Need
Airflow infrastructure	PostgreSQL, Redis, webserver, scheduler, workers
Deployment	Docker/Kubernetes, CI/CD for DAG deployment
dbt runtime	dbt + adapter installed on workers or in Docker images
Credentials	Airflow Connections, Vault, or secret manager integration
Code sync	Git sync or CI/CD pipeline to update dbt project
Monitoring	Email/Slack alerts, log aggregation, SLA tracking
Maintenance	Airflow upgrades, dependency updates, security patches

That's a lot of infrastructure for running dbt build on a schedule.

Do You Actually Need Airflow?

Airflow makes sense if:

You already run Airflow for other data pipelines
You need complex DAG dependencies beyond dbt (e.g., dbt runs after a Spark job that runs after an ingestion pipeline)
You have a platform team that manages Airflow
You need fine-grained per-model orchestration

Airflow is probably overkill if:

dbt is your only (or primary) workload
You don't have a team to maintain the infrastructure
You just need a reliable cron schedule with monitoring
You want to focus on writing models, not managing orchestration

The Easier Way

We built ModelDock because we were tired of maintaining Airflow just to run dbt.

ModelDock runs Airflow under the hood — so you get the same battle-tested orchestration — but you never have to touch it. No DAGs to write, no infrastructure to manage, no credentials sitting in config files.

Here's what the setup looks like:

Connect your Git repository
Pick your dbt adapter (PostgreSQL, BigQuery, Snowflake, Databricks, Fabric, etc.)
Enter your warehouse credentials (encrypted with AES-256-GCM)
Set a cron schedule
Done

Your dbt project runs in an isolated container, on schedule, with full run logs and artifact storage. If it fails, you know about it.

All the reliability of Airflow. None of the operational overhead.

Free during open beta. No credit card required.

Prerequisites

Installing Airflow

Approach 1: Cosmos (Recommended)

Installation

Basic DAG

What Cosmos Gives You

What Cosmos Doesn't Solve

Approach 2: BashOperator

Pros

Cons

Approach 3: DockerOperator

Dockerfile

DAG

Pros

Cons

Managing Credentials

Option A: Airflow Connections

Option B: Environment Variables

Option C: External Secret Managers

Monitoring & Alerting

Email Alerts

Slack Notifications

SLA Monitoring

The Full Picture

Do You Actually Need Airflow?

The Easier Way

Ready to run dbt-core in production?