dbt-coreproductioncomparison

dbt-core Scheduling Options Compared: Cron, Airflow, GitHub Actions, and Managed Services

A practical comparison of every way to schedule dbt-core runs in production — cron, Airflow, Prefect, GitHub Actions, Dagster, and managed platforms.

ModelDock TeamFebruary 17, 202613 min read

dbt-core doesn't come with a scheduler. That's by design — it's a command-line tool that transforms data, not an orchestration platform. But sooner or later, every team that adopts dbt-core needs to answer the same question: how do we schedule dbt runs automatically?

The answer depends on your team size, your existing infrastructure, how much you care about observability, and honestly, how much time you want to spend on plumbing instead of data modeling.

This article compares every major option for scheduling dbt-core in production. Real code, honest trade-offs, no fluff.

Why Scheduling Matters

Running dbt build manually is fine during development. In production, you need your models to refresh on a predictable cadence — after source data lands, before dashboards get checked, and without anyone remembering to press a button.

A good dbt scheduler should handle:

Triggering runs on a time-based or event-based schedule
Managing credentials securely (not hardcoded in scripts)
Retrying on failure without human intervention
Logging and alerting so you know when things break
Environment isolation so production runs don't conflict with development

Some tools do all of this. Some do almost none of it. Let's walk through them.

Option 1: Cron

The oldest trick in the book. Install dbt on a Linux server, write a shell script, and add a crontab entry.

Setup

#!/bin/bash
# /opt/dbt/scheduled_run.sh
set -euo pipefail

cd /opt/dbt/my-project
source /opt/dbt/venv/bin/activate

export DBT_PROFILES_DIR=/opt/dbt/profiles

LOG_FILE="/var/log/dbt/run-$(date +%Y%m%d-%H%M%S).log"

dbt build --target prod 2>&1 | tee "$LOG_FILE"

EXIT_CODE=${PIPESTATUS[0]}
if [ $EXIT_CODE -ne 0 ]; then
  curl -X POST "https://hooks.slack.com/services/YOUR/WEBHOOK/URL" \
    -H 'Content-type: application/json' \
    -d "{\"text\": \"dbt run failed. Check $LOG_FILE\"}"
fi

# Run dbt every day at 5:30 AM UTC
30 5 * * * /opt/dbt/scheduled_run.sh

Pros

Zero dependencies beyond a Linux server
Easy to understand — anyone who's touched Unix can read a crontab
No additional infrastructure to maintain
Fast to set up (15 minutes if you already have a server)

Cons

No built-in retry logic — if a run fails at 5:30 AM, you won't know until you check
Credential management is on you (environment variables, files on disk)
No UI for run history or logs
Scaling to multiple dbt projects means managing multiple cron entries
No dependency awareness — cron doesn't know if your source data has landed yet
Server goes down, runs stop silently

When it works

Solo data engineers or very small teams with a single dbt project and low complexity. If you're just getting started and want something running today, cron is fine as a stopgap.

Option 2: GitHub Actions

GitHub Actions has become a surprisingly popular way to schedule dbt runs, especially for teams that already live in GitHub.

Setup

# .github/workflows/dbt-scheduled.yml
name: Scheduled dbt Build

on:
  schedule:
    - cron: '30 5 * * *'  # 5:30 AM UTC daily
  workflow_dispatch:       # Allow manual triggers

jobs:
  dbt-build:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repo
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'

      - name: Install dbt
        run: |
          pip install dbt-core dbt-snowflake

      - name: Create profiles.yml
        run: |
          mkdir -p ~/.dbt
          cat > ~/.dbt/profiles.yml << 'EOF'
          my_project:
            target: prod
            outputs:
              prod:
                type: snowflake
                account: ${{ secrets.SNOWFLAKE_ACCOUNT }}
                user: ${{ secrets.SNOWFLAKE_USER }}
                password: ${{ secrets.SNOWFLAKE_PASSWORD }}
                role: TRANSFORMER
                database: ANALYTICS
                warehouse: TRANSFORMING
                schema: prod
                threads: 4
          EOF

      - name: Run dbt build
        run: dbt build --target prod

      - name: Upload artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: dbt-artifacts-${{ github.run_id }}
          path: target/
          retention-days: 30

You can also add Slack notifications on failure:

      - name: Notify on failure
        if: failure()
        uses: slackapi/slack-github-action@v1.25.0
        with:
          payload: |
            {"text": "dbt build failed in GitHub Actions. Run: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"}
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

Pros

No infrastructure to manage — GitHub runs the compute
Secrets management is built in (repository secrets)
Run history and logs are in the Actions tab
Easy to trigger manually via workflow_dispatch
Free for public repos, generous free tier for private repos
Your dbt project and its scheduler live in the same repository

Cons

GitHub Actions cron is not precise — runs can be delayed by 5-15 minutes during peak times
No dependency awareness (can't wait for upstream data)
Limited execution time (6 hours max per job on free tier)
Debugging failures means reading through Actions logs, which isn't the best experience
No built-in retry for specific dbt models — it's all or nothing
Costs can creep up if you run frequently or have long-running projects
Not designed for orchestration — it's a CI/CD tool being used as a scheduler

When it works

Small to mid-size teams with straightforward scheduling needs. If your dbt project runs in under 30 minutes and you don't need event-based triggers or complex dependency chains, GitHub Actions is a solid, low-maintenance choice.

Option 3: Apache Airflow

Airflow is the heavyweight option. It's the most widely used orchestrator in data engineering, and it has first-class support for dbt through the Cosmos provider.

Setup with Cosmos

# dags/dbt_scheduled.py
from datetime import datetime
from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.profiles import SnowflakeUserPasswordProfileMapping

dbt_dag = DbtDag(
    project_config=ProjectConfig(
        dbt_project_path="/opt/airflow/dbt/my_project",
    ),
    profile_config=ProfileConfig(
        profile_name="my_project",
        target_name="prod",
        profile_mapping=SnowflakeUserPasswordProfileMapping(
            conn_id="snowflake_default",
            profile_args={
                "database": "ANALYTICS",
                "schema": "prod",
            },
        ),
    ),
    execution_config=ExecutionConfig(
        dbt_executable_path="/opt/airflow/dbt-venv/bin/dbt",
    ),
    schedule_interval="30 5 * * *",
    start_date=datetime(2024, 1, 1),
    catchup=False,
    dag_id="dbt_scheduled_build",
    default_args={
        "retries": 2,
        "retry_delay": 300,
    },
)

Cosmos is nice because it breaks your dbt DAG into individual Airflow tasks — one per model. That means you get per-model retries, parallelism, and visibility in the Airflow UI.

Setup with BashOperator

If you want something simpler (or can't install Cosmos), the BashOperator works:

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator

default_args = {
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
}

with DAG(
    dag_id="dbt_build",
    schedule="30 5 * * *",
    start_date=datetime(2024, 1, 1),
    catchup=False,
    default_args=default_args,
) as dag:

    dbt_build = BashOperator(
        task_id="dbt_build",
        bash_command=(
            "cd /opt/airflow/dbt/my_project && "
            "source /opt/airflow/dbt-venv/bin/activate && "
            "dbt build --target prod"
        ),
        env={
            "DBT_PROFILES_DIR": "/opt/airflow/dbt/profiles",
        },
    )

Pros

Industry standard with a massive community
Powerful dependency management — chain dbt after data ingestion tasks
Built-in retries, alerting, SLAs, and run history
Cosmos gives per-model visibility and retries
Scales to complex, multi-step data pipelines
Extensive UI for monitoring and debugging

Cons

Significant setup and maintenance overhead (database, scheduler, workers, webserver)
Learning curve is steep, especially for teams without platform engineers
Managing Airflow itself becomes a job — upgrades, scaling, debugging the scheduler
Getting dbt and Airflow to play nicely together (virtual environments, dependencies) takes work
Overkill if dbt is the only thing you're orchestrating

When it works

Teams with existing Airflow infrastructure, or teams that need dbt to run as part of a larger pipeline (e.g., after Fivetran ingestion, before reverse ETL). If you already have Airflow, adding dbt to it makes sense. If you don't, think carefully before introducing it just for dbt.

Option 4: Prefect

Prefect takes a Python-native approach to orchestration. If your team thinks in Python rather than YAML, Prefect feels more natural than Airflow.

Setup

# flows/dbt_flow.py
from prefect import flow, task
from prefect.tasks import task_input_hash
from datetime import timedelta
import subprocess

@task(retries=2, retry_delay_seconds=300)
def run_dbt_build():
    result = subprocess.run(
        ["dbt", "build", "--target", "prod"],
        cwd="/opt/dbt/my_project",
        capture_output=True,
        text=True,
        env={
            "DBT_PROFILES_DIR": "/opt/dbt/profiles",
            "PATH": "/opt/dbt/venv/bin:/usr/bin:/bin",
        },
    )
    if result.returncode != 0:
        raise Exception(f"dbt build failed:\n{result.stderr}")
    return result.stdout

@flow(name="dbt-scheduled-build", log_prints=True)
def dbt_scheduled_build():
    output = run_dbt_build()
    print(output)

if __name__ == "__main__":
    dbt_scheduled_build.serve(
        name="dbt-daily-build",
        cron="30 5 * * *",
    )

Prefect also has a prefect-dbt integration for tighter coupling:

from prefect import flow
from prefect_dbt.cli.commands import DbtCoreOperation

@flow
def dbt_build_flow():
    result = DbtCoreOperation(
        commands=["dbt build --target prod"],
        project_dir="/opt/dbt/my_project",
        profiles_dir="/opt/dbt/profiles",
    )
    result.run()

if __name__ == "__main__":
    dbt_build_flow.serve(
        name="dbt-daily-build",
        cron="30 5 * * *",
    )

Pros

Pure Python — no YAML configuration files or custom DSLs
Much easier to set up than Airflow (especially with Prefect Cloud)
Built-in retries, caching, and concurrency controls
Prefect Cloud offers a hosted UI, alerting, and scheduling
Lightweight — a single Python process can serve as your scheduler

Cons

Smaller community than Airflow (though growing)
Self-hosted Prefect server still requires maintenance
Prefect Cloud pricing can add up for high-frequency runs
The dbt integration is less mature than Cosmos for Airflow
Less battle-tested at scale compared to Airflow

When it works

Python-heavy teams that want orchestration without the Airflow overhead. Especially appealing if you're already using Prefect for other data workflows or if your team doesn't have dedicated platform engineers.

Option 5: Dagster

Dagster is interesting because it's the most "dbt-aware" orchestrator. It understands dbt assets natively and can represent dbt models as first-class objects in its own asset graph.

Setup

# definitions.py
from dagster import Definitions
from dagster_dbt import DbtCliResource, dbt_assets, DbtProject

my_project = DbtProject(
    project_dir="/opt/dbt/my_project",
)

@dbt_assets(manifest=my_project.manifest_path)
def my_dbt_assets(context, dbt: DbtCliResource):
    yield from dbt.cli(["build"], context=context).stream()

defs = Definitions(
    assets=[my_dbt_assets],
    resources={
        "dbt": DbtCliResource(
            project_dir=my_project,
            profiles_dir="/opt/dbt/profiles",
        ),
    },
)

Scheduling in Dagster uses ScheduleDefinition or AutoMaterializePolicy:

from dagster import ScheduleDefinition, define_asset_job

dbt_build_job = define_asset_job(
    name="dbt_build_job",
    selection=[my_dbt_assets],
)

dbt_schedule = ScheduleDefinition(
    job=dbt_build_job,
    cron_schedule="30 5 * * *",
)

defs = Definitions(
    assets=[my_dbt_assets],
    schedules=[dbt_schedule],
    resources={
        "dbt": DbtCliResource(
            project_dir=my_project,
            profiles_dir="/opt/dbt/profiles",
        ),
    },
)

Pros

First-class dbt integration — models appear as assets in Dagster's UI
Asset-based approach aligns naturally with how dbt thinks about data
Auto-materialization can trigger dbt models based on upstream changes
Solid local development experience with dagster dev
Good testing framework for pipelines

Cons

Steeper learning curve than Prefect (the asset/op/job model takes time to internalize)
Smaller adoption than Airflow — fewer Stack Overflow answers, fewer tutorials
Self-hosting requires a daemon, webserver, and database (similar to Airflow)
Tighter coupling to Dagster's abstractions can feel constraining
Migrating away from Dagster is harder than migrating away from a simple BashOperator

When it works

Teams that want a modern, asset-oriented approach and are willing to invest in learning Dagster's model. Particularly strong if you're building a new data stack from scratch and want dbt to be deeply integrated with your orchestration layer.

Option 6: Managed Services

Instead of running your own scheduler, you can use a platform that handles it for you.

dbt Cloud

The official managed service from dbt Labs. It's the most polished option if budget isn't a constraint.

# dbt Cloud handles scheduling via its UI
# No code to write — configure jobs through the web interface
# Supports cron schedules, CI on PR, and API-triggered runs

Pros: Seamless integration, IDE included, built-in docs hosting, managed infrastructure.

Cons: Can get expensive at scale ($100+/seat/month on Team plan), less flexibility for custom orchestration, vendor lock-in on proprietary features.

ModelDock

ModelDock takes a different approach — it runs Airflow under the hood but abstracts away the infrastructure. You connect your Git repo and warehouse credentials, and it generates and manages the Airflow DAGs for you.

Pros: Full Airflow power without managing Airflow, per-model visibility, bring your own dbt project.

Cons: Newer platform, smaller community than dbt Cloud.

Comparison Table

Feature	Cron	GitHub Actions	Airflow	Prefect	Dagster	dbt Cloud	ModelDock
Setup time	Minutes	~1 hour	Hours to days	~1 hour	~2 hours	Minutes	Minutes
Scheduling	Cron only	Cron only	Cron + event	Cron + event	Cron + asset	Cron + CI	Cron + event
Retries	Manual	Manual	Built-in	Built-in	Built-in	Built-in	Built-in
Per-model visibility	No	No	With Cosmos	No	Yes	Yes	Yes
Credential management	DIY	Repo secrets	Connections	Blocks/Secrets	Resources	Built-in	Built-in
Run history/UI	Log files	Actions tab	Airflow UI	Prefect UI	Dagster UI	dbt Cloud UI	Dashboard
Alerting	DIY	DIY	Built-in	Built-in	Built-in	Built-in	Built-in
Infra maintenance	Server	None	Significant	Moderate	Moderate	None	None
Cost	Server cost	Free tier + usage	Self-hosted	Free tier + Cloud	Free tier + Cloud	$100+/seat/month	Free tier available
Best for	Solo/prototype	Small teams	Complex pipelines	Python teams	Asset-oriented	Enterprise	Teams wanting Airflow without the ops

Decision Framework

Choosing a dbt scheduler comes down to three factors: team size, pipeline complexity, and how much infrastructure you want to manage.

Choose cron if you're a solo data engineer, your project is simple, and you just need something running today. Plan to migrate off it as soon as your needs grow.

Choose GitHub Actions if your team is small (1-5 people), your dbt project runs in under 30 minutes, you don't need event-based triggers, and you want zero infrastructure to maintain. It's the best "good enough" option for many teams.

Choose Airflow if you already run Airflow for other workloads, or you need dbt to execute as part of a larger data pipeline with complex dependencies. Don't adopt Airflow just for dbt unless you're prepared for the operational commitment.

Choose Prefect if your team is Python-native, you want something lighter than Airflow, and you're comfortable with a smaller ecosystem. Prefect Cloud removes most of the operational burden.

Choose Dagster if you're building a new data platform from scratch and want the tightest possible integration between your orchestrator and dbt. The asset-based model is powerful, but it's a commitment.

Choose a managed service if you'd rather spend your time on data modeling than infrastructure. dbt Cloud is the safe enterprise choice. ModelDock gives you Airflow's power without Airflow's operational burden — it generates and manages the DAGs so you can focus on your dbt project, not your scheduler.

The Real Question

Most teams spend way too long agonizing over scheduler choice. Here's the honest truth: for a straightforward dbt project that runs once or twice a day, nearly any of these options will work. The differences only start to matter when you need event-based triggers, per-model retries, complex multi-step pipelines, or integration with other tools in your stack.

Pick the simplest option that meets your requirements today. You can always migrate later — your dbt project is just SQL and YAML, and it'll run the same way regardless of what triggers it.

If you don't want to manage any orchestration infrastructure but still want the power of Airflow under the hood, give ModelDock a try. Connect your repo, set your schedule, and let it handle the rest.

Why Scheduling Matters

Option 1: Cron

Setup

Pros

Cons

When it works

Option 2: GitHub Actions

Setup

Pros

Cons

When it works

Option 3: Apache Airflow

Setup with Cosmos

Setup with BashOperator

Pros

Cons

When it works

Option 4: Prefect

Setup

Pros

Cons

When it works

Option 5: Dagster

Setup

Pros

Cons

When it works

Option 6: Managed Services

dbt Cloud

ModelDock

Comparison Table

Decision Framework

The Real Question

Ready to run dbt-core in production?