dbt-corearchitectureproduction

The Complete ELT Stack: Fivetran + Snowflake + dbt + Airflow + Tableau

How to build a modern ELT data platform with Fivetran for ingestion, Snowflake for storage, dbt for transformation, Airflow for orchestration, and Tableau for BI.

ModelDock TeamFebruary 17, 202614 min read

If you're building a data platform in 2026, you've probably seen some version of this stack: Fivetran pulls data in, Snowflake stores it, dbt transforms it, Airflow orchestrates the whole thing, and Tableau puts dashboards in front of your stakeholders.

This is the ELT stack that most mid-to-large data teams converge on. Not because it's the only option, but because each piece solves its part of the problem well, and they compose cleanly together.

This article walks through the full architecture — what each component does, how data flows through the system, how to wire the pieces together, and where the sharp edges are.

What is ELT and why it won

For decades, data pipelines followed the ETL pattern: Extract data from source systems, Transform it in a staging area (often a separate server), and then Load the cleaned result into a warehouse. The transformation happened before the data landed in the warehouse because storage and compute were expensive. You only wanted to store what you actually needed.

That constraint disappeared. Cloud warehouses like Snowflake, BigQuery, and Redshift made storage cheap and compute elastic. You can now load terabytes of raw data and run transformations directly in the warehouse — no intermediate staging server required.

ELT flips the order: Extract, Load raw data into the warehouse, then Transform it in place using SQL. This approach has a few clear advantages:

Raw data is always available. If a transformation is wrong, you fix the SQL and re-run it. You don't need to re-extract from source.
Transformations are just SQL. Your data analysts and analytics engineers can own the logic without learning Spark, Python ETL frameworks, or custom tooling.
The warehouse handles scale. Snowflake can spin up a bigger warehouse for a heavy transformation and shut it down when it's done. You don't manage infrastructure for the compute layer.

ELT won because it's simpler, more flexible, and better suited to how modern teams actually work. The stack described in this article is the most common implementation of this pattern.

The stack overview

Here's what each piece does and why it's there.

Fivetran — Ingestion

Fivetran is a managed data ingestion service. It connects to your SaaS tools (Salesforce, Stripe, HubSpot, Google Analytics), databases (PostgreSQL, MySQL, MongoDB), and file stores, then replicates that data into your warehouse on a schedule.

The key value is that Fivetran handles the connector maintenance. APIs change, rate limits shift, pagination logic gets complicated. Fivetran's engineering team deals with all of that. You configure a connector, point it at a destination schema in Snowflake, and it keeps the data flowing.

Fivetran loads data into your warehouse in a raw, mostly-unmodified form. It creates schemas and tables that mirror the source, adds metadata columns for sync tracking, and handles incremental updates. This raw data becomes the input for your dbt transformations.

Snowflake — Storage and compute

Snowflake is the cloud data warehouse at the center of this stack. It stores all your data — raw, intermediate, and transformed — and provides the compute engine to run transformations and queries.

What makes Snowflake particularly well-suited for ELT is the separation of storage and compute. Your data lives in cheap cloud storage (S3/Azure Blob/GCS under the hood), and you spin up compute warehouses only when you need them. A small warehouse handles Fivetran syncs and light queries. A larger warehouse runs your dbt transformations. Tableau queries hit yet another warehouse so they don't compete with transformation workloads.

This means you can scale each workload independently and only pay for compute when it's running.

dbt — Transformation

dbt (data build tool) handles the "T" in ELT. It lets you write transformations as SQL SELECT statements, organized into models that build on each other. dbt handles the DDL — it figures out whether to create a table, a view, or an incremental merge based on your configuration.

A typical dbt project for this stack follows the medallion architecture with three layers:

Bronze (staging) models that clean and rename raw Fivetran tables
Silver (intermediate) models that join and aggregate across sources
Gold (marts) models that serve specific business domains or use cases (these are what Tableau connects to)

dbt also gives you testing, documentation, and lineage tracking. You can write assertions like "this column should never be null" or "this should be a unique key," and dbt will flag failures before bad data reaches your dashboards.

Airflow — Orchestration

Airflow is the conductor. It doesn't move data or transform it — it tells the other tools when to run and in what order.

A typical daily pipeline looks like this:

Trigger Fivetran syncs for all relevant connectors
Wait for syncs to complete
Run dbt build (models + tests)
Send a Slack notification on success or failure

Airflow manages the dependencies between these steps, handles retries, and gives you a UI to monitor runs. Without orchestration, you'd be running dbt on a cron job and hoping Fivetran finished syncing before the transformation starts.

Tableau — BI and visualization

Tableau connects to Snowflake's mart layer and provides the dashboards, reports, and self-service analytics that business users interact with. It's the consumer at the end of the pipeline.

Tableau connects directly to Snowflake using a dedicated read-only warehouse. It queries the mart models that dbt produces — clean, well-documented, pre-aggregated tables designed for reporting. Your Tableau developers don't need to write complex SQL or understand the raw data; they work with curated datasets.

How data flows through the stack

Let's trace a concrete example. Say you're tracking revenue from Stripe.

Step 1: Fivetran syncs Stripe data into Snowflake. Fivetran's Stripe connector pulls charges, customers, subscriptions, invoices, and other tables. These land in a raw_stripe schema in Snowflake, with columns matching the Stripe API response.

Step 2: dbt Bronze models clean the raw data. A Bronze (staging) model like stg_stripe__charges renames columns, casts types, and filters out test data:

-- models/staging/stripe/stg_stripe__charges.sql
with source as (
    select * from {{ source('stripe', 'charges') }}
),

renamed as (
    select
        id as charge_id,
        customer as customer_id,
        amount::number / 100 as amount_dollars,
        currency,
        status,
        created::timestamp_ntz as created_at,
        _fivetran_synced as synced_at
    from source
    where not _fivetran_deleted
)

select * from renamed

Step 3: Silver models join across sources. You might join Stripe charges with your application database to enrich payment data with internal user IDs and account metadata.

Step 4: Gold models serve business-ready data. A fct_revenue model aggregates daily revenue by product, region, or customer segment. This is what Tableau connects to.

-- models/marts/finance/fct_daily_revenue.sql
with charges as (
    select * from {{ ref('int_charges_enriched') }}
),

daily_revenue as (
    select
        date_trunc('day', created_at) as revenue_date,
        product_category,
        count(*) as transaction_count,
        sum(amount_dollars) as total_revenue,
        avg(amount_dollars) as avg_transaction_value
    from charges
    where status = 'succeeded'
    group by 1, 2
)

select * from daily_revenue

Step 5: Tableau reads from the mart layer. A Tableau workbook connects to the fct_daily_revenue table in Snowflake and builds revenue trend charts, product breakdowns, and executive dashboards.

Setting up the pipeline

Fivetran to Snowflake

The Fivetran setup is straightforward. In Fivetran's UI, you create a destination pointing to your Snowflake account, configure a service user with write access to raw schemas, and then add connectors for each data source. Fivetran handles schema creation and ongoing replication.

The key decision here is sync frequency. Fivetran charges based on monthly active rows (MAR), so more frequent syncs cost more. Most teams land on 6-hour or daily syncs for most sources, with 15-minute syncs for critical operational data.

dbt project structure

Your dbt project points at Snowflake via profiles.yml:

# profiles.yml
my_project:
  target: prod
  outputs:
    prod:
      type: snowflake
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
      role: TRANSFORMER
      database: ANALYTICS
      warehouse: TRANSFORM_WH
      schema: dbt_prod
      threads: 8

The project structure follows the Bronze/Silver/Gold (staging/intermediate/marts) convention:

models/
  staging/
    stripe/
      _stripe__sources.yml
      stg_stripe__charges.sql
      stg_stripe__customers.sql
    salesforce/
      _salesforce__sources.yml
      stg_salesforce__opportunities.sql
  intermediate/
    int_charges_enriched.sql
  marts/
    finance/
      fct_daily_revenue.sql
      dim_customers.sql
    marketing/
      fct_campaign_performance.sql

The Airflow DAG

Here's a simplified Airflow DAG that coordinates the full pipeline:

# dags/elt_daily_pipeline.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.fivetran.operators.fivetran import FivetranOperator
from airflow.providers.fivetran.sensors.fivetran import FivetranSensor
from airflow.operators.bash import BashOperator
from airflow.providers.slack.notifications.slack import send_slack_notification
from datetime import datetime, timedelta

default_args = {
    "owner": "data-team",
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
    "on_failure_callback": send_slack_notification(
        text="ELT pipeline failed: {{ task.task_id }}"
    ),
}

with DAG(
    dag_id="elt_daily_pipeline",
    default_args=default_args,
    schedule="0 6 * * *",  # 6 AM UTC daily
    start_date=datetime(2026, 1, 1),
    catchup=False,
    tags=["elt", "production"],
) as dag:

    # Step 1: Trigger Fivetran sync
    trigger_fivetran_stripe = FivetranOperator(
        task_id="trigger_fivetran_stripe",
        fivetran_conn_id="fivetran_default",
        connector_id="stripe_connector_id",
    )

    # Step 2: Wait for sync to complete
    wait_for_stripe = FivetranSensor(
        task_id="wait_for_stripe_sync",
        fivetran_conn_id="fivetran_default",
        connector_id="stripe_connector_id",
        poke_interval=60,
        timeout=3600,
    )

    # Step 3: Run dbt
    dbt_build = BashOperator(
        task_id="dbt_build",
        bash_command=(
            "cd /opt/dbt/my_project && "
            "dbt build --profiles-dir /opt/dbt/profiles --target prod"
        ),
    )

    # Step 4: Notify on success
    notify_success = PythonOperator(
        task_id="notify_success",
        python_callable=lambda: print("Pipeline completed successfully"),
    )

    # Define dependencies
    trigger_fivetran_stripe >> wait_for_stripe >> dbt_build >> notify_success

This is a simplified version. A real DAG would trigger multiple Fivetran connectors in parallel, use Cosmos for per-model dbt task visibility, and have more sophisticated error handling. But the core pattern is always the same: sync, wait, transform, notify.

Why these components (and not alternatives)

Every component in this stack has alternatives. Here's why teams often land on these choices — and when you might choose differently.

Fivetran over Airbyte or Stitch. Fivetran's connector quality and reliability is the main draw. Airbyte is open-source and cheaper, but you host it yourself, and connector stability varies. Stitch (now part of Talend) is simpler but has fewer connectors. If budget is tight and you have engineering capacity, Airbyte is a legitimate choice. If you want connectors that just work and you'd rather not think about ingestion, Fivetran earns its price tag.

Snowflake over Redshift, BigQuery, or Databricks. Snowflake's separation of storage and compute makes it particularly clean for ELT workloads — you can run Fivetran loads, dbt transformations, and Tableau queries on separate warehouses without contention. BigQuery is excellent and arguably simpler (no warehouse sizing decisions), but Tableau's Snowflake integration is more mature. Redshift has improved a lot with Serverless, but the Snowflake ecosystem for ELT (especially dbt + Fivetran) has deeper tooling and community support. Databricks is great if you also need ML/data science workloads alongside your analytics, but it's more complex for pure BI use cases.

dbt over custom SQL scripts or Dataform. dbt has won the transformation layer. The community, package ecosystem, testing framework, and documentation capabilities are unmatched. Dataform (now Google-owned) is solid but tightly coupled to BigQuery. Custom SQL scripts work but give you none of the dependency management, testing, or documentation.

Airflow over Dagster, Prefect, or cron. Airflow is the most widely deployed orchestrator, with the largest community, most integrations (including first-party Fivetran operators), and the biggest hiring pool. Dagster and Prefect offer better developer experience and modern abstractions, but they have smaller ecosystems. Cron works until it doesn't — no dependency management, no retries, no visibility.

Tableau over Looker, Power BI, or Metabase. Tableau's strength is in ad-hoc exploration and polished dashboards. Looker's modeling layer (LookML) is powerful if you want to centralize metrics definitions, but it adds another modeling layer on top of dbt. Power BI is cheaper and deeply integrated with Microsoft shops. Metabase is great for startups that want something free and simple. Tableau wins in organizations that value visual exploration and have analysts who invest in dashboard craft.

Common pitfalls

This stack works well, but there are failure modes you should know about.

Fivetran schema drift breaking dbt models

Fivetran replicates source schemas automatically. When someone adds a column in Salesforce or a Stripe API version changes, Fivetran picks it up and adds it to your raw tables. Usually this is harmless — your dbt staging models select specific columns, so new columns are ignored.

The problem is when columns get renamed or removed at the source. Your dbt models will fail with a "column not found" error. The fix is to add dbt source freshness tests and schema tests that catch these changes early, and to configure Fivetran's column blocking if you want to control what gets synced.

Snowflake warehouse sizing

It's easy to over-provision (expensive) or under-provision (slow). Common mistakes include running all workloads on a single warehouse, never suspending idle warehouses, and using XSMALL for heavy dbt full-refreshes.

The general approach: start with XSMALL for Fivetran loads and Tableau queries, SMALL or MEDIUM for dbt runs, and set auto-suspend to 60 seconds. Monitor query times and scale up only where you see bottlenecks. Snowflake's resource monitors help keep costs predictable.

Airflow complexity

Airflow is powerful, but it's also a distributed system with a web server, scheduler, workers, a metadata database, and a message broker. It needs monitoring, maintenance, and upgrades. DAG authoring requires Python knowledge, and debugging failed tasks often means reading Airflow logs across multiple components.

Many teams spend more time maintaining Airflow than writing dbt models. This isn't a criticism of Airflow — it's a reflection of the fact that orchestration is genuinely hard infrastructure.

Fivetran and dbt timing mismatches

If your Airflow DAG triggers a dbt run before Fivetran finishes syncing, you'll transform stale data. The Airflow DAG above handles this with the FivetranSensor, but teams that run dbt on a simple cron schedule (without orchestration) hit this problem constantly. Always use a sensor or webhook-based trigger rather than assuming the sync is done because "it usually takes 20 minutes."

Tableau extract refresh conflicts

Tableau extracts that refresh while dbt is rebuilding tables can fail or return partial data. Schedule Tableau extract refreshes to start after your dbt pipeline completes. Some teams use Tableau's REST API to trigger refreshes as the final step in their Airflow DAG, ensuring the pipeline is truly end-to-end.

The orchestration challenge

Of the five components in this stack, four are managed services or relatively self-contained tools. Fivetran is fully managed. Snowflake is fully managed. dbt-core is a CLI tool — you install it, point it at a warehouse, and run it. Tableau is a desktop app or a managed server.

Airflow is the outlier. It's the piece that connects everything, and it's also the piece that requires the most operational overhead to run.

Self-managing Airflow means:

Infrastructure: Running the webserver, scheduler, and workers on VMs or Kubernetes
Database: Maintaining the Airflow metadata database (PostgreSQL), handling migrations during upgrades
Scaling: Adding workers as your DAG count grows, tuning the scheduler for performance
Upgrades: Airflow major version upgrades are notoriously painful, often requiring DAG rewrites
Security: Managing connections and variables that hold warehouse credentials, API keys, and secrets
Monitoring: Watching for scheduler hangs, worker OOM kills, and task queue backlogs

There are managed Airflow services (Astronomer, Amazon MWAA, Google Cloud Composer), but they add cost and still require you to author and maintain DAGs.

The irony is that for most dbt-centric teams, the orchestration need is simple: run dbt on a schedule, maybe trigger some syncs first, and notify on failure. You don't need Airflow's full power — but there aren't many good alternatives in between "cron" and "full Airflow deployment."

A simpler path for the orchestration layer

If your primary workload is running dbt against Snowflake on a schedule, you may not need to operate Airflow yourself.

ModelDock runs Airflow under the hood and exposes a simple interface for scheduling and monitoring dbt runs. You connect your Git repository and your Snowflake credentials, configure a schedule, and ModelDock handles the orchestration infrastructure — the Airflow deployment, DAG generation, log management, and run monitoring.

Your Fivetran syncs keep loading raw data into Snowflake. Your dbt models keep transforming it. Your Tableau dashboards keep reading from marts. The only thing that changes is you stop managing Airflow.

If you're building this stack and dreading the orchestration layer, give ModelDock a try — it's free to start, and you can focus on your dbt models and Tableau dashboards instead of Airflow infrastructure.