dbt-corearchitectureproduction

Building a Data Platform on Microsoft Fabric: Data Factory + Fabric DWH + dbt + Airflow + Power BI

How to build an end-to-end data platform on Microsoft Fabric with Data Factory for ingestion, Fabric Data Warehouse for storage, dbt for transformation, Airflow for orchestration, and Power BI for reporting.

ModelDock TeamFebruary 17, 202614 min read

Microsoft Fabric is Microsoft's bet on a unified analytics platform, and plenty of organizations are building their entire data stack on top of it. The pitch is compelling: one platform that handles ingestion, storage, transformation, and reporting. But in practice, you still need to make architectural decisions about how these pieces fit together — especially when dbt is part of the picture.

This guide walks through a complete Microsoft Fabric architecture: Data Factory for ingestion, Fabric Data Warehouse for storage and compute, dbt for transformation, Airflow for orchestration, and Power BI for reporting. We'll cover how data flows through the system, what the configuration looks like, and where the rough edges are.

What is Microsoft Fabric?

If you've worked with Azure Synapse Analytics, think of Fabric as its successor — but bigger in scope. Fabric is a SaaS analytics platform that bundles data engineering, data warehousing, real-time analytics, data science, and Power BI into a single product. Everything sits on top of OneLake, Microsoft's unified storage layer.

Fabric reached general availability in November 2023, which makes it relatively young by data platform standards. It's evolving fast — features ship monthly, pricing models get adjusted, and adapters like dbt-fabric are still maturing. That's not a reason to avoid it, but it's worth keeping in mind. Things described here may look slightly different six months from now.

The core advantage of Fabric is consolidation. Instead of stitching together Azure Data Factory, Azure Synapse, Azure Data Lake Storage, and Power BI Premium as separate services with separate billing, Fabric puts them all under one roof with a unified capacity model. For organizations already deep in the Microsoft ecosystem, this simplifies both architecture and procurement.

The Stack Overview

Here's the full architecture and what each component is responsible for:

Data Factory (Ingestion) — Fabric's built-in data movement tool. You build pipelines and dataflows to pull data from source systems — databases, SaaS APIs, flat files, cloud storage — and land it in your Fabric workspace. If you've used Azure Data Factory, the experience is nearly identical. Data Factory in Fabric supports 100+ connectors out of the box.

Fabric Data Warehouse (Storage and Compute) — This is where your data lives and where queries run. Fabric DWH is T-SQL compatible, which means your existing SQL Server and Synapse knowledge transfers directly. Storage is handled automatically through OneLake (you don't manage files or partitions), and compute scales based on your Fabric capacity units.

dbt (Transformation) — SQL-based transformations using the dbt-fabric adapter. You write models in T-SQL, dbt compiles and runs them against the Fabric DWH. Version control, testing, documentation, and lineage come along for free. This is the layer where raw data becomes analytics-ready.

Airflow (Orchestration) — The glue that ties everything together on a schedule. Airflow triggers Data Factory pipelines, waits for them to complete, runs dbt, and optionally refreshes Power BI datasets. Fabric has its own pipeline orchestration, but it can't run dbt CLI natively — you need external orchestration for that.

Power BI (BI and Reporting) — Deeply integrated with Fabric. Power BI can read directly from Fabric Data Warehouse tables using Direct Lake mode, which gives you near-real-time dashboards without the overhead of data imports. Semantic models, row-level security, and enterprise distribution are all built in.

How Data Flows Through the System

The end-to-end data flow looks like this:

Ingestion: Data Factory pipelines extract data from source systems (databases, APIs, files) and load it into raw tables in the Fabric Data Warehouse. This is your EL step — extract and load, no transformation yet.

Bronze (Staging): dbt picks up raw data and applies light transformations — renaming columns, casting types, deduplication. These are your Bronze staging models.

Silver & Gold (Transformation): dbt builds Silver (intermediate) and Gold (mart) models. Joins, aggregations, business logic, metrics — all in T-SQL, all version controlled and tested.

Reporting: Power BI connects to the Gold tables in Fabric DWH. With Direct Lake mode, Power BI reads directly from the underlying Parquet files in OneLake — no data import step needed. For more complex semantic models, you can use import mode from the same Gold tables.

The orchestration layer (Airflow) coordinates the sequence: trigger Data Factory pipeline, wait for completion, run dbt build, optionally refresh Power BI datasets via the Power BI REST API.

Setting Up the Pipeline

Let's walk through the key configuration pieces.

Fabric Workspace and Capacity

Before anything else, you need a Fabric workspace with an active capacity assigned. Fabric capacity is measured in Capacity Units (CUs), and you need at least an F2 SKU for development work. A few things to note:

If you're using a trial capacity, it expires after 60 days
Pay-as-you-go capacities can be paused — when paused, nothing works (Data Factory, DWH, Power BI)
Create your Data Warehouse and any Lakehouses within the same workspace

Create a Data Warehouse through the Fabric portal: go to your workspace, click + New, and select Warehouse. Note the SQL connection string — you'll need it for the dbt profile.

dbt Project for Fabric DWH

The dbt-fabric adapter connects to Fabric Data Warehouse through its T-SQL endpoint. Install it alongside dbt-core:

pip install dbt-fabric

Your profiles.yml for service principal authentication (recommended for production):

fabric_project:
  target: prod
  outputs:
    prod:
      type: fabric
      driver: "ODBC Driver 18 for SQL Server"
      server: "your-workspace.datawarehouse.fabric.microsoft.com"
      database: "your_fabric_warehouse"
      schema: "analytics"
      authentication: "ServicePrincipal"
      tenant_id: "{{ env_var('FABRIC_TENANT_ID') }}"
      client_id: "{{ env_var('FABRIC_CLIENT_ID') }}"
      client_secret: "{{ env_var('FABRIC_CLIENT_SECRET') }}"
      threads: 4

    dev:
      type: fabric
      driver: "ODBC Driver 18 for SQL Server"
      server: "your-workspace.datawarehouse.fabric.microsoft.com"
      database: "your_fabric_warehouse"
      schema: "dbt_dev"
      authentication: "CLI"
      threads: 4

A few notes on this configuration:

Service principal auth is the only viable option for automated production runs. Azure CLI auth works for local development but requires an interactive login session.
The server value comes from the SQL connection string in your Fabric workspace. It follows the pattern your-workspace.datawarehouse.fabric.microsoft.com.
The database is the name of your Fabric Data Warehouse item.
You need the Microsoft ODBC Driver 18 installed on whatever machine runs dbt (your CI server, Airflow worker, Docker container).

To register a service principal with access to your Fabric workspace, you'll need to create an App Registration in Azure Entra ID (formerly Azure AD), grant it a role in your Fabric workspace (at least Contributor), and enable service principal access in Fabric admin settings.

Airflow DAG Example

Here's a simplified Airflow DAG that orchestrates the full pipeline — trigger Data Factory, run dbt, and optionally refresh a Power BI dataset:

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from airflow.providers.microsoft.azure.operators.data_factory import (
    AzureDataFactoryRunPipelineOperator,
)
from airflow.providers.microsoft.azure.sensors.data_factory import (
    AzureDataFactoryPipelineRunStatusSensor,
)
import requests

default_args = {
    "owner": "data-team",
    "retries": 1,
    "retry_delay": timedelta(minutes=5),
}

dag = DAG(
    "fabric_data_pipeline",
    default_args=default_args,
    schedule="0 6 * * *",  # Daily at 6 AM UTC
    start_date=datetime(2026, 1, 1),
    catchup=False,
)

# Step 1: Trigger Data Factory pipeline for ingestion
run_adf_pipeline = AzureDataFactoryRunPipelineOperator(
    task_id="run_ingestion_pipeline",
    pipeline_name="ingest_source_data",
    azure_data_factory_conn_id="azure_data_factory",
    wait_for_termination=True,
    dag=dag,
)

# Step 2: Run dbt build
run_dbt = BashOperator(
    task_id="dbt_build",
    bash_command=(
        "cd /opt/dbt/fabric-project && "
        "dbt build --target prod --profiles-dir /opt/dbt/profiles"
    ),
    dag=dag,
)

# Step 3: Refresh Power BI dataset (optional)
def refresh_power_bi_dataset():
    """Trigger a Power BI dataset refresh via REST API."""
    tenant_id = "your-tenant-id"
    client_id = "your-client-id"
    client_secret = "your-client-secret"
    dataset_id = "your-dataset-id"
    group_id = "your-workspace-id"

    # Get access token
    token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
    token_response = requests.post(token_url, data={
        "grant_type": "client_credentials",
        "client_id": client_id,
        "client_secret": client_secret,
        "scope": "https://analysis.windows.net/powerbi/api/.default",
    })
    access_token = token_response.json()["access_token"]

    # Trigger refresh
    refresh_url = (
        f"https://api.powerbi.com/v1.0/myorg/groups/{group_id}"
        f"/datasets/{dataset_id}/refreshes"
    )
    response = requests.post(
        refresh_url,
        headers={"Authorization": f"Bearer {access_token}"},
    )
    response.raise_for_status()

refresh_pbi = PythonOperator(
    task_id="refresh_power_bi",
    python_callable=refresh_power_bi_dataset,
    dag=dag,
)

run_adf_pipeline >> run_dbt >> refresh_pbi

This is a simplified version. In production, you'd pull credentials from Airflow connections or a secrets backend, add alerting on failure, and possibly split the dbt run into separate dbt run and dbt test tasks for better observability.

Note that the Airflow Azure provider (apache-airflow-providers-microsoft-azure) includes operators for Azure Data Factory. Fabric Data Factory pipelines are accessible through the same API surface as classic Azure Data Factory, so existing ADF operators generally work.

Fabric-Specific Considerations

Capacity Units and Pricing

Fabric pricing is fundamentally different from per-query platforms like BigQuery or Snowflake. You pay for a capacity reservation measured in CUs (Capacity Units), not for individual queries or storage.

This has real implications for how you think about cost:

No per-query billing: A badly written dbt model doesn't directly cost more money. It just consumes more of your shared capacity, which may slow down other workloads.
Capacity can throttle: If your dbt run + Data Factory pipeline + Power BI refreshes all fire at the same time, they're all competing for the same CU pool. Fabric will throttle or queue workloads when demand exceeds capacity.
Right-sizing matters: Too small a capacity and everything runs slowly or gets throttled. Too large and you're paying for idle CUs. There's no autoscaling in the traditional sense — you need to manually scale or use Azure autoscale rules on the Fabric capacity resource.

For a dbt workload, you'll want at least an F4 capacity for anything beyond toy projects. An F8 or F16 is more realistic for production workloads with concurrent Data Factory pipelines and Power BI refreshes.

Fabric Data Warehouse vs. Fabric Lakehouse

This is a common source of confusion. Both are storage and compute options within Fabric, but they serve different purposes:

Fabric Data Warehouse is the right choice for dbt. It speaks T-SQL, supports the dbt-fabric adapter (which is more mature), allows service principal authentication, and behaves like a traditional SQL warehouse. If your analytics engineers write SQL and you want a familiar workflow, this is the path.

Fabric Lakehouse is better suited for Spark-based workloads — data scientists running notebooks, data engineers doing heavy ETL in PySpark. There's a dbt-fabricspark adapter, but it connects through the Livy endpoint (Spark), has fewer authentication options (no service principal support at the time of writing), and the Spark SQL dialect is different from T-SQL.

You can use both in the same workspace. A common pattern is to land raw data in a Lakehouse (for Spark processing), then use shortcuts or cross-database queries to expose that data to the Data Warehouse where dbt transforms it.

Service Principal vs. CLI Authentication

For local development, Azure CLI authentication (az login) works fine. But for production — Airflow, CI/CD, any automated process — you need service principal authentication.

Setting up a service principal for Fabric:

Create an App Registration in Azure Entra ID
Generate a client secret
In the Fabric admin portal, enable "Service principals can use Fabric APIs" under Tenant settings
Add the service principal to your Fabric workspace with at least Contributor role
Use the tenant ID, client ID, and client secret in your dbt profile

This is more setup than most data platforms require, and it's the step that trips up most teams migrating to Fabric.

OneLake and Data Sharing

OneLake is the storage layer underlying all Fabric items. Every Data Warehouse table, every Lakehouse file, every Data Factory output — it all lives in OneLake. This means:

Power BI can use Direct Lake mode to read directly from OneLake without importing data
You can create shortcuts between Fabric items to share data without copying it
External tools can access OneLake data through ADLS Gen2-compatible APIs

For the dbt + Power BI flow, this is a significant advantage. Once dbt writes Gold (mart) tables to the Fabric Data Warehouse, Power BI can read them through Direct Lake with zero additional data movement. No scheduled imports, no stale data.

Why Add External Airflow?

This is the most common question teams ask. Fabric has its own pipeline orchestration — Data Factory pipelines can schedule tasks, chain activities, handle retries, and trigger on events. So why add Airflow?

The short answer: Fabric Pipelines cannot run dbt CLI natively.

Fabric Pipelines can trigger:

Data Factory copy activities
Spark notebooks
Stored procedures
Fabric Dataflows

What they cannot do is run an arbitrary command like dbt build. There's no "run a shell command" activity in Fabric Pipelines, and dbt isn't available as a built-in Fabric activity.

You could work around this by wrapping dbt commands in a Spark notebook (using !pip install dbt-fabric && !dbt build), but that's fragile, slow (Spark session startup), and hard to maintain. You lose dbt's native logging, artifact generation, and the ability to use dbt's built-in features properly.

External Airflow gives you:

Full control over the dbt runtime: Install exact dbt and adapter versions, manage Python dependencies, use dbt commands as designed
Unified orchestration: One place to see and manage the entire pipeline — ingestion, transformation, BI refresh
Better error handling: Airflow's retry logic, alerting, and task-level logging are more mature than Fabric Pipelines
Portability: If you ever move off Fabric (or add non-Fabric data sources), your orchestration layer doesn't need to change

The trade-off is that you're running another service outside of Fabric. You need somewhere to host Airflow — a VM, a Kubernetes cluster, a managed Airflow service, or a platform that handles it for you.

Common Pitfalls

Capacity throttling during dbt runs. If you schedule your dbt run at the same time as a large Data Factory ingestion, both workloads compete for the same CU pool. Either stagger them (ingestion first, then dbt) or ensure your capacity is large enough to handle the combined load. The Airflow DAG approach above naturally handles this by running tasks sequentially.

The dbt-fabric adapter is still maturing. As of early 2026, dbt-fabric works well for core dbt functionality — models, tests, snapshots, sources. But some advanced features may behave differently than on more established adapters. Check the dbt-fabric GitHub repo for known issues before relying on edge-case features.

T-SQL is not full ANSI SQL. If you're porting dbt models from Snowflake or BigQuery, expect syntax differences. Common pain points include date functions, string functions, CTEs with certain operations, and window function behavior. The dbt-fabric adapter handles some of these through macros, but not all.

Fabric capacity pausing breaks everything. If you're on pay-as-you-go and your Fabric capacity gets paused (manually or by policy), all Fabric items stop working — Data Factory, Data Warehouse, Power BI. Your Airflow DAG will fail at whatever step tries to hit Fabric. Make sure your capacity stays active during scheduled runs, and set up Airflow alerts to catch this failure mode early.

Power BI Direct Lake has limitations. Direct Lake mode is powerful, but it has guardrails — row limits, column limits, and fallback to DirectQuery if limits are exceeded. For large datasets, you may need to stay with import mode or carefully design your Gold tables to stay within Direct Lake's boundaries.

Fabric is changing fast. Features ship monthly. Authentication methods get updated. Admin settings move around. What worked last quarter might have a better approach now. Keep an eye on the Fabric blog and plan for periodic updates to your pipeline configuration.

Wrapping Up

The Fabric + dbt + Airflow + Power BI stack makes sense for organizations already invested in the Microsoft ecosystem. Fabric simplifies the infrastructure layer — you're not stitching together five separate Azure services — while dbt and Airflow give you the control and portability that Fabric's built-in tools don't fully provide.

The architecture isn't complicated: Data Factory loads raw data, dbt transforms it, Airflow keeps everything running on schedule, and Power BI reads the results directly from OneLake. The complexity is in the details — capacity sizing, authentication setup, adapter quirks, and keeping up with a platform that's still finding its footing.

If you're evaluating this stack, start small. Set up a single Data Factory pipeline, connect dbt to the Fabric Data Warehouse, build a few models, and wire up a basic Airflow DAG. You'll quickly discover whether the Fabric model works for your team.

Skip the Orchestration Headaches

[ModelDock](https://modeldock.run) handles Airflow orchestration for your dbt + Fabric pipeline, with both dbt-fabric and dbt-fabricspark adapters supported. Connect your Git repo, configure your Fabric credentials, set a schedule, and your dbt project runs in an isolated environment with full logging and artifact storage.

Free during open beta. No credit card required.