dbt-coretutorialwarehouse

Running dbt-core with Databricks: Complete Setup Guide

Step-by-step guide to setting up dbt-core with Databricks — workspace configuration, personal access tokens, Unity Catalog, profiles.yml, and common issues.

ModelDock TeamFebruary 17, 202611 min read

Databricks is one of the most popular data platforms for analytics engineering, and dbt-core has first-class support for it through the dbt-databricks adapter. But getting the two connected for the first time involves a few moving parts that aren't immediately obvious — personal access tokens, HTTP paths, Unity Catalog configuration, and a profiles.yml that looks different depending on whether you're using a SQL warehouse or an all-purpose cluster.

This guide walks through the entire setup from scratch. By the end, you'll have dbt-core running queries against your Databricks workspace.

Prerequisites

Before you start, make sure you have:

A Databricks workspace (AWS, Azure, or GCP — the setup is nearly identical across clouds)
A SQL warehouse or all-purpose cluster running in that workspace
Python 3.9+ installed locally
pip (comes with Python)
Basic familiarity with dbt (you've at least run dbt init before)

If you're starting from zero with dbt, the official dbt docs are a good place to begin. This guide assumes you know what a model is and have a dbt project ready (or are about to create one).

Installing dbt-databricks

The dbt-databricks adapter is maintained by Databricks themselves, which means it stays current with platform changes and gets features like Unity Catalog support early.

# Create a virtual environment (recommended)
python -m venv dbt-env
source dbt-env/bin/activate  # On Windows: dbt-env\Scripts\activate

# Install dbt-databricks (this also installs dbt-core)
pip install dbt-databricks

Verify the installation:

dbt --version

You should see both dbt-core and dbt-databricks listed in the output. The adapter version should be compatible with your dbt-core version — pip handles this automatically.

One thing to note: dbt-databricks is different from the older dbt-spark adapter. The Databricks-specific adapter supports Unity Catalog, has better performance optimizations, and is the recommended choice. If you're migrating from dbt-spark, check the migration guide for any breaking changes.

Workspace Setup

You need two things from your Databricks workspace: a personal access token and an HTTP path.

Creating a Personal Access Token

Personal access tokens (PATs) are the most common way to authenticate dbt with Databricks. Here's how to create one:

Log into your Databricks workspace
Click your username in the top-right corner
Select Settings
Go to Developer > Access tokens
Click Manage > Generate new token
Give it a descriptive name (e.g., dbt-core-local) and set an expiration
Copy the token immediately — you won't see it again

# The token looks something like this:
dapi1234567890abcdef1234567890abcdef

Store this somewhere secure. You'll need it for your profiles.yml in the next step. Avoid committing it to version control.

For production setups, consider using a service principal instead of a personal access token. Service principals aren't tied to individual user accounts and can be managed through your identity provider. But for getting started, a PAT is fine.

Finding the HTTP Path

The HTTP path tells dbt where to send queries. It's different depending on whether you're using a SQL warehouse or an all-purpose cluster.

For a SQL warehouse:

Go to SQL Warehouses in the sidebar
Click on your warehouse
Go to the Connection details tab
Copy the HTTP path — it looks like /sql/1.0/warehouses/abc123def456

For an all-purpose cluster:

Go to Compute in the sidebar
Click on your cluster
Expand Advanced options > JDBC/ODBC
Copy the HTTP path — it looks like /sql/protocolv1/o/1234567890/0123-456789-abcdef

You'll also need your workspace hostname. This is the URL of your Databricks workspace without the https:// prefix — something like adb-1234567890.12.azuredatabricks.net (Azure) or dbc-abc123.cloud.databricks.com (AWS).

profiles.yml Configuration

Now let's wire everything together. Your profiles.yml goes in ~/.dbt/profiles.yml by default.

Using a SQL Warehouse (Recommended)

my_databricks_project:
  target: dev
  outputs:
    dev:
      type: databricks
      host: "adb-1234567890.12.azuredatabricks.net"
      http_path: "/sql/1.0/warehouses/abc123def456"
      token: "dapi1234567890abcdef1234567890abcdef"
      schema: "analytics"
      threads: 4

Using an All-Purpose Cluster

my_databricks_project:
  target: dev
  outputs:
    dev:
      type: databricks
      host: "adb-1234567890.12.azuredatabricks.net"
      http_path: "/sql/protocolv1/o/1234567890/0123-456789-abcdef"
      token: "dapi1234567890abcdef1234567890abcdef"
      schema: "analytics"
      threads: 4

The YAML looks almost identical. The only difference is the HTTP path format. However, the behavior behind the scenes is quite different — more on that in the performance section.

Using Environment Variables

For anything beyond local development, don't hardcode credentials:

my_databricks_project:
  target: dev
  outputs:
    dev:
      type: databricks
      host: "{{ env_var('DATABRICKS_HOST') }}"
      http_path: "{{ env_var('DATABRICKS_HTTP_PATH') }}"
      token: "{{ env_var('DATABRICKS_TOKEN') }}"
      schema: "analytics"
      threads: 4

Then set the environment variables before running dbt:

export DATABRICKS_HOST="adb-1234567890.12.azuredatabricks.net"
export DATABRICKS_HTTP_PATH="/sql/1.0/warehouses/abc123def456"
export DATABRICKS_TOKEN="dapi1234567890abcdef1234567890abcdef"

Unity Catalog Configuration

If your Databricks workspace uses Unity Catalog (and it should — Databricks is pushing all new workspaces in this direction), you need to specify a catalog in your profiles.yml.

Unity Catalog introduces a three-level namespace: catalog > schema > table. Without Unity Catalog, Databricks uses the legacy Hive metastore with just schema and table.

my_databricks_project:
  target: dev
  outputs:
    dev:
      type: databricks
      host: "adb-1234567890.12.azuredatabricks.net"
      http_path: "/sql/1.0/warehouses/abc123def456"
      token: "dapi1234567890abcdef1234567890abcdef"
      catalog: "analytics_catalog"
      schema: "dbt_dev"
      threads: 4

The key addition is the catalog field. When this is set, dbt-databricks automatically uses Unity Catalog for all operations.

Multiple Environments with Unity Catalog

A common pattern is to use different catalogs (or schemas) for development and production:

my_databricks_project:
  target: dev
  outputs:
    dev:
      type: databricks
      host: "{{ env_var('DATABRICKS_HOST') }}"
      http_path: "{{ env_var('DATABRICKS_HTTP_PATH') }}"
      token: "{{ env_var('DATABRICKS_TOKEN') }}"
      catalog: "dev_catalog"
      schema: "dbt_{{ env_var('DBT_USER', 'dev') }}"
      threads: 4

    prod:
      type: databricks
      host: "{{ env_var('DATABRICKS_HOST') }}"
      http_path: "{{ env_var('DATABRICKS_HTTP_PATH') }}"
      token: "{{ env_var('DATABRICKS_TOKEN') }}"
      catalog: "prod_catalog"
      schema: "analytics"
      threads: 8

This keeps development work isolated from production data. Each developer gets their own schema within the dev catalog, and production writes to a shared schema in the prod catalog.

Granting Permissions in Unity Catalog

Your token's user (or service principal) needs the right permissions. At minimum:

-- Grant usage on the catalog
GRANT USE CATALOG ON CATALOG analytics_catalog TO `dbt-service-principal`;

-- Grant usage and create on the schema
GRANT USE SCHEMA ON SCHEMA analytics_catalog.dbt_dev TO `dbt-service-principal`;
GRANT CREATE TABLE ON SCHEMA analytics_catalog.dbt_dev TO `dbt-service-principal`;
GRANT CREATE VIEW ON SCHEMA analytics_catalog.dbt_dev TO `dbt-service-principal`;

-- If dbt needs to create schemas (e.g., for custom schema names)
GRANT CREATE SCHEMA ON CATALOG analytics_catalog TO `dbt-service-principal`;

Run these in a Databricks SQL editor or notebook with admin privileges.

Creating Your First Model and Running It

Let's verify everything works. If you don't have a dbt project yet:

dbt init my_databricks_project

Follow the prompts to select the databricks adapter. This creates a project skeleton with example models.

If you already have a project, create a simple test model:

-- models/staging/stg_test.sql
select
    1 as id,
    'hello from databricks' as message,
    current_timestamp() as created_at

Now test the connection and run:

# Test the connection
dbt debug

# Install any packages
dbt deps

# Run the model
dbt run --select stg_test

If dbt debug shows all green checkmarks, your connection is working. The dbt run command should create a view (or table, depending on your materialization config) in your Databricks workspace.

You can verify in Databricks by running:

SELECT * FROM analytics_catalog.dbt_dev.stg_test;

Common Errors and Fixes

Here are the issues you're most likely to hit, and how to fix them.

"Invalid access token"

Database Error: Invalid access token

Your token is wrong, expired, or revoked. Generate a new one from Settings > Developer > Access tokens. Make sure you're copying the full token string, including the dapi prefix.

"Could not connect to HTTP path"

Database Error: Could not connect to the specified HTTP path

The HTTP path is incorrect. Double-check that you copied the full path from the Connection details tab. Common mistakes include copying the JDBC URL instead of just the HTTP path, or mixing up the SQL warehouse path with a cluster path.

"Cluster is terminated"

Database Error: Cluster ... is in TERMINATED state

If you're using an all-purpose cluster, it may have auto-terminated due to inactivity. Go to Compute in the Databricks UI and start the cluster manually. For SQL warehouses, make sure the warehouse is set to auto-resume (it usually is by default).

"Catalog not found" or "Unity Catalog is not enabled"

Database Error: Catalog 'my_catalog' not found

Either the catalog name is wrong, your user doesn't have USE CATALOG permission, or Unity Catalog isn't enabled on your workspace. Check your catalog name in the Databricks Data Explorer, and verify permissions with a workspace admin.

If Unity Catalog genuinely isn't enabled, remove the catalog field from your profiles.yml and use the legacy Hive metastore instead.

"Permission denied" on schema or table

Database Error: User does not have CREATE TABLE permission on schema

Your user or service principal lacks the required grants. See the permissions section above. A workspace admin needs to run the GRANT statements.

Connection timeout

Database Error: Connection timed out

This usually means the hostname is wrong or there's a network issue. Verify you can reach the hostname from your machine. If you're behind a corporate firewall or VPN, make sure Databricks workspace URLs are allowed.

Performance Tips

SQL Warehouse vs All-Purpose Cluster

For dbt workloads, SQL warehouses are almost always the better choice. Here's why:

Auto-start and auto-stop: SQL warehouses spin up on demand and shut down after a period of inactivity. No wasted compute costs.
Optimized for SQL: SQL warehouses are tuned for the kind of queries dbt runs. All-purpose clusters are general-purpose and split resources across notebooks, jobs, and SQL.
Scaling: SQL warehouses can scale horizontally (add more clusters) to handle concurrent queries from dbt's threading.
Cost: SQL warehouses, especially serverless ones, are typically cheaper for SQL-only workloads.

All-purpose clusters make sense if you're also running Python models in dbt that need PySpark, or if you need a shared cluster for both notebook exploration and dbt development.

Serverless SQL Warehouses

If your cloud provider and Databricks plan support it, serverless SQL warehouses are the best option for dbt. They start faster (seconds instead of minutes), scale more granularly, and you only pay for what you use — no idle compute costs.

The profiles.yml configuration is identical to a regular SQL warehouse. Just use the HTTP path from your serverless warehouse.

Thread Count

The threads setting in profiles.yml controls how many models dbt runs in parallel. For SQL warehouses, you can safely set this higher (8-16) since the warehouse handles concurrency well. For all-purpose clusters, be more conservative (2-4) to avoid overwhelming the cluster.

# SQL warehouse — can handle more concurrency
threads: 12

# All-purpose cluster — be conservative
threads: 4

Production Considerations

Getting dbt running locally against Databricks is step one. Running it reliably in production requires a few more things:

Use a service principal, not a personal token. Personal tokens expire and are tied to individual accounts. Create a service principal in your Databricks workspace, generate a token for it, and use that for production runs.

Pin your dbt and adapter versions. Don't let pip install dbt-databricks grab whatever the latest version is. Use a requirements.txt:

dbt-core==1.9.1
dbt-databricks==1.9.1

Set up proper scheduling. Running dbt build manually doesn't count as production. You need a scheduler — Airflow, GitHub Actions, a managed service — that runs dbt on a predictable cadence and alerts you on failures.

Store artifacts. dbt generates manifest.json, run_results.json, and catalog.json on every run. These are valuable for debugging, documentation, and lineage tracking. Make sure they're stored somewhere persistent, not just on a CI runner that gets torn down.

Monitor warehouse costs. Databricks bills by DBU (Databricks Unit), and a poorly optimized dbt project can burn through credits quickly. Use dbt's model timing data and Databricks query history to identify expensive models.

Skip the Infrastructure Work

Setting up dbt with Databricks is straightforward once you know the pieces. Keeping it running in production — managing tokens, scheduling runs, storing artifacts, handling failures — is where the ongoing work lives.

ModelDock handles Databricks credentials, scheduling, and deployment so you can focus on writing models instead of maintaining infrastructure. Connect your repo, enter your Databricks connection details, set a cron schedule, and your dbt project runs in an isolated container with full logging and artifact storage.

Free during open beta. No credit card required.