dbt-coretutorialwarehouse

Running dbt-core with Redshift: Complete Setup Guide

Step-by-step guide to setting up dbt-core with Amazon Redshift — cluster provisioning, user creation, schema permissions, profiles.yml configuration, and common errors.

ModelDock TeamFebruary 18, 202612 min read

Amazon Redshift is one of the most widely used cloud data warehouses, and it pairs well with dbt-core. Whether you're running a Redshift Provisioned cluster or Redshift Serverless, the dbt setup follows the same general pattern: install the adapter, create a user with the right permissions, configure profiles.yml, and run your models.

That said, Redshift has more moving parts than a local PostgreSQL instance. You'll deal with VPC networking, security groups, IAM roles, and a permissions model that has a few Redshift-specific quirks. This guide covers all of it — from cluster provisioning to your first dbt run, plus the errors you'll almost certainly encounter along the way.

Prerequisites

Before starting, you'll need:

An AWS account with permissions to create and manage Redshift resources.
A Redshift cluster (Provisioned) or a Redshift Serverless workgroup — we cover setup below if you don't have one yet.
Python 3.9+ and pip installed on your machine.
AWS CLI (optional but recommended) — useful for cluster management and IAM authentication.

Creating a Redshift Cluster

If you don't already have a Redshift instance, you have two options.

Option 1: Redshift Provisioned (via AWS Console)

Go to the Amazon Redshift console.
Click Create cluster.
Choose a cluster identifier (e.g., dbt-warehouse), node type (dc2.large is fine for testing), and number of nodes (1 for dev).
Set an admin username and password.
Under Network and security, choose a VPC and subnet. For local development, make sure Publicly accessible is enabled so you can connect from your machine.
Click Create cluster and wait a few minutes for it to become available.

Option 1b: Redshift Provisioned (via CLI)

aws redshift create-cluster \
  --cluster-identifier dbt-warehouse \
  --node-type dc2.large \
  --number-of-nodes 1 \
  --master-username admin \
  --master-user-password 'AStr0ngP@ssword!' \
  --publicly-accessible \
  --vpc-security-group-ids sg-xxxxxxxx

Replace sg-xxxxxxxx with a security group that allows inbound traffic on port 5439 from your IP address.

Option 2: Redshift Serverless

Go to the Amazon Redshift Serverless console.
Click Create workgroup.
Choose a workgroup name (e.g., dbt-workgroup) and a namespace (e.g., dbt-namespace).
Set admin credentials for the namespace.
Under Network and security, ensure the workgroup is in a VPC with subnets that allow connectivity from your machine. Enable Publicly accessible if needed.
Click Create and wait for the workgroup to become active.

The Serverless endpoint looks slightly different from Provisioned — it follows the pattern <workgroup-name>.<account-id>.<region>.redshift-serverless.amazonaws.com.

Security Group Configuration

Regardless of Provisioned or Serverless, you need a security group that allows inbound TCP on port 5439 from wherever you'll run dbt. For local development, that's your public IP. For production, it's whatever VPC or IP range your dbt runner lives in.

aws ec2 authorize-security-group-ingress \
  --group-id sg-xxxxxxxx \
  --protocol tcp \
  --port 5439 \
  --cidr $(curl -s https://checkip.amazonaws.com)/32

Installing dbt-redshift

The Redshift adapter for dbt-core is a separate package. Install it with pip:

pip install dbt-redshift

This installs both dbt-core and the Redshift adapter. You don't need to install dbt-core separately.

Verify the installation:

dbt --version

You should see something like:

Core:
  - installed: 1.9.x
  - latest:    1.9.x

Plugins:
  - redshift: 1.9.x

If you want to pin to a specific version:

pip install dbt-redshift==1.9.0

For isolated environments (recommended):

python -m venv dbt-env
source dbt-env/bin/activate
pip install dbt-redshift

Redshift User and Permissions

Don't run dbt as your Redshift admin user. Create a dedicated user with only the permissions dbt needs.

Connect to your Redshift cluster as the admin user. You can use psql (Redshift is PostgreSQL-compatible), the Redshift Query Editor in the AWS console, or any SQL client:

psql -h my-cluster.xxxxxx.us-east-1.redshift.amazonaws.com -p 5439 -U admin -d dev

Then run the following SQL:

-- Create a dedicated dbt user
CREATE USER dbt_user PASSWORD 'a_strong_password_here';

-- Create schemas for dbt targets
CREATE SCHEMA IF NOT EXISTS analytics;
CREATE SCHEMA IF NOT EXISTS staging;

-- Grant usage and create on target schemas
GRANT USAGE ON SCHEMA analytics TO dbt_user;
GRANT CREATE ON SCHEMA analytics TO dbt_user;
GRANT USAGE ON SCHEMA staging TO dbt_user;
GRANT CREATE ON SCHEMA staging TO dbt_user;

-- Grant read access to source schemas
GRANT USAGE ON SCHEMA public TO dbt_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbt_user;

-- Grant access to future tables in the source schema
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT ON TABLES TO dbt_user;

A few notes:

**analytics and staging** are the schemas where dbt will create models. Name them whatever makes sense for your project.
The public schema grants assume your raw/source data lives there. Adjust if your sources are in different schemas.
ALTER DEFAULT PRIVILEGES ensures that new tables added to the source schema in the future will automatically be readable by the dbt user.

Permissions for Schema Creation

If dbt needs to create schemas dynamically (for custom schema names, or per-developer schemas in dev), grant the CREATE privilege on the database:

GRANT CREATE ON DATABASE dev TO dbt_user;

Granting Access to Specific External Schemas (Spectrum)

If you're using Redshift Spectrum with external schemas backed by an AWS Glue catalog or Athena, grant usage on those as well:

GRANT USAGE ON SCHEMA spectrum_schema TO dbt_user;

Configuring profiles.yml

dbt uses a profiles.yml file to connect to your warehouse. By default, it lives at ~/.dbt/profiles.yml.

Password Authentication

The simplest approach — username and password:

my_project:
  target: dev
  outputs:
    dev:
      type: redshift
      host: my-cluster.xxxxxx.us-east-1.redshift.amazonaws.com
      port: 5439
      user: dbt_user
      password: "a_strong_password_here"
      dbname: dev
      schema: analytics
      threads: 4

Key fields:

**host** — The Redshift endpoint. For Provisioned clusters, it looks like <cluster-id>.<random>.us-east-1.redshift.amazonaws.com. For Serverless, it's <workgroup-name>.<account-id>.<region>.redshift-serverless.amazonaws.com. Find this in the Redshift console under Cluster details or Workgroup details.
**port — Redshift defaults to 5439**, not 5432 like standard PostgreSQL.
**dbname** — The database name. Redshift Provisioned defaults to dev. You can create additional databases, but most teams use the default.
**schema** — The default schema dbt writes models to.
**threads** — Number of parallel models. 4 is a reasonable starting point. Redshift handles concurrency well, but watch for WLM queue limits.

IAM Authentication

If you prefer to skip long-lived passwords and use IAM instead, dbt-redshift supports it. This uses temporary credentials generated from your AWS IAM identity:

my_project:
  target: dev
  outputs:
    dev:
      type: redshift
      method: iam
      cluster_id: my-cluster
      host: my-cluster.xxxxxx.us-east-1.redshift.amazonaws.com
      port: 5439
      user: dbt_user
      dbname: dev
      schema: analytics
      threads: 4
      profile: default  # AWS CLI profile to use
      region: us-east-1

For IAM auth to work, the IAM user or role needs the redshift:GetClusterCredentials permission, and the Redshift user must exist. The adapter calls GetClusterCredentials behind the scenes to get a temporary password.

Required IAM policy (minimum):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "redshift:GetClusterCredentials",
        "redshift:DescribeClusters"
      ],
      "Resource": [
        "arn:aws:redshift:us-east-1:123456789012:cluster:my-cluster",
        "arn:aws:redshift:us-east-1:123456789012:dbuser:my-cluster/dbt_user",
        "arn:aws:redshift:us-east-1:123456789012:dbname:my-cluster/dev"
      ]
    }
  ]
}

For Redshift Serverless with IAM, the configuration is slightly different:

my_project:
  target: dev
  outputs:
    dev:
      type: redshift
      method: iam
      host: dbt-workgroup.123456789012.us-east-1.redshift-serverless.amazonaws.com
      port: 5439
      user: dbt_user
      dbname: dev
      schema: analytics
      threads: 4
      region: us-east-1

Note: For Serverless, you don't need cluster_id. The adapter detects Serverless from the host endpoint and uses redshift-serverless:GetCredentials instead.

Using Environment Variables

Avoid hardcoding credentials in YAML. Use environment variables:

my_project:
  target: prod
  outputs:
    prod:
      type: redshift
      host: "{{ env_var('DBT_REDSHIFT_HOST') }}"
      port: 5439
      user: "{{ env_var('DBT_REDSHIFT_USER') }}"
      password: "{{ env_var('DBT_REDSHIFT_PASSWORD') }}"
      dbname: "{{ env_var('DBT_REDSHIFT_DBNAME') }}"
      schema: analytics
      threads: 4

Then set them in your shell or CI/CD environment:

export DBT_REDSHIFT_HOST=my-cluster.xxxxxx.us-east-1.redshift.amazonaws.com
export DBT_REDSHIFT_USER=dbt_user
export DBT_REDSHIFT_PASSWORD=a_strong_password_here
export DBT_REDSHIFT_DBNAME=dev

Testing the Connection

Before running models, verify your connection with dbt debug:

dbt debug

If everything is configured correctly, you'll see:

Configuration:
  profiles.yml file [OK found and valid]
  dbt_project.yml file [OK found and valid]

Required dependencies:
  - git [OK found]

Connection:
  host: my-cluster.xxxxxx.us-east-1.redshift.amazonaws.com
  port: 5439
  user: dbt_user
  database: dev
  schema: analytics
  connection_test: [OK connection ok]

All checks passed!

If the connection fails, the error message will tell you what's wrong. The most common issues are security groups blocking port 5439 and incorrect host endpoints — see the Common Errors section below.

Running Your First Model

If you don't have a dbt project yet, create one:

dbt init my_project
cd my_project

Create a simple model at models/staging/stg_orders.sql:

with source as (
    select * from {{ source('raw', 'orders') }}
)

select
    order_id,
    customer_id,
    order_date,
    amount,
    status
from source
where status != 'cancelled'

Define the source in models/staging/sources.yml:

version: 2

sources:
  - name: raw
    schema: public
    tables:
      - name: orders

Run it:

dbt run

Expected output:

Running with dbt=1.9.x
Found 1 model, 1 source...

Concurrency: 4 threads (target='dev')

1 of 1 START sql view model analytics.stg_orders ............ [RUN]
1 of 1 OK created sql view model analytics.stg_orders ....... [CREATE VIEW in 0.45s]

Finished running 1 view model in 1.12s.
Completed successfully.

You can verify the model in Redshift using the Query Editor or psql:

SELECT * FROM analytics.stg_orders LIMIT 10;

Common Errors and Fixes

Connection refused

could not connect to server: Connection refused
  Is the server running on host "my-cluster.xxxxxx..." and accepting connections on port 5439?

This is almost always a networking issue:

Security group: Make sure inbound port 5439 is open from your IP address.
Publicly accessible: If you're connecting from outside the VPC, the cluster must have Publicly accessible enabled.
VPC routing: If you're inside the VPC (e.g., running from an EC2 instance), make sure the subnet route tables and NACLs allow traffic.
Cluster is paused: Provisioned clusters can be paused. Check the console — a paused cluster won't accept connections.

Permission denied for schema

permission denied for schema analytics

The dbt user doesn't have CREATE privileges on the target schema. Run:

GRANT USAGE ON SCHEMA analytics TO dbt_user;
GRANT CREATE ON SCHEMA analytics TO dbt_user;

SSL connection errors

SSL SYSCALL error: Connection reset by peer

Redshift uses SSL by default. If you're seeing SSL-related errors, try adding sslmode to your profile:

dev:
  type: redshift
  # ... other fields
  sslmode: prefer

Valid values are prefer, require, verify-ca, and verify-full. For most setups, the default works fine. If you're behind a corporate proxy or VPN that terminates SSL, you may need to adjust this.

Relation does not exist

relation "analytics.stg_orders" does not exist

This usually means one of:

The schema name in your profiles.yml doesn't match where the model was actually created.
Redshift's search_path doesn't include the schema. Unlike standard PostgreSQL, Redshift defaults to "$user", public in the search path.
The model hasn't been run yet — you're querying something that dbt run hasn't created.

Check where the model actually lives:

SELECT schemaname, tablename
FROM pg_tables
WHERE tablename = 'stg_orders';

Timeout errors

Connection timed out

Several things can cause timeouts:

Serverless scaling: Redshift Serverless may need to spin up compute if the workgroup has been idle. The first connection can take 30-60 seconds. Increase connect_timeout in your profile.
Cluster paused: Provisioned clusters that are paused don't accept connections. Resume the cluster first.
Network: Your request isn't reaching Redshift at all. Check security groups, VPC peering, VPN connections, etc.

Add a connect timeout to your profile to get faster feedback:

dev:
  type: redshift
  # ... other fields
  connect_timeout: 30

WLM Queue Full

ERROR: Query queue is full

Redshift uses Workload Management (WLM) to manage query concurrency. If your dbt run has many threads and the WLM queue is small, queries will be rejected. Either reduce threads in your profile or increase the WLM concurrency in Redshift settings.

Performance Tips

Redshift has several features that dbt can take advantage of for better query performance.

Distribution Keys and Sort Keys

Use the dist and sort configurations in your models to control how Redshift physically stores data:

-- models/marts/fct_orders.sql

{{ config(
    materialized='table',
    dist='customer_id',
    sort='order_date'
) }}

select
    order_id,
    customer_id,
    order_date,
    amount
from {{ ref('stg_orders') }}

**dist** controls how rows are distributed across nodes. Choose a column you frequently join on.
**sort** controls the physical sort order on disk. Choose a column you frequently filter or range-scan on (dates are common).

Compound and Interleaved Sort Keys

For tables with multiple common filter patterns, you can use compound or interleaved sort keys:

{{ config(
    materialized='table',
    sort_type='compound',
    sort=['order_date', 'customer_id']
) }}

Compound sort keys work best when queries filter on leading columns. Interleaved sort keys work when queries filter on any column in the key, but have higher maintenance overhead.

Late Binding Views

Redshift supports late binding views, which don't lock the underlying tables. This is useful for dbt views that reference tables that get dropped and recreated:

{{ config(
    materialized='view',
    bind=False
) }}

Setting bind=False creates the view with WITH NO SCHEMA BINDING, meaning the view won't break if the underlying table is dropped and recreated (which happens during dbt run --full-refresh).

Incremental Models

For large fact tables, use incremental materializations to avoid full table rebuilds:

{{ config(
    materialized='incremental',
    unique_key='order_id',
    incremental_strategy='delete+insert',
    dist='customer_id',
    sort='order_date'
) }}

select
    order_id,
    customer_id,
    order_date,
    amount
from {{ source('raw', 'orders') }}

{% if is_incremental() %}
where order_date > (select max(order_date) from {{ this }})
{% endif %}

The delete+insert strategy works well with Redshift since it avoids merge/upsert overhead.

ANALYZE and VACUUM

Redshift doesn't automatically update table statistics or reclaim disk space after deletes. After a full refresh or large incremental load, consider running:

ANALYZE analytics.fct_orders;
VACUUM analytics.fct_orders;

You can automate this with dbt post-hooks:

{{ config(
    materialized='table',
    post_hook='ANALYZE {{ this }}'
) }}

Skip the Setup Hassle

Getting dbt-core connected to Redshift involves a fair amount of AWS plumbing — VPC configuration, security groups, IAM policies, user permissions, and WLM tuning. It works, but it's a lot of pieces to keep in sync, especially across multiple environments.

[ModelDock](https://modeldock.run) handles all of that for you. Connect your Redshift credentials, point it at your dbt project, and it takes care of scheduling, execution, and deployment. No Airflow, no VPC debugging, no profiles.yml to maintain.

Try ModelDock free and get your dbt models running against Redshift in minutes.