Running dbt-core with Redshift: Complete Setup Guide
Step-by-step guide to setting up dbt-core with Amazon Redshift — cluster provisioning, user creation, schema permissions, profiles.yml configuration, and common errors.
Amazon Redshift is one of the most widely used cloud data warehouses, and it pairs well with dbt-core. Whether you're running a Redshift Provisioned cluster or Redshift Serverless, the dbt setup follows the same general pattern: install the adapter, create a user with the right permissions, configure profiles.yml, and run your models.
That said, Redshift has more moving parts than a local PostgreSQL instance. You'll deal with VPC networking, security groups, IAM roles, and a permissions model that has a few Redshift-specific quirks. This guide covers all of it — from cluster provisioning to your first dbt run, plus the errors you'll almost certainly encounter along the way.
Prerequisites
Before starting, you'll need:
- An AWS account with permissions to create and manage Redshift resources.
- A Redshift cluster (Provisioned) or a Redshift Serverless workgroup — we cover setup below if you don't have one yet.
- Python 3.9+ and
pipinstalled on your machine. - AWS CLI (optional but recommended) — useful for cluster management and IAM authentication.
Creating a Redshift Cluster
If you don't already have a Redshift instance, you have two options.
Option 1: Redshift Provisioned (via AWS Console)
- Go to the Amazon Redshift console.
- Click Create cluster.
- Choose a cluster identifier (e.g.,
dbt-warehouse), node type (dc2.largeis fine for testing), and number of nodes (1 for dev). - Set an admin username and password.
- Under Network and security, choose a VPC and subnet. For local development, make sure Publicly accessible is enabled so you can connect from your machine.
- Click Create cluster and wait a few minutes for it to become available.
Option 1b: Redshift Provisioned (via CLI)
aws redshift create-cluster \--cluster-identifier dbt-warehouse \--node-type dc2.large \--number-of-nodes 1 \--master-username admin \--master-user-password 'AStr0ngP@ssword!' \--publicly-accessible \--vpc-security-group-ids sg-xxxxxxxx
Replace sg-xxxxxxxx with a security group that allows inbound traffic on port 5439 from your IP address.
Option 2: Redshift Serverless
- Go to the Amazon Redshift Serverless console.
- Click Create workgroup.
- Choose a workgroup name (e.g.,
dbt-workgroup) and a namespace (e.g.,dbt-namespace). - Set admin credentials for the namespace.
- Under Network and security, ensure the workgroup is in a VPC with subnets that allow connectivity from your machine. Enable Publicly accessible if needed.
- Click Create and wait for the workgroup to become active.
The Serverless endpoint looks slightly different from Provisioned — it follows the pattern <workgroup-name>.<account-id>.<region>.redshift-serverless.amazonaws.com.
Security Group Configuration
Regardless of Provisioned or Serverless, you need a security group that allows inbound TCP on port 5439 from wherever you'll run dbt. For local development, that's your public IP. For production, it's whatever VPC or IP range your dbt runner lives in.
aws ec2 authorize-security-group-ingress \--group-id sg-xxxxxxxx \--protocol tcp \--port 5439 \--cidr $(curl -s https://checkip.amazonaws.com)/32
Installing dbt-redshift
The Redshift adapter for dbt-core is a separate package. Install it with pip:
pip install dbt-redshift
This installs both dbt-core and the Redshift adapter. You don't need to install dbt-core separately.
Verify the installation:
dbt --version
You should see something like:
Core:- installed: 1.9.x- latest: 1.9.xPlugins:- redshift: 1.9.x
If you want to pin to a specific version:
pip install dbt-redshift==1.9.0
For isolated environments (recommended):
python -m venv dbt-envsource dbt-env/bin/activatepip install dbt-redshift
Redshift User and Permissions
Don't run dbt as your Redshift admin user. Create a dedicated user with only the permissions dbt needs.
Connect to your Redshift cluster as the admin user. You can use psql (Redshift is PostgreSQL-compatible), the Redshift Query Editor in the AWS console, or any SQL client:
psql -h my-cluster.xxxxxx.us-east-1.redshift.amazonaws.com -p 5439 -U admin -d dev
Then run the following SQL:
-- Create a dedicated dbt userCREATE USER dbt_user PASSWORD 'a_strong_password_here';-- Create schemas for dbt targetsCREATE SCHEMA IF NOT EXISTS analytics;CREATE SCHEMA IF NOT EXISTS staging;-- Grant usage and create on target schemasGRANT USAGE ON SCHEMA analytics TO dbt_user;GRANT CREATE ON SCHEMA analytics TO dbt_user;GRANT USAGE ON SCHEMA staging TO dbt_user;GRANT CREATE ON SCHEMA staging TO dbt_user;-- Grant read access to source schemasGRANT USAGE ON SCHEMA public TO dbt_user;GRANT SELECT ON ALL TABLES IN SCHEMA public TO dbt_user;-- Grant access to future tables in the source schemaALTER DEFAULT PRIVILEGES IN SCHEMA publicGRANT SELECT ON TABLES TO dbt_user;
A few notes:
- **
analyticsandstaging** are the schemas where dbt will create models. Name them whatever makes sense for your project. - The
publicschema grants assume your raw/source data lives there. Adjust if your sources are in different schemas. ALTER DEFAULT PRIVILEGESensures that new tables added to the source schema in the future will automatically be readable by the dbt user.
Permissions for Schema Creation
If dbt needs to create schemas dynamically (for custom schema names, or per-developer schemas in dev), grant the CREATE privilege on the database:
GRANT CREATE ON DATABASE dev TO dbt_user;
Granting Access to Specific External Schemas (Spectrum)
If you're using Redshift Spectrum with external schemas backed by an AWS Glue catalog or Athena, grant usage on those as well:
GRANT USAGE ON SCHEMA spectrum_schema TO dbt_user;
Configuring profiles.yml
dbt uses a profiles.yml file to connect to your warehouse. By default, it lives at ~/.dbt/profiles.yml.
Password Authentication
The simplest approach — username and password:
my_project:target: devoutputs:dev:type: redshifthost: my-cluster.xxxxxx.us-east-1.redshift.amazonaws.comport: 5439user: dbt_userpassword: "a_strong_password_here"dbname: devschema: analyticsthreads: 4
Key fields:
- **
host** — The Redshift endpoint. For Provisioned clusters, it looks like<cluster-id>.<random>.us-east-1.redshift.amazonaws.com. For Serverless, it's<workgroup-name>.<account-id>.<region>.redshift-serverless.amazonaws.com. Find this in the Redshift console under Cluster details or Workgroup details. - **
port— Redshift defaults to 5439**, not 5432 like standard PostgreSQL. - **
dbname** — The database name. Redshift Provisioned defaults todev. You can create additional databases, but most teams use the default. - **
schema** — The default schema dbt writes models to. - **
threads** — Number of parallel models. 4 is a reasonable starting point. Redshift handles concurrency well, but watch for WLM queue limits.
IAM Authentication
If you prefer to skip long-lived passwords and use IAM instead, dbt-redshift supports it. This uses temporary credentials generated from your AWS IAM identity:
my_project:target: devoutputs:dev:type: redshiftmethod: iamcluster_id: my-clusterhost: my-cluster.xxxxxx.us-east-1.redshift.amazonaws.comport: 5439user: dbt_userdbname: devschema: analyticsthreads: 4profile: default # AWS CLI profile to useregion: us-east-1
For IAM auth to work, the IAM user or role needs the redshift:GetClusterCredentials permission, and the Redshift user must exist. The adapter calls GetClusterCredentials behind the scenes to get a temporary password.
Required IAM policy (minimum):
{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Action": ["redshift:GetClusterCredentials","redshift:DescribeClusters"],"Resource": ["arn:aws:redshift:us-east-1:123456789012:cluster:my-cluster","arn:aws:redshift:us-east-1:123456789012:dbuser:my-cluster/dbt_user","arn:aws:redshift:us-east-1:123456789012:dbname:my-cluster/dev"]}]}
For Redshift Serverless with IAM, the configuration is slightly different:
my_project:target: devoutputs:dev:type: redshiftmethod: iamhost: dbt-workgroup.123456789012.us-east-1.redshift-serverless.amazonaws.comport: 5439user: dbt_userdbname: devschema: analyticsthreads: 4region: us-east-1
Note: For Serverless, you don't need cluster_id. The adapter detects Serverless from the host endpoint and uses redshift-serverless:GetCredentials instead.
Using Environment Variables
Avoid hardcoding credentials in YAML. Use environment variables:
my_project:target: prodoutputs:prod:type: redshifthost: "{{ env_var('DBT_REDSHIFT_HOST') }}"port: 5439user: "{{ env_var('DBT_REDSHIFT_USER') }}"password: "{{ env_var('DBT_REDSHIFT_PASSWORD') }}"dbname: "{{ env_var('DBT_REDSHIFT_DBNAME') }}"schema: analyticsthreads: 4
Then set them in your shell or CI/CD environment:
export DBT_REDSHIFT_HOST=my-cluster.xxxxxx.us-east-1.redshift.amazonaws.comexport DBT_REDSHIFT_USER=dbt_userexport DBT_REDSHIFT_PASSWORD=a_strong_password_hereexport DBT_REDSHIFT_DBNAME=dev
Testing the Connection
Before running models, verify your connection with dbt debug:
dbt debug
If everything is configured correctly, you'll see:
Configuration:profiles.yml file [OK found and valid]dbt_project.yml file [OK found and valid]Required dependencies:- git [OK found]Connection:host: my-cluster.xxxxxx.us-east-1.redshift.amazonaws.comport: 5439user: dbt_userdatabase: devschema: analyticsconnection_test: [OK connection ok]All checks passed!
If the connection fails, the error message will tell you what's wrong. The most common issues are security groups blocking port 5439 and incorrect host endpoints — see the Common Errors section below.
Running Your First Model
If you don't have a dbt project yet, create one:
dbt init my_projectcd my_project
Create a simple model at models/staging/stg_orders.sql:
with source as (select * from {{ source('raw', 'orders') }})selectorder_id,customer_id,order_date,amount,statusfrom sourcewhere status != 'cancelled'
Define the source in models/staging/sources.yml:
version: 2sources:- name: rawschema: publictables:- name: orders
Run it:
dbt run
Expected output:
Running with dbt=1.9.xFound 1 model, 1 source...Concurrency: 4 threads (target='dev')1 of 1 START sql view model analytics.stg_orders ............ [RUN]1 of 1 OK created sql view model analytics.stg_orders ....... [CREATE VIEW in 0.45s]Finished running 1 view model in 1.12s.Completed successfully.
You can verify the model in Redshift using the Query Editor or psql:
SELECT * FROM analytics.stg_orders LIMIT 10;
Common Errors and Fixes
Connection refused
could not connect to server: Connection refusedIs the server running on host "my-cluster.xxxxxx..." and accepting connections on port 5439?
This is almost always a networking issue:
- Security group: Make sure inbound port 5439 is open from your IP address.
- Publicly accessible: If you're connecting from outside the VPC, the cluster must have Publicly accessible enabled.
- VPC routing: If you're inside the VPC (e.g., running from an EC2 instance), make sure the subnet route tables and NACLs allow traffic.
- Cluster is paused: Provisioned clusters can be paused. Check the console — a paused cluster won't accept connections.
Permission denied for schema
permission denied for schema analytics
The dbt user doesn't have CREATE privileges on the target schema. Run:
GRANT USAGE ON SCHEMA analytics TO dbt_user;GRANT CREATE ON SCHEMA analytics TO dbt_user;
SSL connection errors
SSL SYSCALL error: Connection reset by peer
Redshift uses SSL by default. If you're seeing SSL-related errors, try adding sslmode to your profile:
dev:type: redshift# ... other fieldssslmode: prefer
Valid values are prefer, require, verify-ca, and verify-full. For most setups, the default works fine. If you're behind a corporate proxy or VPN that terminates SSL, you may need to adjust this.
Relation does not exist
relation "analytics.stg_orders" does not exist
This usually means one of:
- The schema name in your
profiles.ymldoesn't match where the model was actually created. - Redshift's
search_pathdoesn't include the schema. Unlike standard PostgreSQL, Redshift defaults to"$user", publicin the search path. - The model hasn't been run yet — you're querying something that
dbt runhasn't created.
Check where the model actually lives:
SELECT schemaname, tablenameFROM pg_tablesWHERE tablename = 'stg_orders';
Timeout errors
Connection timed out
Several things can cause timeouts:
- Serverless scaling: Redshift Serverless may need to spin up compute if the workgroup has been idle. The first connection can take 30-60 seconds. Increase
connect_timeoutin your profile. - Cluster paused: Provisioned clusters that are paused don't accept connections. Resume the cluster first.
- Network: Your request isn't reaching Redshift at all. Check security groups, VPC peering, VPN connections, etc.
Add a connect timeout to your profile to get faster feedback:
dev:type: redshift# ... other fieldsconnect_timeout: 30
WLM Queue Full
ERROR: Query queue is full
Redshift uses Workload Management (WLM) to manage query concurrency. If your dbt run has many threads and the WLM queue is small, queries will be rejected. Either reduce threads in your profile or increase the WLM concurrency in Redshift settings.
Performance Tips
Redshift has several features that dbt can take advantage of for better query performance.
Distribution Keys and Sort Keys
Use the dist and sort configurations in your models to control how Redshift physically stores data:
-- models/marts/fct_orders.sql{{ config(materialized='table',dist='customer_id',sort='order_date') }}selectorder_id,customer_id,order_date,amountfrom {{ ref('stg_orders') }}
- **
dist** controls how rows are distributed across nodes. Choose a column you frequently join on. - **
sort** controls the physical sort order on disk. Choose a column you frequently filter or range-scan on (dates are common).
Compound and Interleaved Sort Keys
For tables with multiple common filter patterns, you can use compound or interleaved sort keys:
{{ config(materialized='table',sort_type='compound',sort=['order_date', 'customer_id']) }}
Compound sort keys work best when queries filter on leading columns. Interleaved sort keys work when queries filter on any column in the key, but have higher maintenance overhead.
Late Binding Views
Redshift supports late binding views, which don't lock the underlying tables. This is useful for dbt views that reference tables that get dropped and recreated:
{{ config(materialized='view',bind=False) }}
Setting bind=False creates the view with WITH NO SCHEMA BINDING, meaning the view won't break if the underlying table is dropped and recreated (which happens during dbt run --full-refresh).
Incremental Models
For large fact tables, use incremental materializations to avoid full table rebuilds:
{{ config(materialized='incremental',unique_key='order_id',incremental_strategy='delete+insert',dist='customer_id',sort='order_date') }}selectorder_id,customer_id,order_date,amountfrom {{ source('raw', 'orders') }}{% if is_incremental() %}where order_date > (select max(order_date) from {{ this }}){% endif %}
The delete+insert strategy works well with Redshift since it avoids merge/upsert overhead.
ANALYZE and VACUUM
Redshift doesn't automatically update table statistics or reclaim disk space after deletes. After a full refresh or large incremental load, consider running:
ANALYZE analytics.fct_orders;VACUUM analytics.fct_orders;
You can automate this with dbt post-hooks:
{{ config(materialized='table',post_hook='ANALYZE {{ this }}') }}
Skip the Setup Hassle
Getting dbt-core connected to Redshift involves a fair amount of AWS plumbing — VPC configuration, security groups, IAM policies, user permissions, and WLM tuning. It works, but it's a lot of pieces to keep in sync, especially across multiple environments.
[ModelDock](https://modeldock.run) handles all of that for you. Connect your Redshift credentials, point it at your dbt project, and it takes care of scheduling, execution, and deployment. No Airflow, no VPC debugging, no profiles.yml to maintain.
Try ModelDock free and get your dbt models running against Redshift in minutes.