Back to Blog
dbt-coretutorialwarehouse

Running dbt-core with BigQuery: Complete Setup Guide

Step-by-step guide to setting up dbt-core with Google BigQuery — service accounts, dataset permissions, profiles.yml configuration, and common errors.

ModelDock TeamFebruary 17, 202610 min read

Google BigQuery is one of the most popular data warehouses for dbt-core projects. It's serverless, scales automatically, and the pricing model is straightforward. But the initial setup — especially around service accounts, IAM roles, and authentication — trips people up more than it should.

This guide walks through everything you need to get dbt-core running with BigQuery, from creating a service account to running your first model. We'll cover both the GCP Console and gcloud CLI approaches, plus the common errors you'll hit along the way.

Prerequisites

Before you start, make sure you have:

  • A Google Cloud Platform (GCP) project with billing enabled. BigQuery has a generous free tier (1 TB of queries and 10 GB of storage per month), but billing still needs to be active.
  • A BigQuery dataset in that project. If you don't have one yet, we'll create one below.
  • Python 3.9+ and pip installed locally.
  • **The gcloud CLI** (optional but recommended). Install it from cloud.google.com/sdk.

If you already have a GCP project and dataset, skip ahead to the dbt-bigquery installation.

Creating a BigQuery Dataset

If you need a dataset to work with:

# Via gcloud CLI
gcloud config set project your-gcp-project-id
bq mk --dataset --location=US your-gcp-project-id:analytics
# Or specify EU location
bq mk --dataset --location=EU your-gcp-project-id:analytics

You can also create one through the BigQuery console at console.cloud.google.com/bigquery — click your project, then "Create Dataset."

Pick the dataset location carefully. BigQuery datasets are region-locked, and you can't change the location after creation.

Installing dbt-bigquery

The dbt-bigquery adapter includes dbt-core as a dependency, so you only need one install:

# Create a virtual environment (recommended)
python -m venv dbt-venv
source dbt-venv/bin/activate
# Install dbt with the BigQuery adapter
pip install dbt-bigquery

Verify the installation:

dbt --version

You should see both dbt-core and dbt-bigquery in the output. If you're using a specific dbt version, pin it:

pip install dbt-bigquery==1.9.0

Service Account Setup

dbt needs credentials to authenticate with BigQuery. The recommended approach for anything beyond local development is a service account with a JSON key file.

Option A: GCP Console

  1. Go to console.cloud.google.com/iam-admin/serviceaccounts
  2. Select your project
  3. Click Create Service Account
  4. Name it something descriptive, like dbt-runner
  5. Click Create and Continue
  6. Grant the necessary roles (see the permissions section below)
  7. Click Done
  8. Click the new service account, go to Keys > Add Key > Create new key
  9. Choose JSON and click Create
  10. Save the downloaded JSON file somewhere secure — you'll reference it in profiles.yml

Option B: gcloud CLI

# Create the service account
gcloud iam service-accounts create dbt-runner \
--display-name="dbt Runner" \
--project=your-gcp-project-id
# Grant BigQuery roles
gcloud projects add-iam-policy-binding your-gcp-project-id \
--member="serviceAccount:dbt-runner@your-gcp-project-id.iam.gserviceaccount.com" \
--role="roles/bigquery.dataEditor"
gcloud projects add-iam-policy-binding your-gcp-project-id \
--member="serviceAccount:dbt-runner@your-gcp-project-id.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"
# Create and download the key file
gcloud iam service-accounts keys create ~/dbt-bigquery-key.json \
--iam-account=dbt-runner@your-gcp-project-id.iam.gserviceaccount.com

Keep the JSON key file out of version control. Add *.json to your .gitignore or store the file outside your project directory.

Required BigQuery Permissions

This is where most people get stuck. BigQuery's IAM model splits permissions across several roles, and dbt needs a specific combination to work properly.

Minimum Required Roles

RoleWhy dbt Needs It
BigQuery Data Editor (roles/bigquery.dataEditor)Create, update, and delete tables and views in datasets
BigQuery Job User (roles/bigquery.jobUser)Run queries (BigQuery jobs) in the project

These two roles cover most use cases. If your dbt project only reads from certain datasets and writes to others, you can get more granular:

Fine-Grained Permissions (Optional)

RoleScopePurpose
BigQuery Data Viewer (roles/bigquery.dataViewer)Source datasetsRead-only access to source tables
BigQuery Data Editor (roles/bigquery.dataEditor)Target datasetWrite access where dbt creates models
BigQuery Job User (roles/bigquery.jobUser)Project levelExecute queries

You can assign roles at the dataset level instead of the project level for tighter security. Do this through the BigQuery console by opening the dataset, clicking Sharing, and adding the service account with the appropriate role.

Configuring profiles.yml

dbt uses profiles.yml to know how to connect to your warehouse. By default, it looks for this file at ~/.dbt/profiles.yml.

Method 1: Service Account JSON Key (Recommended)

# ~/.dbt/profiles.yml
my_bigquery_project:
target: dev
outputs:
dev:
type: bigquery
method: service-account
project: your-gcp-project-id
dataset: analytics
threads: 4
keyfile: /path/to/dbt-bigquery-key.json
location: US # Must match your dataset location
timeout_seconds: 300
priority: interactive
retries: 1

Method 2: Service Account JSON Inline (Environment Variable)

If you don't want to deal with key files — useful in CI/CD or Docker environments — you can pass the JSON content directly:

# ~/.dbt/profiles.yml
my_bigquery_project:
target: prod
outputs:
prod:
type: bigquery
method: service-account-json
project: your-gcp-project-id
dataset: analytics
threads: 4
location: US
timeout_seconds: 300
keyfile_json:
type: service_account
project_id: "{{ env_var('GCP_PROJECT_ID') }}"
private_key_id: "{{ env_var('GCP_PRIVATE_KEY_ID') }}"
private_key: "{{ env_var('GCP_PRIVATE_KEY') }}"
client_email: "{{ env_var('GCP_CLIENT_EMAIL') }}"
client_id: "{{ env_var('GCP_CLIENT_ID') }}"
auth_uri: https://accounts.google.com/o/oauth2/auth
token_uri: https://oauth2.googleapis.com/token
auth_provider_x509_cert_url: https://www.googleapis.com/oauth2/v1/certs
client_x509_cert_url: "{{ env_var('GCP_CERT_URL') }}"

This approach lets you inject credentials from environment variables, a secrets manager, or CI/CD secrets without ever writing a key file to disk.

Method 3: OAuth (Local Development)

For local development, OAuth is the easiest — no service account needed:

# ~/.dbt/profiles.yml
my_bigquery_project:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: your-gcp-project-id
dataset: analytics_dev
threads: 4
location: US
timeout_seconds: 300

Then authenticate with:

gcloud auth application-default login

This opens a browser window for you to log in with your Google account. It works great for development, but don't use it in production or CI — there's no way to automate the browser login step.

Creating Your First Model and Running It

With credentials configured, let's verify everything works.

Initialize a dbt Project

If you don't already have a dbt project:

dbt init my_bigquery_project

When prompted, select bigquery as your adapter. dbt will create a project directory with the standard folder structure.

Create a Simple Model

Create a file at models/staging/stg_example.sql:

-- models/staging/stg_example.sql
with source as (
select 1 as id, 'Alice' as name, current_timestamp() as created_at
union all
select 2, 'Bob', current_timestamp()
union all
select 3, 'Charlie', current_timestamp()
)
select
id,
name,
created_at
from source

This is a self-contained model that doesn't depend on any existing tables — useful for testing your connection.

Run It

# Test the connection first
dbt debug
# If debug passes, run the model
dbt run

dbt debug checks your profiles.yml, credentials, and warehouse connectivity. If it passes, your setup is correct. If it fails, the error messages are usually specific enough to point you in the right direction.

After a successful dbt run, you should see a new view (or table, depending on your materialization) in your BigQuery dataset.

Verify in BigQuery

bq query --use_legacy_sql=false \
'SELECT * FROM `your-gcp-project-id.analytics.stg_example`'

Or check the BigQuery console — your model should appear under the dataset you specified.

Common Errors and Fixes

Here are the errors you're most likely to hit, and how to fix them.

403: Access Denied

Access Denied: BigQuery BigQuery: Permission denied while globbing file pattern.

Cause: The service account doesn't have the right roles.

Fix: Make sure the service account has both roles/bigquery.dataEditor and roles/bigquery.jobUser. Double-check you're granting roles on the correct project.

# Verify current roles
gcloud projects get-iam-policy your-gcp-project-id \
--flatten="bindings[].members" \
--filter="bindings.members:dbt-runner@your-gcp-project-id.iam.gserviceaccount.com" \
--format="table(bindings.role)"

Dataset Not Found

Not found: Dataset your-gcp-project-id:analytics was not found in location US

Cause: Either the dataset doesn't exist, or the location in profiles.yml doesn't match the dataset's actual location.

Fix: Check the dataset location in the BigQuery console and update profiles.yml to match. If your dataset is in europe-west1, set location: europe-west1. This is case-insensitive but must be the correct region.

Authentication Errors

google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials.

Cause: dbt can't find your credentials. Either the keyfile path is wrong, the JSON file is malformed, or (for OAuth) you haven't run gcloud auth application-default login.

Fix: For service account auth, verify the keyfile path is absolute and the file exists. For OAuth, re-run gcloud auth application-default login. For the service-account-json method, make sure all environment variables are set and the private key includes the \n characters (not literal backslash-n).

Quota Exceeded

Exceeded rate limits: too many concurrent queries for this project_and_region

Cause: BigQuery has per-project concurrency limits (default is 100 concurrent queries).

Fix: Reduce the threads value in profiles.yml. Start with 4, increase only if builds are slow and you're not hitting quota limits. For large dbt projects, threads: 8 is usually the sweet spot.

Invalid Private Key

ValueError: Could not deserialize key data

Cause: The private key in your JSON key file or environment variable is corrupted, often from copy-paste issues stripping newline characters.

Fix: If using environment variables, make sure the private key preserves its \n characters. In bash:

export GCP_PRIVATE_KEY=$(cat dbt-bigquery-key.json | jq -r '.private_key')

Production Considerations

Once you've got dbt running locally with BigQuery, here's what to think about for production.

CI/CD

Run dbt build in your CI pipeline on every pull request against a development dataset. This catches schema errors before they hit production. Use the service-account-json method and inject credentials from your CI provider's secrets store (GitHub Actions secrets, GitLab CI variables, etc.).

# .github/workflows/dbt.yml
name: dbt CI
on:
pull_request:
branches: [main]
jobs:
dbt-build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install dbt-bigquery
- run: dbt deps
- run: dbt build --target ci
env:
GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
GCP_PRIVATE_KEY: ${{ secrets.GCP_PRIVATE_KEY }}
GCP_CLIENT_EMAIL: ${{ secrets.GCP_CLIENT_EMAIL }}
# ... other credential fields

Docker

For production runs, containerize your dbt project:

FROM python:3.11-slim
RUN pip install --no-cache-dir dbt-bigquery
WORKDIR /dbt
COPY . /dbt/
ENTRYPOINT ["dbt"]
CMD ["build", "--target", "prod"]

Credential Management

Never commit service account JSON keys to Git. In production, prefer one of these approaches:

  • Workload Identity Federation (GKE / Cloud Run): No key files at all. The runtime environment authenticates automatically.
  • Secret managers: GCP Secret Manager, HashiCorp Vault, or your cloud provider's equivalent.
  • Environment variables: Injected at runtime from CI secrets or orchestrator configuration.

Workload Identity Federation is the gold standard if you're running on GCP infrastructure — it eliminates long-lived credentials entirely.

Scheduling

You need something to run dbt build on a schedule. The options range from a simple cron job to a full Airflow deployment. The right choice depends on how much operational overhead you're willing to take on. (We wrote a whole guide on dbt + Airflow if you want to go that route.)

A Simpler Path to Production

Getting dbt-core connected to BigQuery isn't hard once you know the steps. Getting it running reliably in production — with scheduling, credential rotation, monitoring, and CI/CD — is a different story.

ModelDock handles all of that for you. Connect your Git repo, enter your BigQuery service account credentials (encrypted with AES-256-GCM), set a schedule, and your dbt project runs in an isolated container with full logs and artifact storage. No infrastructure to manage, no DAGs to write.

It's free during the open beta. Give it a try at modeldock.run.

Ready to run dbt-core in production?

ModelDock handles scheduling, infrastructure, and credential management so you don't have to.

Start For Free