dbt-coretutorialwarehouse

Running dbt-core with BigQuery: Complete Setup Guide

Step-by-step guide to setting up dbt-core with Google BigQuery — service accounts, dataset permissions, profiles.yml configuration, and common errors.

ModelDock TeamFebruary 17, 202610 min read

Google BigQuery is one of the most popular data warehouses for dbt-core projects. It's serverless, scales automatically, and the pricing model is straightforward. But the initial setup — especially around service accounts, IAM roles, and authentication — trips people up more than it should.

This guide walks through everything you need to get dbt-core running with BigQuery, from creating a service account to running your first model. We'll cover both the GCP Console and gcloud CLI approaches, plus the common errors you'll hit along the way.

Prerequisites

Before you start, make sure you have:

A Google Cloud Platform (GCP) project with billing enabled. BigQuery has a generous free tier (1 TB of queries and 10 GB of storage per month), but billing still needs to be active.
A BigQuery dataset in that project. If you don't have one yet, we'll create one below.
Python 3.9+ and pip installed locally.
**The gcloud CLI** (optional but recommended). Install it from cloud.google.com/sdk.

If you already have a GCP project and dataset, skip ahead to the dbt-bigquery installation.

Creating a BigQuery Dataset

If you need a dataset to work with:

# Via gcloud CLI
gcloud config set project your-gcp-project-id
bq mk --dataset --location=US your-gcp-project-id:analytics

# Or specify EU location
bq mk --dataset --location=EU your-gcp-project-id:analytics

You can also create one through the BigQuery console at console.cloud.google.com/bigquery — click your project, then "Create Dataset."

Pick the dataset location carefully. BigQuery datasets are region-locked, and you can't change the location after creation.

Installing dbt-bigquery

The dbt-bigquery adapter includes dbt-core as a dependency, so you only need one install:

# Create a virtual environment (recommended)
python -m venv dbt-venv
source dbt-venv/bin/activate

# Install dbt with the BigQuery adapter
pip install dbt-bigquery

Verify the installation:

dbt --version

You should see both dbt-core and dbt-bigquery in the output. If you're using a specific dbt version, pin it:

pip install dbt-bigquery==1.9.0

Service Account Setup

dbt needs credentials to authenticate with BigQuery. The recommended approach for anything beyond local development is a service account with a JSON key file.

Option A: GCP Console

Go to console.cloud.google.com/iam-admin/serviceaccounts
Select your project
Click Create Service Account
Name it something descriptive, like dbt-runner
Click Create and Continue
Grant the necessary roles (see the permissions section below)
Click Done
Click the new service account, go to Keys > Add Key > Create new key
Choose JSON and click Create
Save the downloaded JSON file somewhere secure — you'll reference it in profiles.yml

Option B: gcloud CLI

# Create the service account
gcloud iam service-accounts create dbt-runner \
  --display-name="dbt Runner" \
  --project=your-gcp-project-id

# Grant BigQuery roles
gcloud projects add-iam-policy-binding your-gcp-project-id \
  --member="serviceAccount:dbt-runner@your-gcp-project-id.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataEditor"

gcloud projects add-iam-policy-binding your-gcp-project-id \
  --member="serviceAccount:dbt-runner@your-gcp-project-id.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

# Create and download the key file
gcloud iam service-accounts keys create ~/dbt-bigquery-key.json \
  --iam-account=dbt-runner@your-gcp-project-id.iam.gserviceaccount.com

Keep the JSON key file out of version control. Add *.json to your .gitignore or store the file outside your project directory.

Required BigQuery Permissions

This is where most people get stuck. BigQuery's IAM model splits permissions across several roles, and dbt needs a specific combination to work properly.

Minimum Required Roles

Role	Why dbt Needs It
BigQuery Data Editor (`roles/bigquery.dataEditor`)	Create, update, and delete tables and views in datasets
BigQuery Job User (`roles/bigquery.jobUser`)	Run queries (BigQuery jobs) in the project

These two roles cover most use cases. If your dbt project only reads from certain datasets and writes to others, you can get more granular:

Fine-Grained Permissions (Optional)

Role	Scope	Purpose
BigQuery Data Viewer (`roles/bigquery.dataViewer`)	Source datasets	Read-only access to source tables
BigQuery Data Editor (`roles/bigquery.dataEditor`)	Target dataset	Write access where dbt creates models
BigQuery Job User (`roles/bigquery.jobUser`)	Project level	Execute queries

You can assign roles at the dataset level instead of the project level for tighter security. Do this through the BigQuery console by opening the dataset, clicking Sharing, and adding the service account with the appropriate role.

Configuring profiles.yml

dbt uses profiles.yml to know how to connect to your warehouse. By default, it looks for this file at ~/.dbt/profiles.yml.

Method 1: Service Account JSON Key (Recommended)

# ~/.dbt/profiles.yml
my_bigquery_project:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: service-account
      project: your-gcp-project-id
      dataset: analytics
      threads: 4
      keyfile: /path/to/dbt-bigquery-key.json
      location: US  # Must match your dataset location
      timeout_seconds: 300
      priority: interactive
      retries: 1

Method 2: Service Account JSON Inline (Environment Variable)

If you don't want to deal with key files — useful in CI/CD or Docker environments — you can pass the JSON content directly:

# ~/.dbt/profiles.yml
my_bigquery_project:
  target: prod
  outputs:
    prod:
      type: bigquery
      method: service-account-json
      project: your-gcp-project-id
      dataset: analytics
      threads: 4
      location: US
      timeout_seconds: 300
      keyfile_json:
        type: service_account
        project_id: "{{ env_var('GCP_PROJECT_ID') }}"
        private_key_id: "{{ env_var('GCP_PRIVATE_KEY_ID') }}"
        private_key: "{{ env_var('GCP_PRIVATE_KEY') }}"
        client_email: "{{ env_var('GCP_CLIENT_EMAIL') }}"
        client_id: "{{ env_var('GCP_CLIENT_ID') }}"
        auth_uri: https://accounts.google.com/o/oauth2/auth
        token_uri: https://oauth2.googleapis.com/token
        auth_provider_x509_cert_url: https://www.googleapis.com/oauth2/v1/certs
        client_x509_cert_url: "{{ env_var('GCP_CERT_URL') }}"

This approach lets you inject credentials from environment variables, a secrets manager, or CI/CD secrets without ever writing a key file to disk.

Method 3: OAuth (Local Development)

For local development, OAuth is the easiest — no service account needed:

# ~/.dbt/profiles.yml
my_bigquery_project:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: your-gcp-project-id
      dataset: analytics_dev
      threads: 4
      location: US
      timeout_seconds: 300

Then authenticate with:

gcloud auth application-default login

This opens a browser window for you to log in with your Google account. It works great for development, but don't use it in production or CI — there's no way to automate the browser login step.

Creating Your First Model and Running It

With credentials configured, let's verify everything works.

Initialize a dbt Project

If you don't already have a dbt project:

dbt init my_bigquery_project

When prompted, select bigquery as your adapter. dbt will create a project directory with the standard folder structure.

Create a Simple Model

Create a file at models/staging/stg_example.sql:

-- models/staging/stg_example.sql
with source as (
    select 1 as id, 'Alice' as name, current_timestamp() as created_at
    union all
    select 2, 'Bob', current_timestamp()
    union all
    select 3, 'Charlie', current_timestamp()
)

select
    id,
    name,
    created_at
from source

This is a self-contained model that doesn't depend on any existing tables — useful for testing your connection.

Run It

# Test the connection first
dbt debug

# If debug passes, run the model
dbt run

dbt debug checks your profiles.yml, credentials, and warehouse connectivity. If it passes, your setup is correct. If it fails, the error messages are usually specific enough to point you in the right direction.

After a successful dbt run, you should see a new view (or table, depending on your materialization) in your BigQuery dataset.

Verify in BigQuery

bq query --use_legacy_sql=false \
  'SELECT * FROM `your-gcp-project-id.analytics.stg_example`'

Or check the BigQuery console — your model should appear under the dataset you specified.

Common Errors and Fixes

Here are the errors you're most likely to hit, and how to fix them.

403: Access Denied

Access Denied: BigQuery BigQuery: Permission denied while globbing file pattern.

Cause: The service account doesn't have the right roles.

Fix: Make sure the service account has both roles/bigquery.dataEditor and roles/bigquery.jobUser. Double-check you're granting roles on the correct project.

# Verify current roles
gcloud projects get-iam-policy your-gcp-project-id \
  --flatten="bindings[].members" \
  --filter="bindings.members:dbt-runner@your-gcp-project-id.iam.gserviceaccount.com" \
  --format="table(bindings.role)"

Dataset Not Found

Not found: Dataset your-gcp-project-id:analytics was not found in location US

Cause: Either the dataset doesn't exist, or the location in profiles.yml doesn't match the dataset's actual location.

Fix: Check the dataset location in the BigQuery console and update profiles.yml to match. If your dataset is in europe-west1, set location: europe-west1. This is case-insensitive but must be the correct region.

Authentication Errors

google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials.

Cause: dbt can't find your credentials. Either the keyfile path is wrong, the JSON file is malformed, or (for OAuth) you haven't run gcloud auth application-default login.

Fix: For service account auth, verify the keyfile path is absolute and the file exists. For OAuth, re-run gcloud auth application-default login. For the service-account-json method, make sure all environment variables are set and the private key includes the \n characters (not literal backslash-n).

Quota Exceeded

Exceeded rate limits: too many concurrent queries for this project_and_region

Cause: BigQuery has per-project concurrency limits (default is 100 concurrent queries).

Fix: Reduce the threads value in profiles.yml. Start with 4, increase only if builds are slow and you're not hitting quota limits. For large dbt projects, threads: 8 is usually the sweet spot.

Invalid Private Key

ValueError: Could not deserialize key data

Cause: The private key in your JSON key file or environment variable is corrupted, often from copy-paste issues stripping newline characters.

Fix: If using environment variables, make sure the private key preserves its \n characters. In bash:

export GCP_PRIVATE_KEY=$(cat dbt-bigquery-key.json | jq -r '.private_key')

Production Considerations

Once you've got dbt running locally with BigQuery, here's what to think about for production.

CI/CD

Run dbt build in your CI pipeline on every pull request against a development dataset. This catches schema errors before they hit production. Use the service-account-json method and inject credentials from your CI provider's secrets store (GitHub Actions secrets, GitLab CI variables, etc.).

# .github/workflows/dbt.yml
name: dbt CI
on:
  pull_request:
    branches: [main]

jobs:
  dbt-build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install dbt-bigquery
      - run: dbt deps
      - run: dbt build --target ci
        env:
          GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
          GCP_PRIVATE_KEY: ${{ secrets.GCP_PRIVATE_KEY }}
          GCP_CLIENT_EMAIL: ${{ secrets.GCP_CLIENT_EMAIL }}
          # ... other credential fields

Docker

For production runs, containerize your dbt project:

FROM python:3.11-slim

RUN pip install --no-cache-dir dbt-bigquery

WORKDIR /dbt
COPY . /dbt/

ENTRYPOINT ["dbt"]
CMD ["build", "--target", "prod"]

Credential Management

Never commit service account JSON keys to Git. In production, prefer one of these approaches:

Workload Identity Federation (GKE / Cloud Run): No key files at all. The runtime environment authenticates automatically.
Secret managers: GCP Secret Manager, HashiCorp Vault, or your cloud provider's equivalent.
Environment variables: Injected at runtime from CI secrets or orchestrator configuration.

Workload Identity Federation is the gold standard if you're running on GCP infrastructure — it eliminates long-lived credentials entirely.

Scheduling

You need something to run dbt build on a schedule. The options range from a simple cron job to a full Airflow deployment. The right choice depends on how much operational overhead you're willing to take on. (We wrote a whole guide on dbt + Airflow if you want to go that route.)

A Simpler Path to Production

Getting dbt-core connected to BigQuery isn't hard once you know the steps. Getting it running reliably in production — with scheduling, credential rotation, monitoring, and CI/CD — is a different story.

ModelDock handles all of that for you. Connect your Git repo, enter your BigQuery service account credentials (encrypted with AES-256-GCM), set a schedule, and your dbt project runs in an isolated container with full logs and artifact storage. No infrastructure to manage, no DAGs to write.

It's free during the open beta. Give it a try at modeldock.run.