Running dbt-core with BigQuery: Complete Setup Guide
Step-by-step guide to setting up dbt-core with Google BigQuery — service accounts, dataset permissions, profiles.yml configuration, and common errors.
Google BigQuery is one of the most popular data warehouses for dbt-core projects. It's serverless, scales automatically, and the pricing model is straightforward. But the initial setup — especially around service accounts, IAM roles, and authentication — trips people up more than it should.
This guide walks through everything you need to get dbt-core running with BigQuery, from creating a service account to running your first model. We'll cover both the GCP Console and gcloud CLI approaches, plus the common errors you'll hit along the way.
Prerequisites
Before you start, make sure you have:
- A Google Cloud Platform (GCP) project with billing enabled. BigQuery has a generous free tier (1 TB of queries and 10 GB of storage per month), but billing still needs to be active.
- A BigQuery dataset in that project. If you don't have one yet, we'll create one below.
- Python 3.9+ and
pipinstalled locally. - **The
gcloudCLI** (optional but recommended). Install it from cloud.google.com/sdk.
If you already have a GCP project and dataset, skip ahead to the dbt-bigquery installation.
Creating a BigQuery Dataset
If you need a dataset to work with:
# Via gcloud CLIgcloud config set project your-gcp-project-idbq mk --dataset --location=US your-gcp-project-id:analytics# Or specify EU locationbq mk --dataset --location=EU your-gcp-project-id:analytics
You can also create one through the BigQuery console at console.cloud.google.com/bigquery — click your project, then "Create Dataset."
Pick the dataset location carefully. BigQuery datasets are region-locked, and you can't change the location after creation.
Installing dbt-bigquery
The dbt-bigquery adapter includes dbt-core as a dependency, so you only need one install:
# Create a virtual environment (recommended)python -m venv dbt-venvsource dbt-venv/bin/activate# Install dbt with the BigQuery adapterpip install dbt-bigquery
Verify the installation:
dbt --version
You should see both dbt-core and dbt-bigquery in the output. If you're using a specific dbt version, pin it:
pip install dbt-bigquery==1.9.0
Service Account Setup
dbt needs credentials to authenticate with BigQuery. The recommended approach for anything beyond local development is a service account with a JSON key file.
Option A: GCP Console
- Go to console.cloud.google.com/iam-admin/serviceaccounts
- Select your project
- Click Create Service Account
- Name it something descriptive, like
dbt-runner - Click Create and Continue
- Grant the necessary roles (see the permissions section below)
- Click Done
- Click the new service account, go to Keys > Add Key > Create new key
- Choose JSON and click Create
- Save the downloaded JSON file somewhere secure — you'll reference it in
profiles.yml
Option B: gcloud CLI
# Create the service accountgcloud iam service-accounts create dbt-runner \--display-name="dbt Runner" \--project=your-gcp-project-id# Grant BigQuery rolesgcloud projects add-iam-policy-binding your-gcp-project-id \--member="serviceAccount:dbt-runner@your-gcp-project-id.iam.gserviceaccount.com" \--role="roles/bigquery.dataEditor"gcloud projects add-iam-policy-binding your-gcp-project-id \--member="serviceAccount:dbt-runner@your-gcp-project-id.iam.gserviceaccount.com" \--role="roles/bigquery.jobUser"# Create and download the key filegcloud iam service-accounts keys create ~/dbt-bigquery-key.json \--iam-account=dbt-runner@your-gcp-project-id.iam.gserviceaccount.com
Keep the JSON key file out of version control. Add *.json to your .gitignore or store the file outside your project directory.
Required BigQuery Permissions
This is where most people get stuck. BigQuery's IAM model splits permissions across several roles, and dbt needs a specific combination to work properly.
Minimum Required Roles
| Role | Why dbt Needs It |
|---|---|
BigQuery Data Editor (roles/bigquery.dataEditor) | Create, update, and delete tables and views in datasets |
BigQuery Job User (roles/bigquery.jobUser) | Run queries (BigQuery jobs) in the project |
These two roles cover most use cases. If your dbt project only reads from certain datasets and writes to others, you can get more granular:
Fine-Grained Permissions (Optional)
| Role | Scope | Purpose |
|---|---|---|
BigQuery Data Viewer (roles/bigquery.dataViewer) | Source datasets | Read-only access to source tables |
BigQuery Data Editor (roles/bigquery.dataEditor) | Target dataset | Write access where dbt creates models |
BigQuery Job User (roles/bigquery.jobUser) | Project level | Execute queries |
You can assign roles at the dataset level instead of the project level for tighter security. Do this through the BigQuery console by opening the dataset, clicking Sharing, and adding the service account with the appropriate role.
Configuring profiles.yml
dbt uses profiles.yml to know how to connect to your warehouse. By default, it looks for this file at ~/.dbt/profiles.yml.
Method 1: Service Account JSON Key (Recommended)
# ~/.dbt/profiles.ymlmy_bigquery_project:target: devoutputs:dev:type: bigquerymethod: service-accountproject: your-gcp-project-iddataset: analyticsthreads: 4keyfile: /path/to/dbt-bigquery-key.jsonlocation: US # Must match your dataset locationtimeout_seconds: 300priority: interactiveretries: 1
Method 2: Service Account JSON Inline (Environment Variable)
If you don't want to deal with key files — useful in CI/CD or Docker environments — you can pass the JSON content directly:
# ~/.dbt/profiles.ymlmy_bigquery_project:target: prodoutputs:prod:type: bigquerymethod: service-account-jsonproject: your-gcp-project-iddataset: analyticsthreads: 4location: UStimeout_seconds: 300keyfile_json:type: service_accountproject_id: "{{ env_var('GCP_PROJECT_ID') }}"private_key_id: "{{ env_var('GCP_PRIVATE_KEY_ID') }}"private_key: "{{ env_var('GCP_PRIVATE_KEY') }}"client_email: "{{ env_var('GCP_CLIENT_EMAIL') }}"client_id: "{{ env_var('GCP_CLIENT_ID') }}"auth_uri: https://accounts.google.com/o/oauth2/authtoken_uri: https://oauth2.googleapis.com/tokenauth_provider_x509_cert_url: https://www.googleapis.com/oauth2/v1/certsclient_x509_cert_url: "{{ env_var('GCP_CERT_URL') }}"
This approach lets you inject credentials from environment variables, a secrets manager, or CI/CD secrets without ever writing a key file to disk.
Method 3: OAuth (Local Development)
For local development, OAuth is the easiest — no service account needed:
# ~/.dbt/profiles.ymlmy_bigquery_project:target: devoutputs:dev:type: bigquerymethod: oauthproject: your-gcp-project-iddataset: analytics_devthreads: 4location: UStimeout_seconds: 300
Then authenticate with:
gcloud auth application-default login
This opens a browser window for you to log in with your Google account. It works great for development, but don't use it in production or CI — there's no way to automate the browser login step.
Creating Your First Model and Running It
With credentials configured, let's verify everything works.
Initialize a dbt Project
If you don't already have a dbt project:
dbt init my_bigquery_project
When prompted, select bigquery as your adapter. dbt will create a project directory with the standard folder structure.
Create a Simple Model
Create a file at models/staging/stg_example.sql:
-- models/staging/stg_example.sqlwith source as (select 1 as id, 'Alice' as name, current_timestamp() as created_atunion allselect 2, 'Bob', current_timestamp()union allselect 3, 'Charlie', current_timestamp())selectid,name,created_atfrom source
This is a self-contained model that doesn't depend on any existing tables — useful for testing your connection.
Run It
# Test the connection firstdbt debug# If debug passes, run the modeldbt run
dbt debug checks your profiles.yml, credentials, and warehouse connectivity. If it passes, your setup is correct. If it fails, the error messages are usually specific enough to point you in the right direction.
After a successful dbt run, you should see a new view (or table, depending on your materialization) in your BigQuery dataset.
Verify in BigQuery
bq query --use_legacy_sql=false \'SELECT * FROM `your-gcp-project-id.analytics.stg_example`'
Or check the BigQuery console — your model should appear under the dataset you specified.
Common Errors and Fixes
Here are the errors you're most likely to hit, and how to fix them.
403: Access Denied
Access Denied: BigQuery BigQuery: Permission denied while globbing file pattern.
Cause: The service account doesn't have the right roles.
Fix: Make sure the service account has both roles/bigquery.dataEditor and roles/bigquery.jobUser. Double-check you're granting roles on the correct project.
# Verify current rolesgcloud projects get-iam-policy your-gcp-project-id \--flatten="bindings[].members" \--filter="bindings.members:dbt-runner@your-gcp-project-id.iam.gserviceaccount.com" \--format="table(bindings.role)"
Dataset Not Found
Not found: Dataset your-gcp-project-id:analytics was not found in location US
Cause: Either the dataset doesn't exist, or the location in profiles.yml doesn't match the dataset's actual location.
Fix: Check the dataset location in the BigQuery console and update profiles.yml to match. If your dataset is in europe-west1, set location: europe-west1. This is case-insensitive but must be the correct region.
Authentication Errors
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials.
Cause: dbt can't find your credentials. Either the keyfile path is wrong, the JSON file is malformed, or (for OAuth) you haven't run gcloud auth application-default login.
Fix: For service account auth, verify the keyfile path is absolute and the file exists. For OAuth, re-run gcloud auth application-default login. For the service-account-json method, make sure all environment variables are set and the private key includes the \n characters (not literal backslash-n).
Quota Exceeded
Exceeded rate limits: too many concurrent queries for this project_and_region
Cause: BigQuery has per-project concurrency limits (default is 100 concurrent queries).
Fix: Reduce the threads value in profiles.yml. Start with 4, increase only if builds are slow and you're not hitting quota limits. For large dbt projects, threads: 8 is usually the sweet spot.
Invalid Private Key
ValueError: Could not deserialize key data
Cause: The private key in your JSON key file or environment variable is corrupted, often from copy-paste issues stripping newline characters.
Fix: If using environment variables, make sure the private key preserves its \n characters. In bash:
export GCP_PRIVATE_KEY=$(cat dbt-bigquery-key.json | jq -r '.private_key')
Production Considerations
Once you've got dbt running locally with BigQuery, here's what to think about for production.
CI/CD
Run dbt build in your CI pipeline on every pull request against a development dataset. This catches schema errors before they hit production. Use the service-account-json method and inject credentials from your CI provider's secrets store (GitHub Actions secrets, GitLab CI variables, etc.).
# .github/workflows/dbt.ymlname: dbt CIon:pull_request:branches: [main]jobs:dbt-build:runs-on: ubuntu-lateststeps:- uses: actions/checkout@v4- uses: actions/setup-python@v5with:python-version: '3.11'- run: pip install dbt-bigquery- run: dbt deps- run: dbt build --target cienv:GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}GCP_PRIVATE_KEY: ${{ secrets.GCP_PRIVATE_KEY }}GCP_CLIENT_EMAIL: ${{ secrets.GCP_CLIENT_EMAIL }}# ... other credential fields
Docker
For production runs, containerize your dbt project:
FROM python:3.11-slimRUN pip install --no-cache-dir dbt-bigqueryWORKDIR /dbtCOPY . /dbt/ENTRYPOINT ["dbt"]CMD ["build", "--target", "prod"]
Credential Management
Never commit service account JSON keys to Git. In production, prefer one of these approaches:
- Workload Identity Federation (GKE / Cloud Run): No key files at all. The runtime environment authenticates automatically.
- Secret managers: GCP Secret Manager, HashiCorp Vault, or your cloud provider's equivalent.
- Environment variables: Injected at runtime from CI secrets or orchestrator configuration.
Workload Identity Federation is the gold standard if you're running on GCP infrastructure — it eliminates long-lived credentials entirely.
Scheduling
You need something to run dbt build on a schedule. The options range from a simple cron job to a full Airflow deployment. The right choice depends on how much operational overhead you're willing to take on. (We wrote a whole guide on dbt + Airflow if you want to go that route.)
A Simpler Path to Production
Getting dbt-core connected to BigQuery isn't hard once you know the steps. Getting it running reliably in production — with scheduling, credential rotation, monitoring, and CI/CD — is a different story.
ModelDock handles all of that for you. Connect your Git repo, enter your BigQuery service account credentials (encrypted with AES-256-GCM), set a schedule, and your dbt project runs in an isolated container with full logs and artifact storage. No infrastructure to manage, no DAGs to write.
It's free during the open beta. Give it a try at modeldock.run.