In 2022, our data engineering team at a large retail banking organisation inherited what can only be described as a functioning disaster: 340 dbt models across two repositories, zero naming conventions, a monolithic DAG taking 6.5 hours to complete, and 11 engineers constantly stepping on each other's work. Production incidents linked to bad transformations occurred roughly twice a month.
Eighteen months later: 560+ models across a clean domain-driven structure, end-to-end pipeline runtime reduced to 87 minutes, zero naming inconsistencies, and a CI/CD gate catching data quality failures before they reached risk or compliance teams.
This article is not about dbt features. It is about the specific architectural, operational, and governance decisions that allowed a 12-person team to confidently run dbt at scale across 500+ TB of banking data — including customer PII, AML transaction feeds, internet banking event streams, and regulatory datasets.
Everything here is drawn from real production experience. The numbers are real. The failures are real. The solutions can be implemented starting this week.
01 · Architecture Overview: Data Flow at Scale
Source Systems
Our data originated from four distinct source categories. Core banking systems running on a legacy Oracle platform emitted nightly full extracts as compressed CSVs delivered to AWS S3 via a managed file transfer service. Event streams — internet banking clickstreams, payment initiation events, and fraud signals — ingested via Kafka topics published by application teams at 4.2 million events per day. Regulatory and compliance feeds — KYC verification results, AML screening outputs, and sanctions list match data — pushed by third-party vendors in SFTP batches. Internal operational systems — HR, branch performance, product pricing — arriving as scheduled API exports.
Ingestion Layer: S3 → GCS
We used a hybrid ingestion approach. Core banking extracts landed in AWS S3, where a lightweight Python-based Lambda validated file integrity and triggered a cross-cloud transfer job (using Google Cloud Storage Transfer Service) to move validated files into GCS landing buckets. Kafka streams were consumed by Dataflow pipelines that landed raw JSON payloads into GCS in 15-minute micro-batch windows.
Transformation Layer: GCS → BigQuery → dbt
From GCS, data was loaded into BigQuery raw datasets using scheduled BigQuery Data Transfer jobs for bulk files and streaming inserts for near-real-time event data. dbt sat entirely within BigQuery, structured across five clearly defined data zones:
dbt models only existed from staging_* onwards. Raw datasets were entirely managed by ingestion pipelines, not dbt. This boundary was sacred — and enforcing it saved us from numerous incidents where someone tried to "just quickly fix" something in the raw layer.
Data Volumes
At peak, our BigQuery environment managed 620 TB across all datasets (raw + transformed), approximately 1.1 billion transaction rows in the core transactions fact table (partitioned by transaction date, 36 months hot retention), 14 million customer records with full history, and 230 GB of daily incremental data across all sources.
02 · The Breaking Point: When dbt Doesn't Scale by Default
By the time our team recognised a scaling problem, the symptoms were impossible to ignore.
DAG Explosion
With 340 models in a single project, the dbt DAG had become visually and operationally unmanageable. Engineers would open the lineage graph and immediately close it. Nobody had a mental model of how data flowed from source to mart anymore.
The deeper problem was phantom dependencies — models referencing upstream sources through multiple unnecessary intermediate steps, creating chains of 8–10 hops where 3 would have sufficed. One mart model had 14 upstream dependencies, 6 of which were only there because someone had copied and extended an older model without pruning the lineage.
Runtime: 6 Hours and Climbing
Our full refresh run was taking 6 hours 20 minutes. For a regulated bank, any incident discovered after 6pm could not be resolved and validated before the next business day. The root cause was not BigQuery performance — it was dbt execution order and model design. We had 80+ models running as full-table scans on tables with hundreds of millions of rows. No partitioning. No clustering. Sequential execution of models that could have run in parallel.
Developer Conflicts
With 11 engineers on a single dbt project in a monorepo, merge conflicts were a weekly event. Two engineers working on adjacent domains would modify the same schema.yml file, and reconciling those conflicts often introduced subtle errors not caught until production. We had no enforced ownership model. Any engineer could modify any model. This was philosophically nice ("shared ownership!") and operationally catastrophic.
Inconsistent Naming and Modeling Patterns
A brief audit revealed: 4 different conventions for staging model names (stg_, staging_, s_, and unprefixed), 3 different approaches to handling null values in the same fact table, 2 models named customer_metrics in different folders producing different numbers, and no consistent approach to grain documentation.
Debugging and Lineage Tracking
When a regulatory report produced incorrect AML risk scores, tracing the issue through our dbt lineage took 4.5 hours. The problem was a type casting error in a staging model introduced 6 weeks earlier, buried under 9 downstream dependencies. We had no automated data quality checks between layers. By the time the error surfaced in the reporting layer, it had propagated through dozens of models.
03 · Model Organisation Strategies That Actually Work
Layered Architecture with Hard Zone Boundaries
The first structural decision was to enforce hard boundaries between data zones — not as a convention, but as a technical constraint baked into dbt_project.yml:
dbt_project.ymlmodels: banking_platform: staging: +materialized: view +tags: ["staging"] intermediate: +materialized: ephemeral +tags: ["intermediate"] marts: +materialized: table +tags: ["mart"] payments: +materialized: incremental accounts: +materialized: incremental reporting: +materialized: view +tags: ["reporting", "regulated"] +meta: access_tier: "restricted"
Staging models were views — cheap to run, easy to debug, always reflecting current raw data. Intermediate models were ephemeral — they exist in the DAG but not as physical objects in BigQuery, reducing storage costs significantly. Marts were incremental tables. No mart model was permitted to reference another mart model. This single rule eliminated an entire class of dependency hell.
Domain-Driven Structure
We reorganised from a flat technical structure into a domain-driven folder hierarchy, with each domain having a designated owning team. Pull requests touching /compliance/ required approval from both an engineer and a compliance data steward — enforced via GitHub CODEOWNERS.
Domain-Driven Folder Structuremodels/ ├── staging/ │ ├── core_banking/ # stg_accounts, stg_transactions │ ├── internet_banking/ # stg_ib_sessions, stg_ib_events │ └── compliance/ # stg_kyc_verifications, stg_aml_alerts ├── intermediate/ │ ├── payments/ # int_payment_enriched │ └── customers/ # int_customer_360 └── marts/ ├── payments/ # fct_daily_payments, dim_payment_channel ├── accounts/ # fct_daily_balances, dim_account └── compliance/ # fct_aml_risk_scores, fct_regulatory_positions
Naming Conventions — Enforced, Not Suggested
We adopted a strict naming convention and wrote a custom dbt macro that would fail the run if any model violated it:
| Layer | Prefix | Example |
|---|---|---|
| Staging | stg_ | stg_transactions |
| Intermediate | int_ | int_customer_metrics |
| Fact tables | fct_ | fct_daily_balances |
| Dimension tables | dim_ | dim_account |
| Reporting | rpt_ | rpt_aml_monthly_summary |
A model named outside this convention would fail in CI before reaching review. The first week this was enforced, three PRs were rejected. After that, nobody forgot.
04 · DAG Performance: From 6 Hours to 87 Minutes
Incremental Models for Everything Over 10M Rows
The single highest-impact change: converting 23 full-refresh fact models to incremental. Our transactions fact table alone accounted for 2 hours of the original 6-hour runtime.
fct_daily_balances.sql — Incremental Model Config{{ config( materialized='incremental', unique_key='balance_snapshot_id', incremental_strategy='merge', partition_by={ "field": "snapshot_date", "data_type": "date", "granularity": "day" }, cluster_by=['account_id', 'product_type'], on_schema_change='sync_all_columns' ) }} SELECT {{ dbt_utils.generate_surrogate_key(['account_id', 'snapshot_date']) }} AS balance_snapshot_id, account_id, snapshot_date, closing_balance, available_balance, product_type, currency_code FROM {{ ref('int_account_daily_position') }} {% if is_incremental() %} WHERE snapshot_date >= (SELECT MAX(snapshot_date) FROM {{ this }}) - INTERVAL 3 DAY {% endif %}
The 3-day lookback window handled late-arriving corrections from the core banking system without requiring a full refresh.
Partitioning and Clustering Strategy
We applied a consistent BigQuery optimisation strategy: partition by date on any model with a time dimension; cluster by the most common filter columns (account_id, customer_id, product_type); avoid clustering on high-cardinality string columns. Before optimisation, a query scanning fct_daily_payments for a single month processed 340 GB. After partition pruning and clustering: 4.2 GB.
Parallelism via Thread Tuning
The default dbt threads setting of 4 was insufficient. BigQuery handles concurrency well. We tuned to 16 threads per environment — and with a properly pruned DAG, models previously executing sequentially ran in large parallel batches, reducing runtime by approximately 40 minutes alone.
Before vs. After
| Metric | Before | After |
|---|---|---|
| Full pipeline runtime | 6h 20min | 87 min |
| Models on full refresh | 340 (all) | 47 (dimensions + lookups only) |
| BigQuery slot utilisation peak | 94% | 61% |
| Monthly BigQuery compute cost | $8,400 | $3,100 |
| Data quality test coverage | 12% of models | 89% of models |
05 · Refactoring at Scale with dbt-osmosis
By 300 models, our schema.yml files were in entropy. Some models had no documentation. Others had outdated column descriptions copied from tables refactored away from long ago. dbt-osmosis allowed us to automate the majority of remediation that would otherwise have taken weeks.
What dbt-osmosis Did For Us
Schema propagation was the killer feature. When a staging model added a new column — say, risk_band added to stg_accounts — dbt-osmosis automatically propagated the column description down to every intermediate and mart model that referenced it through ref(). Previously this required manually editing 6–8 schema.yml files per new column.
dbt-osmosis — Schema Propagation# Sync column descriptions from upstream to all downstream models dbt-osmosis yaml refactor --project-dir . --profiles-dir . # Organise schema files to match the new folder structure dbt-osmosis yaml organise --project-dir . --profiles-dir .
Legacy model migration was our most painful project. We had 80 models built before the naming convention existed, sitting in a flat /models/legacy/ folder. dbt-osmosis handled the schema.yml restructuring for the migration, saving approximately 3 weeks of manual work.
06 · Enforcing Standards Across a Team of 12
CI/CD Pipeline Structure
Every dbt model change went through a four-stage CI pipeline before merging to main:
.github/workflows/dbt_ci.ymlstages: - name: Lint and naming check run: | dbt debug python scripts/validate_model_names.py # Custom prefix enforcement - name: Compile and parse run: dbt compile --select state:modified+ - name: Test on modified models run: | dbt run --select state:modified+ --target ci dbt test --select state:modified+ --target ci - name: Documentation coverage check run: python scripts/check_doc_coverage.py --min-coverage 80
The state:modified+ selector was critical — it ran tests not only on the changed model but on all downstream models. A breaking change to stg_transactions would trigger tests on every mart and report depending on it.
Automated Testing Strategy — Three Tiers
Tier 1 — Always run (every model, no exceptions): not_null and unique on all primary keys; accepted_values on all categorical columns with known enumerations.
Tier 2 — Run on all mart and reporting models: row count assertions against previous run (±15% threshold triggers alert, not failure); referential integrity between fact and dimension tables; custom freshness checks on time-partitioned tables.
Tier 3 — Run on compliance and regulatory models only: balance reconciliation checks (total in dbt mart must match source control totals ±0.01%); record completeness checks. Tier 3 failures blocked deployment completely. No human override.
07 · Handling PII and Compliance in a Banking Environment
In banking, you cannot treat PII as an afterthought. It has to be a first-class design constraint.
Column-Level Security in BigQuery
We used BigQuery's column-level security features, managed through dbt model configuration and Terraform-provisioned policy tags:
stg_customers.sql — PII Masking MacroSELECT customer_id, -- not PII {{ mask_pii('first_name') }} AS first_name, -- masked in non-prod {{ mask_pii('last_name') }} AS last_name, {{ mask_pii('date_of_birth') }} AS date_of_birth, {{ mask_pii('national_id_number') }} AS national_id_number, {{ hash_identifier('email_address') }} AS email_hash, -- hashed, original removed account_open_date, customer_segment FROM {{ source('core_banking', 'raw_customers') }} -- macros/mask_pii.sql {% macro mask_pii(column_name) %} {% if target.name == 'prod' %} {{ column_name }} {% else %} SHA256(CAST({{ column_name }} AS STRING)) {% endif %} {% endmacro %}
Role-Based Access Control
| Tier | Datasets Accessible | Teams |
|---|---|---|
| Standard | mart_*, reporting_public_* | Analytics, Product |
| Elevated | + staging_*, intermediate_* | Data Engineering |
| Restricted | + reporting_compliance_*, reporting_regulatory_* | Compliance, Risk, Audit |
Restricted tier access required a formal access request, manager approval, and quarterly review — implemented as BigQuery IAM bindings managed in Terraform, not ad-hoc grants.
Audit Trails
Every model in the reporting_regulatory_* dataset included an automatically appended audit metadata block via a post-hook that wrote run metadata (model name, run timestamp, row count, dbt invocation ID) to a dedicated audit.model_run_log table. This gave our internal audit team the ability to reconstruct exactly which data transformation produced any specific regulatory report on any specific date.
08 · Lessons Learned: The Hard Truths
- We waited too long to enforce naming conventions. At 100 models, a messy naming convention is an annoyance. At 300 models, it is a productivity tax on every engineer every day. Establish naming rules in week one. Automate enforcement immediately.
- Ephemeral models are underused. Most teams default to views or tables for intermediate logic. Ephemeral models eliminate intermediate physical tables entirely, reducing storage costs and simplifying the lineage graph. We converted 60% of our intermediate models to ephemeral, and the DAG became dramatically cleaner overnight.
- The monorepo was wrong for us. For the first 200 models, a single dbt project felt like unity. After 300, it felt like a bottleneck. Splitting into three domain projects — transactional, compliance, and customer — with a shared macros package reduced merge conflicts by approximately 80%.
- Synthetic data for CI is not optional in banking. Running CI tests against real customer data — even in a "dev" dataset — is a data governance risk. Building a synthetic data generator was a 3-week investment that paid back in audit confidence within the first quarter.
- dbt alone is not an observability solution. dbt tests tell you if the data that arrived passes your rules. They do not tell you if the data you expected to arrive, arrived. We added a separate pipeline health monitoring layer (Python + BigQuery metadata tables) to close this gap.
- Invest in dbt macros early. We wrote over 40 custom macros — PII masking, audit logging, standard test generators, naming validators. The upfront investment in macro quality paid compound returns as the project grew.
09 · Key Takeaways for Practitioners
50 models feels early. It is exactly the right time. Every month you delay is compounding technical debt.
If a mart model can reference another mart model, eventually it will. Block it in config, not convention.
Full refreshes on 100M+ row tables in a daily pipeline are an architectural smell, not a badge of honour.
The performance and cost benefits are too significant to defer. 340 GB → 4.2 GB on the same query.
state:modified+ in CI
Testing only changed models and their downstream dependencies makes CI fast enough to actually be used.
In regulated industries, build it once, maintain it forever. Audit confidence is worth the 3-week investment.
Shared ownership without defined accountability is no ownership at all. GitHub CODEOWNERS enforces this automatically.
Run it regularly, not as a rescue tool. Keep schema.yml files aligned before entropy sets in.
— · Closing Thoughts
Managing dbt at scale in a banking environment is fundamentally a people and process problem that technology can only partially solve. The architecture described here — layered zones, domain-driven structure, incremental models, CI/CD gates, PII macros, tiered testing — is a framework that worked for our context. Your context will differ.
What will not differ: the cost of addressing these problems at 300 models versus 100 models is roughly 5x. Every month you delay establishing structure is technical debt accumulating at compound interest.
The most valuable thing our team did was not a tool or a dbt feature. It was spending two days with all 12 engineers aligning on principles — what models should be named, who owned what, what a test failure meant — before writing a single line of SQL.
Make those decisions early. Your future self — debugging a pipeline at 11pm before a regulatory submission — will be grateful.