Building a Data Contract System That Teams Actually Follow

TL;DR — Direct Answer

A data contract is a versioned, machine-enforced agreement between a data producer and its consumers covering schema, semantics, freshness, and volume. The tooling is the easy half: dbt tests enforce schema and relationships, Great Expectations enforces distributions and volumes, CI blocks merges that break either, and Slack routes violations to the producer. Contracts die not from weak tooling but from missing ownership and consequences — the three cultural mechanics in the second half of this article are the actual product.

The enforcement stack (the easy half)

Layer 1 — Schema and relational integrity: dbt

Every contracted model gets a YAML spec: column names, types (dbt model contracts with enforced: true), not_null, unique, accepted values, and relationship tests to upstream keys. This runs in CI on every pull request — a producer cannot merge a change that breaks the declared shape. The contract lives in the same repo as the transformation, versioned in git, reviewed in the same PR.

Layer 2 — Distributions, volumes, freshness: Great Expectations (or Soda)

Schema tests pass while the data goes wrong. Layer 2 catches the rest: row-count anomalies versus trailing baselines, null-rate drift on critical columns, value distributions (order totals suddenly 100x), and freshness SLAs (partition landed by 06:00). These run post-load, not in CI — they are about the data, not the code.

Layer 3 — Routing: alerts that reach the right humans

The single most important config in the whole system: violations page the producing team, not the data team. Each contract declares an owner (a team Slack handle, never a person). dbt test failures and GE checkpoint failures post to the producer's channel with the contract link, the diff, and the consumers affected. The data platform team gets cc'd, not assigned.

The cultural mechanics (the real product)

1. Contracts are created at the consumer's request, not imposed

Platform-mandated contracts on every table create resentment and checkbox compliance. Instead: when a consumer (finance, ML, an exec dashboard) depends on a dataset, they request the contract, and the negotiation — what columns, what freshness, what happens on breach — is a 30-minute meeting between two teams. The contract documents a relationship that already exists; that is why it gets honored.

2. Breaches have a pre-agreed, boring consequence

Not punishment — process: a breached contract auto-creates a ticket on the producer's board with an SLA matched to the contract tier (Tier 1 = same-day, Tier 2 = this sprint). The escalation path is written into the contract itself. The first time a breach quietly ages for three weeks with no consequence, every contract in the company becomes decoration.

3. Quarterly contract review, with deletion

Contracts accumulate like feature flags. Each quarter, owners review: still consumed? thresholds still right? Any contract with no active consumer is deleted ceremonially. A pruned contract set stays credible; an unmaintained one becomes the alert channel everyone mutes — and muted alerts are how the worst incidents arrive.

Rollout sequence that works

Week 1–2: pick ONE high-pain dataset (the one that broke the CFO dashboard last month). Write its contract with producer + consumer in the room.
Week 3–4: wire the stack: dbt contract + tests in CI, GE checkpoint post-load, Slack routing to the producer.
Month 2: first breach happens. Run the consequence process visibly and blamelessly. This event, handled well, sells the system better than any deck.
Month 3+: accept contract requests from consumers; publish the catalog of contracted datasets; report breach MTTR monthly.

This is the rollout we use in governance engagements — the 40% reconciliation reduction in our Fortune 500 case came from contracts plus this process, not from any single tool.

Frequently Asked Questions

What should a data contract actually contain?

Four sections: schema (columns, types, nullability, keys), semantics (what a row means, grain, accepted values), SLAs (freshness deadline, volume bounds), and operations (owner team, alert channel, breach severity tier and escalation path). Version it in git next to the producing code.

dbt tests or Great Expectations — which do I need for contracts?

Both, for different layers: dbt model contracts and schema tests enforce structure at merge time in CI; Great Expectations (or Soda) validates the data itself after load — distributions, volumes, freshness. Schema can pass while the data is wrong, so a contract system needs both layers.

Why do most data contract initiatives fail?

Missing ownership and consequences. Tooling alerts fire into channels nobody owns, breaches carry no process, and contracts are imposed platform-wide instead of created per consumer need. The fix is cultural: producer-routed alerts, pre-agreed breach SLAs, and quarterly reviews that delete dead contracts.

Who should own a data contract — producer or consumer?

The producer owns honoring it; the consumer owns requesting it and defining what they need. Violations route to the producing team's channel with a ticket on their board. The data platform team builds the rails but should never be the default owner of every breach.

Building a Data Contract System Teams Actually Follow