TL;DR — Direct Answer
A data contract is a versioned, machine-enforced agreement between a data producer and its consumers covering schema, semantics, freshness, and volume. The tooling is the easy half: dbt tests enforce schema and relationships, Great Expectations enforces distributions and volumes, CI blocks merges that break either, and Slack routes violations to the producer. Contracts die not from weak tooling but from missing ownership and consequences — the three cultural mechanics in the second half of this article are the actual product.
The enforcement stack (the easy half)
Layer 1 — Schema and relational integrity: dbt
Every contracted model gets a YAML spec: column names, types (dbt model contracts with enforced: true), not_null, unique, accepted values, and relationship tests to upstream keys. This runs in CI on every pull request — a producer cannot merge a change that breaks the declared shape. The contract lives in the same repo as the transformation, versioned in git, reviewed in the same PR.
Layer 2 — Distributions, volumes, freshness: Great Expectations (or Soda)
Schema tests pass while the data goes wrong. Layer 2 catches the rest: row-count anomalies versus trailing baselines, null-rate drift on critical columns, value distributions (order totals suddenly 100x), and freshness SLAs (partition landed by 06:00). These run post-load, not in CI — they are about the data, not the code.
Layer 3 — Routing: alerts that reach the right humans
The single most important config in the whole system: violations page the producing team, not the data team. Each contract declares an owner (a team Slack handle, never a person). dbt test failures and GE checkpoint failures post to the producer's channel with the contract link, the diff, and the consumers affected. The data platform team gets cc'd, not assigned.
The cultural mechanics (the real product)
1. Contracts are created at the consumer's request, not imposed
Platform-mandated contracts on every table create resentment and checkbox compliance. Instead: when a consumer (finance, ML, an exec dashboard) depends on a dataset, they request the contract, and the negotiation — what columns, what freshness, what happens on breach — is a 30-minute meeting between two teams. The contract documents a relationship that already exists; that is why it gets honored.
2. Breaches have a pre-agreed, boring consequence
Not punishment — process: a breached contract auto-creates a ticket on the producer's board with an SLA matched to the contract tier (Tier 1 = same-day, Tier 2 = this sprint). The escalation path is written into the contract itself. The first time a breach quietly ages for three weeks with no consequence, every contract in the company becomes decoration.
3. Quarterly contract review, with deletion
Contracts accumulate like feature flags. Each quarter, owners review: still consumed? thresholds still right? Any contract with no active consumer is deleted ceremonially. A pruned contract set stays credible; an unmaintained one becomes the alert channel everyone mutes — and muted alerts are how the worst incidents arrive.
Rollout sequence that works
- Week 1–2: pick ONE high-pain dataset (the one that broke the CFO dashboard last month). Write its contract with producer + consumer in the room.
- Week 3–4: wire the stack: dbt contract + tests in CI, GE checkpoint post-load, Slack routing to the producer.
- Month 2: first breach happens. Run the consequence process visibly and blamelessly. This event, handled well, sells the system better than any deck.
- Month 3+: accept contract requests from consumers; publish the catalog of contracted datasets; report breach MTTR monthly.
This is the rollout we use in governance engagements — the 40% reconciliation reduction in our Fortune 500 case came from contracts plus this process, not from any single tool.