Phantom stock — inventory your systems believe exists but shelves don't hold — doesn't come from one bug. It comes from a dozen small disagreements compounding: returns counted by one system and rejected by another, double-posted movements on retry, unit-of-measure drift, stale availability caches, timezone-split business dates. Each error is small; unsurfaced across SKUs and months, the reference scenario's $4M annual write-off is unremarkable.
The playbook: data contracts freeze every source interface; dbt models express cross-system identities that should always be empty; Great Expectations catches distributional weirdness weeks before a cycle count would; alerts tier by money; and dbt lineage makes root cause a click. 99.7% inventory record accuracy is the realistic end-state target.
Reference scenario figures ($4M write-off, 12 ERP sources) are labelled as such. The reconciliation discipline is documented Vipra production work — our Fortune 500 governance engagement cut reconciliation effort 40% with exactly this mechanism.
01 · Where Phantom Stock Actually Comes From
Run the postmortems across enough retailers and the same dozen culprits appear, none of them dramatic:
| Disagreement | Mechanism | Typical magnitude |
|---|---|---|
| Returns limbo | ERP counts the refund; WMS rejected the restock; nobody owns the difference | The single biggest source — often 40%+ of phantom |
| Retry double-posts | Integration retries replay movements without idempotency keys | Spiky; clusters around outage days |
| UoM drift | Cases vs eaches vs pallets disagreeing across systems | Rare but huge when it lands (×12, ×24 errors) |
| Stale availability cache | Storefront promises from a snapshot taken hours ago | Oversell exposure rather than count error |
| Timezone splits | Midnight UTC vs local splits one day's movements across two business dates | Small daily; corrupts every reconciliation that ignores it |
You cannot fix these one ticket at a time, because each one re-occurs under a new costume. The fix is structural: surface every disagreement, automatically, daily, with an owner attached.
02 · The Quality Architecture, End to End
03 · Contracts First: Freeze the Interfaces
Every source feeding inventory gets a contract — and the contract answers the questions that actually cause phantom stock, not just column types:
inventory source contract — the questions that matter (YAML)source: wms_eu_movements owner: warehouse-systems@company.com # a name, not a team alias that 404s semantics: quantity_field: on_hand # on-hand? available-to-promise? in-transit? quantity_uom: eaches # the ×24 bug lives here movement_grain: per-transaction # not per-day-summary timezone: Europe/Berlin # business date derivation depends on it idempotency_key: [movement_id, source_system] delivery: freshness_sla: 15m completeness: "movements >= 99.9% vs WMS daily control total" on_violation: quarantine: true notify: [producer, inventory-data-oncall]
Contracts live in the repo; producer changes validate in CI; violations block merges instead of corrupting the ledger. The semantics block is the payload — most phantom stock is two systems disagreeing about what a number means, and no schema validator catches that. The cultural mechanics that make contracts survive contact with twelve producer teams are covered in our data contracts playbook; inventory is where they pay fastest because the disagreements are dollar-denominated.
04 · Reconciliation: Tests That Should Return Zero Rows
Standard dbt tests (not-null, unique, accepted ranges) catch malformed data; phantom stock lives in well-formed data that's wrong. The tests that matter are cross-system identities, written as models whose correct cardinality is zero:
rec_movement_conservation.sql — the law of inventory physics-- yesterday + receipts - shipments - adjustments = today, per SKU/location WITH expected AS ( SELECT sku, location_id, snapshot_date, LAG(on_hand) OVER (PARTITION BY sku, location_id ORDER BY snapshot_date) + receipts - shipments - adjustments AS expected_on_hand FROM {{ ref('int_daily_positions') }} ) SELECT e.sku, e.location_id, e.snapshot_date, e.expected_on_hand, p.on_hand, p.on_hand - e.expected_on_hand AS break_qty, (p.on_hand - e.expected_on_hand) * c.unit_cost AS break_value -- triage by money FROM expected e JOIN {{ ref('int_daily_positions') }} p USING (sku, location_id, snapshot_date) JOIN {{ ref('dim_sku_cost') }} c USING (sku) WHERE ABS(p.on_hand - e.expected_on_hand) > {{ var('conservation_tolerance', 0) }}
The full suite, each one a model + a dbt test asserting emptiness: movement conservation (above — catches lost movements and double-posts); ERP↔WMS parity (on-hand agreement within tolerance — catches sync failures); storefront honesty (availability shown ⊆ available-to-promise — catches oversell exposure before the customer does); and returns closure (every refund reaches a disposition: restock, scrap, or lost — the single biggest phantom source, closed). Note the break_value column: every break is born with a dollar amount, which is what makes Section 06's triage possible.
05 · Great Expectations: Catching Weird Before It's Wrong
Some failures are distributional, not relational — well-formed, conservation-clean, and lying. A feed that suddenly reports zero adjustments (broken extract upstream), a warehouse whose shrinkage doubles (process problem walking), a SKU whose count flatlines for 30 days while sales continue (sync death):
GE checkpoint — calibrated from history, not vibes (Python)validator.expect_table_row_count_to_be_between( min_value=baseline.p05_daily_movements, # rolling 90-day per-source baseline max_value=baseline.p99_daily_movements) validator.expect_column_mean_to_be_between( "adjustment_qty", min_value=baseline.adj_mean - 3*baseline.adj_std, max_value=baseline.adj_mean + 3*baseline.adj_std) # the flatline detector — sync death looks like perfect stability validator.expect_compound_columns_to_be_unique( column_list=["sku","on_hand"], row_condition='days_unchanged > 30 and trailing_30d_sales > 0', meta={"alert_tier": "ticket", "owner": "inventory-data-oncall"})
The statistics are humble — baselines and deviation bands per source, recalibrated monthly. They don't need to be clever to be early: every one of these patterns surfaces in the morning run, weeks before a quarterly cycle count would have found it and months before finance would have written it off.
06 · Alerting That Respects the On-Call
A quality system that pages for everything trains people to ignore it — alert fatigue is how phantom stock survives instrumentation. Tier by money and blast radius:
| Tier | Trigger | Action | Example |
|---|---|---|---|
| Page | Active customer-facing risk | Immediate, on-call | Storefront honesty violation — overselling now |
| Ticket | Break above value threshold | Same-day, owning team | Conservation break > $5K at one location |
| Digest | Distributional warnings, small breaks | Daily summary, trend-reviewed | Shrinkage drift; sub-threshold breaks accumulating |
The storefront-honesty page is the one that earns the program its budget — it is the failure our real-time inventory platform eliminated at the streaming layer (500ms updates, oversells gone). Batch reconciliation and streaming truth are complements: streaming prevents the customer-facing lie, reconciliation finds the slow institutional drift.
07 · Lineage: Root Cause in Minutes, Not Days
Every alert carries its dbt lineage link, which means the path from failed test to upstream source is a click: rec_movement_conservation → int_daily_positions → stg_wms_eu_movements → the contract, the owner, and the quarantine bucket with the offending rows. Root-cause time is the metric to optimise — when finding the source takes minutes, teams fix causes; when it takes days, they re-run cycle counts forever and call it culture.
Close the loop with a breaks ledger: every reconciliation break lands in a table with its detection date, value, root cause, and resolution. That ledger is the program's report card — breaks-created trends down as contracts bite, time-to-resolution trends down as lineage pays, and the write-off line follows both. This is the same make-it-visible mechanism from our Fortune 500 governance engagement, where it cut reconciliation effort 40%.
(Reference End-State)
Driven Toward Zero
Reconciliation Cut
Every Reconciliation Model
08 · Lessons Learned: The Hard Truths
- Returns are the boss fight. Every estate's biggest phantom source is the refund→disposition gap, and every estate is surprised by it. Build the returns-closure test first; it pays for the program alone.
- Semantics beat schemas. Twelve sources can pass schema validation while three of them mean available-to-promise where you read on-hand. The contract's semantics block caught more money than every type check combined.
- Tolerance zero is a lie you tell yourself once. Real estates have legitimate sub-unit noise (kit builds, repack). Set tolerances explicitly per location class and alert on tolerance drift — pretending zero just teaches people to ignore red.
- Money-denominated breaks changed the meetings. "417 breaks" produced shrugs; "$83K of breaks, $61K at two locations" produced a fixed integration by Friday. The unit_cost join is the program's persuasion engine.
- The flatline is the deadliest pattern. Perfect stability while sales continue looks healthy on every dashboard. Our flatline expectation has caught more dead syncs than any other check — silence is a signal.
- Quarantine needs an SLA or it becomes a landfill. Quarantined rows aging past 7 days escalate automatically. The day we added that, quarantine went from write-only storage to a working queue.
09 · Key Takeaways for Practitioners
Units, timezone, grain, quantity meaning, idempotency — per source, in the repo, validated in CI.
Conservation, parity, honesty, returns-closure — dbt models whose correct answer is zero rows, valued in dollars.
GE baselines per source: volumes, drift, flatlines. Weird surfaces in the morning run, not the quarterly count.
Page for oversell risk, ticket above value thresholds, digest the rest. Fatigue is the failure mode.
Alert → model → source → owner in one click. MTTR is the KPI that predicts the write-off line.
Detection, value, cause, resolution — the program's report card, and finance's favourite table.
Companion reading: Data Contracts That Stick for the producer-side culture, Why Your dbt Tests Are Giving You False Confidence for the testing philosophy, and the inventory intelligence case study for the streaming half of the answer.