Inventory Ghosts: Eliminating Phantom Stock with dbt + Great Expectations

Executive Summary

Phantom stock — inventory your systems believe exists but shelves don't hold — doesn't come from one bug. It comes from a dozen small disagreements compounding: returns counted by one system and rejected by another, double-posted movements on retry, unit-of-measure drift, stale availability caches, timezone-split business dates. Each error is small; unsurfaced across SKUs and months, the reference scenario's $4M annual write-off is unremarkable.

The playbook: data contracts freeze every source interface; dbt models express cross-system identities that should always be empty; Great Expectations catches distributional weirdness weeks before a cycle count would; alerts tier by money; and dbt lineage makes root cause a click. 99.7% inventory record accuracy is the realistic end-state target.

Reference scenario figures ($4M write-off, 12 ERP sources) are labelled as such. The reconciliation discipline is documented Vipra production work — our Fortune 500 governance engagement cut reconciliation effort 40% with exactly this mechanism.

01 · Where Phantom Stock Actually Comes From

Run the postmortems across enough retailers and the same dozen culprits appear, none of them dramatic:

Disagreement	Mechanism	Typical magnitude
Returns limbo	ERP counts the refund; WMS rejected the restock; nobody owns the difference	The single biggest source — often 40%+ of phantom
Retry double-posts	Integration retries replay movements without idempotency keys	Spiky; clusters around outage days
UoM drift	Cases vs eaches vs pallets disagreeing across systems	Rare but huge when it lands (×12, ×24 errors)
Stale availability cache	Storefront promises from a snapshot taken hours ago	Oversell exposure rather than count error
Timezone splits	Midnight UTC vs local splits one day's movements across two business dates	Small daily; corrupts every reconciliation that ignores it

You cannot fix these one ticket at a time, because each one re-occurs under a new costume. The fix is structural: surface every disagreement, automatically, daily, with an owner attached.

02 · The Quality Architecture, End to End

contract

→

Source gates. Every ERP/WMS/storefront/3PL feed validated against its contract at ingestion — schema, semantics, units, timezone, freshness. Violations quarantine, producers get notified.

model

→

dbt staging. Normalised movements and positions: one UoM, one timezone, idempotency keys derived, business-date logic centralised in one macro.

reconcile

→

Identity tests. Cross-system dbt models that should always be empty — conservation, parity, honesty, returns-closure. Non-empty = a break, with rows as evidence.

watch

→

Great Expectations. Distributional checkpoints per source: volumes, shrinkage rates, flatline detection — calibrated from history, not vibes.

act

→

Tiered alerting + lineage. Page / ticket / digest by money at risk; every alert carries its dbt lineage link. MTTR is the KPI.

03 · Contracts First: Freeze the Interfaces

Every source feeding inventory gets a contract — and the contract answers the questions that actually cause phantom stock, not just column types:

inventory source contract — the questions that matter (YAML)
source: wms_eu_movements
owner: warehouse-systems@company.com        # a name, not a team alias that 404s
semantics:
  quantity_field: on_hand                   # on-hand? available-to-promise? in-transit?
  quantity_uom: eaches                      # the ×24 bug lives here
  movement_grain: per-transaction           # not per-day-summary
  timezone: Europe/Berlin                   # business date derivation depends on it
  idempotency_key: [movement_id, source_system]
delivery:
  freshness_sla: 15m
  completeness: "movements >= 99.9% vs WMS daily control total"
on_violation:
  quarantine: true
  notify: [producer, inventory-data-oncall]

Contracts live in the repo; producer changes validate in CI; violations block merges instead of corrupting the ledger. The semantics block is the payload — most phantom stock is two systems disagreeing about what a number means, and no schema validator catches that. The cultural mechanics that make contracts survive contact with twelve producer teams are covered in our data contracts playbook; inventory is where they pay fastest because the disagreements are dollar-denominated.

04 · Reconciliation: Tests That Should Return Zero Rows

Standard dbt tests (not-null, unique, accepted ranges) catch malformed data; phantom stock lives in well-formed data that's wrong. The tests that matter are cross-system identities, written as models whose correct cardinality is zero:

rec_movement_conservation.sql — the law of inventory physics
-- yesterday + receipts - shipments - adjustments = today, per SKU/location
WITH expected AS (
  SELECT sku, location_id, snapshot_date,
         LAG(on_hand) OVER (PARTITION BY sku, location_id ORDER BY snapshot_date)
           + receipts - shipments - adjustments AS expected_on_hand
  FROM {{ ref('int_daily_positions') }}
)
SELECT e.sku, e.location_id, e.snapshot_date,
       e.expected_on_hand, p.on_hand,
       p.on_hand - e.expected_on_hand AS break_qty,
       (p.on_hand - e.expected_on_hand) * c.unit_cost AS break_value   -- triage by money
FROM expected e
JOIN {{ ref('int_daily_positions') }} p USING (sku, location_id, snapshot_date)
JOIN {{ ref('dim_sku_cost') }} c USING (sku)
WHERE ABS(p.on_hand - e.expected_on_hand) > {{ var('conservation_tolerance', 0) }}

The full suite, each one a model + a dbt test asserting emptiness: movement conservation (above — catches lost movements and double-posts); ERP↔WMS parity (on-hand agreement within tolerance — catches sync failures); storefront honesty (availability shown ⊆ available-to-promise — catches oversell exposure before the customer does); and returns closure (every refund reaches a disposition: restock, scrap, or lost — the single biggest phantom source, closed). Note the break_value column: every break is born with a dollar amount, which is what makes Section 06's triage possible.

05 · Great Expectations: Catching Weird Before It's Wrong

Some failures are distributional, not relational — well-formed, conservation-clean, and lying. A feed that suddenly reports zero adjustments (broken extract upstream), a warehouse whose shrinkage doubles (process problem walking), a SKU whose count flatlines for 30 days while sales continue (sync death):

GE checkpoint — calibrated from history, not vibes (Python)
validator.expect_table_row_count_to_be_between(
    min_value=baseline.p05_daily_movements,     # rolling 90-day per-source baseline
    max_value=baseline.p99_daily_movements)
validator.expect_column_mean_to_be_between(
    "adjustment_qty",
    min_value=baseline.adj_mean - 3*baseline.adj_std,
    max_value=baseline.adj_mean + 3*baseline.adj_std)
# the flatline detector — sync death looks like perfect stability
validator.expect_compound_columns_to_be_unique(
    column_list=["sku","on_hand"],
    row_condition='days_unchanged > 30 and trailing_30d_sales > 0',
    meta={"alert_tier": "ticket", "owner": "inventory-data-oncall"})

The statistics are humble — baselines and deviation bands per source, recalibrated monthly. They don't need to be clever to be early: every one of these patterns surfaces in the morning run, weeks before a quarterly cycle count would have found it and months before finance would have written it off.

06 · Alerting That Respects the On-Call

A quality system that pages for everything trains people to ignore it — alert fatigue is how phantom stock survives instrumentation. Tier by money and blast radius:

Tier	Trigger	Action	Example
Page	Active customer-facing risk	Immediate, on-call	Storefront honesty violation — overselling now
Ticket	Break above value threshold	Same-day, owning team	Conservation break > $5K at one location
Digest	Distributional warnings, small breaks	Daily summary, trend-reviewed	Shrinkage drift; sub-threshold breaks accumulating

The storefront-honesty page is the one that earns the program its budget — it is the failure our real-time inventory platform eliminated at the streaming layer (500ms updates, oversells gone). Batch reconciliation and streaming truth are complements: streaming prevents the customer-facing lie, reconciliation finds the slow institutional drift.

07 · Lineage: Root Cause in Minutes, Not Days

Every alert carries its dbt lineage link, which means the path from failed test to upstream source is a click: rec_movement_conservation → int_daily_positions → stg_wms_eu_movements → the contract, the owner, and the quarantine bucket with the offending rows. Root-cause time is the metric to optimise — when finding the source takes minutes, teams fix causes; when it takes days, they re-run cycle counts forever and call it culture.

Close the loop with a breaks ledger: every reconciliation break lands in a table with its detection date, value, root cause, and resolution. That ledger is the program's report card — breaks-created trends down as contracts bite, time-to-resolution trends down as lineage pays, and the write-off line follows both. This is the same make-it-visible mechanism from our Fortune 500 governance engagement, where it cut reconciliation effort 40%.

99.7%

Inventory Accuracy
(Reference End-State)

$4M→

Reference Write-Off,
Driven Toward Zero

40%

Vipra Documented
Reconciliation Cut

Correct Cardinality of
Every Reconciliation Model

08 · Lessons Learned: The Hard Truths

Returns are the boss fight. Every estate's biggest phantom source is the refund→disposition gap, and every estate is surprised by it. Build the returns-closure test first; it pays for the program alone.
Semantics beat schemas. Twelve sources can pass schema validation while three of them mean available-to-promise where you read on-hand. The contract's semantics block caught more money than every type check combined.
Tolerance zero is a lie you tell yourself once. Real estates have legitimate sub-unit noise (kit builds, repack). Set tolerances explicitly per location class and alert on tolerance drift — pretending zero just teaches people to ignore red.
Money-denominated breaks changed the meetings. "417 breaks" produced shrugs; "$83K of breaks, $61K at two locations" produced a fixed integration by Friday. The unit_cost join is the program's persuasion engine.
The flatline is the deadliest pattern. Perfect stability while sales continue looks healthy on every dashboard. Our flatline expectation has caught more dead syncs than any other check — silence is a signal.
Quarantine needs an SLA or it becomes a landfill. Quarantined rows aging past 7 days escalate automatically. The day we added that, quarantine went from write-only storage to a working queue.

09 · Key Takeaways for Practitioners

📜

Contracts freeze semantics

Units, timezone, grain, quantity meaning, idempotency — per source, in the repo, validated in CI.

⚖️

Identities, not just types

Conservation, parity, honesty, returns-closure — dbt models whose correct answer is zero rows, valued in dollars.

📈

Humble stats, early warnings

GE baselines per source: volumes, drift, flatlines. Weird surfaces in the morning run, not the quarterly count.

🚨

Tier by money

Page for oversell risk, ticket above value thresholds, digest the rest. Fatigue is the failure mode.

🔗

Lineage is the speed

Alert → model → source → owner in one click. MTTR is the KPI that predicts the write-off line.

📒

Keep a breaks ledger

Detection, value, cause, resolution — the program's report card, and finance's favourite table.

Companion reading: Data Contracts That Stick for the producer-side culture, Why Your dbt Tests Are Giving You False Confidence for the testing philosophy, and the inventory intelligence case study for the streaming half of the answer.

FAQ · Frequently Asked Questions

What causes phantom inventory from a data perspective?

Compounding small disagreements between systems: returns counted by one system and rejected by another, double-posted movements on retry, unit-of-measure drift, stale availability caches, and timezone-split business dates. Each is minor; unsurfaced, they compound into material write-offs.

Why both dbt and Great Expectations?

They catch different failure classes: dbt expresses relational identities (movement conservation, cross-system parity) as version-controlled models, while Great Expectations catches distributional anomalies (volume collapses, shrinkage shifts, flatlined counts) that are well-formed but wrong. Together they cover correctness and plausibility.

What's a realistic inventory accuracy target?

99.5–99.8% record accuracy (cycle-count agreement) is achievable for a multi-source estate with contracts, daily reconciliation, and lineage-driven root cause — the reference scenario here targets 99.7%. Past that, returns physics and shrinkage dominate over data quality.

How fast does this pay back?

Quickly, because the losses are dollar-denominated: oversell prevention and write-off reduction are measurable within one or two quarters. Vipra's documented governance work cut reconciliation effort 40% — the same mechanism, measured in hours instead of stock.

Inventory Ghosts:Eliminating Phantom Stock with dbt + Great Expectations