A plant floor emits truth at brutal rates: vibration sensors at kHz, temperature at Hz, PLC state changes in bursts. Ship it all raw to the cloud and you pay three times — egress, storage, and the engineering needed to make sense of it later. Ship too little and the twin lies. This architecture is a series of deliberate compressions, each chosen so the twin keeps physical fidelity while the bill keeps proportion.
The path: clustered MQTT brokers at the edge (Sparkplug-structured topics, QoS matched to telemetry class), edge nodes applying deadband filtering, windowed aggregation, and exception passthrough before egress, a Kafka spine organising per-asset streams, and Delta Live Tables declaring the bronze→silver→gold refinement with quality expectations enforced inline.
The 60% egress reduction and 40% downtime reduction are labelled reference targets — the second is a program outcome (pipeline + features + maintenance-process change), not a pipeline metric. The streaming and lakehouse engineering underneath is documented Vipra production work at 1B+ events/hour.
01 · The Shape of the Problem
Industrial telemetry is unlike clickstream in three ways that shape every decision downstream. Rate asymmetry: a vibration sensor at 8kHz produces five orders of magnitude more raw samples than the temperature probe beside it — uniform treatment bankrupts you or blinds you. Network reality: plant networks partition, cellular backhauls drop, and a pipeline that can't buffer hours of edge data and reconcile cleanly afterwards will corrupt the twin on every outage. Physical consequence: the consumer of this data schedules maintenance on million-dollar assets — a stale or dishonest twin doesn't just mislead a dashboard, it strands a production line.
The design answer is a chain of deliberate compressions: filter at the edge (Section 04), organise in the spine (Section 05), refine incrementally (Section 06) — with raw fidelity preserved exactly where investigations need it and nowhere else.
02 · The Architecture: Edge to Twin
03 · MQTT Done Industrially
MQTT earns its position at the edge — tiny client footprint, broker-mediated, built for flaky networks — but the defaults are not industrial. The production configuration:
edge contract — topics, QoS, lifecycle (Sparkplug-informed)# Topic hierarchy: site/line/asset/sensor — authz and routing follow the tree plant-mex/line-3/press-07/vibration-x QoS 0 # high-rate: next reading supersedes plant-mex/line-3/press-07/temp-bearing QoS 0 plant-mex/line-3/press-07/state QoS 1 # PLC state changes MUST arrive plant-mex/line-3/press-07/alarm QoS 1 # Sparkplug B lifecycle — the twin knows the difference between # "silent because healthy" and "silent because dead": NBIRTH/NDEATH node online/offline (broker-enforced last will) DBIRTH/DDEATH device lifecycle + metric birth certificates (types, units!)
Three rules: cluster the brokers per site (HiveMQ-class HA — a single broker is a plant-wide single point of failure wearing a lightweight protocol); QoS by telemetry class, not by habit — QoS 0 for high-rate streams where the next sample supersedes the last, QoS 1 for state changes and alarms, and QoS 2 almost never (its handshake cost buys little that idempotent consumers don't); and Sparkplug birth certificates — typed metrics with units declared at DBIRTH are the difference between a twin and a pile of floats. The death certificates matter even more: a twin that can't distinguish a healthy-quiet sensor from a dead one will lie precisely when it matters.
04 · Edge Filtering: The 60% You Never Ship
Most industrial signal is redundant by physics — a healthy bearing's vibration spectrum doesn't need restating 8,000 times a second in the cloud. Three reductions, all versioned config:
| Mechanism | What it does | Typical reduction |
|---|---|---|
| Deadband filtering | Publish on meaningful change (per-metric thresholds), not on schedule | 50–80% on slow-moving signals |
| Windowed aggregation | High-rate signals → per-window RMS, peak, crest factor, FFT band energies | 99%+ on kHz vibration, fidelity preserved for the use case |
| Exception passthrough | Threshold crossing → stream raw full-rate for N minutes | Full detail exactly when something interesting happens |
The combined effect lands around 60% egress reduction in the reference estate — some estates see far more. Two disciplines keep it honest: every filter parameter is versioned configuration deployed like code (a deadband widened during an incident and never narrowed is a slow-acting lie), and raw data buffers at the edge long enough — typically 7–30 days on cheap local disk — to be replayed when an investigation needs the unfiltered truth. The aggregate is for operations; the buffer is for root cause.
05 · Bridge and Spine: MQTT into Kafka
MQTT moves telemetry; Kafka organises it. The bridge (Kafka Connect MQTT source or broker-native) maps the topic tree onto Kafka topics keyed by asset, and the properties the twin needs appear: ordered per-asset streams, replayable history, consumer-group fanout to every downstream — twin state, alerting, lakehouse — without the edge ever knowing how many consumers exist.
bridge mapping — tree to keyed topics# MQTT tree Kafka topic key plant-*/line-*/+/vibration-* → telemetry.vibration {site}.{line}.{asset} plant-*/line-*/+/temp-* → telemetry.thermal {site}.{line}.{asset} plant-*/line-*/+/state → asset.state {site}.{line}.{asset} plant-*/line-*/+/alarm → asset.alarms {site}.{line}.{asset} # partitions sized for fleet peak; per-asset ordering preserved by key
Offline reconciliation is where the spine earns its keep: edge buffers replaying hours of backlog after a network partition must not corrupt twin chronology. The rule is event-time everywhere downstream — every payload carries its sensor timestamp, the twin and all aggregates are built on event time, and arrival time is just an ops metric. This is the same spine pattern our telemetry platform runs at 1B+ events/hour; a 100K-sensor plant fleet sits comfortably inside it.
06 · Delta Live Tables: Incremental Refinement to Twin State
DLT declares the refinement as code, with quality enforcement inline — and incremental processing keeps minute-fresh twin state affordable at fleet scale:
DLT pipeline — bronze → silver → gold (Python, simplified)import dlt from pyspark.sql import functions as F @dlt.table(comment="Raw telemetry, schema-checked") def bronze_telemetry(): return (spark.readStream.format("kafka") .option("subscribe", "telemetry.vibration,telemetry.thermal") .load() .select(parse_payload("value").alias("r")).select("r.*")) @dlt.table(comment="Unit-normalised, asset-joined, physics-checked") @dlt.expect_or_quarantine("plausible_temp", "temp_c BETWEEN -40 AND 400") @dlt.expect_or_quarantine("plausible_rms", "vib_rms_g BETWEEN 0 AND 50") @dlt.expect_or_fail("has_event_time", "sensor_ts IS NOT NULL") def silver_telemetry(): return (dlt.read_stream("bronze_telemetry") .withColumn("temp_c", normalise_units("temp_raw", "unit")) .join(dlt.read("asset_registry"), "asset_key", "left")) @dlt.table(comment="Current twin state per asset") def gold_twin_state(): return (dlt.read_stream("silver_telemetry") .withWatermark("sensor_ts", "10 minutes") .groupBy("asset_key") .agg(F.last("temp_c").alias("temp_c"), F.last("vib_rms_g").alias("vib_rms_g"), F.max("sensor_ts").alias("as_of")))
The expectations are the point: a sensor reporting 600°C on a bearing gets quarantined, not averaged in — visible in the quarantine table with lineage, instead of silently poisoning the rolling mean a maintenance model trains on. Silver also owns unit normalisation (the cases-vs-eaches of IoT is °F-vs-°C-vs-K) and the asset-registry join that turns sensor streams into asset truth. Gold maintains current state plus the windowed aggregates dashboards and features read.
07 · The Feature Store: Where Downtime Reduction Comes From
Predictive maintenance lives or dies on features, not models: rolling vibration-band energies over multiple windows, temperature rate-of-change, cycles since last service, cross-sensor correlations per asset class. Two properties separate a feature store from a folder of CSVs:
- Point-in-time correctness. Training a failure-prediction model on features computed with post-failure knowledge is the classic self-deception of this domain — the model looks brilliant in backtest and useless on the floor. Every feature row carries its as-of timestamp; training joins are time-travel joins.
- Serving parity. The features the model scores on in production are computed by the same code that built its training set — DLT gold tables feed both, eliminating train/serve skew by construction.
Then close the loop where the money is: predictions route into the CMMS as work orders with evidence attached (the feature values and trend that fired), not into another dashboard. The 40% downtime reduction reference target is a program outcome — pipeline + honest features + condition-based scheduling replacing calendar-based — measured against a baseline year. The pipeline is the enabling layer; the maintenance-process change is where the number comes from, and pretending otherwise is how IoT programs lose their executive sponsor in year two.
Fleet Design
at the Edge
Program-Level Target
Streaming Headroom
08 · Lessons Learned: The Hard Truths
- Death certificates matter more than birth certificates. The expensive twin failures are silent sensors read as healthy assets. Sparkplug NDEATH/DDEATH plus per-sensor freshness SLOs make silence a signal — wire them first.
- Unfiltered "keep everything" pilots poison the budget conversation. The pilot that ships raw kHz waveforms produces a cloud bill that kills the program before the value ships. Edge aggregation is not an optimisation to add later; it is the architecture.
- Filter configs drift unless they're code. A deadband widened during a noisy-sensor incident and never reverted hid a real bearing degradation for six weeks. Filters are versioned, reviewed, and audited like the code they are.
- Event time is non-negotiable the first time a backhaul drops. An hour of buffered replay processed on arrival time rewrote an asset's day and fired every alert at once. Build on sensor timestamps from day one; arrival time is an ops metric.
- The asset registry needs an owner with a name. The twin is a join between telemetry and the registry — and registries rot: sensors get remounted, assets get renumbered. Stale registry, lying twin; assign ownership like it's production code, because it is.
- Quarantine earns trust with the maintenance crew. The day planners saw an impossible reading sitting in quarantine instead of skewing the dashboard was the day they started believing the dashboard. Visible rejection beats silent correction.
09 · Key Takeaways for Practitioners
Per-site broker HA, Sparkplug lifecycle, QoS by telemetry class. Silence must be distinguishable from health.
Deadband + window aggregates + exception passthrough = 60% egress cut. Raw buffers locally for investigations.
Ordered, replayable, fanned out. Event time everywhere — offline replay must not rewrite chronology.
DLT bronze→silver→gold with inline quality gates. Impossible readings go visible, never averaged in.
Features carry as-of timestamps; training joins are time-travel joins; serving reads the same gold tables.
Route to the CMMS with evidence attached. The 40% is a program outcome — pipeline plus process change.
The streaming spine and lakehouse practice behind this design are documented in the network telemetry (1B+ events/hour) and supply chain lakehouse engineering projects; format trade-offs in Delta vs Iceberg vs Hudi.