TL;DR — Direct Answer
A digital twin is only as honest as the pipeline feeding it. The production-shaped path: MQTT at the edge (clustered brokers, 100K+ sensors), filtering and downsampling at the edge before egress (60% cloud-transfer reduction is a realistic target), Kafka as the cloud spine, and Delta Live Tables for incremental bronze→silver→gold processing into twin state and predictive-maintenance features. A 40% unplanned-downtime reduction is a realistic program-level target once condition-based maintenance replaces calendar-based. Scenario figures are reference targets; the streaming and lakehouse engineering underneath is Vipra production practice — 1B+ events/hour telemetry and multi-region GCP lakehouse engagements.
The shape of the problem
A plant floor emits truth at brutal rates: vibration sensors at kHz, temperature at Hz, PLC state changes in bursts. Ship it all raw to the cloud and you pay three times — egress, storage, and the engineering time to make sense of it later. Ship too little and the twin lies. The architecture below is a series of deliberate compressions, each one chosen so the twin keeps physical fidelity while the bill keeps proportion.
Edge: MQTT done industrially
MQTT earns its position — lightweight, broker-based, QoS levels matched to telemetry classes — but single-broker deployments are a plant-wide single point of failure. Cluster the brokers (HiveMQ-class or equivalent), structure topics hierarchically (site/line/asset/sensor), and use QoS pragmatically: QoS 0 for high-rate streams where the next reading supersedes the last, QoS 1 for state changes and alarms that must arrive. Sparkplug B payloads buy you birth/death certificates and typed metrics — worth the constraint in greenfield deployments.
Edge filtering: the 60% you never ship
Most industrial signal is redundant by physics — a healthy bearing's vibration spectrum doesn't need re-stating 8,000 times a second in the cloud. Edge nodes apply three reductions before egress: deadband filtering (publish on meaningful change, not on schedule), windowed aggregation (RMS, peak, FFT band energies per window for high-rate signals), and exception passthrough (raw fidelity streams only when a signal crosses thresholds — full detail exactly when something interesting happens). A 60% egress reduction is a realistic target; some estates see far more. The discipline: every filter is versioned config, and raw data buffers at the edge long enough to be replayed when an investigation needs it.
Bridge and spine: MQTT into Kafka
MQTT moves telemetry; Kafka organizes it. The bridge (Kafka Connect MQTT source or broker-native) maps topic hierarchies onto Kafka topics keyed by asset, gaining ordered per-asset streams, replayable history, and consumer-group fanout to every downstream — twin state, alerting, the lakehouse. This is the same spine pattern our telemetry platform runs at 1B+ events/hour; plant-scale volumes sit comfortably inside it.
Delta Live Tables: incremental refinement to twin state
DLT declares the bronze→silver→gold flow as code with quality expectations enforced inline: bronze lands raw events with schema checks; silver normalizes units, joins asset metadata, and quarantines violations (a sensor reporting impossible physics gets flagged, not averaged in); gold maintains current twin state per asset plus the windowed aggregates that feed dashboards and models. Incremental processing means each layer computes only deltas — the economics that make minute-fresh twin state affordable at fleet scale.
The feature store: where downtime reduction actually comes from
Predictive maintenance lives or dies on features, not models: rolling vibration-band energies, temperature-rate-of-change, cycle counts since service, cross-sensor correlations per asset class. Maintain them in a governed feature store fed by the gold layer, with point-in-time correctness — training a failure-prediction model on features computed with post-failure knowledge is the classic self-deception in this domain. Models then consume honest features; alerts route into the CMMS as work orders, not into another dashboard. The 40% downtime reduction reference target is a program outcome: pipeline + features + maintenance-process change, measured against a baseline year.
Failure modes to design for on day one
Plant networks partition — edge buffers must survive hours offline and backfill without corrupting event order (event-time processing downstream, not arrival-time). Sensors drift and die — silver-layer expectations catch impossible values, and per-sensor freshness SLOs catch silence. Asset metadata goes stale — the twin is a join between telemetry and the asset registry, and the registry needs an owner. None of these are exotic; all of them are the difference between a demo twin and one a maintenance planner trusts at 6am.