“Real-time personalization” fails for a boring reason: nobody owns the latency budget end to end. Each team optimises its component; the sum is four seconds; the shopper is gone. This article itemises the two-second budget and assigns every millisecond — then builds the architecture that holds it at peak.
The pattern: Kafka as the event spine, Kafka Streams for live sessionization and enrichment (identity stitching included), Druid for sub-second OLAP over streaming data, and a feature path feeding the recommendation API in under 100ms. Live A/B readouts come free, because exposure and conversion flow through the same spine.
The 2.3M events/second peak is a labelled reference target (large marketplace, flash sale). The streaming engineering is documented Vipra production work: 500ms end-to-end at 50M events/day on inventory, and an 18% revenue lift from real-time personalization at 8M-customer scale.
01 · The Two-Second Budget, Itemised
Latency budgets fail when they're aspirations instead of allocations. Here is the budget with owners — the P99 column is the one that matters, because the shopper who hits your tail latency is statistically your busiest-hour shopper:
| Stage | P50 target | P99 budget | Owner |
|---|---|---|---|
| Edge collection → Kafka produce | 60ms | 150ms | Web platform |
| Streams: sessionize + enrich | 120ms | 300ms | Data platform |
| Feature store write (Redis-class) | 30ms | 100ms | Data platform |
| Recommendation API: read + score | 80ms | 200ms | ML serving |
| Experience render (web/app) | 100ms | 250ms | Frontend |
| Total | ~390ms | 1.0s | One named owner |
That leaves a full second of headroom against the 2-second product promise — headroom that exists to be spent on the bad day, not the demo. Every component below was chosen for its tail behaviour; means are for marketing.
02 · The Architecture, End to End
03 · Sessionization in Kafka Streams, Not the Warehouse
Sessions computed nightly in SQL are history lessons. Kafka Streams session windows maintain the session while it's happening:
Kafka Streams — live sessions that emit while openKStream<String, ClickEvent> clicks = builder.stream("clickstream"); clicks .groupByKey() .windowedBy(SessionWindows.ofInactivityGapAndGrace( Duration.ofMinutes(30), Duration.ofMinutes(2))) .aggregate(SessionState::new, SessionState::add, SessionState::merge, Materialized.<String, SessionState, SessionStore<Bytes, byte[]>> as("sessions").withRetention(Duration.ofHours(12))) .suppress(Suppressed.untilTimeLimit( // emit cadence, NOT window close — Duration.ofSeconds(5), maxRecords(1))) // abandonment needs the open session .toStream() .to("session-updates");
The three non-obvious essentials: emit on cadence, not on close — an abandonment engine that only learns about sessions after they end is a postmortem service; identity stitching mid-session — when the login event arrives, re-key the stream using an anonymous→known mapping held in a GlobalKTable, so the session's history survives the boundary (this is miniature Customer-360 discipline, and it's where most clickstream platforms quietly lose half their signal); and size state for peak concurrent sessions — average-day sizing plus a flash sale equals a RocksDB eviction storm at the worst moment.
04 · Enrichment: Joins at Stream Speed
Raw clicks are thin — visitor 481 viewed SKU 9912 decides nothing. Enrichment makes events decisionable, and the only way to join at stream speed is to make every lookup a local memory read:
GlobalKTables — reference data as local stateGlobalKTable<String, Product> products = builder.globalTable("products-compacted"); GlobalKTable<String, Inventory> inventory = builder.globalTable("inventory-compacted"); GlobalKTable<String, Identity> idMap = builder.globalTable("identity-map"); clicks .join(products, (k, e) -> e.sku(), ClickEvent::withProduct) .join(inventory, (k, e) -> e.sku(), ClickEvent::withStock) // truthful offers only .leftJoin(idMap, (k, e) -> e.visitorId(), ClickEvent::withIdentity);
The inventory join deserves its sentence: discounting an item that just sold out is a self-inflicted wound, and it happens to every team that enriches against a nightly inventory snapshot. Our production inventory platform exists precisely to make that join truthful — 50M events/day at 500ms update latency. Compacted topics keep the GlobalKTables fresh; the recommendation never promises what the warehouse doesn't hold.
05 · Druid: OLAP That Answers in Milliseconds
Marketing asks: funnel conversion by traffic source, last 15 minutes, by minute. Warehouses answer that shape in tens of seconds at high concurrency cost; Druid answers in tens of milliseconds with second-level freshness, because it was built for exactly this: streaming ingestion from Kafka, rollup at ingestion time, and a segment architecture designed for time-sliced aggregation under concurrency.
Druid ingestion spec — rollup is the economics (excerpt)"granularitySpec": { "segmentGranularity": "hour", "queryGranularity": "minute", // pre-aggregate to the grain dashboards use "rollup": true }, "dimensionsSpec": { "dimensions": ["funnel_stage","traffic_source","device","campaign","category"] }, "metricsSpec": [ {"type":"count","name":"events"}, {"type":"thetaSketch","name":"visitors","fieldName":"visitor_id"}, // distinct at speed {"type":"doubleSum","name":"revenue","fieldName":"order_value"} ]
Minute-grain rollup with sketch-based distinct counts cuts storage 10–50× while preserving every query shape a human dashboard actually issues. Keep raw events in the lakehouse for ad-hoc science; Druid serves the operational questions where someone is waiting on the answer. The division of labour is the design: Druid for the questions you ask standing up, the lakehouse for the ones you ask sitting down.
06 · A/B Results While the Test Is Running
When exposure events and conversion events flow through the same spine, experiment readout becomes a Druid query instead of a weekly report. Freshness changes what experiments are for: breakage detection in minutes (a variant that tanks conversion gets caught before lunch, not after a week of damage), and sample-ratio mismatch monitored continuously — the silent killer of experiment validity, visible as a live dashboard line.
The discipline that keeps live readouts honest: pre-register the metric definition in code — a shared library both the assignment service and the analysis query import, so "conversion" cannot drift between teams; and resist peeking-driven stopping — fresh data is for detecting breakage fast, not for declaring victory early. Sequential testing methods exist for the impatient; use them rather than re-inventing p-hacking at streaming speed.
07 · Surviving the Flash Sale
The 2.3M events/second reference peak is 20× a normal hour and it arrives in minutes. Peak behaviour is designed, not hoped for:
- Partitions sized for peak, in advance. Repartitioning during an incident is not a plan. 256 partitions at ~9K events/sec/partition peak is comfortable; the idle cost of over-partitioning on a normal day is trivial against one melted flash sale.
- Consumers autoscale on lag, not CPU. Streams instances scale on consumer-group lag thresholds; CPU-based scaling reacts after the queue has already formed.
- Druid pre-scales on the marketing calendar. The flash sale is scheduled; ingestion tasks and historical replicas scale the hour before, not reactively.
- Degradation switches, decided in peacetime. When lag exceeds the intervention window, the abandonment path sheds load gracefully — recommendations fall back to popularity, the data path stays whole, and nothing corrupts. The worst flash-sale outcome isn't slow personalization; it's a week of poisoned analytics.
Experience, P99
(Reference Target)
Inventory Latency
Documented, 8M Customers
08 · Lessons Learned: The Hard Truths
- The budget needs one owner with authority across teams. Five teams each meeting their local SLO summed to a missed product promise. The fix was organisational: one named owner for the end-to-end number, with the standing to make teams trade milliseconds.
- Emit-on-close was our most expensive default. The first build only published completed sessions — a perfect dataset for a use case that needs the open ones. Re-architecting emission cadence cost a quarter; reading this paragraph is cheaper.
- Identity stitching is half your signal. Platforms that drop the anonymous prefix of a session personalise on amnesia. The GlobalKTable re-key is two days of work and it roughly doubled usable session history.
- Stale inventory enrichments are negative-value personalization. Discounting sold-out items measurably hurts trust metrics versus showing nothing. If you can't join live inventory, don't make stock-contingent offers.
- Druid rollup decisions are forever-ish. Dimensions you didn't keep at ingestion are gone from the rolled-up data. We keep a parallel raw stream to the lakehouse precisely so a rollup mistake is a backfill, not a loss.
- Degradation design is what saved the launch. The switch that dropped intervention and preserved the data path turned our worst traffic day into a non-event with a clean dataset. Decide your shed order before the day decides it for you.
09 · Key Takeaways for Practitioners
Every stage gets a P99 allocation and an owner. The total gets one owner with cross-team authority.
Suppression on cadence, not close. Abandonment intervention needs the session that hasn't ended.
GlobalKTables from compacted topics — products, inventory, identity. No database on the hot path.
Rollup at ingestion, sketches for distincts, raw events to the lakehouse. Two consumers, one truth.
Shared metric definitions in code; SRM on a dashboard; sequential methods instead of peeking.
Partitions for peak, scaling on lag, pre-scaled Druid, and a peacetime-decided shed order.
The production systems behind the proven numbers: real-time inventory intelligence (500ms at 50M events/day) and Customer 360 personalisation (18% revenue lift at 8M customers). Sector context on the retail industry page.