Vipra SoftwareCase StudiesCustomer 360 & Personalisation Engine
DatabricksMLflowPower BI

Customer 360 &
Personalisation Engine

How Vipra Software built a Databricks Customer 360 platform for a retail chain with 8M customers — unifying fragmented customer data across 5 channels to power ML-driven recommendations that delivered an 18% revenue lift in the first year.

Industry
Retail
Duration
24 Weeks
Customer Base
8M Customers
Stack
Databricks + MLflow
Revenue Impact
+18% Revenue Lift
18%
Revenue Lift Delivered
8M
Customer Profiles Built
5ch
Channels Unified
24w
Delivery Timeline

The Challenge

A national retail chain with 8 million loyalty programme members had a customer data problem that prevented it from competing with digitally-native retailers on personalisation. Customer interactions existed across 5 distinct channels — physical stores (POS), e-commerce, mobile app, email marketing, and a third-party loyalty programme partner — each with its own customer identifier, transaction history, and behavioural data. A customer who shopped in-store, browsed online, and clicked an email promotion was recorded as three different entities across the chain's systems.

The commercial consequence was directly quantifiable. Personalisation capability was limited to generic RFM (Recency, Frequency, Monetary) segmentation applied to loyalty card transaction history — a blunt instrument that grouped 500,000 customers into 12 segments, applied the same promotion to each segment, and hoped for conversion. Email click-through rates had declined 40% over three years as customers increasingly ignored promotions that bore no relationship to their actual preferences or purchase history.

The data science team had ambition — they had prototyped collaborative filtering recommendation models, customer lifetime value prediction, and churn propensity scoring — but all prototypes were built on samples of loyalty data only. Without a unified customer profile incorporating browsing behaviour, app interactions, and cross-channel purchase patterns, the models couldn't reach the accuracy required for production deployment.

Our Approach

Vipra Software's Customer 360 architecture centred on Delta Lake as the unified customer data foundation, with Databricks providing both the data engineering (ETL, identity resolution) and machine learning (feature engineering, model training, serving) workloads on a single platform — eliminating the fragmentation between data engineering and data science tooling that had slowed previous initiatives.

  • Customer Identity Resolution (Weeks 1–6): Designed a probabilistic identity resolution engine using PySpark to resolve customer identities across 5 channels. Resolution rules combined email, phone, loyalty ID, and device fingerprint matching with ML-based entity resolution for ambiguous cases. Resolved 8M+ unique customer identities from 23M+ cross-channel records — a 2.9:1 identity collapse ratio indicating significant identity fragmentation pre-project.
  • Delta Lake Customer 360 Foundation (Weeks 7–11): Built the Customer 360 Delta Lake tables: unified_customer (golden record), customer_transactions (cross-channel purchase history), customer_events (browsing, app, and marketing engagement events), and customer_attributes (derived behavioural attributes). Implemented Delta Lake merge operations for daily incremental updates across all 5 source channels.
  • Feature Engineering Platform (Weeks 12–15): Built 180 customer features using PySpark: purchase recency/frequency/value by category, brand affinity scores, price sensitivity indices, channel preference scores, seasonal purchase patterns, and product category exploration metrics. Registered all features in the Databricks Feature Store for reuse across ML models.
  • ML Model Development (Weeks 16–20): Built and MLflow-tracked 4 production ML models: (a) Two-tower collaborative filtering recommendation model — 500K product interactions training set, deployed for real-time product recommendations; (b) Customer Lifetime Value (CLV) prediction — gradient boosted model predicting 12-month spend; (c) Churn propensity scoring — 90-day churn risk score for loyalty retention interventions; (d) Next-best-offer model — classification model predicting which promotion category a customer will respond to.
  • Serving Layer & Integration (Weeks 21–23): Deployed models to Databricks Model Serving for real-time inference (recommendation API, <100ms p99 latency). Built batch scoring pipelines for CLV and churn running on daily cadence. Integrated model outputs into the email platform, e-commerce recommendation widgets, and store associate mobile app.
  • Power BI Analytics & Cutover (Week 24): Built customer analytics dashboards in Power BI for marketing and commercial teams: segment performance, model accuracy tracking, campaign attribution, and CLV cohort analysis. Executed phased rollout starting with email personalisation, then e-commerce widgets, then store recommendations.

Technical Architecture

The architecture is built entirely on Databricks Unity Catalog, providing a single governance layer for both data engineering assets (Delta tables, notebooks, pipelines) and ML assets (experiments, models, feature tables). This unified governance model ensures that the model serving layer has the same access controls and audit trail as the underlying data — a requirement for GDPR compliance on customer data used in automated decision-making.

The recommendation model uses a two-tower neural architecture: a customer tower encodes the 180-feature customer embedding, a product tower encodes product attributes and historical interaction patterns, and the dot product similarity between embeddings produces relevance scores. The model is served in real-time from Databricks Model Serving, with a Redis cache layer storing the top-100 recommendations per customer to serve e-commerce page loads without incurring model inference latency on every request.

MLflow provides the experimentation and model lifecycle management layer, tracking 340+ model training runs across the 4 production models. Every production model deployment is linked to its training run, feature set version, and evaluation metrics — providing complete model provenance for the regulatory audit trail required under GDPR's automated decision-making provisions.

Business Impact

Revenue lift of 18% was measured in a controlled A/B test comparing the personalised recommendation experience against the previous generic RFM-segmented promotions — with 1M customers in each test cell running over a full quarter to capture seasonal effects. The test was designed and analysed by the client's own analytics team, independently of the implementation, to ensure unbiased measurement.

Email click-through rates recovered from a 3-year declining trend: the first personalised email campaign sent to the unified customer profiles achieved a 340% higher CTR than the previous generic campaign sent to the equivalent cohort. The difference was attributed to next-best-offer model accuracy — customers received promotions for categories they had browsed recently rather than the chain's highest-margin products regardless of relevance.

Churn prevention became an operational capability for the first time. The churn propensity model identified 280,000 customers in the high-risk cohort in the first month of operation. A targeted win-back campaign — offering personalised rewards based on the CLV model's estimate of each customer's potential future value — achieved a 23% reactivation rate among churned customers who had not transacted in 90+ days. The incremental revenue from reactivated high-CLV customers in the first quarter was £2.4M — exceeding the full programme investment within a single trading period.

Technology Stack

Databricks Delta Lake MLflow PySpark Power BI Unity Catalog Feature Store Redis Python Airflow

Services Delivered

Customer 360 Identity Resolution ML Engineering Recommendation Engine Feature Engineering Retail Analytics

Fragmented Customer Data?

We build Customer 360 platforms that unify cross-channel customer data and power personalisation at scale. Talk to our team about your customer data challenge.

Start the Conversation →
← Previous: Regulatory Lineage ← All Case Studies