How Vipra Software built an end-to-end automated data lineage graph covering 100% of a European bank's data assets using Apache Atlas and OpenMetadata, delivering GDPR compliance certification and full PCI-DSS audit coverage.
A European retail bank with operations in 6 EU member states was facing a dual regulatory crisis. An imminent GDPR audit required the bank to demonstrate that it could trace the lineage of every personal data element — from point of collection through processing, storage, and any third-party sharing — and produce Subject Access Request responses within the 30-day regulatory window. Simultaneously, a PCI-DSS recertification audit required documented evidence of cardholder data flows across all systems in scope.
The bank had no automated data lineage capability. Lineage documentation existed as partially-maintained Word documents and Visio diagrams maintained by individual system owners — with coverage gaps, outdated information, and no connection to the actual technical data flows. The compliance team had attempted to manually produce a data flow map for the GDPR assessment and abandoned the exercise after 6 weeks when it became clear that manual documentation at the required depth across 28 source systems was not achievable within the audit timeline.
The regulatory stakes were not abstract. The bank's legal team had assessed potential GDPR fine exposure at €45M based on the data volumes involved. PCI-DSS non-compliance would trigger a card scheme assessment that could result in increased interchange fees affecting the bank's entire card issuing business. The lineage programme needed to deliver regulatory-grade results within 18 weeks — before the audit windows opened.
Vipra Software designed a hybrid lineage architecture combining Apache Atlas for technical lineage (the actual data flows as they exist in code and pipelines) with OpenMetadata for business lineage and the regulatory reporting layer. Automated lineage capture — rather than manual documentation — was the foundational principle: no lineage that couldn't be automatically validated against the running system would be accepted.
Apache Atlas serves as the technical lineage store, with integration hooks embedded into every pipeline execution context. When a dbt model runs, the Atlas hook records input and output tables, transformation logic, and execution metadata. When an Oracle stored procedure executes, the JDBC hook captures the source and target tables involved. This automated capture approach means lineage is always current — it reflects what the system actually does, not what documentation says it should do.
OpenMetadata sits above Atlas as the governance and search layer, providing the business-friendly interface that the compliance team and data owners use daily. The PII classification model (fine-tuned on financial services terminology) scans all datasets on ingestion and flags columns containing names, addresses, account numbers, and other personal data identifiers — ensuring that new data assets are automatically assessed for regulatory scope without manual review.
The Subject Access Request (SAR) generator is the most operationally significant capability: given a customer identifier, it traverses the Atlas lineage graph to identify every system holding data for that individual, queries each system's record count, and produces a structured package suitable for DPO review within minutes — compared to the previous process of manually contacting 12+ system owners and aggregating responses over 2–3 weeks.
The GDPR audit was passed with no enforcement action — the first time the bank had received a clean assessment in three audit cycles. The auditor specifically cited the automated lineage capability and the SAR response demonstration as evidence of a mature data governance posture. The estimated €45M fine exposure was eliminated.
The PCI-DSS recertification was completed with zero non-conformities against data flow documentation requirements — previously the most time-consuming audit section. The QSA noted that the bank's automated lineage capability was among the most comprehensive they had assessed across their financial services client base.
SAR response time dropped from an average of 22 days (close to the 30-day regulatory limit) to under 4 hours using the automated generator. The compliance team, previously consuming 40% of their capacity on manual SAR production, redeployed that capacity to proactive compliance risk monitoring — a structural shift from reactive to preventive compliance posture.