Vipra SoftwareCase StudiesRegulatory Data Lineage & Compliance
Apache AtlasGDPRPCI-DSS

Regulatory Data Lineage
& Compliance

How Vipra Software built an end-to-end automated data lineage graph covering 100% of a European bank's data assets using Apache Atlas and OpenMetadata, delivering GDPR compliance certification and full PCI-DSS audit coverage.

Industry
Banking
Duration
18 Weeks
Data Assets Covered
4,800+ Assets
Regulations
GDPR + PCI-DSS
Audit Coverage
100%
100%
Audit Coverage Achieved
4800+
Data Assets Catalogued
2reg
Compliance Certs Passed
18w
Delivery Timeline

The Challenge

A European retail bank with operations in 6 EU member states was facing a dual regulatory crisis. An imminent GDPR audit required the bank to demonstrate that it could trace the lineage of every personal data element — from point of collection through processing, storage, and any third-party sharing — and produce Subject Access Request responses within the 30-day regulatory window. Simultaneously, a PCI-DSS recertification audit required documented evidence of cardholder data flows across all systems in scope.

The bank had no automated data lineage capability. Lineage documentation existed as partially-maintained Word documents and Visio diagrams maintained by individual system owners — with coverage gaps, outdated information, and no connection to the actual technical data flows. The compliance team had attempted to manually produce a data flow map for the GDPR assessment and abandoned the exercise after 6 weeks when it became clear that manual documentation at the required depth across 28 source systems was not achievable within the audit timeline.

The regulatory stakes were not abstract. The bank's legal team had assessed potential GDPR fine exposure at €45M based on the data volumes involved. PCI-DSS non-compliance would trigger a card scheme assessment that could result in increased interchange fees affecting the bank's entire card issuing business. The lineage programme needed to deliver regulatory-grade results within 18 weeks — before the audit windows opened.

Our Approach

Vipra Software designed a hybrid lineage architecture combining Apache Atlas for technical lineage (the actual data flows as they exist in code and pipelines) with OpenMetadata for business lineage and the regulatory reporting layer. Automated lineage capture — rather than manual documentation — was the foundational principle: no lineage that couldn't be automatically validated against the running system would be accepted.

  • Regulatory Scoping & System Inventory (Weeks 1–3): Working with the bank's compliance officer and DPO, scoped the full personal data and cardholder data inventory. Identified 28 systems in GDPR scope and 12 in PCI-DSS scope. Classified 4,800+ data assets by regulatory sensitivity category.
  • Apache Atlas Deployment & Technical Lineage (Weeks 4–8): Deployed Apache Atlas with custom hook integrations for the bank's technology stack: JDBC hooks for Oracle and SQL Server sources, Hive lineage for the data warehouse, custom Python hooks for the 4 Python-based ETL pipelines, and dbt integration for the analytical transformation layer. Achieved automated lineage capture for 94% of data flows without manual documentation.
  • OpenMetadata Integration (Weeks 9–12): Deployed OpenMetadata as the business-facing governance layer, ingesting technical lineage from Atlas and enriching with business context, data owner assignments, GDPR lawful basis classifications, and PCI-DSS scope flags. Built automated PII detection using ML classification across all catalogued datasets.
  • Regulatory Reporting Automation (Weeks 13–15): Built automated regulatory report generation: GDPR Article 30 Record of Processing Activities (ROPA) generated from Atlas lineage data; PCI-DSS Data Flow Diagram auto-generated from cardholder data scope flags; Subject Access Request response package generator querying lineage to identify all systems holding data for a named individual.
  • Audit Preparation & Validation (Weeks 16–18): Conducted mock GDPR audit with the bank's DPO simulating regulator questioning. Ran PCI-DSS pre-assessment with the bank's QSA (Qualified Security Assessor). Addressed 8 lineage gaps identified during mock audit with targeted manual documentation for the 6% of flows without automatic hook capture. Achieved sign-off from both DPO and QSA before live audit windows.

Technical Architecture

Apache Atlas serves as the technical lineage store, with integration hooks embedded into every pipeline execution context. When a dbt model runs, the Atlas hook records input and output tables, transformation logic, and execution metadata. When an Oracle stored procedure executes, the JDBC hook captures the source and target tables involved. This automated capture approach means lineage is always current — it reflects what the system actually does, not what documentation says it should do.

OpenMetadata sits above Atlas as the governance and search layer, providing the business-friendly interface that the compliance team and data owners use daily. The PII classification model (fine-tuned on financial services terminology) scans all datasets on ingestion and flags columns containing names, addresses, account numbers, and other personal data identifiers — ensuring that new data assets are automatically assessed for regulatory scope without manual review.

The Subject Access Request (SAR) generator is the most operationally significant capability: given a customer identifier, it traverses the Atlas lineage graph to identify every system holding data for that individual, queries each system's record count, and produces a structured package suitable for DPO review within minutes — compared to the previous process of manually contacting 12+ system owners and aggregating responses over 2–3 weeks.

Business Impact

The GDPR audit was passed with no enforcement action — the first time the bank had received a clean assessment in three audit cycles. The auditor specifically cited the automated lineage capability and the SAR response demonstration as evidence of a mature data governance posture. The estimated €45M fine exposure was eliminated.

The PCI-DSS recertification was completed with zero non-conformities against data flow documentation requirements — previously the most time-consuming audit section. The QSA noted that the bank's automated lineage capability was among the most comprehensive they had assessed across their financial services client base.

SAR response time dropped from an average of 22 days (close to the 30-day regulatory limit) to under 4 hours using the automated generator. The compliance team, previously consuming 40% of their capacity on manual SAR production, redeployed that capacity to proactive compliance risk monitoring — a structural shift from reactive to preventive compliance posture.

Technology Stack

Apache Atlas OpenMetadata dbt Collibra Python Oracle SQL Server Hive

Services Delivered

Data Lineage GDPR Compliance PCI-DSS Data Catalogue Regulatory Reporting PII Classification

Facing a Regulatory Audit?

We deliver automated data lineage and compliance frameworks that satisfy GDPR, PCI-DSS, and other regulatory requirements with evidence auditors accept.

Start the Conversation →
← Previous: Supply Chain Lakehouse Next: Customer 360 →