Where should a data governance programme start?
Start narrow and provable: one critical domain (e.g., revenue or customer), a DQ scorecard, lineage for its pipelines, and a glossary for its 20 most-disputed terms. Expand once that domain demonstrably reduces rework. Big-bang governance rollouts fail; thin slices compound.
Which data catalog do you recommend?
OpenMetadata for engineering-led teams wanting open source and APIs; Collibra for enterprise compliance programmes with stewardship workflows; Atlas where Hadoop heritage matters. We implement all three — the catalog matters less than the adoption programme around it.
How do you measure data quality?
Six dimensions — completeness, validity, uniqueness, consistency, timeliness, accuracy — expressed as executable contracts (Great Expectations/dbt tests) with thresholds, trends, and ownership. Executives see a scorecard; engineers see failing checks in CI before bad data ships.
Can governance coexist with self-service analytics?
That is the point of doing it well. Certified datasets, visible lineage, and clear ownership make self-service safe. Our Fortune 500 governance engagement cut manual reconciliation 40% precisely by enabling self-service BI on governed data.
How do you handle GDPR right-to-erasure in a data lake?
Subject-keyed indexes across zones, deletion vectors or rewrite jobs in Delta/Iceberg tables, propagation to downstream marts, and an auditable erasure log. We design the capability in from day one rather than retrofitting under deadline.