Ant Group’s Data Governance Practices: Overview, Data Quality, and Data Storage Governance
This article shares Ant Group's extensive experience in big data governance, detailing the overall data governance framework, data quality management, data storage governance, and future considerations, illustrated with practical cases and strategies for ensuring compliance, reliability, and cost efficiency.
The presentation outlines Ant Group's four‑part approach to data governance: an overview of governance concepts, data quality governance, data storage (计存) governance, and forward‑looking thoughts on the evolution of data governance.
Data Governance Overview – Ant focuses on five critical dimensions—architecture, security, compliance, quality, and value—to meet regulatory requirements (privacy, anti‑money‑laundering) and ensure data is usable, safe, and valuable across the enterprise.
Data Quality Governance – The discussion covers the sources of data (logs, DBs, messages, unstructured data) and the challenges posed by rapid business changes, high‑frequency financial data, and many stakeholder roles. A three‑layer architecture (capability, system, business) is introduced, with risk categories (technical engine, content, application) and concrete measures such as pre‑release testing, change‑gate controls, gray‑scale releases, and post‑incident audits. Key metrics (fault count, loss volume) drive continuous improvement.
Data Storage Governance – Ant’s 2019 storage utilization exceeded safe thresholds, prompting a shift to mixed‑deployment of offline warehouses and online resources. The strategy includes open‑source‑based resource sharing, tiered storage (hot, warm, archive, cold‑backup), and techniques like progressive computation, storage archiving, and data re‑partitioning. Results show a 50 % increase in warehouse elasticity and a 30 % reduction in storage consumption.
Future Directions – The speaker envisions integrated lake‑warehouse governance powered by large models, turning data from an internal product into a tradable commodity, and leveraging AI to automate risk detection and remediation.
The session concludes with acknowledgments of the speakers and organizers.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.