Taobao Data Model Governance: Challenges, Analysis, and Solutions
This article presents a comprehensive overview of Taobao's data model governance, detailing the background and problems of the current data architecture, analyzing root causes, proposing a structured governance framework with DataWorks automation, and outlining future plans to improve efficiency, standardization, and product tooling.
Overview
The session, hosted by DataFunTalk, features Guo Jinshi from Alibaba discussing the past year of data model governance within the Taobao ecosystem, summarizing key findings and future directions.
1. Model Background & Issues
Taobao's data middle platform has operated for about seven years without systematic governance. Data generation is 22% manual and 78% machine‑created, with 9% active data and 21% non‑standard data. The data lifecycle shows a 25‑month model lifespan, 30% annual growth, and 44% retention, while model reuse and cross‑market dependencies are problematic.
Low reuse of public‑layer tables
Uneven distribution of public tables across teams
Excessive temporary tables, inconsistent naming, and duplicated ADS tables
Cross‑market dependencies affecting stability
2. Problem Analysis
Seven major issues were identified: temporary tables, naming inconsistencies, over‑designed public layer, duplicated ADS construction, cross‑market dependencies, unsunk common logic, and ADS‑ODS coupling. Root causes fall into four categories: architecture standards, process mechanisms, product tools, and development capability.
3. Governance Solutions (DataWorks Intelligent Data Modeling)
The proposed solution includes a four‑step approach: inventory of existing assets, standardization of incremental development, ongoing health checks, and data‑driven governance. Specific mechanisms involve layered architecture standards, market segmentation principles, and a co‑construction model for the public layer.
Define clear architecture layers (ODS, CDM, ADS)
Segment data markets by business scenario (MECE principle)
Open public‑layer co‑construction with post‑audit governance
Automate model migration and code generation via DataWorks
Integrate data maps for easier data discovery
4. Model Governance Process
Introduce quantitative scoring for models at team, domain, and core levels, generating tags for issues. The workflow combines data‑driven evaluation (model scores) with product‑driven actions (expert judgments) to prioritize remediation.
5. Future Planning
Improve application‑layer efficiency and reduce coupling
Refine architectural standards and control mechanisms
Enhance product tools: intelligent modeling, data testing, operation upgrades, real‑time governance assistants, batch deletion, and data maps
6. Q&A Highlights
Key points from the Q&A include a hybrid top‑down/bottom‑up approach to public‑layer construction, the need for unified standards across business units, criteria for sinking metrics to the public layer, and handling naming and cross‑market dependency challenges.
Overall, the presentation outlines a pragmatic, tool‑enabled roadmap to elevate data model governance, improve reuse, and sustain long‑term value for Taobao's massive data ecosystem.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.