Big Data 13 min read

Taobao Data Model Governance and Intelligent Modeling with DataWorks

This article summarizes Guo Jinshi's presentation on Taobao's data model governance, covering the current data landscape, identified problems, analysis of root causes, proposed governance solutions—including DataWorks intelligent modeling—and future plans, while also providing a Q&A session on practical implementation.

DataFunTalk
DataFunTalk
DataFunTalk
Taobao Data Model Governance and Intelligent Modeling with DataWorks

Guest: Guo Jinshi, Alibaba – Taobao/Tmall Data Warehouse Public Layer Model Lead

Editor: Zhang Chao, Shenzhen Recycle Treasure

Platform: DataFunTalk

Overview: The talk titled “Taobao Data Model Governance” reviews a year of data‑governance work in the Taobao ecosystem, presenting the overall data background, problems, analysis, solutions, and future directions.

01 Model Background & Issues

1. Overall Situation – Taobao’s data middle‑platform has existed for about seven years, with 22% manually created data and 78% machine‑generated data. Active data accounts for 9% while non‑standard data is 21%.

2. Public Layer – Two core problems: low reuse of public‑layer tables and unreasonable distribution of public tables across teams, leading to many ineffective tables.

3. Application Layer – Main issues include insufficient public‑layer support, many ADS tables with common logic not sunk, and severe cross‑market dependencies (30% overall, up to 40% in some markets).

02 Problem Analysis

Problem Summary – Seven major issues were identified: excessive temporary tables, naming inconsistencies, over‑designed public layer, ADS duplication, cross‑market ADS dependencies, unsunk common logic, and ADS‑ODS penetration.

Root‑Cause Analysis – Issues stem from four categories: architectural standards, process mechanisms, product tools, and development capability.

03 Governance Solutions

Overall Solution – Based on the analysis, a governance plan was devised, focusing on three core strategies: inventory of existing assets, standardizing incremental development, and data‑driven long‑term health maintenance.

Mechanism Standards – (1) Layered architecture standards defining ODS, CDM, ADS responsibilities; (2) Market segmentation principles (business‑scenario based, MECE); (3) Public‑layer co‑construction mechanism with open development and post‑audit governance.

DataWorks Intelligent Modeling – Four aspects: structured data catalog, online model design, automated code generation, and integration with DataWorks map data albums.

04 Future Planning

1. Application‑Layer Efficiency – Improve guidelines, reduce over‑coupling, balance tool‑assisted efficiency with standards.

2. Architectural Governance – Refine design, development, operation, change, and governance standards, especially table‑naming.

3. Product‑Tool Enhancements – Continue co‑building with DataWorks: smarter modeling, data testing, operation upgrades, real‑time governance assistants, batch deletion, push‑optimizations, and data maps for easier data discovery.

05 Q&A Session

Key questions covered public‑layer construction approach, need for unified standards across BU, criteria for sinking metrics to the public layer, naming conventions, handling cross‑market dependencies, and long‑term impact on data operation and governance.

Reference Materials: DataWorks official site (https://www.aliyun.com/product/bigdata/ide) and DataWorks Intelligent Modeling documentation (https://help.aliyun.com/document_detail/276018.html).

Thank you for attending.

Alibababig datadata modelingdata platformdata governanceDataWorks
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.