Product Practice of Data Governance Tools at NetEase: Review, Pain Points, Strategy, and Future Planning
The presentation at the DataFun Summit detailed NetEase's data‑governance tool practice, reviewing past initiatives, current challenges, comprehensive product strategies, and future roadmap to improve compute and storage efficiency, cost quantification, and systematic governance across business lines.
In mid‑October at the DataFun Summit, the author presented a product practice of data‑governance tools, covering an internal review, current pain points, overall product strategy, and future roadmap, and received positive feedback from online attendees.
Audience members often struggle to identify existing data, assess its quality, usability, and value; as data volume grows, compute and storage resources become bottlenecks. The talk highlighted governance focus on compute and storage, offering macro‑level product insights and targeted suggestions for business lines.
Past governance activities in internal lines (Yanxuan, Media, Cloud Music) defined “useless data”, assigned owners, scanned and de‑commissioned storage, analyzed compute task costs, and built a cost‑measurement system that combines table metadata, task lineage, and internal billing to quantify storage and compute expenses.
Current pain points include lack of standardized data creation and management, high turnover of data engineers, reactive governance driven by resource pressure, coarse‑grained effectiveness metrics, and challenges in data quality, standards, security, and value assessment.
The platform implements features such as owner‑based table/task assignment, rule‑based useless‑data detection with gray‑space retention, table lifecycle management, compute‑task cost analysis, and red‑black leaderboards for owners and projects, plus email/internal notifications to drive governance participation.
Future planning proposes a three‑stage governance model—scope definition, value quantification, and systematic governance—defining measurable indicators for cost, quality, security, standards, and value, and envisions an automated, end‑to‑end big‑data evaluation tool that ties asset health scores to resource‑allocation decisions.
Author: Cloud Shui Yao, NetEase YouShu product manager responsible for data services, metric systems, and data‑governance, with experience building data service platforms and closing governance loops from 0 to 1.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.