Tencent PCG Data Governance System: Architecture, Asset Scoring, and One‑Stop Governance Platform
The article presents Tencent PCG's comprehensive data governance solution, detailing the challenges of massive, heterogeneous data, the four‑chapter framework covering governance overview, meta‑warehouse construction, an open asset‑scoring system, and a one‑stop governance workbench, and explains how lineage, scoring, and rule‑engine mechanisms enable cost‑effective, continuous data governance.
The presentation introduces Tencent PCG's data governance system, which addresses the high‑volume, complex data landscape across QQ, Tencent Video, News, WeRead, and Music by moving from ad‑hoc, "exercise‑style" governance to a systematic, platform‑driven approach.
Key challenges identified are the massive, diverse data growth (trillions of daily records) and a fragmented technical architecture with varied scheduling, storage, and processing frameworks (Venus, US, Hive, PySQL, PySpark, etc.). These make cost control, risk assessment, and governance execution difficult for both managers and data owners.
The solution is organized into four chapters: (1) an overview of data governance problems and remedies; (2) the construction of a meta‑warehouse focusing on feature extraction and lineage; (3) an open, sustainable asset‑scoring system that evaluates data assets across five dimensions—norms, cost, security, quality, and application; and (4) a one‑stop governance workbench that provides visibility, policy definition, execution, and effect review.
Asset scoring aggregates fine‑grained metrics (e.g., naming conventions, storage/computing costs, security compliance, field usage) into a unified score, enabling managers to set KPIs and prioritize remediation. The governance engine extracts lineage at table and field levels using custom parsers for SuperSQL, Thive, Hive, and Spark, normalizing diverse SQL dialects into a unified model.
Feature engineering derives actionable governance items such as table/field hotness, duplicate calculations, and cross‑layer dependencies. A rule engine translates these features into concrete governance tasks, which are scored and fed back into the asset‑score, creating a closed loop of continuous improvement.
The one‑stop workbench offers four core functions: (1) a panoramic view of asset health and cost; (2) manager‑driven policy creation with over 100 predefined governance items; (3) execution support for data owners, including one‑click operations; and (4) post‑governance analytics that quantify cost savings and score improvements, with regular reports delivered to teams.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.