Tencent Video Metrics Middle Platform and Lakehouse Integration: Architecture, Governance, and Practices
This article details Tencent Video’s data business, describing the design and implementation of its metrics middle platform and lake‑warehouse integration, covering architecture, governance, consistency, timeliness, usability, cost optimization, and future plans, with insights into technology choices such as Iceberg, StarRocks, and MQL.
In the digital era, data underpins enterprise decision‑making, and Tencent Video, as a leading online video platform in China, faces significant data‑related challenges due to its massive user base and rich content.
The presentation begins with an overview of Tencent Video’s data business, outlining key user interaction metrics such as active users, poster exposures, clicks, searches, plays, and comments, which are essential for product and operational decisions.
It then introduces the overall architecture of the metrics middle platform, explaining why a unified platform is needed to address issues of metric consistency, timeliness, usability, and cost. The platform centralizes metric definitions, lineage, and governance, providing a unified query service that abstracts underlying storage engines (MySQL, ClickHouse, StarRocks) and offers a class‑SQL MQL language for simplified access.
The article discusses industry research on metric platforms, highlighting the need for one‑time definition with multi‑usage, unified management, and low‑code delivery. It also details the platform’s governance capabilities, including metric certification, SLA monitoring, data maps for discoverability, and self‑service analysis tools that enable drag‑and‑drop queries without writing SQL.
Subsequently, the lake‑warehouse integration practice is described. The initial Lakehouse 1.0 solution introduced Iceberg as the lake technology and combined batch‑stream processing, but faced consistency and latency issues at higher aggregation layers.
Lakehouse 2.0 upgrades the architecture by incorporating StarRocks for real‑time analytics, simplifying the data flow (Iceberg → StarRocks → Iceberg) to ensure consistency between batch and streaming data while reducing cost and improving development efficiency.
Future plans focus on a metric‑driven data consumption model that unifies definition, production, consumption, and quality assurance, further standardizing configuration, reducing costs, and enhancing both hot and cold data management through adaptive storage and materialized views.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.