Big Data 18 min read

Tencent Video Metrics Middle Platform and Lakehouse Integration: Architecture, Governance, and Practices

This article details Tencent Video’s data business, describing the design and implementation of its metrics middle platform and lake‑warehouse integration, covering architecture, governance, consistency, timeliness, usability, cost optimization, and future plans, with insights into technology choices such as Iceberg, StarRocks, and MQL.

DataFunTalk
DataFunTalk
DataFunTalk
Tencent Video Metrics Middle Platform and Lakehouse Integration: Architecture, Governance, and Practices

In the digital era, data underpins enterprise decision‑making, and Tencent Video, as a leading online video platform in China, faces significant data‑related challenges due to its massive user base and rich content.

The presentation begins with an overview of Tencent Video’s data business, outlining key user interaction metrics such as active users, poster exposures, clicks, searches, plays, and comments, which are essential for product and operational decisions.

It then introduces the overall architecture of the metrics middle platform, explaining why a unified platform is needed to address issues of metric consistency, timeliness, usability, and cost. The platform centralizes metric definitions, lineage, and governance, providing a unified query service that abstracts underlying storage engines (MySQL, ClickHouse, StarRocks) and offers a class‑SQL MQL language for simplified access.

The article discusses industry research on metric platforms, highlighting the need for one‑time definition with multi‑usage, unified management, and low‑code delivery. It also details the platform’s governance capabilities, including metric certification, SLA monitoring, data maps for discoverability, and self‑service analysis tools that enable drag‑and‑drop queries without writing SQL.

Subsequently, the lake‑warehouse integration practice is described. The initial Lakehouse 1.0 solution introduced Iceberg as the lake technology and combined batch‑stream processing, but faced consistency and latency issues at higher aggregation layers.

Lakehouse 2.0 upgrades the architecture by incorporating StarRocks for real‑time analytics, simplifying the data flow (Iceberg → StarRocks → Iceberg) to ensure consistency between batch and streaming data while reducing cost and improving development efficiency.

Future plans focus on a metric‑driven data consumption model that unifies definition, production, consumption, and quality assurance, further standardizing configuration, reducing costs, and enhancing both hot and cold data management through adaptive storage and materialized views.

data engineeringBig DataStarRocksdata governancelakehousemetrics platform
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.