How Tencent Music Scaled Its Content Data Platform with Apache Doris: From ClickHouse to 4.0 Architecture
This article details the evolution of Tencent Music's content data platform from version 1.0 to 4.0, describing the migration from ClickHouse to Apache Doris, the introduction of a semantic layer, optimization of data ingestion, query performance, and cost reduction strategies that dramatically improved data timeliness, operational efficiency, and storage costs.
Business Background
Tencent Music Entertainment operates a massive content library covering recorded music, live performances, audio, and video. To support comprehensive analysis of songs, albums, artists, and other content objects, the Content Library Data Platform integrates data sources into a unified asset with metrics and tag systems, providing inventory, segmentation, metric analysis, and tag selection services for downstream applications.
Data Architecture Evolution
Architecture 1.0
The initial architecture consisted of three layers: a data warehouse layer (ODS‑DWD‑DWS) building wide tables, an acceleration layer using ClickHouse for analysis and Elasticsearch for search, and an application layer creating DataSets for logical views. Problems included delayed data updates, high storage costs, complex coupling between ClickHouse and Elasticsearch, and difficulty maintaining both engines.
Architecture 2.0
To address these issues, the analysis engine was switched from ClickHouse to Apache Doris. Doris offers simple deployment, MySQL compatibility, partial column updates, federated queries across Hive, Iceberg, Hudi, MySQL, and Elasticsearch, and multiple import methods (batch from HDFS/S3, MySQL binlog, Kafka, Flink). The migration solved data latency, reduced storage redundancy, and improved query throughput.
Architecture 3.0
A semantic layer was introduced to centralize metric and tag definitions, turning them into first‑class citizens. This layer decouples the application from physical storage, allowing analysts to define unified metrics and tags, and enabling business users to query without building DataSets.
Architecture 4.0
The semantic layer became the core of the architecture, handling SQL generation for the warehouse, configuring acceleration tasks, and directing application queries. This unified view standardizes metric and tag management, simplifies optimization, and resolves data‑management fragmentation.
Optimization Experience
Data Ingestion
Two approaches for building wide tables were evaluated: generating them in TDW (high storage cost, poor real‑time performance) versus using Flink to aggregate streams and write directly to Doris. The Flink approach reduced storage, improved latency, and allowed flexible ETL reuse.
Data Model Selection
Doris 1.1.3 supports three models: Unique (primary‑key updates), Aggregate (supports partial column updates via
REPLACE_IF_NOT_NULL), and Duplicate (raw storage). The Aggregate model was chosen to meet the partial‑update requirement.
Schema Management
Metadata for tags/metrics is stored in a MySQL table mapping business names to internal column IDs (e.g.,
song_name→ a4). Queries rewrite column names to IDs, allowing name changes without altering Doris tables. Tag lifecycle is managed via metadata status, and pre‑allocated ID columns enable rapid onboarding of new tags/metrics with minimal schema change impact.
Write Optimizations
Flink pre‑aggregation reduces write pressure.
Adaptive batch sizes lower Flink resource usage.
Doris BE tuning (partition size, compaction parameters) improves write stability.
BE submission logic caches BE nodes and randomizes batch submissions for balanced load.
<code>max_XXXX_compaction_thread<br/>max_cumulative_compaction_num_singleton_deltas</code>Query Optimizations
Metrics are stored as partitioned tables in Doris, while tags reside in Elasticsearch. A "Doris on ES" solution combines Doris' distributed query planning with ES's full‑text capabilities. Query sessions can enable
es_optimizeto push bucket joins, compress ES results with bitmaps, and limit data transfer, reducing a million‑record tag‑selection query from 60 seconds to 3.7 seconds.
Cost Optimizations
Hot data kept in Doris (TTL for one year); older data moved to TDW.
Partition‑level replica settings (3 replicas for recent data, decreasing for older data).
Cold data migrated from SSD to HDD.
Tag/metric lifecycle management reduces unnecessary storage.
Doris' simple FE/BE architecture, MySQL compatibility, and federated query support lower operational overhead.
These measures lowered storage costs by 42% and development/time costs by 40%.
Future Plans
Automatic hot/cold data identification using Doris for hot data and Iceberg for cold data.
Materialized views for frequently used tag/metric combinations.
Exploring Doris for data‑warehouse compute tasks to simplify logic and improve timeliness.
The authors thank the Apache Doris community and SelectDB for their support and contributions.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.