Design and Evolution of Alibaba Advertising Real-Time Data Warehouse
Alibaba Mama’s advertising platform migrated from a monolithic Flink‑Kafka pipeline to a layered Paimon lakehouse, adding DWS upsert support and multi‑layer storage, which delivers minute‑level data freshness, cuts latency by 2.5 hours, reduces resource use over 40 %, halves development effort and achieves ≥99.9 % availability.
1. Business Background
Alibaba Mama operates a large‑scale advertising ecosystem covering search, display, and feed across Alibaba’s apps. Advertisers create campaigns via DSPs, and the system tracks impressions, clicks, conversions, and budget consumption. Real‑time data is critical for performance monitoring, fraud detection, and budget enforcement.
1.1 Advertising Data Scenarios and Requirements
External real‑time reports : metrics such as impressions, CTR, ROI.
Internal real‑time analysis : audience, product, region analysis.
Business monitoring : high‑traffic events (e.g., 618, Double 11) need instant feedback.
1.2 Technical Goals for the Data Warehouse
Data timeliness : minute‑level freshness, ideally zero‑delay.
System throughput : billions of events per day, tens of millions of TPS.
Stability : ≥99.9% availability, fast rollback.
Cost efficiency : minimal resources, reduced operational burden.
2. Architecture Design
The architecture evolved from a monolithic Flink‑Kafka pipeline to a layered lakehouse built on Paimon.
2.1 Why a Real‑Time Data Warehouse?
Initial design focused on speed, but faced issues: inflexible requirements, duplicated development, resource waste, schema‑less data, and high operational cost.
2.2 Evolution of the Real‑Time Warehouse
2.2.1 TT‑Based Real‑Time Warehouse
Data flow: ODS (raw logs) → TT → Flink → DWD (processed) → downstream stores (OceanBase, Hologres, ClickHouse). Problems included data duplication, missing DWS layer, lack of upsert, and duplicated offline pipelines.
2.2.2 Paimon Lakehouse Solution
Replaced DWD storage with Paimon, added a DWS layer supporting upsert and changelog consumption. Schema is now materialized, enabling direct queries for BI and algorithms. The same ODS data is ingested via Flink into Paimon tables.
2.2.3 Full Lakehouse Architecture
Four layers plus an ops platform: Data layer (TT → Paimon), Compute layer (Flink, MaxCompute), Storage layer (dual‑system for HA), Application layer (reports, BI, algorithms), and Operations platform (monitoring, stress testing).
2.2.4 Common Paimon Optimizations
Asynchronous compaction for higher write throughput.
Dynamic checkpoint intervals based on traffic.
Resource allocation matching bucket count.
File lifecycle management to avoid small‑file explosion.
Consumer ID retention to enable job recovery.
3. Benefits of the Lakehouse
Layered design reduced development time by >30%, cut data latency by 2.5 hours on average, and enabled minute‑level budget tracking. Resource consumption dropped >40%, and development effort fell by 50%, achieving near‑zero downtime with three‑nines availability.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.