Big Data 15 min read

Xiaomi Streaming Platform: Evolution, Architecture, and Flink‑Based Real‑Time Data Warehouse

The article presents a comprehensive overview of Xiaomi's streaming platform, detailing its three‑stage evolution from a Scribe‑Kafka‑Storm stack to a Flink‑driven real‑time data warehouse, describing its architecture, components, challenges, migration strategies, job and SQL management, and future roadmap.

DataFunTalk
DataFunTalk
DataFunTalk
Xiaomi Streaming Platform: Evolution, Architecture, and Flink‑Based Real‑Time Data Warehouse

Abstract: Xiaomi operates numerous business lines—information flow, e‑commerce, advertising, finance—and provides an integrated streaming data solution covering data collection, integration, and real‑time computation, handling up to 1.2 trillion messages daily and 1.5 × 10⁴ real‑time sync tasks.

Background: To meet growing business demands, the platform has undergone three major upgrades, the latest based on Apache Flink, replacing Spark Streaming across Xiaomi.

Platform Vision: Deliver a unified, platform‑wide streaming data solution, encompassing streaming data storage (a proprietary message queue similar to Kafka), data ingestion & dumping, and processing using engines such as Flink, Spark Streaming, and Storm.

Architecture Overview: Data sources (User logs, MySQL, HBase, etc.) feed into the Talos message queue; Talos Source collects data, while Talos Sink transfers it with low latency to downstream systems. The platform supports multi‑source & multi‑sink designs, configuration & package management, and end‑to‑end monitoring.

Streaming Platform 1.0: Built in 2010 using Scribe, Kafka, and Storm; offline processing via HDFS/Hive and real‑time via Kafka/Storm. Issues included excessive Scribe agents, lack of buffering, and limited observability.

Streaming Platform 2.0: Introduced Talos (Xiaomi’s own queue), multi‑source/multi‑sink architecture, configuration & package management, and end‑to‑end data monitoring, reducing system complexity from O(M·N) to O(M+N).

Streaming Platform 3.0: Added abstract Table concept, Job management, SQL management, enhanced Talos Sink with SQL‑driven features, and broader platformization for debugging, monitoring, and operations.

Flink‑Based Real‑Time Data Warehouse: To address 2.0’s limitations (schema management, custom sink needs, Spark Streaming’s lack of event‑time and exactly‑once semantics), Xiaomi migrated to Flink, implementing full‑link schema support, leveraging Flink community, productizing streaming jobs, and redesigning Talos Sink with Flink SQL.

Job Management: Provides full lifecycle, permission, tagging, history, status, and latency monitoring, with automatic restart of failed jobs.

SQL Management: Converts external tables to Flink DDL, builds SQL Config (DDL + DML), translates to Job Config (resource & state settings), and finally to JobGraph for submission. Supports schema discovery, UDF/lookup joins, and automatic DDL generation.

Future Plans: Continue Flink adoption, unify offline and real‑time warehouses via Flink SQL, implement data lineage and governance on top of schema, and contribute actively to the Flink community.

Author: Xia Jun, head of Xiaomi Streaming Platform, responsible for streaming computation, message queues, and big‑data integration, working with Flink, Spark Streaming, Storm, Kafka, and related open‑source and proprietary systems.

Real-timebig dataData PipelineFlinkstreamingXiaomi
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.