Big Data 20 min read

Xiaomi Growth Analytics System: Architecture Evolution and Doris Optimization

The article details Xiaomi's growth analytics platform evolution from a Lambda architecture using SparkSQL, Kudu, and HDFS to a streamlined MPP solution with Apache Doris, covering performance gains, real‑time data ingestion, query tuning, and operational improvements for large‑scale analytics.

DataFunTalk
DataFunTalk
DataFunTalk
Xiaomi Growth Analytics System: Architecture Evolution and Doris Optimization

With the rapid growth of Xiaomi's internet services, product lines needed a unified, low‑cost, and efficient growth analytics platform to avoid building separate systems for each line.

The initial solution, launched in 2018, combined existing big‑data components (HDFS, Kudu, SparkSQL) in a Lambda architecture, handling data collection, cleaning, storage, and BI reporting, but suffered from high operational costs, query latency, and limited schema flexibility.

To address these issues, the team evaluated MPP databases and selected Apache Doris for its fast query performance, full SQL support, and simple deployment, replacing the SparkSQL‑Kudu‑HDFS stack.

Performance tests on a daily 1‑billion‑row workload showed Doris reduced event‑analysis query time by ~85% and retention/funnel analysis time by ~50% compared to the previous stack.

Data ingestion was re‑engineered using Spark Streaming to perform periodic stream load operations into Doris, with optimizations such as specifying target partitions, limiting daily partition shards, and adjusting batch intervals to improve stability.

Operational challenges like FE master thread spikes and transaction bottlenecks were mitigated by limiting thread‑pool size, adding request timeouts, improving transaction isolation, and adding metrics for latency tracking.

Online query performance was further tuned by adjusting parallel_fragment_exec_instance_num and introducing doris_exchange_instances (later renamed parallel_exchange_instance_num ) to control post‑exchange parallelism, yielding noticeable gains for medium‑size workloads.

Doris's UDF/UDAF framework was employed to implement custom analytics functions, such as retention and funnel calculations, enabling flexible, real‑time analysis.

Table management practices include partitioned event tables (daily partitions) and non‑partitioned tables for user segments, with automated partition lifecycle handling and schema‑change optimizations to reduce lock contention and tablet creation latency.

Since its first deployment in September 2019, the Doris‑based system now runs across multiple clusters serving tens of thousands of daily queries, delivering faster response times and a simpler architecture, while future work focuses on improving real‑time ingestion efficiency and supporting UNIQUE KEY models.

performance optimizationbig dataOLAPApache DorisSQL TuningReal-time IngestionGrowth Analytics
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.