Big Data 15 min read

Xiaomi Sales Data Warehouse: Construction Practices, Architecture, and Capability Evolution

This article presents a comprehensive overview of Xiaomi's sales data warehouse, detailing its development history, dimensional modeling theory, multi‑layer architecture, Lambda design with batch and streaming processing, capability layers, security measures, and answers to common technical questions.

DataFunTalk
DataFunTalk
DataFunTalk
Xiaomi Sales Data Warehouse: Construction Practices, Architecture, and Capability Evolution

The presentation introduces Xiaomi's sales data warehouse, outlining its evolution from siloed warehouses before 2019 to a unified, real‑time capable platform built under the ABC (AI, Big data, Cloud) strategy, with offline construction completed in 2020 and real‑time capabilities added in 2021.

It explains the warehouse's scope—storing orders, logistics, store, user behavior, and product data—and describes data acquisition methods via data maps, data factories, and data encyclopedias, as well as the various storage and query technologies such as Hive, Iceberg, Talos, OLAP engines, and real‑time query layers.

The construction theory section covers business analysis, domain modeling, fact and dimension table design, metric definition, dimensional modeling, layer separation (ODS, DWD, DWM, DIM, DM, ADS, TMP), and key modeling principles like high cohesion, low coupling, public logic sinking, cost‑performance balance, consistency, and data rollback.

The architecture part details the Lambda architecture employed: batch processing with Spark + Hive, streaming with Flink + Talos, acceleration via Hologres, and a unified Iceberg‑based batch‑stream solution that supports both structured and unstructured data while offering transactional writes.

The capability layer discusses unified data architecture, real‑time processing standards, strict development and quality guidelines, data security compliance (including GDPR), and metric management through a data encyclopedia linked to corporate dashboards.

The Q&A segment addresses practical issues such as handling periodic refunds, data permission layers, replacement of Kudu with Hologres/Doris, the role of DWD/DWM as detail layers, exposure of DM‑level data, and storage of dimension metrics.

Big DataFlinkmetricsData WarehouseIcebergLambda architecture
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.