Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Layers
This article presents Xiaomi's sales data warehouse practice, detailing its evolution, positioning, dimensional modeling, layered architecture, Lambda design, Iceberg integration, capability building, security governance, and future directions toward data value and real‑time metrics.
The article introduces Xiaomi's sales data warehouse practice, covering its development history, positioning, content, scale, and sharing common dimension modeling, layering theory, architecture evolution, and capability accumulation.
Sales warehouse evolution moved from siloed warehouses before 2019 to a unified platform guided by the ABC strategy, with offline construction completed in 2020 and real‑time capabilities added in 2021; it stores orders, logistics, store, user behavior, and product data, integrates online system data and logs, manages metadata globally, and serves dashboards, real‑time screens, and reports.
Construction theory emphasizes business analysis, theme domain definition, fact and dimension table design, metric management via a data encyclopedia, dimensional modeling, and a multi‑layer structure (ODS, DWD, DWM, DIM, DM, ADS, TMP) with principles such as high cohesion‑low coupling, public logic sinking, cost‑performance balance, consistency, and rollback capability.
The architecture adopts a Lambda model: batch processing with Spark + Hive, streaming with Flink + Talos, and DW/DM layers accelerated by Hologres; it addresses real‑time state expiration by merging an offline stream, and replaces Hive/Talos with Iceberg for unified batch‑stream processing, noting benefits and the limitation of commit latency.
Capability layers provide a unified data architecture, minute‑level batch‑stream processing via Iceberg, second‑level real‑time via Flink + Talos, strict data governance, security compliance (GDPR, internal audits), DQC checks, and metric application through a data encyclopedia linked to corporate dashboards.
In summary, the offline sales warehouse is largely built and widely used; the team continues to refine architecture, share best practices, and focus on future trends of data value realization and real‑time metric delivery.
The Q&A section answers questions on handling refund state changes with offline full‑load, permission control across layers, replacing Kudu with Hologres, clarifying DWD/DWM as detail layers, access permissions for various layers, and storage of dimension metrics.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.