Big Data 17 min read

Design and Practice of Xiaomi’s One‑Stop Data Production Platform

This article presents a comprehensive overview of Xiaomi’s data production platform, detailing the full data lifecycle, the technical‑driven product design methodology, the platform’s architecture and core capabilities, as well as real‑world case studies and a Q&A session that illustrate how the system improves data collection, storage, processing, and usage across the organization.

DataFunTalk

Dec 5, 2023

Design and Practice of Xiaomi’s One‑Stop Data Production Platform

The article introduces Xiaomi’s one‑stop data production platform, outlining its purpose to unify fragmented data development tools and address issues such as permission management, performance, and data consistency.

It explains the complete data lifecycle in five stages—generation, collection, storage, processing, and application—using a water‑cycle analogy, and emphasizes that the platform focuses on the first four stages, collectively called the data production chain.

Detailed discussions cover data generation (online and offline sources), collection methods (client‑side, server‑side, IoT devices, questionnaires, and web crawling), and storage choices (relational databases, NoSQL, message queues, file systems, and big‑data stores, with Xiaomi adopting Iceberg and Hologres).

The processing phase is broken down into ETL, data cleaning, and offline/real‑time development, highlighting how these steps transform raw data into valuable assets for downstream applications.

A technical‑driven product perspective is presented, describing core characteristics such as heavy reliance on underlying technology, performance focus, and a user base of engineers; the role of product managers in translating technical capabilities into user‑friendly solutions is also discussed.

Three practical case studies illustrate platform evolution: (1) a technology‑led architecture upgrade introducing Iceberg for a data lake, (2) a product‑driven efficiency improvement using drag‑and‑drop lineage and SQL parsing, and (3) performance enhancements by integrating Presto and Spark 3.x, enabling multi‑engine queries.

The article then outlines the platform’s construction roadmap, from initial analysis and MVP development (0→1) to scaling and standardization (1→10), and finally to continuous innovation while maintaining focus on technical users.

A Q&A section answers common questions about monitoring mechanisms, engine division of labor between Presto and Doris, the roles of data lake versus data warehouse, and product manager involvement in technology selection.

Overall, the piece demonstrates how a well‑designed, technology‑driven data production platform can streamline data workflows, improve efficiency, and support a wide range of analytical and operational use cases within a large organization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data engineering Data Platform ETL Data Lifecycle

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.