Design, Challenges, and Evolution of Lianjia's Big Data Platform Architecture (0 → 1.0 → 2.0)
This article details the evolution of Lianjia's massive‑data platform from its initial 0 version through 1.0 and 2.0, describing architectural challenges, a three‑layer redesign, data‑engine selection (ROLAP, MOLAP, Kylin), transparent compression techniques, and practical lessons for large‑scale data systems.
Introduction: The article is based on Zhao Guoxian's talk at DataFun about selecting and applying data engines for massive data, summarizing the evolution from version 0 to 1.0 and 2.0 of Lianjia's big‑data platform.
Challenges of the original 1.0 architecture: simple coupling, long development cycles, engineers acting as data extractors, frequent failures, and high maintenance cost.
Redesign to a three‑layer platform: (1) Cluster layer with Hadoop, HBase, Kafka, Sqoop, YARN, MapReduce, Spark, Presto; (2) Toolchain layer with a custom scheduling system, metadata management (Meta) and data‑service middleware (Adhoc); (3) API layer exposing internal, business‑oriented and generic APIs.
Data engine selection: comparison of ROLAP (real‑time SQL on Hive/Presto), MOLAP (pre‑computed cubes using Kylin/Druid), and hybrid approaches; reasons for choosing Apache Kylin for high‑concurrency offline analytics.
Transparent compression project: hierarchical storage with hot (SSD), warm (disk) and cold (archive) tiers, using ZFS with gz compression to reduce storage cost, and migration workflow from HDFS to ZFS.
Results: compression saved ~300 wan RMB, reduced growth rate, and improved cost efficiency; future work includes off‑loading compression to QAT cards, combining EC codes, and intelligent hot‑data acceleration.
Conclusion: Emphasize thorough requirement analysis, stable iterative architecture, proactive monitoring, and continuous online optimization.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.