Big Data 14 min read

Design, Challenges, and Evolution of Lianjia's Big Data Platform Architecture (0 → 1.0 → 2.0)

This article details the evolution of Lianjia's massive‑data platform from its initial 0 version through 1.0 and 2.0, describing architectural challenges, a three‑layer redesign, data‑engine selection (ROLAP, MOLAP, Kylin), transparent compression techniques, and practical lessons for large‑scale data systems.

DataFunTalk

May 29, 2018

Design, Challenges, and Evolution of Lianjia's Big Data Platform Architecture (0 → 1.0 → 2.0)

Introduction: The article is based on Zhao Guoxian's talk at DataFun about selecting and applying data engines for massive data, summarizing the evolution from version 0 to 1.0 and 2.0 of Lianjia's big‑data platform.

Challenges of the original 1.0 architecture: simple coupling, long development cycles, engineers acting as data extractors, frequent failures, and high maintenance cost.

Redesign to a three‑layer platform: (1) Cluster layer with Hadoop, HBase, Kafka, Sqoop, YARN, MapReduce, Spark, Presto; (2) Toolchain layer with a custom scheduling system, metadata management (Meta) and data‑service middleware (Adhoc); (3) API layer exposing internal, business‑oriented and generic APIs.

Data engine selection: comparison of ROLAP (real‑time SQL on Hive/Presto), MOLAP (pre‑computed cubes using Kylin/Druid), and hybrid approaches; reasons for choosing Apache Kylin for high‑concurrency offline analytics.

Transparent compression project: hierarchical storage with hot (SSD), warm (disk) and cold (archive) tiers, using ZFS with gz compression to reduce storage cost, and migration workflow from HDFS to ZFS.

Results: compression saved ~300 wan RMB, reduced growth rate, and improved cost efficiency; future work includes off‑loading compression to QAT cards, combining EC codes, and intelligent hot‑data acceleration.

Conclusion: Emphasize thorough requirement analysis, stable iterative architecture, proactive monitoring, and continuous online optimization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture Big Data OLAP Hadoop Kylin transparent compression

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.