Big Data 14 min read

Design, Challenges, and Evolution of Lianjia's Big Data Platform Architecture (0 → 1.0 → 2.0)

This article details the evolution of Lianjia's massive‑data platform from its initial 0 version through 1.0 and 2.0, describing architectural challenges, a three‑layer redesign, data‑engine selection (ROLAP, MOLAP, Kylin), transparent compression techniques, and practical lessons for large‑scale data systems.

DataFunTalk
DataFunTalk
DataFunTalk
Design, Challenges, and Evolution of Lianjia's Big Data Platform Architecture (0 → 1.0 → 2.0)

Introduction: The article is based on Zhao Guoxian's talk at DataFun about selecting and applying data engines for massive data, summarizing the evolution from version 0 to 1.0 and 2.0 of Lianjia's big‑data platform.

Challenges of the original 1.0 architecture: simple coupling, long development cycles, engineers acting as data extractors, frequent failures, and high maintenance cost.

Redesign to a three‑layer platform: (1) Cluster layer with Hadoop, HBase, Kafka, Sqoop, YARN, MapReduce, Spark, Presto; (2) Toolchain layer with a custom scheduling system, metadata management (Meta) and data‑service middleware (Adhoc); (3) API layer exposing internal, business‑oriented and generic APIs.

Data engine selection: comparison of ROLAP (real‑time SQL on Hive/Presto), MOLAP (pre‑computed cubes using Kylin/Druid), and hybrid approaches; reasons for choosing Apache Kylin for high‑concurrency offline analytics.

Transparent compression project: hierarchical storage with hot (SSD), warm (disk) and cold (archive) tiers, using ZFS with gz compression to reduce storage cost, and migration workflow from HDFS to ZFS.

Results: compression saved ~300 wan RMB, reduced growth rate, and improved cost efficiency; future work includes off‑loading compression to QAT cards, combining EC codes, and intelligent hot‑data acceleration.

Conclusion: Emphasize thorough requirement analysis, stable iterative architecture, proactive monitoring, and continuous online optimization.

ArchitectureBig Datadata platformOLAPHadoopKylintransparent compression
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.