Evolution and Practice of BEIKE OLAP Platform Architecture and Engine Selection
This article details the three‑stage evolution of BEIKE's OLAP platform—from the early Hive‑to‑MySQL phase, through a Kylin‑based architecture, to a flexible multi‑engine design—explaining metric modeling, engine selection, performance trade‑offs, and future roadmap for supporting Druid, ClickHouse, Doris and real‑time analytics.
This article is based on the notes of senior BEIKE engineer Xiao Zan’s 2020 talk "Engineering Architecture Practice for AI Technologies".
1 BEIKE OLAP Platform Architecture Evolution
The evolution can be divided into three stages:
Stage 0 (2015‑2016): Hive to MySQL – an initial, simple data pipeline.
Stage 1 (2016‑early 2019): Kylin‑based OLAP platform – building on Apache Kylin.
Stage 2 (early 2019‑present): Flexible multi‑engine OLAP platform – decoupling from Kylin and supporting various engines.
Stage 0 – Hive to MySQL
Data (logs, Dblog, etc.) is ingested via Sqoop or Kafka into HDFS, processed with ETL, then batch‑written daily to MySQL, which serves as the source for reports. This approach is simple and quick to launch but suffers from MySQL’s limited scalability and lack of reusable common capabilities.
MySQL cannot handle massive data volumes; performance degrades after a few million rows.
Development is case‑by‑case, leading to long feature cycles.
Because of these limitations, the platform was not yet a true “platform”.
To address the problems, a dedicated OLAP engine (Kylin) was introduced for large‑scale analytics, and a metric platform was built to provide unified metric definitions and APIs for business units.
Metrics consist of dimensions (e.g., time, location) and measures (e.g., GMV, view count). They abstract the underlying star or snowflake schema.
Stage 1 – Kylin‑based OLAP Platform
The architecture consists of three layers from bottom to top: the OLAP engine layer (Apache Kylin), the metric platform layer (providing unified APIs, metric definitions, and management), and the application layer (visualization products like Odin that consume metrics via the API). Below Kylin are the data‑warehouse layers (ODS, DWD, DWS, OLAP tables).
The metric platform defines metrics such as "Group_View_Count" with dozens of dimensions and a distinct‑count measure on the field show_num . Metric types include atomic, derived, and composite metrics.
Metric queries are sent to the metric platform, which translates them into Kylin SQL, executes the query, and optionally performs post‑processing (e.g., period‑over‑period calculations).
Kylin’s core idea is pre‑computation: users define cubes (star or snowflake schemas) which are built via MR, Spark, or Flink jobs and stored in HBase. Queries are served through a RESTful or JDBC interface that translates SQL to HBase scans. Pre‑computation can cause "dimension explosion"; Kylin provides optimization techniques such as forced dimensions and hierarchical dimensions.
After 2‑3 years of rollout, the metric system supports over 6,600 metrics, >20 M daily calls, and 99.5 % of queries return within 3 seconds.
Key Kylin‑related work includes monitoring, optimization/customization, and integration with BEIKE’s data‑pipeline ecosystem.
During peak periods, the Kylin platform hosts >800 cubes, >300 TB of storage, and >1.6 trillion rows, handling >20 M daily queries.
2 OLAP Engine Selection and Practice
Engine selection focuses on three aspects: data volume (TB‑scale support), query performance (sub‑second latency, high QPS), and flexibility (SQL support, real‑time ingestion, schema changes).
Open‑source OLAP engines fall into categories such as SQL‑on‑Hadoop (MPP vs. batch), and engines with integrated storage (MPP or pre‑computation). BEIKE evaluated Kylin, Druid, ClickHouse, and Doris.
Druid is currently used for offline metrics, handling ~50 % of traffic with 99.7 % of queries returning within 3 seconds. Compared with Kylin, Druid builds cubes faster and consumes less storage, especially for high‑cardinality dimensions.
Benchmark charts show Druid’s cube build time is often >2× faster than Kylin’s, and its data‑inflation ratio is dramatically lower (up to 342× reduction for some tables).
Work around Druid includes monitoring, optimization (precise deduplication, query tuning), and integration with BEIKE’s data‑platform.
ClickHouse and Doris (both MPP with custom storage) are used for real‑time metrics and detailed queries, accounting for 1‑2 % of traffic and still under deep testing.
3 Future Work Plan
The multi‑engine OLAP architecture is now stable. Upcoming tasks include expanding Druid/ClickHouse/Doris adoption, building intelligent routing across engines based on data size and query characteristics, integrating with the Adhoc platform for slow‑query offloading, and enhancing real‑time metric support.
Thank you for listening.
4 Live Replay
Conference agenda and replay links are provided (PC URL and mobile mini‑program QR code).
Beike Product & Technology
As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.