Evolution of Beike's OLAP Platform Architecture: From Hive‑MySQL to Multi‑Engine Support
This article reviews the evolution of Beike's OLAP platform—from the early Hive‑to‑MySQL stage, through a Kylin‑based architecture, to a flexible multi‑engine solution—detailing the design choices, metric system, engine selection criteria, encountered challenges, and future development plans.
With the continuous growth of big data and the rise of digital transformation, the demand for OLAP analysis has become increasingly urgent. Companies of all sizes are actively exploring OLAP engine selection and platform architecture. This talk shares the evolution of Beike's OLAP platform, the reasons behind each architectural change, and future directions.
Agenda
OLAP platform architecture evolution
OLAP engine selection
Future work plan
1. OLAP Platform Architecture Evolution
The platform can be divided into three stages:
Stage 0 (2015‑mid‑2016): Hive → MySQL – Data from logs and DBs is ingested via Sqoop or Kafka into HDFS, processed with ETL, and then batch‑written to MySQL for reporting. This simple architecture can be built quickly by junior engineers but suffers from poor query performance, heavy reliance on OLTP‑type MySQL, and a lack of reusable, platform‑wide capabilities.
Stage 1 (2016‑early‑2019): Kylin‑based OLAP – Introduced Apache Kylin as the sole OLAP engine, added a metric platform that provides a unified API, metric definitions, and dimension management, and built the application layer (e.g., Odin visualization) on top of the metric API.
Stage 2 (2019‑present): Multi‑engine support – Decoupled the OLAP engine from the platform, allowing plug‑in engines such as Druid, ClickHouse, and Doris while keeping the metric API unchanged.
Stage 0 Details
The initial architecture is straightforward: data is collected, stored in HDFS, transformed, and finally written to MySQL. Its main characteristics are simplicity, low development cost, poor reporting performance (all data resides in an OLTP database), and a "smoke‑stack" development model where each business need results in a custom implementation, leading to duplicated effort and limited platformization.
Stage 1 – Kylin‑Based Architecture
The platform consists of three layers:
OLAP Engine Layer : Apache Kylin (only Kylin supported at this stage).
Metric Platform Layer : Provides a unified API, metric definitions, and dimension management. Metrics are defined with names (e.g., show_count_group ), supported dimensions (e.g., branch code, region code), and calculation formulas.
Application Layer : Visualization tools such as Odin and other data‑product applications that consume metrics via the API instead of issuing raw SQL against Kylin.
Metrics are categorized into three types:
Atomic metrics – basic measurements.
Derived metrics – built on existing metrics with additional filter conditions.
Composite metrics – created by arithmetic operations on other metrics.
Metric queries are expressed as JSON parameters (e.g., startDate, endDate, filters, pagination, and whether to compute period‑over‑period). The metric platform translates these parameters into Kylin SQL, executes the query, and optionally performs post‑processing such as ratio calculations.
2. Kylin Selection and Overview
Beike chose Kylin because it meets three core requirements: support for hundred‑billion‑row datasets, fast response time, and high concurrency. Kylin’s architecture includes:
Metadata : Manages cube definitions, dimensions, and measures.
Cube Build Engine : Performs pre‑computation using MapReduce, Spark, or Flink and stores results in HBase.
Query Engine : Exposes REST, JDBC, and ODBC interfaces; converts user SQL into queries against the pre‑computed cubes.
Kylin’s pre‑computation approach solves large‑scale analytical queries but introduces the "dimension explosion" problem, which Kylin mitigates with various optimization techniques.
3. Problems Encountered with Kylin
Limited number of supported dimensions per metric, causing developers to split metrics and making maintenance difficult.
Long cube build times, especially as data volume grows, delaying metric delivery.
Low flexibility: any dimension change requires a full cube rebuild.
Performance tuning challenges due to HBase row‑key design; many data‑warehouse engineers lack deep HBase expertise.
Real‑time metric support was absent until Kylin 3.0.
These issues stem from Kylin’s full pre‑computation model, which is costly in both time and storage.
4. Stage 3 – Multi‑Engine OLAP Platform
To address the limitations, the platform was redesigned to support multiple OLAP engines. The new architecture adds a Query Engine Layer that abstracts engine‑specific query languages and routes requests to the appropriate engine (Kylin, Druid, ClickHouse, Doris). Cube management is moved to the metric platform, allowing dynamic binding of cubes to any engine.
Key changes include:
Unified cube definition and management, decoupled from Kylin.
Standardized query interface that converts metric API calls into engine‑specific queries.
Engine‑specific semantics: Druid builds wide tables, ClickHouse/Doris keep relational joins, each optimized for its storage model.
A one‑stop metric development tool (VILI) was built to handle data‑warehouse planning, cube modeling, metric definition, and composite metric processing.
5. Engine Selection and Practice
When selecting an OLAP engine, three factors are considered: data volume (TB‑scale), query performance (sub‑second latency and high QPS), and flexibility (SQL support, real‑time ingestion, schema changes). Open‑source engines were evaluated and grouped by architecture (MPP, batch, pre‑computation). Beike focused on engines with built‑in storage: Kylin, Druid, ClickHouse, and Doris.
Comparative tests showed Druid’s cube build time and storage footprint are significantly better than Kylin’s, especially for high‑cardinality dimensions. Druid now handles roughly 50 % of platform traffic with 99.7 % of queries returning within 3 seconds.
ClickHouse and Doris are used for real‑time metrics and detailed queries, accounting for 1‑2 % of traffic, and are still under evaluation.
6. Future Work
Promote the use of Druid, ClickHouse, and Doris, and further improve their monitoring and performance.
Implement intelligent routing between engines based on data size and query characteristics.
Integrate the OLAP platform with the Adhoc platform to offload heavy queries.
Enhance real‑time metric support across all engines.
In summary, the OLAP platform has evolved from a simple Hive‑MySQL pipeline to a sophisticated, plug‑in‑based architecture that can flexibly leverage multiple analytical engines to meet diverse business needs.
Thank you for listening.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.