Big Data 18 min read

Beike OLAP Platform: Druid Adoption, Architecture, Performance Comparison, and Operational Optimizations

The article details Beike's OLAP platform built on Druid, covering its four‑layer architecture, selection criteria, performance comparisons with Kylin, data ingestion workflows, custom improvements for data import and real‑time distinct counting, and operational measures such as caching, dynamic throttling, and HDFS storage optimization.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Beike OLAP Platform: Druid Adoption, Architecture, Performance Comparison, and Operational Optimizations

Introduction Beike, a leading online real‑estate platform in China, processes massive real‑time and offline data streams. To meet diverse analytical needs across business lines, the company evaluated several OLAP engines and selected Druid as a core component of its data platform.

Beike OLAP Platform Overview The platform consists of four layers: an application layer (dashboards and reports), a metric layer (one‑stop metric platform for data‑warehouse engineers), a routing layer (unified query engine handling query translation, caching, and fallback), and the OLAP engine layer (primarily Druid, Kylin, and ClickHouse). Since May 2020 Druid accounts for about 60% of query traffic.

OLAP Engine Selection Strategy The selection criteria focused on PB‑scale data volume, sub‑second response times, high concurrency (average QPS 500‑600, peak 2000), flexible query interfaces, and rapid data ingestion. Compared with Kylin, Doris, and ClickHouse, Druid offered comparable concurrency, native SQL distinct counting, and lower operational cost, leading to its adoption.

Druid vs. Kylin Benchmark Tests showed Druid’s data ingestion time is roughly one‑third of Kylin’s, while query latency is comparable. Druid also demonstrated significantly lower HDFS storage usage and data‑size inflation (1‑3× vs. 18‑100× for Kylin).

Druid Architecture Druid’s architecture includes a query service layer (Broker), a storage layer (hot and cold segments), a cluster management layer (Overlord and Coordinator), and a data ingestion layer handling both batch and real‑time jobs.

Practical Application at Beike Metrics are built via a one‑stop platform that creates models and cubes (similar to Kylin), then automatically launches Hive‑to‑Druid jobs. The system supports hourly, daily, weekly, and monthly offline intervals and allows complex time expressions for historical data re‑processing.

Improvements Tailored to Business Needs Two major enhancements were made: (1) Optimized Hadoop index jobs for faster data import, reducing ingestion time to about one‑third of the original; (2) Added multi‑column real‑time distinct counting for Kafka index jobs using a CommonUnique extension that leverages Snowflake‑style IDs, Redis‑based dictionary storage, and 64‑bit bitmap aggregation.

Cluster Stability Measures To ensure reliable service during peak loads (7 am–12 pm, up to 1200 QPS on Druid, 2000 QPS on the unified SQL layer), Beike implemented three measures: query result caching at the API and SQL layer, dynamic throttling based on historical node CPU load and business‑line priority, and HDFS storage optimizations (segment merging, compact tasks, and retention policies) to reduce the number of small files and directories.

Future Plans The roadmap includes expanding real‑time metrics from ClickHouse to Druid, exploring Spark‑based and parallel index jobs for faster offline ingestion, and using Hive as a global dictionary to improve high‑cardinality distinct counting without excessive GC pressure.

Overall, the sharing demonstrates how Druid can be effectively integrated, tuned, and extended to meet large‑scale, high‑concurrency analytical workloads in a production environment.

performance optimizationReal-time Analyticsdata platformOLAPDruidBeike
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.