Big Data 24 min read

Kylin at Autohome: Development History, Deployment Practices, Optimizations, and Future Roadmap

This article details Autohome's use of Apache Kylin as its core OLAP engine, covering its architecture, large‑scale Cube deployment, real‑world business applications, a series of performance and operational optimizations, cluster upgrade experiences, and upcoming plans for real‑time OLAP and cloud‑native evolution.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Kylin at Autohome: Development History, Deployment Practices, Optimizations, and Future Roadmap

1. Kylin Introduction and Architecture Apache Kylin is a scalable, ultra‑fast big‑data analytical data warehouse with a friendly web UI, interactive query capability, standard SQL interface, and JDBC support. It follows a pre‑computed multi‑dimensional cube model and, since version 3.0, also supports real‑time OLAP. The architecture consists of data sources (Hive tables, Kafka streams, relational databases), a REST API layer, a SQL parsing engine based on Apache Calcite, a routing layer that directs queries to pre‑built Cubes, metadata stored in memory, and a build engine that generates HBase tables.

2. Cube Pre‑computation Principle A Cube represents all possible dimension combinations of a source table; each combination is a Cuboid. Kylin encodes dimension values into dictionaries to reduce storage and accelerate queries. RowKey design reflects dimension ordering, and aggregated values (e.g., Count) are stored per Cuboid.

3. Current Usage at Autohome Kylin serves multiple business lines, supporting traffic, lead, user‑behavior, and recommendation‑effect analysis. Autohome operates over 500 Cubes, storing ~300 TB, with ~1.6 万 Segments, each Cube handling up to 31 dimensions and over a trillion rows. Query 95th‑percentile latency stays under 2 seconds across ~12 万 HBase regions.

4. Development Timeline • 2016: Initial evaluation (Kylin 1.5.4). • 2017: Deep adoption for the Car‑Smart‑Cloud project, upgrade to 1.6, added monitoring. • 2018: Model optimization, stability improvements, HBase disaster‑recovery. • 2019: Upgrade to 2.6.3, migration to Spark build engine, integration with internal AutoBI BI tool.

5. Application in Commercial Data Products The strategic product “Car‑Smart‑Cloud” leverages Kylin for massive user‑behavior data, providing a UVN user‑segmentation model and a marketing funnel analytics platform. Kylin enables rapid, multi‑dimensional queries required by these commercial services.

6. Optimization Practices • Cuboid pruning and maximum dimension combination control to limit the number of Cuboids built. • Segment‑level dictionary filtering to avoid loading unnecessary high‑cardinality dictionaries. • Hybrid Cubes for mixing precise and approximate distinct‑count measures. • Scheduling performance improvements by skipping already‑successful jobs. • Temporarily disabling Cubes during incremental builds to prevent premature routing. • Monitoring via native Kylin metrics, Prometheus, Grafana, and automated health‑check scripts that restart services on threshold breaches. • HBase master‑slave replication (T+1 backup) with custom bulk‑load‑compatible backup scripts.

7. Cluster Upgrade Experience Upgrading from Kylin 1.6 to 2.6 required parallel clusters, metadata synchronization, segment reconstruction, and SQL replay for validation. The migration leveraged KylinSide tools for automated build, replay, and reporting, ensuring data consistency and minimal service interruption.

8. Future Planning Autohome aims to adopt real‑time OLAP capabilities and move Kylin to a cloud‑native architecture in the upcoming 4.0 release, simplifying streaming receiver management and improving operational efficiency.

Cloud Nativeoptimizationbig dataOLAPcluster managementReal-time OLAPKylin
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.