Big Data 17 min read

Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations

This article describes how Kuaishou leverages Apache Flink for large‑scale real‑time multi‑dimensional analytics, details the architecture of its analytics platform using Kudu storage and KwaiBI, and introduces SlimBase—a lightweight, embedded shared state backend that replaces RocksDB to reduce I/O, latency, and CPU overhead.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations

1. Flink in Kuaishou: Scenarios and Scale Kuaishou uses Flink for short‑video and live‑stream quality monitoring, user growth analysis, real‑time data processing, and live CDN scheduling. The data pipeline ingests DB/Binlog and WebService logs into Kafka, processes them with Flink, and writes results to Druid, Kudu, HBase, or ClickHouse, while also dumping raw data to Hadoop for offline jobs. Typical Flink use cases are split into 80% monitoring, 15% data cleaning/splitting, and 5% business‑specific processing. The Flink cluster consists of about 1,500 nodes handling 30 trillion events per day with peak throughput of ~300 million events/second, deployed on YARN with isolated real‑time clusters.

2. Kuaishou Real-time Multi-dimensional Analysis Platform The platform enables analysts to select up to five dimensions to compute metrics such as PV, UV, new users, and retention in real time. Users configure cube models in the internal KwaiBI tool, which translates the model into Flink jobs that pre‑compute metrics and store results in Kudu for instant dashboard queries. The solution was evaluated against Druid and ClickHouse on computation capability, aggregation strength, query concurrency, and latency, with Flink+Kudu offering low‑latency queries by materializing results directly in Kudu.

3. SlimBase: A More I/O‑Efficient Embedded State Store RocksDB’s checkpointing caused excessive I/O (up to 100% disk usage) and long latency in a real‑time ad‑click join scenario. To address this, Kuaishou explored shared storage, Size‑TieredCompaction, and FIFOCompaction, ultimately choosing to slim down HBase and embed it as a shared state store called SlimBase. The redesign removes client, Zookeeper, and master components, keeping only RegionServer core modules, and adds a lightweight state backend supporting ListState, MapState, ValueState, and ReduceState. The new backend reduces checkpoint/restore latency from minutes to seconds, cuts disk I/O by 66%, write throughput by 50%, and CPU usage by 33%.

4. Testing and Future Plans Benchmarks confirm significant performance gains, and further optimizations are planned, including InMemoryCompaction, prefix Bloom filters, short‑circuit reads, and a FIFOCompaction strategy based on TTL to eliminate disk I/O. The long‑term goal is to replace RocksDB with SlimBase across all Kuaishou Flink workloads.

big dataFlinkreal-time analyticsState BackendKuaishouKuduSlimBase
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.