Big Data 14 min read

Building a Sub‑Second Response Lakehouse Platform with Apache Iceberg at Bilibili

This article details Bilibili's implementation of a sub‑second response lakehouse platform using Apache Iceberg, covering background challenges, query acceleration techniques such as multi‑dimensional sorting, indexing, cube pre‑aggregation, and intelligent automated optimizations via the Magnus service, and reports current production metrics.

DataFunSummit

Jun 13, 2023

Building a Sub‑Second Response Lakehouse Platform with Apache Iceberg at Bilibili

Background: Bilibili built a lake‑warehouse integrated platform on Apache Iceberg to address Hive’s performance, complexity, data silo, and latency issues, aiming for interoperable access, fast interactive queries, and low user barriers.

Architecture: Data is stored as Iceberg tables on HDFS, ingested via Flink, Spark, or Java APIs; Magnus continuously optimizes tables (sorting, indexing, cube building). Alluxio provides caching, and Trino serves interactive queries, with fallback to ClickHouse or Elasticsearch for millisecond‑level latency.

Iceberg Table Structure: Iceberg manages file‑level metadata with snapshots and manifests, offering an open storage format that facilitates extensions.

Query Acceleration: Multi‑dimensional sorting (Hibert Curve preferred over Z‑ORDER) reduces scanned files; file‑level indexes (BloomFilter, Bitmap, BloomRF, Token/N‑gram variants) further prune data. Pre‑computed aggregates (Cubes) are generated per file and stored in manifests, supporting global merge at query time. Star‑Tree indexes enhance Cube performance for varied dimension combinations.

Intelligent Optimization (Magnus): The service automatically monitors Iceberg writes, triggers Spark jobs for sorting, indexing, and Cube creation, visualizes table metadata, and provides recommendation based on query logs, table statistics, and user‑defined preferences.

Current Status: The platform serves BI reports, metric services, A/B testing, audience selection, and log analysis. Iceberg tables total ~5 PB with a daily growth of 75 TB; Trino handles ~200 k queries per day, with 95th‑percentile latency around 5 seconds, targeting sub‑second to 10‑second response times.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Iceberg Query Acceleration Lakehouse Cube Magnus

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.