Design and Evolution of Bilibili's Billions 3.0 Log Platform: A Lakehouse Architecture with ClickHouse, Iceberg, and Trino
Bilibili evolved its log platform from ClickHouse‑based Billions 2.0 to Billions 3.0 lakehouse using Iceberg, HDFS, Trino, retaining ClickHouse for acceleration; this reduces storage cost by over 20%, improves observability, solves the compute‑storage mismatch, adds flexible indexing, and supports complex ETL while staying open‑source.
The article presents the background and motivations for evolving Bilibili's log system from the ClickHouse‑based Billions 2.0 solution to the new Billions 3.0 architecture. Although Billions 2.0 reduced storage cost by 60%, several issues remained: high storage cost due to double‑replication, limited troubleshooting capability, the mismatch between compute and storage in a Share‑Nothing ClickHouse design, and costly ETL between ClickHouse and external big‑data systems.
Key challenges addressed:
Optimizing storage cost.
Improving log‑troubleshooting (observability) capability.
Handling the compute‑storage integration dilemma.
Supporting complex ETL and data‑processing requirements.
Increasing overall resource utilization.
Industry research: The team evaluated OpenSearch, ClickHouse, Loki, and proprietary solutions, distinguishing between "store‑compute integrated" and "store‑compute separated" approaches. OpenSearch and Loki were deemed unsuitable for Bilibili’s scale and open‑source requirements.
Billions 3.0 architecture: The new platform adopts a lake‑house model built on Iceberg table format, HDFS storage, and Trino as the default query engine, while retaining ClickHouse as an access‑acceleration layer. The architecture consists of:
Log‑agent (data collection, supports OTEL and multiple formats).
Log‑gateway (kafka‑proxy) for routing to different Kafka clusters.
Data pipeline (Kafka clusters for buffering).
Processing layer (log‑consumer written in Go for simple cases, Flink jobs for complex scenarios).
Log engine (ClickHouse + Iceberg + HDFS + Trino).
Query gateway (unified DSL/SQL interface).
Unified access (API, tenant management, monitoring).
Storage layer decisions: Iceberg was chosen as the table format for its industry acceptance and schema‑evolution support. HDFS was selected over object storage for its larger capacity and tighter integration with Bilibili’s existing big‑data ecosystem. Trino was preferred as the query engine because of its close collaboration with Iceberg, strong performance on log workloads (e.g., 1.4 trillion rows in 20 seconds), and containerized deployment.
Log table design: Tables are partitioned by day, with three field categories: common fields (e.g., timestamp, app_id), log_msg (text for full‑text search), and private business‑specific fields stored as Map/JSON.
Indexing strategy:
Forward indexes: time partitioning, Iceberg _timestamp sorting, MinMax metrics for Data Skipping, BloomFilter for low‑cardinality fields, and BloomRangeFilter for range queries.
Reverse indexes: TokenBloomFilter (lightweight Bloom filter on tokenized log_msg ) and TokenBitMap (Lucene‑based token dictionary + bitmap for precise row‑level skipping). TokenBitMap offers better performance on medium/low frequency terms but consumes more storage; therefore it is applied selectively.
Performance evaluation: Using a 330 GB ORC log dataset, TokenBloomFilter generated 2.1 GB of index, while TokenBitMap generated 76.6 GB. Query latency and scanned data volume were significantly lower for TokenBitMap on low‑frequency terms.
Additional innovations:
Map/JSON expression‑based indexes for dynamic fields.
Computation push‑down in log‑agent to reduce backend load.
Flexible consumption scheduling across high‑priority, dedicated, and general Kafka clusters.
Integration with Bilibili’s big‑data stack via Iceberg, enabling seamless batch and streaming processing.
Lightweight log clustering using a fixed‑depth parsing tree, tokenization, and pattern merging to provide fast, stable clustering results for anomaly detection.
Overall benefits: The evolution lowered storage cost by at least 20%, improved system stability, reduced mean time to recovery (MTTR) through one‑stop analysis, clustering, and alerting, and kept the architecture open for future extensions.
Future work: Plans include ClickHouse multi‑cluster smooth splitting, enhanced log insight capabilities, tighter integration with OpenTelemetry, overseas cloud deployment, unified observability platform support, Iceberg meta‑service and index‑service research, more elastic data pipelines, and WebAssembly‑based dynamic operator push‑down in log‑agent.
For collaboration, interested engineers are invited to apply via the provided email address.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.