ClickHouse Projection: Design, Implementation, and Production Performance
This article presents an in‑depth overview of ClickHouse Projection, covering its background, definition, practical use cases, underlying architecture, query analysis, consistency guarantees, performance comparisons, and real‑world production results, highlighting how it enhances OLAP workloads while maintaining strong data consistency.
The presentation, based on Zheng Tianqi's talk at the Kuaishou Big Data Architecture Exchange, introduces the latest ClickHouse contribution—Projection—detailing its motivation, design, and impact on large‑scale OLAP workloads.
ClickHouse Background : ClickHouse originated from a Russian web‑analytics system, offering columnar storage, vectorized execution, and a loosely coupled P2P distributed architecture, making it a popular open‑source OLAP engine.
Projection Concept : Inspired by Vertica, a Projection is a set of columns stored in a specific order or pre‑aggregated form, defined via ALTER statements. It can be normal (reordered columns) or aggregate (pre‑aggregated data), and supports lazy materialization.
Use Cases : Examples include accelerating queries on a video_log table by reordering data on device_id , defining aggregate Projections for hourly domain‑level metrics, and simplifying ETL pipelines by eliminating separate materialized view tables.
Implementation Details : A Projection consists of three components—definition, storage, and query analysis. It is stored as a sub‑Part alongside the original Part, ensuring partition pruning and consistency during merges and mutations. Query analysis automatically selects the optimal Projection without requiring query rewrites.
Consistency Guarantees : INSERT operations propagate data to all related Projections, SELECT queries read from both original and Projection parts with atomic merges, and MUTATION triggers re‑materialization of affected Projections.
Feature Comparison and Production Impact : Projections improve query latency (up to 150× faster in some cases), reduce storage overhead (≈40% for normal Projections, negligible for aggregate), and enable high‑concurrency dashboard rendering. Limitations include part‑level granularity, inability to span parts, and lack of join support.
Conclusion : Projections act as a production‑ready materialized view mechanism for ClickHouse, offering strong consistency, automatic optimization, and significant performance gains for OLAP workloads.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.