Enhancing ClickHouse Resource Isolation with ByteHouse Resource Group
This article explains how ByteHouse extends ClickHouse with a Resource Group mechanism that provides fine‑grained concurrency, memory, and CPU isolation, improving query latency, reducing variance, and increasing cluster stability for large‑scale ad‑tech workloads.
ClickHouse is renowned for its powerful analytical performance, but in ByteDance's large‑scale production use cases it still suffers from limitations such as lack of full upsert/delete support, weak multi‑table joins, degraded availability at large cluster sizes, and no resource isolation.
To address these issues, ByteHouse—a ClickHouse‑based enhanced version—introduces a Resource Group component that partitions resources (concurrency, memory, CPU) into separate groups with parent‑child relationships, allowing queries to be scheduled based on defined rules and resource availability.
Concurrency control : The max_concurrent_queries setting limits the number of simultaneous queries per group. When the limit is reached, additional queries are queued and executed in FIFO order as slots free up, with parent groups sharing quota among children.
Memory control : Each group can define a soft memory limit; queries exceeding this limit are placed in a waiting queue. The system estimates memory usage on query start and later corrects the estimate using the Memory_Tracker during execution.
CPU control : ByteHouse leverages Linux cgroups CPU controller (CFS scheduler) to allocate CPU shares per group. The share of CPU a group can use is proportional to cpu_shares / sum(cpu_shares) , guaranteeing a minimum share and full utilization when only one group is active.
The Resource Group brings several benefits: it significantly improves query experience for high‑priority workloads, reduces query latency variance, prevents OOM‑induced query kills, and enhances overall cluster stability.
Performance tests on the advertising workload show average query latency dropping from 2.3‑14.1 seconds (pre‑deployment) to 0.4‑1.7 seconds after deploying ByteHouse Resource Group, demonstrating the effectiveness of the isolation mechanisms.
ByteHouse is now publicly available with multiple versions to suit different users, and a free trial can be requested via the official website.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.