Databases 8 min read

How ByteHouse Enhances ClickHouse with Resource Isolation and High Availability

This article explains how ByteHouse, an enhanced version of ClickHouse used at ByteDance, adds full upsert support, multi‑table joins, high‑availability features, and, most importantly, a Resource Group mechanism that provides fine‑grained CPU, memory, and concurrency isolation to improve query performance and stability.

ByteDance Data Platform
ByteDance Data Platform
ByteDance Data Platform
How ByteHouse Enhances ClickHouse with Resource Isolation and High Availability

ByteHouse is ByteDance's internally enhanced version of ClickHouse that addresses several limitations of the open‑source engine.

Missing complete upsert and delete operations.

Weak multi‑table join capabilities.

Degraded availability when the cluster scales large.

Lack of resource isolation.

To overcome these issues, ByteDance launched a five‑part enhancement plan covering upsert support, multi‑table joins, query optimization, high availability, and resource isolation. This article focuses on the resource isolation component.

Resource Group Solution

ByteHouse introduces a Resource Group component that partitions CPU, memory, and concurrency resources into distinct groups with parent‑child relationships, allowing shared resources while enforcing isolation.

Concurrency Control

The

max_concurrent_queries

setting limits the number of simultaneous queries per group. When the limit is reached, additional queries are queued until a running query finishes, after which the earliest queued query is executed. If a child group’s queue is empty, the parent group releases resources, enabling queued queries in sibling groups.

Memory Control

Each group can define a soft memory limit. Queries exceeding this limit are placed in a waiting queue, and the parent group's limit is also considered, allowing memory sharing across groups. Since precise memory‑usage estimation is unavailable, ByteHouse uses an estimate‑plus‑correction approach based on the

memory_tracker

.

CPU Control

ByteHouse leverages Linux cgroups’ CPU controller. CPU shares are allocated according to predefined

cpu_shares

values. The effective CPU proportion for a group is:

cpu_shares / sum(cpu_shares)

when multiple groups are active, or

100%

when only one group runs. Thus the usable CPU range is

[cpu_shares/sum(cpu_shares), 100%]

, guaranteeing a minimum CPU share for each group and high overall CPU utilization.

Impact of Resource Groups

Resource Groups dramatically improve query experience by protecting high‑priority workloads, reducing query latency variance, and preventing out‑of‑memory kills that could destabilize the cluster.

Full‑stack isolation for CPU, memory, and concurrency.

Limiting impact of low‑priority queries.

Mitigating adverse effects of heavy write statements.

Performance tests in ByteDance's advertising business show average query latency dropping from 2.3‑14.1 seconds (pre‑deployment) to 0.4‑1.7 seconds (post‑deployment).

ClickHouseconcurrency controlDatabase PerformanceResource IsolationByteHouse
ByteDance Data Platform
Written by

ByteDance Data Platform

The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.