Enhancing ClickHouse Multi‑Table Join Capability with ByteHouse
This article explains the limitations of ClickHouse for multi‑table joins, describes ByteHouse’s staged execution model, various join strategies (Shuffle, Broadcast, Colocate) and runtime filters, and presents performance benchmarks that show significant speed‑ups over the original ClickHouse engine.
ClickHouse is renowned for its high‑performance single‑table analytics, but in large‑scale production at ByteDance it shows several shortcomings: lack of full upsert/delete support, weak multi‑table join capabilities, reduced availability at scale, and no resource isolation.
To overcome these issues, ByteHouse extends ClickHouse by redesigning the query execution engine to handle complex multi‑table joins through a multi‑stage architecture. Instead of ClickHouse’s two‑stage model, a query is split into several independent Stages that exchange data via three exchange patterns: key‑based shuffle, gather, and broadcast.
The system defines clear module interfaces to minimize coupling and adopts a plugin‑based design for flexible strategy configuration. Based on data size and distribution, ByteHouse currently supports three join implementations:
Shuffle Join – the most generic join, shuffling rows by join key across workers.
Broadcast Join – suitable for joining a large table with a small one by broadcasting the small table to all workers.
Colocate Join – used when both tables share the same key distribution, eliminating data movement.
Join operators are often the most expensive in OLAP workloads. ByteHouse improves them by optimizing hash table implementations, parallelism, and by reducing the amount of data participating in the join. Runtime Filters further prune non‑matching rows on the probe side, especially effective in star‑schema queries.
Performance tests on the SSB 100G benchmark demonstrate the benefits. Without materializing a wide table, ClickHouse 22.2.3.1 frequently fails or exceeds 60 seconds, whereas ByteHouse 2.0.1 consistently returns results within one second. Specific cases show:
Case 1 (Hash Join with a large right table): query time drops from 17.210 s to 1.749 s.
Case 2 (Five‑table join without runtime filter): query time drops from 8.583 s to 4.464 s, with further gains expected when runtime filters are enabled.
These results illustrate that strengthening multi‑table join capabilities allows ByteHouse to support diverse analytical workloads without relying on pre‑flattened wide tables, delivering a more flexible and efficient analysis experience.
ByteHouse’s superior performance is largely attributed to its custom query optimizer, which addresses key limitations of the open‑source ClickHouse engine. Future articles will dive deeper into the optimizer’s design and the user‑visible improvements it brings.
ByteHouse is now publicly available; interested users can request a free trial on the official website.
SQL SELECT sum(LO_REVENUE) - sum(LO_SUPPLYCOST) AS profit FROM customer INNER JOIN (SELECT LO_REVENUE, LO_SUPPLYCOST, LO_CUSTKEY FROM lineorder WHERE toYear(LO_ORDERDATE) = 1997 and toMonth(LO_ORDERDATE) = 1) as lineorder ON LO_CUSTKEY = C_CUSTKEY WHERE C_REGION = 'AMERICA' SQL SELECT D_YEAR, S_CITY, P_BRAND, sum(LO_REVENUE) - sum(LO_SUPPLYCOST) AS profit FROM ssb1000.lineorder INNER JOIN (SELECT C_CUSTKEY FROM ssb1000.customer WHERE C_REGION = 'AMERICA') AS customer ON LO_CUSTKEY = C_CUSTKEY INNER JOIN (SELECT D_DATEKEY, D_YEAR FROM date WHERE (D_YEAR = 1997) OR (D_YEAR = 1998)) AS dates ON LO_ORDERDATE = toDate(D_DATEKEY) INNER JOIN (SELECT S_SUPPKEY, S_CITY FROM ssb1000.supplier WHERE S_NATION = 'UNITED STATES') AS supplier ON LO_SUPPKEY = S_SUPPKEY INNER JOIN (SELECT P_PARTKEY, P_BRAND FROM ssb1000.part WHERE P_CATEGORY = 'MFGR#14') AS part ON LO_PARTKEY = P_PARTKEY GROUP BY D_YEAR, S_CITY, P_BRAND ORDER BY D_YEAR ASC, S_CITY ASC, P_BRAND ASC SETTINGS enable_distributed_stages = 1, exchange_source_pipeline_threads = 32DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.