Databases 12 min read

Understanding ClickHouse Distributed DDL Execution: Cases, Principles, and Mitigation Guide

This article analyzes ClickHouse distributed DDL execution by presenting typical failure scenarios, dissecting the underlying Zookeeper‑based workflow, and offering practical mitigation steps to avoid DDL timeouts and improve cluster stability for large‑scale data operations.

DataFunSummit
DataFunSummit
DataFunSummit
Understanding ClickHouse Distributed DDL Execution: Cases, Principles, and Mitigation Guide

The article begins with a typical case where a ClickHouse cluster with three shards and replicated tables experiences DDL timeouts when altering or deleting data across the cluster, causing subsequent DDL tasks to block.

alter table t1 on cluster '{cluster}' modify ttl dt + interval 1 week and alter table t1 on cluster '{cluster}' delete where c1=1 are shown as examples of problematic statements.

Operational responses such as restarting ClickHouse or ZooKeeper, deleting ZooKeeper nodes, detaching/attaching tables, and killing long‑running mutations are listed, highlighting the difficulty of choosing the safest fix.

The article then explains the execution principle: a client sends a DDL request to a coordinator node, which stores the task in ZooKeeper under /distributed_ddl/queue‑xxxx . The task is split into sub‑tasks for each shard and replica, with coordination via ZooKeeper directories active , shards , and finished .

Key steps include creating task directories, waiting for completion signals (default 180 s timeout), and the role of the DDLWorker thread that processes tasks serially on each replica. The article shows how logs and mutations are written to ZooKeeper paths like /log/log‑xxxxxx and /mutations/xxxxxx , and how replicas pull these entries into local queues for execution.

Because DDLWorker runs with a default pool size of 1, tasks are processed sequentially; increasing the pool size can improve concurrency but may break ordering guarantees.

Root causes of DDL blockage are identified as large data modifications (e.g., 90 GB merges) that occupy the single worker thread, causing later DDL statements to timeout.

The mitigation guide advises: (1) clearly define operation scope to reduce I/O, (2) minimize I/O and enable delayed materialization or deletion via parameters like materialize_ttl_after_modify=0 and ttl_only_drop_parts=1 , (3) use partition keys or replace deletes with updates, and (4) monitor the cluster with queries such as show processlist , select * from system.merges , and select * from system.mutations where is_done=0 , killing long‑running mutations when necessary.

Finally, the article mentions a community‑driven lightweight delete project ( GitHub PR #37893 ) that may soon be production‑ready.

performanceBig DataZookeeperClickHouseDatabase OperationsDistributed DDL
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.