Databases 5 min read

Resolving Stuck Distributed DDL in ClickHouse: Diagnosis and Workaround

An online ClickHouse 10-node cluster experienced a distributed DDL task that hung for over 100 hours, blocking subsequent operations; the article explains the FIFO execution model, diagnosis steps, and a workaround using table renaming to unblock the queue, recommending version upgrade.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Resolving Stuck Distributed DDL in ClickHouse: Diagnosis and Workaround

Background: An online ClickHouse cluster with 10 nodes (5 shards × 2 replicas) running version 19.7.3 was used to create a distributed table. Nine nodes succeeded, but one node returned an error indicating the DDL task exceeded the timeout.

Code: 159. DB::Exception: Received from 127.0.0.1:9000. DB::Exception: Watching task /clickhouse/task_queue/ddl/query‑0003271440 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished hosts (0 of them are currently active), they are going to execute the query in background.

Diagnosis: The problematic node showed a long‑running distributed DDL in its process list, lasting over 100 hours and blocking subsequent tasks. ClickHouse stores distributed DDL tasks in ZooKeeper under /clickhouse/task_queue/ddl and executes them FIFO per node, so a single stuck task blocks the queue.

select * from zookeeper where path='/clickhouse/task_queue/ddl' order by ctime desc\G

Attempts to kill the task or restart the instance failed; even a kill ‑9 left the task lingering. The workaround was to rename the affected table, which caused the hanging DDL (an OPTIMIZE TABLE operation) to finish, allowing the queued tasks to proceed. After the queue cleared, the table was renamed back.

rename table log_catch_data_local to log_catch_data_local1;

Summary: ClickHouse’s distributed DDL runs serially per node in FIFO order, so any DDL that hangs blocks all later DDLs. Renaming the table can bypass a stuck OPTIMIZE TABLE DDL; for schema changes, rename then apply local DDL. If necessary, drop the local table on the problematic node and let it be recreated from replicas. Upgrading from the outdated 19.x series to 20.x is also recommended.

PerformanceSQLZookeeperClickHouseDatabase TroubleshootingDistributed DDL
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.