Backend Development 14 min read

Investigation of Dubbo Thread‑Pool Exhaustion Caused by Redis Cluster Pipeline Deadlock

The article details how a Dubbo thread‑pool exhaustion incident was traced to a deadlock in Redis Cluster pipeline usage, caused by Jedis pools lacking a borrow timeout, which let threads block indefinitely while holding cross‑node connections, and recommends configuring maxWaitMillis or enlarging the pool to prevent recurrence.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Investigation of Dubbo Thread‑Pool Exhaustion Caused by Redis Cluster Pipeline Deadlock

This article documents a complete troubleshooting process for a Dubbo thread‑pool exhaustion incident. The root cause was identified as a deadlock triggered by using Redis Cluster pipeline without a timeout configuration.

Background

Redis Pipeline is a mechanism for batching commands to reduce network latency. In cluster mode, a JedisClusterPipeline aggregates commands for multiple Redis nodes, which can improve throughput for batch queries such as fetching reservation game information.

The problematic scenario involved batch querying a Redis Cluster using an internal JedisClusterPipeline utility. The code used was:

JedisClusterPipline jedisClusterPipline = redisService.clusterPipelined();
List<Object> response;
try {
    for (String key : keys) {
        jedisClusterPipline.hmget(key, VALUE1, VALUE2);
    }
    response = jedisClusterPipline.syncAndReturnAll();
} finally {
    jedisClusterPipline.close();
}

Fault Record

An alert indicated that the Dubio thread pool was exhausted. Only one machine showed the problem, and its completed‑task count never increased. Monitoring screenshots showed zero request traffic and a hung JVM.

Using arthas , all 400 Dubbo threads were observed in a waiting state. Stack traces revealed that the threads were blocked while waiting for a Redis connection from the Jedis pool.

Analysis – Why Threads Were Waiting

The borrowObject method of GenericObjectPool (used by Jedis) blocks indefinitely when blockWhenExhausted is true and borrowMaxWaitMillis is not set (default -1 ). The relevant source snippet:

public T borrowObject(long borrowMaxWaitMillis) throws Exception {
    ...
    while (p == null) {
        if (blockWhenExhausted) {
            p = idleObjects.pollFirst();
            if (p == null) {
                if (borrowMaxWaitMillis < 0) {
                    p = idleObjects.takeFirst(); // blocks forever
                } else {
                    p = idleObjects.pollFirst(borrowMaxWaitMillis, TimeUnit.MILLISECONDS);
                }
            }
            if (p == null) {
                throw new NoSuchElementException("Timeout waiting for idle object");
            }
            if (!p.allocate()) {
                p = null;
            }
        }
        ...
    }
    return p.getObject();
}

Because the application never set borrowMaxWaitMillis , threads waited indefinitely for a free connection.

Analysis – Why No Connection Was Obtained

Four hypotheses were examined:

Redis was unreachable – disproved by successful ping after the network jitter.

Business code failed to return connections – all pipeline usages called close() in a finally block.

Jedis connection leak – the project used Jedis 2.9.0, which does not exhibit the known leak in 2.10.0.

Deadlock – confirmed as the remaining plausible cause. In cluster pipeline mode, each thread may acquire connections from multiple node pools. If one thread holds a connection to node A and waits for node B while another thread holds node B and waits for node A, the classic “hold‑and‑wait” deadlock condition occurs. Deadlock Proof The author used jstack and jmap to capture thread dumps and heap dumps, then employed MAT (Memory Analyzer Tool) to trace which connection pools each thread was waiting on and which pools it already owned. The steps were: Identify the lock object each Dubbo thread was blocked on (e.g., 0x6a3305858 ). Map the lock to a LinkedBlockingDeque → GenericObjectPool → JedisPool address. Inspect each JedisClusterPipeline instance’s poolToJedisMap to see which pools it currently holds connections for. Analysis showed that all 400 Dubbo threads were waiting for connections from only five of the twelve Redis master nodes, and those five pools (each size 20) were fully occupied by 100 threads. Consequently, the remaining 300 threads were blocked, confirming a deadlock. Conclusion The investigation demonstrates a systematic approach to diagnosing production incidents: capture runtime state, use low‑level tools (arthas, jstack, jmap, MAT), read relevant source code, and apply concurrency theory to pinpoint the root cause. The fix is to configure a timeout for Jedis pool acquisition (set maxWaitMillis ) and/or increase the pool size to avoid the deadlock condition.

JavadeadlockRedisDubboThreadPoolJedis
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.