Backend Development 13 min read

Why Our Redis Cluster Pipeline Deadlocked: Thread Locks Explained

This article walks through a production incident where a Redis Cluster pipeline caused Dubbo threads to block and eventually deadlock, detailing the root‑cause analysis, code inspection, and verification steps using jstack, jmap, and MAT to confirm the deadlock and propose fixes.

Efficient Ops

Dec 3, 2023

Why Our Redis Cluster Pipeline Deadlocked: Thread Locks Explained

1. Background

Redis Pipeline is an efficient batch‑command mechanism that reduces network latency and improves read/write throughput. Redis Cluster Pipeline extends this to a Redis Cluster, packaging multiple operations and sending them to several nodes at once.

The project uses a pipeline to batch‑query reservation game information from a Redis Cluster via an internal JedisClusterPipeline utility.

2. Incident Record

An alert indicated that a Dubbo thread pool was exhausted. Only one machine showed the problem, and the number of completed tasks never increased.

Monitoring revealed that request counts dropped to zero, confirming the machine had hung. Arthas showed all 400 Dubbo threads in a waiting state.

3. Fault Analysis

3.1 Threads waiting for a connection

The thread stack traces showed they were blocked inside

org.apache.commons.pool2.impl.GenericObjectPool#borrowObject(long)

. Because the pool’s blockWhenExhausted default is true and borrowMaxWaitMillis was not set (default -1), threads waited indefinitely for an idle connection.

public T borrowObject(long borrowMaxWaitMillis) throws Exception {
    // ...
    while (p == null) {
        if (blockWhenExhausted) {
            p = idleObjects.pollFirst();
            if (p == null) {
                if (borrowMaxWaitMillis < 0) {
                    p = idleObjects.takeFirst(); // blocks forever
                } else {
                    p = idleObjects.pollFirst(borrowMaxWaitMillis, TimeUnit.MILLISECONDS);
                }
            }
            // ...
        }
    }
    return p.getObject();
}

Since the business code did not set borrowMaxWaitMillis, threads kept waiting for a connection.

3.2 Threads unable to obtain a connection

Two possibilities were considered: inability to create a Redis connection, or all connections in the pool being occupied. Network jitter was observed, but the problematic machine could still connect to Redis, ruling out the first case.

Connection‑leakage was examined; the project uses Jedis 2.9.0, which does not exhibit the known leak in version 2.10.0.

3.3 Potential deadlock

Without a timeout, pipeline mode can cause a deadlock when threads acquire connections from multiple pools in different orders (the classic “hold‑and‑wait” condition). In the example, four threads each need connections from two Redis nodes; opposite acquisition order leads to circular waiting.

4. Deadlock Proof

4.1 Identify which pool each thread is waiting on

Using jstack and jmap, the lock address each thread waited for was extracted (e.g., thread 383 waiting on 0x6a3305858). MAT was then used to trace the lock back to a specific JedisPool instance.

4.2 Identify which pools each thread currently holds

MAT searched for all JedisClusterPipeline objects (one per Dubbo thread). The poolToJedisMap field revealed which connection pools each pipeline held connections from.

4.3 Analyze deadlock conditions

Out of 12 Redis master nodes, all 400 Dubbo threads were waiting on only five connection pools, each configured with a size of 20. Those five pools already had 100 connections occupied, leaving no free connections for the remaining threads, confirming a deadlock.

5. Summary

The article demonstrates a systematic approach to diagnosing a production failure: capturing heap and thread dumps, using Arthas for live inspection, reading source code to understand blocking behavior, forming hypotheses, and finally confirming a deadlock with MAT by correlating waiting locks and held connections. It highlights the importance of configuring connection‑pool timeouts and sizing pools appropriately to avoid similar deadlocks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java performance deadlock Redis Jedis Troubleshooting pipeline

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.