Backend Development 7 min read

Why Did Our JSF Thread Pool Exhaust? A Deep Dive into Redis Lock Contention

This article analyzes a thread‑pool exhaustion issue in JD's JSF framework during a Redis cache upgrade, detailing log evidence, stack‑trace analysis, lock contention caused by read/write locks and ForkJoinPool usage, and the steps taken to diagnose and resolve the problem.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
Why Did Our JSF Thread Pool Exhaust? A Deep Dive into Redis Lock Contention

1. Problem Background

During an upgrade of the R2M distributed cache (a high‑performance, highly‑available Redis‑based service), upstream calls began failing with the error

RpcException: [Biz thread pool of provider has been exhausted]

. Monitoring showed the issue occurring on only one or two nodes, persisting over time. The problematic nodes were taken offline via JSF (Jingdong Service Framework) to restore service.

2. Error Logs

<code>24-03-13 02:21:20.188 [JSF-SEV-WORKER-57-T-5] ERROR BaseServerHandler - handlerRequest error msg:[JSF-23003] Biz thread pool of provider has been exhausted, the server port is 22003</code><code>24-03-13 02:21:20.658 [JSF-SEV-WORKER-57-T-5] WARN BusinessPool - [JSF-23002] Task:com.alibaba.ttl.TtlRunnable - com.jd.jsf.gd.server.JSFTask@0 has been reject for ThreadPool exhausted! pool:80, active:80, queue:300, taskcnt: 1067777</code>

2. Investigation Steps

JSF allocates a fixed‑size thread pool at startup. When all threads are busy and new traffic arrives, requests cannot be processed, leading to the observed exception. The question was what the JSF threads were doing.

2.1 Print Stack Information

2.2 Analyze Stack

Using an online thread‑dump analyzer ( http://spotify.github.io/threaddump-analyzer/ ), most JSF threads were found stuck in

JedisClusterInfoCache#getSlaveOfSlotFromDc

:

2.3 Examine the Stuck Method

The method acquires a read lock at its entry, while a global read/write lock is declared elsewhere:

It appears a business operation acquires the write lock and fails to release it, and the read lock has no timeout, causing JSF threads to block while waiting for the read lock.

2.4 Business‑Level Analysis of Write‑Lock Usage

Collaboration with the middleware team identified a scheduled topology‑update task that obtains the write lock before execution. The task’s stack trace confirms this:

Code snippets (shown as images) illustrate the lock acquisition.

2.5 Deep Dive into the Root Cause

If

parallelStream().forEach

is used without specifying a custom executor, it defaults to

ForkJoinPool.commonPool

, which creates (CPU cores ‑ 1) threads. Caffeine’s asynchronous refresh also uses this pool when no executor is provided. Consequently, threads attempting to acquire the write lock compete with those waiting for the read lock, leading to deadlock‑like behavior.

2.6 Verification

Three

ForkJoinPool.commonPool‑worker

threads were observed blocked while trying to obtain a Redis connection, all waiting for the read lock:

The local Caffeine cache had no custom thread pool, and the topology‑updater task was stuck inside the

foreach

business logic:

3. Post‑mortem

The issue occurs only under specific load conditions and is rare; thanks to the middleware team for assistance. The topology update will be changed from asynchronous to synchronous.

Java provides many asynchronous capabilities, but using shared thread pools without proper timeout or isolation can cause contention.

Robust monitoring is essential to detect and address such problems promptly.

DebuggingJavaconcurrencyRedisthread poolJSF
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.