Backend Development 7 min read

Investigation of JSF Thread‑Pool Exhaustion During R2M Redis Upgrade

This article details a step‑by‑step investigation of a JSF thread‑pool exhaustion error that occurred when upgrading the Redis version of JD's internal R2M distributed cache, analyzing stack traces, lock contention, ForkJoinPool behavior, and the eventual remediation steps.

JD Tech

Jul 6, 2024

Investigation of JSF Thread‑Pool Exhaustion During R2M Redis Upgrade

The issue originated when upgrading the Redis version of JD's internal distributed cache service R2M, where a few nodes began reporting the error

RpcException: [Biz thread pool of provider has been exhausted]

. Monitoring showed the problem was isolated to one or two nodes, prompting an immediate shutdown of the affected nodes via the JSF framework.

Log excerpts captured the error:

24-03-13 02:21:20.188 [JSF-SEV-WORKER-57-T-5] ERROR BaseServerHandler - handlerRequest error msg:[JSF-23003] Biz thread pool of provider has been exhausted, the server port is 22003

24-03-13 02:21:20.658 [JSF-SEV-WORKER-57-T-5] WARN BusinessPool - [JSF-23002] Task:com.alibaba.ttl.TtlRunnable - com.jd.jsf.gd.server.JSFTask@0 has been reject for ThreadPool exhausted! pool:80, active:80, queue:300, taskcnt: 1067777

Initial analysis suggested that the JSF thread pool, sized statically at service start, became saturated when incoming traffic exceeded the available threads, leaving no thread to handle new requests.

Further investigation using SGM stack dumps and an online thread‑dump analyzer revealed that many JSF threads were blocked inside JedisClusterInfoCache#getSlaveOfSlotFromDc, which acquires a read lock at method entry. The read lock is paired with a write lock that is held by a periodic topology‑update task.

The topology‑update task acquires the write lock, performs a Redis topology refresh, and then releases it. However, the task did not release the write lock properly, and because the read lock had no timeout, worker threads remained blocked waiting for the read lock.

Additional analysis showed that the application relied on parallelStream().forEach and Caffeine’s asynchronous refresh, both of which default to ForkJoinPool.commonPool(). The common pool size (CPU cores − 1) was insufficient for the workload, causing the worker threads to compete for the same lock and leading to dead‑lock‑like behavior.

Verification confirmed that three ForkJoinPool.commonPool‑worker threads were stuck waiting for the Redis connection lock, while the topology‑updater thread was blocked in the for‑each business logic.

Root cause: improper use of shared thread pools without custom sizing or timeout settings, combined with a write‑lock that was not released, resulted in thread‑pool exhaustion and service disruption.

Remediation steps included synchronizing the topology‑update operation (making it synchronous), configuring dedicated thread pools for Caffeine refresh and parallel streams, and adding proper lock timeout handling.

Key takeaways: when using asynchronous processing in Java, always configure thread‑pool sizes and timeouts; monitor lock usage; and ensure that long‑running tasks do not hold write locks indefinitely.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java Concurrency Redis ThreadPool JSF R2M

Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.