Backend Development 6 min read

Root Cause Analysis and Optimization of Long Young GC Times in gRPC/Netty Services

Long Young GC pauses in gRPC/Netty services were traced to Netty’s default thread‑local cache creating many MpscArrayQueue objects, and disabling the cache with the JVM options ‑Dio.netty.allocator.useCacheForAllThreads=false and ‑Dio.grpc.netty.shaded.io.netty.allocator.useCacheForAllThreads=false reduced GC time from up to 900 ms to about 100 ms, stabilizing the service.

HelloTech
HelloTech
HelloTech
Root Cause Analysis and Optimization of Long Young GC Times in gRPC/Netty Services

**Problem Scenario**

During each SOA deployment, a small number of errors occur, mainly on upstream services with short RPC timeout settings (e.g., 300 ms). The issue disappears after the deployment finishes. No new version was released, so middleware changes are unlikely the cause.

**Technology Stack**

The SOA framework uses gRPC for communication, and gRPC relies on Netty underneath.

**Investigation – GC Logs**

GC logs show that the 4th and 5th Young GC cycles take an unusually long time, reaching up to 900 ms in production. The same behavior is reproduced in the test environment, where the 4th Young GC also exceeds 500 ms.

**Investigation – Dump File**

The heap dump reveals that the MpscArrayQueue occupies a large portion of memory.

**Root Cause Analysis**

Netty’s thread‑local cache is enabled by default (property io.netty.allocator.useCacheForAllThreads = true). When enabled, Netty creates a PoolThreadCache for each thread, which in turn constructs many MpscArrayQueue objects. These queues consume a lot of memory, leading to long Young GC pauses.

**Solution**

Disable the thread‑cache by adding the following JVM options:

-Dio.netty.allocator.useCacheForAllThreads=false -Dio.grpc.netty.shaded.io.netty.allocator.useCacheForAllThreads=false

After applying the settings, Young GC time drops to around 100 ms, and the service meets QPS and resource requirements.

**Source Code Insight**

Netty’s source shows the definition of the cache flag and the construction of PoolThreadCache :

private static final boolean DEFAULT_USE_CACHE_FOR_ALL_THREADS; DEFAULT_USE_CACHE_FOR_ALL_THREADS = SystemPropertyUtil.getBoolean("io.netty.allocator.useCacheForAllThreads", true);

Further down, the initialValue() method creates the cache when the flag is true, allocating several arenas and queues. When the flag is false, a minimal cache (all sizes set to 0) is created, avoiding the heavy MpscArrayQueue structures.

@Override protected synchronized PoolThreadCache initialValue() { final PoolArena heapArena = leastUsedArena(heapArenas); final PoolArena directArena = leastUsedArena(directArenas); final Thread current = Thread.currentThread(); // Thread cache switch if (useCacheForAllThreads || current instanceof FastThreadLocalThread) { final PoolThreadCache cache = new PoolThreadCache(heapArena, directArena, tinyCacheSize, smallCacheSize, normalCacheSize, DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL); // schedule trim task ... return cache; } // No caching so just use 0 as sizes. return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0, 0); }

**Conclusion**

Disabling Netty’s thread‑local cache removes the large MpscArrayQueue allocations, significantly shortening Young GC pauses and stabilizing the service.

JavaperformancegRPCNettygc optimizationthread cache
HelloTech
Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.