Backend Development 13 min read

Root Cause Analysis of Memory Leak and High Latency in a Netty‑Based Real‑Time Risk Control System Using JDK 17 ZGC

This article investigates the severe memory growth and latency spikes observed when synchronizing data across data centers in a Netty‑driven online computation service, analyzes the impact of JDK 17 ZGC and direct‑buffer allocation, and presents the debugging steps, source‑code insights, and configuration changes that ultimately resolved the issue.

JD Retail Technology

Sep 14, 2023

Root Cause Analysis of Memory Leak and High Latency in a Netty‑Based Real‑Time Risk Control System Using JDK 17 ZGC

The Tianwang risk‑control system is an in‑memory, high‑throughput online computation service built on Netty for TCP communication between client and server. Initial optimizations achieved >20 wqps per core, but under a cross‑data‑center test the server exhibited rapid memory growth, persistent 20% CPU usage, and frequent GC pauses.

To address the latency problem, the team upgraded to JDK 11+ ZGC and later JDK 17, noting that ZGC can reduce pause times to sub‑millisecond levels. However, the issue persisted, prompting a deeper investigation.

Investigation Steps

Enabled Netty leak detection (PARANOID) – no leak logs were produced.

Observed that WriteTask objects accumulated in Netty's MpscUnboundedArrayQueue, causing memory bloat.

Compared JDK 8 and JDK 17 behavior; the problem disappeared on JDK 8.

Debug logs revealed a critical message about the direct‑buffer constructor being unavailable:

[2023-08-23 11:16:16.163] DEBUG [] - io.netty.util.internal.PlatformDependent0 - direct buffer constructor: unavailable: Reflective setAccessible(true) disabled

Source‑code analysis showed Netty allocates direct memory via PooledByteBufAllocator. When PlatformDependent.useDirectBufferNoCleaner() returns false (the default on JDK 17 without special JVM flags), Netty falls back to ByteBuffer.allocateDirect, which triggers synchronous System.gc() and can block EventLoop threads when direct memory is exhausted.

Key code excerpts:

protected PoolChunk<ByteBuffer> newChunk() {
    // critical code
    ByteBuffer memory = allocateDirect(chunkSize);
}

PlatformDependent.useDirectBufferNoCleaner() ?
    PlatformDependent.allocateDirectNoCleaner(capacity) :
    ByteBuffer.allocateDirect(capacity);

if (maxDirectMemory == 0 || !hasUnsafe() || !PlatformDependent0.hasDirectBufferNoCleanerConstructor()) {
    USE_DIRECT_BUFFER_NO_CLEANER = false;
} else {
    USE_DIRECT_BUFFER_NO_CLEANER = true;
}

On JDK 9+ the constructor java.nio.DirectByteBuffer(long, int) is only accessible when the JVM is started with -io.netty.tryReflectionSetAccessible and the module is opened via --add-opens=java.base/java.nio=ALL-UNNAMED. Without these flags, the constructor is unavailable, leading to the problematic allocation path.

The root cause was identified as the EventLoop’s WriteTask being blocked while waiting for direct‑memory allocation, causing a backlog of unflushed entries and massive memory consumption.

Solution steps included:

Adding a connection pool and random channel selection for cross‑data‑center sync to increase parallelism.

Enabling the required JVM flags ( -io.netty.tryReflectionSetAccessible and --add-opens=java.base/java.nio=ALL-UNNAMED) so Netty can use allocateDirectNoCleaner.

Monitoring non‑heap memory accurately and adding write‑and‑flush error listeners to detect OutOfMemoryError early.

Additional reflections highlighted the importance of proper back‑pressure handling (low/high watermarks) and the need to align monitoring metrics with actual direct‑memory usage.

Overall, the case demonstrates how JVM version differences, Netty’s memory allocation strategy, and missing JVM options can combine to produce severe latency and memory‑leak‑like symptoms in high‑throughput backend services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance zgc Netty MemoryLeak jdk17 DirectMemory

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.