How to Detect and Fix Netty Direct Memory Leaks in Janus Service
This article explains how a Janus service gateway ran out of memory due to unreleased Netty off‑heap ByteBufs, how enabling Netty's leak detection and adjusting JVM parameters helped reproduce the issue, and how to fix the leak by explicitly releasing the buffers.
Recently the Janus service gateway experienced an OOM crash because Netty could not allocate the required direct memory.
After the issue appeared, we tried load testing to reproduce the exception, but the exception chain was not hit, and logs from Mercury only showed Netty internal classes, indicating that off‑heap memory was not being released.
Following an article by a colleague, we changed the JVM parameter -Dio.netty.leakDetectionLevel to paranoid (default is disable) and performed a POST request with a 100 KB JSON body to trigger YGC while watching the backend logs for the keyword LEAK .
Set the Janus JVM parameter -Dio.netty.leakDetectionLevel to paranoid .
Send continuous POST requests with a 100 KB JSON body to trigger YGC and monitor logs for LEAK entries.
When -Dio.netty.leakDetectionLevel is enabled, Netty checks ByteBuf reclamation after a YGC. If a possible leak is detected, it logs a message containing the keyword LEAK .
After some time the logs finally showed a LEAK entry:
<code>[2017-05-05 19:53:29.013] [4428112630540205003] [ERROR] [Janus-Http-Worker-4-2] [io.netty.util.ResourceLeakDetector] >>> LEAK: ByteBuf.release() was not called before it's garbage-collected. ...</code>Tracing the LEAK log led to the following problematic code:
<code>@Override
public String getBodyStrForPost() {
return httpRequest.content().toString(CharsetUtil.UTF_8);
}</code>In this code httpRequest is a io.netty.handler.codec.http.FullHttpRequest whose content() returns a Netty‑allocated off‑heap ByteBuf . The ByteBuf was referenced by external methods and never released, causing accumulation until OOM.
The fix is to use Netty’s ReferenceCountUtil to explicitly release the off‑heap memory.
Memory leaks mainly affect pooled ByteBuf objects. If release() is not called before the JVM GC reclaims the object, the underlying DirectByteBuffer or byte[] is not returned to the pool, causing the pool to grow. Non‑pooled ByteBufs will eventually be reclaimed.
Netty provides a leak detection mechanism that samples about 1 % of allocated ByteBufs. When a leak is detected, it prints a message such as:
LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetectionLevel=advanced' or call ResourceLeakDetector.setLevel() .
Further details are available in Netty’s reference‑counted objects documentation.
Vipshop Quality Engineering
Technology exchange and sharing for quality engineering
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.