ZSTD Compression and GC Optimization in Java Netty Backend
Switching a Java Netty gateway from GZIP to ZSTD compression using zstd‑jni doubled GC time and introduced heap and Netty off‑heap memory leaks, which were resolved by employing the library’s direct off‑heap API with a NoFinalizer compressor, promptly releasing ByteBufs, avoiding finalize(), and adopting jemalloc to reduce fragmentation.
To reduce bandwidth costs, the team replaced GZIP with ZSTD compression in a Java gateway SDK using the zstd-jni library, which calls the native ZSTD C++ library.
Observed issues during performance testing:
GC time doubled while GC count remained unchanged.
Process memory leaks leading to OOM Killer termination.
Netty off‑heap (DirectByteBuf) memory leaks introduced after fixing issue 1.
GC time‑doubling analysis : ZSTD compression is performed via JNI, which copies data from the Java heap to off‑heap memory and back. This introduces additional heap allocations (original data + compressed data) and off‑heap buffers, and the JNI call interferes with GC efficiency. The presence of many Finalizer objects that reference large objects further prolongs GC cycles.
Solution for GC issue : Use the direct off‑heap compression API provided by zstd-jni to write compressed data directly from off‑heap, releasing the original heap buffer early. Switching to a NoFinalizer compressor eliminated the GC time increase.
Finalizer problem : The application created many Finalizer objects (e.g., ZstdJNIDirectByteBufCompressor , DefaultInvocation ) that retained large objects, causing extra memory usage and longer GC pauses. The article explains the JVM finalize mechanism, its overhead, and why it is deprecated (JDK 18+). Recommended practice is to avoid finalize() entirely.
Netty ByteBuf memory leak : Under high QPS, the channel is deregistered, handlers are marked REMOVE_COMPLETE , and subsequent writes bypass the custom handler that would have released the ReferenceCounted ByteBuf . This leaves the off‑heap buffer unreleased, causing a leak. The stack trace of the leak is shown below:
setRemoved:911, AbstractChannelHandlerContext (io.netty.channel)
callHandlerRemoved:950, AbstractChannelHandlerContext (io.netty.channel)
callHandlerRemoved0:637, DefaultChannelPipeline (io.netty.channel)
... (full stack omitted for brevity) ...
close:92, DefaultHttpStream (com.alibaba.xxx.xxx.xxx.inbound.http)
onRequestReceived:111, DefaultHttpStreamTest$getHttpServerRequestListener$1 (com.alibaba.xxx.xxx.xxx.inbound.http)
...Leak mitigation :
Check channel.isActive() before writing.
Send objects that implement ReferenceCounted so Netty can release them automatically.
Limit the scope of ByteBuf usage and move compression logic to the Netty layer when possible.
Off‑heap memory leak investigation : The application uses two kinds of off‑heap memory – JVM off‑heap (mainly Netty) and native memory allocated by ZSTD. Profiling showed that native ZSTD memory dominates when compression is enabled. No native leaks were detected, but the memory usage pattern differed between glibc’s ptmalloc and jemalloc . Switching to jemalloc reduced fragmentation and overall memory pressure.
Take‑aways :
Avoid finalize() and large object references in finalizers.
Prefer direct off‑heap operations and release buffers promptly.
Select an appropriate memory allocator (e.g., jemalloc) for high‑throughput services.
Implement defensive checks (channel active, ReferenceCounted) to prevent Netty leaks.
DaTaobao Tech
Official account of DaTaobao Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.