Root Cause Analysis of Metaspace OOM Triggered by Arthas Trace
Tracing a large method with Arthas caused a Metaspace OOM because constant‑pool rewrites inflated the Non‑Class space, each rewrite copying a massive StackMapTable, but preserving the original constant‑pool layout in Arthas stops the growth, and exposing off‑heap metrics aids production monitoring.
Arthas is an open‑source Java diagnostic tool from Alibaba that can attach to a running JVM without modifying the application. While tracing a large method with many getter/setter calls, the trace command hung and later reported a failure, and the server’s Metaspace OOM alarm fired.
The author reproduced the issue on OpenJDK 8, 12 and 14 and observed that the Non‑Class space of Metaspace grew dramatically while the Class space and loaded class count stayed stable.
Metaspace structure : Metaspace consists of a Class space (holding Klass objects, vtables, itables, etc.) and a Non‑Class space (constant pool, method metadata, JIT data, annotations, …). The problem was traced to the Non‑Class space.
Diagnostic commands used:
jstat -gc <pid>
and on newer JDKs:
jcmd <pid> VM.metaspace
These showed a rapid increase of Metaspace usage after the trace.
Investigation of bytecode enhancement : Arthas enhances bytecode by inserting timing code. The enhancement process creates a new .class file whose constant pool layout changes drastically. The new constant pool contains many more string entries, causing many ldc instructions to be rewritten to ldc_w because their index exceeds max_jubyte (255) . Each rewrite forces the JVM to copy the method’s StackMapTable . The original method’s StackMapTable was about 900 KB, so each rewrite allocated roughly 1 MB of Metaspace. With over a thousand ldc instructions, Metaspace consumption exploded.
Key source snippets:
/*** SpaceManager::get_new_chunk ***/ Metachunk* SpaceManager::get_new_chunk(size_t chunk_word_size) { // ... if (log.is_trace() && next != NULL && SpaceManager::is_humongous(next->word_size())) { log.trace(" new humongous chunk word size ", next->word_size()); } return next; }
// rewrite constant pool references if (!rewrite_cp_refs(scratch_class, THREAD)) { return JVMTI_ERROR_INTERNAL; }
Further analysis of VM_RedefineClasses::load_new_class_versions revealed that the JVM parses the StackMapTable only when bytecode verification is enabled. Disabling verification (using -noverify ) skips parsing, preventing the costly copy and thus avoiding the Metaspace surge, but this is unsafe for production.
Solution : Modify Arthas to preserve the original constant‑pool layout when generating enhanced classes. The changes include keeping the original ClassReader and passing it to ASM’s ClassWriter so that the constant pool is copied unchanged. After rebuilding Arthas with these patches, tracing the same method no longer causes Metaspace growth on both AliJDK and OpenJDK.
The article concludes that Metaspace OOMs are hard to diagnose because they are off‑heap, and suggests exposing detailed off‑heap metrics (e.g., Non‑Class Space size, Class Space size, loaded class count) in production monitoring.
Xianyu Technology
Official account of the Xianyu technology team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.