Diagnosing a Java Application Hang: Thread Dump, GC, and JVM Parameter Analysis
The article recounts a step‑by‑step investigation of an intermittent Java service freeze caused by blocked threads, excessive Metaspace usage, and conflicting JVM flags, showing how thread dumps, GC statistics, and startup‑parameter tweaks revealed a hidden HotSpot bug that was resolved by upgrading the JDK.
Preface – The author, a senior Java engineer at Lianjia, shares a post‑mortem of a mysterious Java application hang where a CPU core spiked to 100% and the process became unresponsive.
On‑site Observation – Initial symptoms suggested a GC issue or a dead loop. Evidence was gathered from GC logs, thread dumps, heap dumps, and resource metrics. Using jstack (without the -l flag) produced a dump showing most threads in BLOCKED state with no business‑logic stack frames, while one thread consumed full CPU.
GC Statistics – jstat output highlighted that the Metaspace column was approaching 100 %, indicating that class metadata was exhausting the allocated space.
JVM Startup Parameters – The problematic JVM was started with HotSpot 8u40 and the following options (illustrated in a screenshot): -XX:MaxMetaspaceSize , -XX:+UseConcMarkSweepGC , -XX:+CMSIncrementalMode , -Xnoclassgc , and others. These flags control Metaspace limits, the CMS collector, and class‑unloading behavior.
Analysis of Class‑Unloading Flags – The article explains the interaction between CMSClassUnloadingEnable , -Xnoclassgc , and the internal ClassUnloading flag. Source‑code excerpts from OpenJDK (e.g., globals.hpp , arguments.cpp , concurrentMarkSweepGeneration.cpp ) show that enabling -Xnoclassgc forces ClassUnloading to false , while CMSClassUnloadingEnable defaults to true in JDK 8, creating a conflict.
Root Cause – The conflict caused the third step of class unloading (pruning dead classes from hierarchy lists) to be skipped, leading to a hidden HotSpot bug that manifested as the observed hang.
Resolution – Upgrading the JDK to a version where the bug was fixed (8u60) eliminated the issue. The fix involved setting both CMSClassUnloadingEnabled and ClassUnloading to false when -Xnoclassgc is used.
Takeaways – Preserve fault evidence before restoring service, avoid using unfamiliar JVM options, keep the JDK patched, and recognize that the sheer number of HotSpot flags makes it easy to create subtle conflicts.
Beike Product & Technology
As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.