Master JVM Memory Troubleshooting: A Step‑by‑Step Guide
This comprehensive guide walks you through systematic JVM memory issue diagnosis, covering initial data collection, analysis of heap, metaspace, direct memory, stack problems, and practical command‑line tools, while offering actionable tips and real‑world examples for effective troubleshooting.
Purpose
Goal: systematically organize troubleshooting steps for JVM memory issues and collect community cases for future reference.
Basic Principles
Cookbook style: focus on practical steps rather than theory; JVM memory model is placed in extended reading. The main text provides troubleshooting ideas and suggestions, while tools must be Alibaba Cloud PaaS products or out‑of‑the‑box utilities. The process follows a step‑by‑step flow to keep reasoning coherent. Complex analysis is delegated to GPT or extended reading.
Scope
Applicable to JDK8‑JDK11; other versions share the same troubleshooting flow.
Step 1: Receive Issue
1.1 Basic Information Collection
Identify the phenomenon, recent changes, and monitoring data.
Is memory constantly high, slowly increasing, or does the process dump?
Which node, any recent deployments, log traces?
1.2 Basic Judgment
Distinguish between business‑no‑impact and business‑impact scenarios and suggest appropriate actions such as scaling, periodic task checks, replication, or fast‑stop measures.
1.3 Preserve现场 (Collect Artifacts)
Collect heap dump, JVM startup parameters, GC logs, Linux logs, Java logs, etc.
Heap Dump
<code>#jmap -dump:format=b,file=heap.bin <pid>
#jmap -dump:live,format=b,file=heap.bin <pid>
#jcmd <pid> GC.heap_dump filename=heap.bin
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heap.bin</code>JVM Startup Parameters
<code>ps -ef|grep java</code>Example parameters are shown in the original article.
Prompt for ChatGPT analysis
<code>我希望你充当JVM调优专家。我将提供机器的规格和当前JVM参数,可能会包含其他输入。您的工作是逐条解释这些JVM参数,然后基于机器的规格和我其他的输入,判断当前JVM参数是否合理。如果有不合理的参数,请给出具体的修改建议和理由。</code>Step 2: Determine Source
2.1 Identify Process
Use top , ps , or system logs to confirm which process caused OOM.
2.2 Check for JVM Memory Leak
Slow increase is not always a leak; it may be delayed allocation. Example: adding -XX:+AlwaysPreTouch forces the JVM to touch all heap pages at startup, preventing gradual growth.
2.3 Analyze Logs
Search for OOM keywords in application and system logs.
2.4 Preliminary Region Judgment
Use ARMS monitoring to see which memory region (heap, metaspace, direct memory, stack) is abnormal and jump to the corresponding step.
Step 3: Heap Issue Investigation
Use commands and tools to analyze heap problems.
jstat -gcutil <pid>
jmap -heap <pid>
jmap -histo <pid>
Arthas memory command
pmap -x <pid>
Step 4: Off‑Heap Issue Investigation
Metaspace
Continuous growth after startup; GC cannot release. Relevant parameters: -XX:MetaspaceSize , -XX:MaxMetaspaceSize , -XX:MinMetaspaceFreeRatio , -XX:MaxMetaspaceFreeRatio .
Direct Memory and JNI Memory
Check with -XX:MaxDirectMemorySize and -XX:NativeMemoryTracking=detail . Use jcmd <pid> VM.native_memory detail to view distribution. Netty leak detection levels: disabled , simple , advanced , paranoid .
Stack Issues
StackOverflowError and OutOfMemoryError: unable to create new native thread . Adjust with -Xss or -XX:ThreadStackSize .
Common Cases
JNI memory overflow (ptmalloc2 fragmentation), Metaspace leaks (Fastjson, Groovy, BeanCopy), DirectMemory leaks (Netty), Stack overflow (recursive calls), OOM Killer.
Common Linux Commands
top
Explains fields such as PID, USER, VIRT, RES, SHR, %CPU, %MEM, etc.
pmap
Shows memory map; use -x , sort by size, compare snapshots with icdiff .
jcmd, jhat, jps, jinfo, jstat, jstack, jmap
Brief usage examples for each command.
Tools
ATP – Alibaba Cloud memory analysis platform.
ARMS – APM monitoring service.
MAT – Eclipse Memory Analyzer.
Arthas – Java diagnostic tool.
References
Reference links are listed in the original article (omitted here for brevity).
Illustrations
Sanyou's Java Diary
Passionate about technology, though not great at solving problems; eager to share, never tire of learning!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.