How to Diagnose and Fix Java Application Slowdowns: CPU, GC, and Thread Issues
This guide explains how to identify and resolve common Java production problems such as sudden CPU spikes, excessive Full GC, thread blocking, waiting states, and deadlocks by using tools like top, jstack, jstat, and memory‑dump analysis to pinpoint the root cause and apply appropriate fixes.
Developers often encounter production issues where a system suddenly becomes slow, CPU reaches 100%, and Full GC occurs frequently.
The article provides a step‑by‑step troubleshooting approach to locate the problematic code and propose solutions.
Typical Causes of System Slowdown
Reading a large amount of data causing memory exhaustion and frequent Full GC.
CPU‑intensive operations leading to high CPU usage.
Other less severe cases include occasional blocking operations, threads stuck in WAITING state, and deadlocks.
Excessive Full GC
Full GC is common, especially after new features are deployed. Two main signs are:
Multiple threads with CPU >100% that are GC threads.
jstat shows a rapidly increasing number of Full GC events.
<code>top -08:31:10 up 30 min, 0 users, load average: 0.73,0.58,0.34
KiB Mem: 2046460 total, 1923864 used, 122596 free, 14388 buffers
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9 root 20 0 2557160 288976 15812 S 98.0 14.1 0:42.60 java</code>Identify the Java process (PID 9) and inspect its threads:
<code>top -Hp 9</code>Thread 10 shows high CPU usage. Convert its ID to hexadecimal for jstack:
<code>printf "%x\n" 10
a</code>jstack reveals a VM Thread (GC thread) with nid=0xa, confirming that frequent Full GC is the slowdown cause.
Use jstat to monitor GC:
<code>jstat -gcutil 9 1000 10
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 0.00 0.00 75.07 59.09 59.60 3259 0.919 6517 7.715 8.635</code>High FGC values (e.g., 6793) indicate memory leaks. Dump the heap and analyze with Eclipse MAT to find large objects (e.g., PrintStream). If explicit
System.gc()calls are present, disable them with
-XX:+DisableExplicitGC.
High CPU Usage
Use
topto find the high‑CPU process, then
top -Hp <pid>to locate the thread (CPU >80%). Convert the thread ID to hex and search it in the jstack log to see whether it is a GC thread or user code.
Intermittent Interface Latency
When an API occasionally takes 2–3 seconds, load‑test the endpoint to increase request frequency. Multiple threads will block at the same stack trace, revealing the slow code location (e.g., line 34 in
UserController).
Thread Stuck in WAITING State
Export TIMED_WAITING threads from jstack using
grepto files (a1.log, a2.log, …). Compare the files to find threads that persist across snapshots; these are likely problematic. Examine their stack traces to locate the waiting code (e.g., line 8 in
SyncTask).
Deadlock Detection
jstack can automatically detect deadlocks and print the involved threads. Identify the lock‑contending code (e.g., line 5 in
ConnectTask) and fix the synchronization.
Summary of Troubleshooting Steps
Run
top; if CPU is high, use
top -Hp <pid>to find the hot thread.
Convert the thread ID to hex and locate it in jstack.
If it is a VM thread, monitor GC with
jstat -gcutiland dump the heap for analysis.
If CPU is normal but latency persists, use load testing and compare multiple jstack snapshots to find waiting or blocked threads.
Detect deadlocks with jstack and resolve lock ordering issues.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.