Backend Development 10 min read

Analysis of Java Thread States, Flame Graphs, and Performance Optimizations

The article examines Java thread states—WAITING and TIMED_WAITING—through monitoring screenshots, explains their definitions, trigger conditions, and recovery mechanisms, analyzes real‑world code involving concurrent tasks, and presents flame‑graph based performance optimizations for backend services.

JD Tech Talk

Mar 14, 2025

Analysis of Java Thread States, Flame Graphs, and Performance Optimizations

1. Thread Run States

1.1 total

1.2 timed_waiting

From the above image we can see that the top N threads in TIMED_WAITING are querying national subsidy qualifications.

1.3 waiting

From the above image we can see that the top N threads in WAITING are querying national subsidy activities.

1.4 Thread Analysis

Below we analyze the two states:

1. WAITING state

• Definition : When a thread is in the WAITING state, it is waiting for a specific action from another thread (such as notification or interruption) and will not continue execution.

• Trigger conditions : Common ways a thread enters WAITING include:

Calling Object.wait(): the thread waits for the object's monitor to be notified.

Calling Thread.join(): waiting for another thread to finish.

Calling LockSupport.park(): the thread is blocked until another thread wakes it.

• Recovery : The thread remains in WAITING until another thread calls notify() or notifyAll() (for Object.wait()) or it is interrupted.

2. TIMED_WAITING state

• Definition : When a thread is in the TIMED_WAITING state, it waits for a condition but will automatically return after a specified time.

• Trigger conditions : Common ways a thread enters TIMED_WAITING include:

Calling Thread.sleep(milliseconds): the thread sleeps for the given time.

Calling Object.wait(milliseconds): the thread waits on the object's monitor with a timeout.

Calling Thread.join(milliseconds): waiting for another thread with a time limit.

Calling LockSupport.parkNanos() or LockSupport.parkUntil().

• Recovery : The thread automatically resumes after the timeout or when another thread calls notify() or notifyAll().

<span>| 状态           | 描述                                     | 触发条件                                    | 恢复方式                                   |</span>
<span>----------------|------------------------------------------|---------------------------------------------|--------------------------------------------|</span>
<span>**WAITING**    | 线程等待另一个线程的特定操作，不会继续执行 | `Object.wait()`, `Thread.join()`, `LockSupport.park()` | 其他线程调用 `notify()`/`notifyAll()` 或被中断 |</span>
<span>**TIMED_WAITING** | 线程等待某个条件的发生，但有时间限制   | `Thread.sleep(milliseconds)`, `Object.wait(milliseconds)`, `Thread.join(milliseconds)`, `LockSupport.parkNanos()`, `LockSupport.parkUntil()` | 超过指定时间后自动恢复，或其他线程调用 `notify()`/`notifyAll()` |</span>

In the actual code, queryActTp runs getActivityInfo with two subtasks, and queryQualityTp runs getQualityInfo with five subtasks; both are executed in parallel within queryActAndQualityTp.

Monitoring screenshot for getActivityInfo.

Monitoring screenshot for getQualityInfo.

Although the call patterns are the same, two thread states appear; theoretically both should be TIMED_WAITING. The stack trace shows LockSupport.park instead of LockSupport.parkNanos, requiring further analysis.

Another issue is that thread pool A invokes pools B and C in parallel; under high load the frequent CPU context switches cause pressure. The logic was rewritten to use a single pool for concurrent activity and qualification queries, but the change has not been deployed yet.

2. Flame Graph Analysis

2.1 wait thread

2.2 lock performance

2.3 CPU sampling

2.3.1 getFatherActivity analysis

Q1: Call scenario – loop calling getFatherActivity.

Q2: Configuration data is a JSON string of about 50,000 characters, leading to large object deserialization.

Q3: Using new ArrayList() creates a new object each time.

Q4: After grouping only the first element is used; using toMap would be better.

Optimization 1:

We still see multiple stream calls inside the loop; moving the toMap logic outside the loop reduces overhead:

Other methods also consume high CPU and are left unchanged for now.

Further optimization introduces a utility class for collecting concurrent thread results:

1、 allOf异常后，取消所有线程的继续执行。这么做为了防止有些线程超时后仍在执行，浪费部分CPU资源，线上发现确实存在较多的超时情况。</code>
<code>2、 这里的异常日志较多，根据异常类型进行区分，去掉没用的堆栈日志。

All waiting in concurrent threads now uses the unified method described earlier; the queryActTp remaining in WAITING may be due to missing cancellation, while queryQualityTp in TIMED_WAITING could be caused by longer subtask execution time. Further observation after deployment is needed.

Scan to join the technical discussion group

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java performance profiling flame graph thread states

Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.