Backend Development 14 min read

Diagnosing and Optimizing Throughput and CPU Usage in a Java Spring Backend Service

The article details a step‑by‑step investigation of a Java Spring backend that initially achieved only 50 req/s under load, identifies bottlenecks such as slow SQL, excessive logging, thread‑pool misconfiguration and costly Spring bean creation, and demonstrates how targeted optimizations roughly doubled throughput while reducing response times.

Selected Java Interview Questions

Oct 27, 2023

Diagnosing and Optimizing Throughput and CPU Usage in a Java Spring Backend Service

The author describes a ToB system that, under normal conditions, handled requests quickly (<100 ms) but failed to meet a new client requirement of 500 requests per second. Initial load testing with 100 concurrent users yielded only 50 req/s and CPU usage near 80%.

Analysis Process – Locating the "slow" causes

Key suspects were locks (synchronised, distributed, DB), time‑consuming operations (network, SQL) and missing metrics. The team added instrumentation to log warnings when response times exceeded thresholds (500 ms for the API, 200 ms for internal calls, 10 ms for Redis, 100 ms for SQL).

Log inspection revealed a slow SQL statement that updated a single row in a high‑contention table:

update table set field = field - 1 where type = 1 and filed > 1;

Because the statement locked the same row under high concurrency, it accounted for over 80% of the request latency. The query was changed to asynchronous execution, which cut the maximum response time from 5 s to 2 s and the 95th‑percentile from 4 s to 1 s.

Further profiling showed irregular gaps of ~100 ms in the logs, suggesting thread switches, excessive logging, or stop‑the‑world pauses. The team reduced log level to DEBUG, consolidated @Async thread pools to a maximum of 50 threads, and increased JVM heap from 512 MB to 4 GB. GC frequency dropped, but throughput only rose to ~200 req/s.

CPU usage remained high despite fewer threads. Stack traces indicated frequent calls to BeanUtils.getBean(), which internally invoked createBean for a prototype‑scoped Redis helper. Each call performed full bean initialization, causing significant overhead under load.

The problematic code:

RedisTool redisTool = BeanUtils.getBean(RedisMaster.class);

was replaced with direct new instantiation, eliminating the repeated bean creation cost.

Additional timing utilities (manual System.currentTimeMillis() and Hutool's StopWatch) were identified as non‑negligible contributors in high‑concurrency scenarios.

Final results, shown in the last chart, demonstrated a roughly two‑fold improvement in throughput and a substantial reduction in response time after applying the above optimizations.

Summary of actions:

Identified and async‑executed a slow SQL statement.

Reduced logging verbosity and consolidated thread pools.

Increased JVM heap size and monitored GC.

Replaced prototype‑scoped bean retrieval with direct object creation to avoid costly createBean calls.

Reviewed and minimized use of timestamp APIs in hot paths.

The author notes that while the immediate issues were resolved, deeper understanding of performance fundamentals and systematic troubleshooting methods are still needed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java Performance Optimization spring CPU Profiling

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.