Backend Development 7 min read

Root Cause Analysis and Resolution of OutOfMemoryError in a Java Backend Service

This article details a comprehensive investigation of a Java backend service suffering from severe OutOfMemoryError due to an unbounded userId list in a count query, describing monitoring findings, heap dump analysis, and practical mitigation steps including request limiting and JVM tuning.

IT Services Circle
IT Services Circle
IT Services Circle
Root Cause Analysis and Resolution of OutOfMemoryError in a Java Backend Service

Phenomenon

A certain online service exhibited extremely slow response times; monitoring revealed a large GAP time despite the actual request processing time being short, and many such requests occurred during the period.

Root Cause Analysis

Trace through the monitoring chain showed that requests reached the service but waited about 3 seconds before processing. CPU spiked and frequent long GC cycles occurred, eventually filling the heap and causing the pod to be terminated. Logs contained an OutOfMemoryError:

system error: org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap space
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1055)
    ...

A large batch job running at the same time was suspected, but its code showed no obvious issue. JVM parameters such as -XX:+HeapDumpOnOutOfMemoryError -XX:ErrorFile=/logs/oom_dump/xxx.log -XX:HeapDumpPath=/logs/oom_dump/xxx.hprof were set, yet the container killed the pod before the dump could be retained.

After the pod restart, a 4.8 GB heap dump was generated; jvisualvm failed to load it but identified the OOM‑triggering thread. The dump revealed a massive count SQL statement that allocated a 1.07 GB byte array and a 1.03 GB char array.

The offending userId list originated from an external system and was 64 MB in size. Investigation uncovered a bug in the upstream system that sent all user IDs for the query.

Solution

The receiving system now enforces a limit on the number of userId values, and the same restriction is applied on our side, turning a complex troubleshooting effort into a simple configuration change.

Wait, There’s Another One

A similar incident occurred later: multiple alerts and machine crashes due to memory exhaustion. Heap dumps (up to 12 GB) were analyzed with MAT, revealing huge String objects. The root cause was a full‑table query without a WHERE clause on TiDB, which loaded the entire user table into memory.

Conclusion

When facing OOM issues without obvious code defects, the following JVM options are valuable, especially in containerized environments:

-XX:+HeapDumpOnOutOfMemoryError -XX:ErrorFile=/logs/oom_dump/xxx.log -XX:HeapDumpPath=/logs/oom_dump/xxx.hprof

Additionally, -XX:+ExitOnOutOfMemoryError forces the JVM to terminate promptly, allowing Kubernetes to restart a fresh instance.

For SQL statements lacking a WHERE clause, enforce a sensible LIMIT to prevent full‑table scans from overwhelming the system.

backendJavaJVMmonitoringPerformanceDatabaseOutOfMemoryError
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.