Diagnosing and Resolving OutOfMemoryError Caused by Zipkin Reporter in Spring Cloud Applications
This article details a step‑by‑step investigation of a Java OutOfMemoryError caused by the Zipkin reporter in a Spring Cloud application, covering symptom identification, resource monitoring, heap dump analysis, code inspection, and the final fix of upgrading the zipkin‑reporter dependency.
The author encountered repeated application‑exception alerts indicating service failures, CPU at 100% and Java heap OOM (GC overhead limit exceeded) in a Spring Cloud project that uses spring-cloud-sleuth-zipkin with zipkin‑reporter version 2.7.3.
Problem Symptoms
Alert shows "connection refused"; the service becomes unresponsive. Logs reveal java.lang.OutOfMemoryError: GC overhead limit exceeded .
Environment
Spring Cloud (F version) with dependency:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>
Version: 2.0.0.RELEASEInvestigation Steps
1. Check service status and health‑check URL
Run ps -ef | grep and verify the health‑check endpoint (e.g., curl http://192.168.1.110:20606/serviceCheck ).
2. Review service logs
Logs show the service OOM.
3. Examine server resource usage
Use top -H -p to find a thread consuming >300% CPU. Save output with top -b -n 2 > /tmp/top.txt .
4. Collect diagnostic data
CPU snapshot: top -b -n 2 > /tmp/top.txt
Thread list: ps -mp -o THREAD,tid,time | sort -k2r > /tmp/ _threads.txt
Open connections: lsof -p > /tmp/ _lsof.txt
Thread dump: jstack -l > /tmp/ _jstack.txt
Heap summary: jmap -heap > /tmp/ _jmap_heap.txt
Heap histogram: jmap -histo | head -n 100 > /tmp/ _jmap_histo.txt
GC stats: jstat -gcutil > /tmp/ _jstat_gc.txt
Heap dump: jmap -dump:live,format=b,file=/tmp/ _jmap_dump.hprof
Analysis
CPU‑intensive threads
Thread IDs 0x2cb4‑0x2cb7 correspond to GC task threads, indicating excessive GC activity.
GC statistics
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 0.00 100.00 99.94 90.56 87.86 875 9.307 3223 5313.139 5322.446Frequent Full GC (FGC) confirms the heap is constantly reclaimed.
Heap dump inspection
Using Eclipse MAT, the largest retained object is zipkin2.reporter.InMemoryReporterMetrics (≈926 MB), which is the root cause of OOM.
Root cause
The older zipkin-reporter-2.7.3.jar stores metrics in memory; newer versions replace the ConcurrentHashMap<Throwable, AtomicLong> with a type‑safe ConcurrentHashMap<Class<? extends Throwable>, AtomicLong> , preventing the memory leak.
Solution
Upgrade the zipkin reporter dependency to version 2.8.4 (or later) and rebuild the service:
<dependency>
<groupId>io.zipkin.brave</groupId>
<artifactId>brave</artifactId>
<version>5.6.4</version>
</dependency>Optionally add JVM flags to generate a heap dump on OOM:
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=path/filename.hprofReferences
Original article: "记一次sleuth发送zipkin异常引起的OOM" (https://www.jianshu.com/p/f8c74943ccd8)
Zipkin reporter GitHub issue: https://github.com/openzipkin/zipkin-reporter-java/issues/139
Patch fixing the leak: https://github.com/openzipkin/zipkin-reporter-java/pull/119/files
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.