Backend Development 11 min read

Diagnosing and Resolving OutOfMemoryError Caused by Zipkin Reporter in Spring Cloud Applications

This article details a step‑by‑step investigation of a Java OutOfMemoryError caused by the Zipkin reporter in a Spring Cloud application, covering symptom identification, resource monitoring, heap dump analysis, code inspection, and the final fix of upgrading the zipkin‑reporter dependency.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Diagnosing and Resolving OutOfMemoryError Caused by Zipkin Reporter in Spring Cloud Applications

The author encountered repeated application‑exception alerts indicating service failures, CPU at 100% and Java heap OOM (GC overhead limit exceeded) in a Spring Cloud project that uses spring-cloud-sleuth-zipkin with zipkin‑reporter version 2.7.3.

Problem Symptoms

Alert shows "connection refused"; the service becomes unresponsive. Logs reveal java.lang.OutOfMemoryError: GC overhead limit exceeded .

Environment

Spring Cloud (F version) with dependency:

<dependency>
  <groupId>org.springframework.cloud</groupId>
  <artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>
Version: 2.0.0.RELEASE

Investigation Steps

1. Check service status and health‑check URL

Run ps -ef | grep and verify the health‑check endpoint (e.g., curl http://192.168.1.110:20606/serviceCheck ).

2. Review service logs

Logs show the service OOM.

3. Examine server resource usage

Use top -H -p to find a thread consuming >300% CPU. Save output with top -b -n 2 > /tmp/top.txt .

4. Collect diagnostic data

CPU snapshot: top -b -n 2 > /tmp/top.txt

Thread list: ps -mp -o THREAD,tid,time | sort -k2r > /tmp/ _threads.txt

Open connections: lsof -p > /tmp/ _lsof.txt

Thread dump: jstack -l > /tmp/ _jstack.txt

Heap summary: jmap -heap > /tmp/ _jmap_heap.txt

Heap histogram: jmap -histo | head -n 100 > /tmp/ _jmap_histo.txt

GC stats: jstat -gcutil > /tmp/ _jstat_gc.txt

Heap dump: jmap -dump:live,format=b,file=/tmp/ _jmap_dump.hprof

Analysis

CPU‑intensive threads

Thread IDs 0x2cb4‑0x2cb7 correspond to GC task threads, indicating excessive GC activity.

GC statistics

S0   S1   E     O     M    CCS   YGC   YGCT   FGC   FGCT    GCT
 0.00 0.00 100.00 99.94 90.56 87.86 875 9.307 3223 5313.139 5322.446

Frequent Full GC (FGC) confirms the heap is constantly reclaimed.

Heap dump inspection

Using Eclipse MAT, the largest retained object is zipkin2.reporter.InMemoryReporterMetrics (≈926 MB), which is the root cause of OOM.

Root cause

The older zipkin-reporter-2.7.3.jar stores metrics in memory; newer versions replace the ConcurrentHashMap<Throwable, AtomicLong> with a type‑safe ConcurrentHashMap<Class<? extends Throwable>, AtomicLong> , preventing the memory leak.

Solution

Upgrade the zipkin reporter dependency to version 2.8.4 (or later) and rebuild the service:

<dependency>
  <groupId>io.zipkin.brave</groupId>
  <artifactId>brave</artifactId>
  <version>5.6.4</version>
</dependency>

Optionally add JVM flags to generate a heap dump on OOM:

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=path/filename.hprof

References

Original article: "记一次sleuth发送zipkin异常引起的OOM" (https://www.jianshu.com/p/f8c74943ccd8)

Zipkin reporter GitHub issue: https://github.com/openzipkin/zipkin-reporter-java/issues/139

Patch fixing the leak: https://github.com/openzipkin/zipkin-reporter-java/pull/119/files

backenddebuggingJavaPerformanceSpring CloudZipkinOutOfMemoryError
Full-Stack Internet Architecture
Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.