Operations 10 min read

JVM GC Tuning for Java Service Migration to a Private Cloud: A Multi‑Round Optimization Case Study

During the migration of a billion‑record Java service from physical servers to a private‑cloud Docker environment, a series of JVM GC tuning steps—including adaptive young generation sizing, larger young generation, reduced concurrent GC threads, and phantom‑reference cleanup—significantly reduced stop‑the‑world pauses and restored service performance.

58 Tech
58 Tech
58 Tech
JVM GC Tuning for Java Service Migration to a Private Cloud: A Multi‑Round Optimization Case Study

Background : A Java service storing billions of user records in MySQL (sharded across >5 instances) was migrated from a 32‑core/128 GB physical server to an 8‑core/8 GB private‑cloud Docker host. After migration, cache timeouts, increased response latency, and occasional latency spikes were observed, suspected to be caused by JVM GC stop‑the‑world pauses.

Initial Environment : The service ran on JDK 1.6 with the following JVM options (using ParNew + CMS):

-server
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+DisableExplicitGC
-Xms10g
-Xmx10g
-Xmn4g
-Xss1024K
-XX:PermSize=256m
-XX:MaxPermSize=512m
-XX:SurvivorRatio=10
-XX:+ParallelRefProcEnabled
-XX:+CMSParallelRemarkEnabled
-XX:+UseCMSCompactAtFullCollection
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-XX:CMSMaxAbortablePrecleanTime=30000
-XX:SoftRefLRUPolicyMSPerMB=0

After migration the JVM memory was reduced to 7 GB.

Problem Identification : GC logs showed frequent young‑generation pauses of 10‑13 ms every ~4 s, and occasional long pauses (>100 ms) in the old generation, correlating with cache timeouts.

Round 1 – Adaptive Young Generation : Enabled GC logging and removed explicit -Xmn . New logs showed a JVM‑chosen young generation of ~156 MB with pauses of 10‑13 ms.

Round 2 – Increase Young Generation Size : Set -Xmn2g to lengthen the GC interval. Pauses grew to 12‑17 ms (young) and 8 ms / 698 ms (old), with limited improvement.

Round 3 – Reduce Concurrent GC Threads : Adjusted -XX:ParallelGCThreads=8 to match the limited CPU cores available in Docker. Young‑gen pauses dropped to 8‑10 ms, but old‑gen remark time increased to ~729 ms.

Round 4 – Full Heap Dump & Analysis : A full heap dump analyzed with Eclipse MAT revealed that the MySQL driver’s com.mysql.jdbc.NonRegisteringDriver retained a large hash map of PhantomReference objects (~812 MB), causing long weak refs processing times during remark.

Round 5 – Clean Connection Reference Map : Implemented a background thread to periodically clear the driver’s connection reference map. Subsequent GC logs showed young‑gen pauses of ~8 ms (interval 8 s) and old‑gen pauses reduced to 11‑131 ms, with noticeable performance gains over days of operation.

Additional Notes : The service uses Tomcat DBCP for connection pooling, so MySQL’s own cleanup is unnecessary. Connection pool configuration is <initialSize>16</initialSize> , <maxActive>16</maxActive> , etc. Upgrading to JDK 1.8 was considered but deemed risky due to compatibility concerns.

Conclusion : By iteratively tuning GC parameters, aligning concurrent thread counts with actual CPU limits, and fixing a memory‑leak‑like issue in the MySQL driver, the service’s GC‑induced pauses were dramatically reduced, cache timeouts vanished, and overall response times returned to or exceeded pre‑migration levels.

JavaJVMPerformanceDockercloud migrationMemory ManagementGC tuning
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.