JVM FullGC Optimization: Reducing Full GC Frequency from 40 Times a Day to Once Every 10 Days
This article documents a month‑long JVM garbage‑collection tuning process that reduced daily FullGC occurrences from over 40 to less than one per ten days by adjusting heap parameters, fixing memory leaks, and tuning Metaspace, while providing detailed analysis, code snippets, and performance graphs.
After more than a month of effort, the FullGC frequency was reduced from about 40 times per day to roughly once every ten days, and YoungGC time was cut by more than half; the article records the intermediate tuning steps.
Initially, the production servers (2 CPU, 4 GB RAM, four‑node cluster) suffered frequent FullGCs, causing automatic restarts. The JVM start‑up parameters were:
-Xms1000M -Xmx1800M -Xmn350M -Xss300K -XX:+DisableExplicitGC -XX:SurvivorRatio=4 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC-Xmx1800M: sets maximum heap size to 1800 MB.
-Xms1000M: sets initial heap size to 1000 MB; matching it with -Xmx avoids re‑allocation after GC.
-Xmn350M: sets young generation size to 350 MB (recommended ~3/8 of total heap).
-Xss300K: sets thread stack size; smaller values allow more threads under the same physical memory.
Problem
The servers experienced very frequent FullGCs (average >40 per day) and occasional automatic restarts, indicating severe instability and prompting a tuning effort.
First Optimization
The original configuration had a small young generation, leading to frequent YoungGCs and long collection times (e.g., 830 s). The first tuning increased the young generation and aligned initial heap with maximum heap:
-Xmn350M -> -Xmn800M
-XX:SurvivorRatio=4 -> -XX:SurvivorRatio=8
-Xms1000M -> -Xms1800MSurvivorRatio was raised to 8 to encourage more garbage reclamation in the young generation. After deploying the new settings to two production servers (prod, prod2) for five days, YoungGC count dropped by more than half and time decreased by 400 s, but FullGC count increased by 41, indicating a regression.
Second Optimization – Memory Leak Investigation
A code review revealed an anonymous inner‑class listener that retained a generic object T for up to a minute after a timeout, preventing its garbage collection and creating thousands of instances. After fixing the listener, the memory‑leak issue was partially mitigated, but FullGC frequency remained high.
Further dump analysis uncovered a massive number of ByteArrowRow objects (≈40 k) generated by a database query that omitted a crucial filter condition, causing a sudden surge of data processing and high inbound traffic (≈83 MB/s). Removing the missing filter reduced the object count and stabilized the servers.
Second Tuning – Metaspace Adjustment
With the memory leak resolved, attention turned to Metaspace, which grew to ~200 MB in the GC logs (default is 21 MB). The following parameters were applied to two servers (prod1, prod2) while keeping the other two unchanged:
-Xmn350M -> -Xmn800M
-Xms1000M -> 1800M
-XX:MetaspaceSize=200M
-XX:CMSInitiatingOccupancyFraction=75For prod1 and prod2 the young generation size differed (800 M vs 600 M). After ten days of observation, FullGC and YoungGC counts on prod1 and prod2 were significantly lower than on prod3 and prod4, and throughput on prod1 improved noticeably.
Summary
FullGC more than once per day is abnormal.
When FullGC spikes, prioritize investigating memory leaks.
After fixing leaks, JVM tuning opportunities become limited; further effort may not be worthwhile.
If CPU stays high after code checks, consult the cloud provider (e.g., Alibaba Cloud) – in this case, a server issue caused 100 % CPU.
High inbound traffic can stem from large database queries; verify query conditions before assuming attacks.
Regularly monitor GC to detect problems early.
Overall, the month‑long tuning reduced GC frequency and time by more than half, increased throughput, and stabilized the production environment.
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.