Memory Optimization for Core Services: JVM Tuning and Large Object Management
The article describes how vivo’s core music service reduced frequent garbage collections, long GC pauses, and RPC timeouts by switching to ParNew+CMS collectors, implementing fault‑transfer monitoring, optimizing large response objects, and adding non‑invasive payload checks, resulting in dramatically lower YGC/FGC frequencies and improved memory stability.
This article details the memory optimization process for a core music service at vivo, addressing JVM memory issues that were causing frequent garbage collection, long GC pauses, and RPC timeouts during peak periods. The service provides metadata and user asset queries for songs and artists.
Initial analysis revealed YGC occurring 12 times per minute (peaking at 24) with 327ms average duration, and FGC every 10 minutes (peaking at 1) with 30-second average duration. Memory usage showed abnormal spikes during problem periods, with heap memory rising rapidly and FGC becoming frequent with diminishing memory release.
The optimization process involved four key steps:
Step 1: JVM Optimization The default JVM configuration (Parallel Scavenge + Parallel Old) was replaced with ParNew+CMS garbage collectors, better suited for the service's characteristics of many short-lived objects and low throughput requirements. Key parameters were adjusted: young generation size increased to 1.5x original, CMSScavengeBeforeRemark enabled to reduce remark time. This significantly reduced heap memory usage.
Step 2: Fault Transfer Strategy A monitoring-based fault transfer mechanism was implemented. When API services encountered exceptions calling core services, problematic machine IPs were reported to the monitoring platform. Alert rules and callback functions were configured to automatically remove problematic IPs from the provider cluster, preventing further calls to faulty machines.
Step 3: Large Object Optimization Analysis of thread dumps revealed numerous threads stuck in Dubbo's encoding process, suggesting large response objects. Heap dumps identified a 258MB Netty taskQueue and individual 9MB responses. The problematic interface was found to query excessive information. Post-optimization, YGC total count decreased by 76.5%, high-peak cumulative time decreased by 75.5%, FGC occurred only once every three days, and high-peak cumulative time decreased by 90.1%.
Step 4: Non-invasive Memory Object Monitoring Inspired by Dubbo's payload checking mechanism, a custom codec was developed to monitor objects exceeding thresholds without impacting performance. The solution leveraged buffer position changes before and after encoding to detect oversized objects. For objects between the alert threshold and payload limit, direct size judgment and key information logging were performed. For objects exceeding payload, IDs were cached during retransmission and logged during the original transmission process.
The article concludes that memory optimization is an ongoing comprehensive process involving log analysis, monitoring tools, stack information, and various optimization strategies including scheduled tasks, code refactoring, and caching.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.