Optimizing Hotel Query Service Memory Usage: GC Tuning, Native Memory Management, and Migration to jemalloc
This article details the systematic reduction of memory consumption for Ctrip's hotel query service by halving container memory, evaluating and tuning modern garbage collectors, diagnosing off‑heap leaks, and ultimately replacing the default ptmalloc allocator with jemalloc to achieve stable performance and lower resource costs.
Background and Goal In a container‑centric deployment, reducing per‑container memory improves cluster elasticity, recovery time, and scheduling, but aggressive compression can hurt stability and throughput. The hotel query service, a major cost driver with thousands of servers and tens of terabytes of Redis, needed its container memory cut from 32 GB to 16 GB, focusing on memory growth reduction and management efficiency.
Heap Memory Management The service migrated from JDK 8/CMS to newer collectors on JDK 11/17. Tests compared G1, ZGC, and ShenandoahGC. ZGC offers near‑zero stop‑the‑world pauses via colored pointers and read barriers, while Shenandoah uses Brook pointers and a connection matrix. G1 remains the mature default with balanced pause‑time and throughput. Test configurations combined various JVM versions and collectors.
G1 Tuning Practices Adjusted MaxGCPauseMillis (e.g., from 200 ms to 300 ms) to reduce YGC frequency, and lowered InitiatingHeapOccupancyPercent to trigger earlier mixed collections, improving old‑generation reclamation.
ZGC Tuning Practices Increased ZAllocationSpikeTolerance and enabled ZProactive with a shorter ZCollectionInterval to mitigate response spikes under load; also monitored allocation/relocation stalls and adjusted ConcGCThreads and ParallelGCThreads when GC resources were insufficient.
Benchmark Results ZGC achieved sub‑100 µs pauses but consumed excessive CPU (four Z‑worker threads saturated cores) and higher heap‑outside memory, leading to ~70 % response latency increase under peak load. G1, while slower than ZGC, maintained comparable latency with half the memory and modest CPU overhead.
Native Memory Management Off‑heap usage grew due to heavy NIO, serialization, and compression, causing RSS to climb until the kernel OOM killer terminated the process. Investigation with gdb --batch --pid 36563 --ex 'call malloc_trim()' showed glibc's ptmalloc was the leak source.
ptmalloc Limitations ptmalloc’s arena, bin, and chunk structures introduce extra overhead, fragmentation, lock contention, and poor reclamation, especially under high concurrency, leading to out‑of‑memory conditions in a constrained 2.5 GB off‑heap budget.
Switch to jemalloc Replaced ptmalloc with jemalloc, which offers lower fragmentation (< 20 %), per‑thread caches, arena‑based lock‑free allocation, and proactive purge of dirty pages. Migration required only setting LD_PRELOAD to the jemalloc shared library in the Tomcat start script and rebuilding the container image.
Benefits of jemalloc The change saved 1–1.5 GB off‑heap per machine, eliminated RSS spikes, improved stability under traffic bursts, and required minimal operational effort. Performance gains include better multi‑threaded allocation, reduced GC interference, and lower cost for scaling the fleet.
Conclusion The case study demonstrates a repeatable optimization loop: hypothesize, benchmark, tune, and validate. While specific numbers depend on the application, the methodology and jemalloc migration provide valuable guidance for backend services facing tight memory constraints.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.