Backend Development 22 min read

Optimizing Hotel Query Service Memory Usage: GC Tuning, Native Memory Management, and Migration to jemalloc

This article details the systematic reduction of memory consumption for Ctrip's hotel query service by halving container memory, evaluating and tuning modern garbage collectors, diagnosing off‑heap leaks, and ultimately replacing the default ptmalloc allocator with jemalloc to achieve stable performance and lower resource costs.

Ctrip Technology

Mar 9, 2023

Optimizing Hotel Query Service Memory Usage: GC Tuning, Native Memory Management, and Migration to jemalloc

Background and Goal In a container‑centric deployment, reducing per‑container memory improves cluster elasticity, recovery time, and scheduling, but aggressive compression can hurt stability and throughput. The hotel query service, a major cost driver with thousands of servers and tens of terabytes of Redis, needed its container memory cut from 32 GB to 16 GB, focusing on memory growth reduction and management efficiency.

Heap Memory Management The service migrated from JDK 8/CMS to newer collectors on JDK 11/17. Tests compared G1, ZGC, and ShenandoahGC. ZGC offers near‑zero stop‑the‑world pauses via colored pointers and read barriers, while Shenandoah uses Brook pointers and a connection matrix. G1 remains the mature default with balanced pause‑time and throughput. Test configurations combined various JVM versions and collectors.

G1 Tuning Practices Adjusted MaxGCPauseMillis (e.g., from 200 ms to 300 ms) to reduce YGC frequency, and lowered InitiatingHeapOccupancyPercent to trigger earlier mixed collections, improving old‑generation reclamation.

ZGC Tuning Practices Increased ZAllocationSpikeTolerance and enabled ZProactive with a shorter ZCollectionInterval to mitigate response spikes under load; also monitored allocation/relocation stalls and adjusted ConcGCThreads and ParallelGCThreads when GC resources were insufficient.

Benchmark Results ZGC achieved sub‑100 µs pauses but consumed excessive CPU (four Z‑worker threads saturated cores) and higher heap‑outside memory, leading to ~70 % response latency increase under peak load. G1, while slower than ZGC, maintained comparable latency with half the memory and modest CPU overhead.

Native Memory Management Off‑heap usage grew due to heavy NIO, serialization, and compression, causing RSS to climb until the kernel OOM killer terminated the process. Investigation with gdb --batch --pid 36563 --ex 'call malloc_trim()' showed glibc's ptmalloc was the leak source.

ptmalloc Limitations ptmalloc’s arena, bin, and chunk structures introduce extra overhead, fragmentation, lock contention, and poor reclamation, especially under high concurrency, leading to out‑of‑memory conditions in a constrained 2.5 GB off‑heap budget.

Switch to jemalloc Replaced ptmalloc with jemalloc, which offers lower fragmentation (< 20 %), per‑thread caches, arena‑based lock‑free allocation, and proactive purge of dirty pages. Migration required only setting LD_PRELOAD to the jemalloc shared library in the Tomcat start script and rebuilding the container image.

Benefits of jemalloc The change saved 1–1.5 GB off‑heap per machine, eliminated RSS spikes, improved stability under traffic bursts, and required minimal operational effort. Performance gains include better multi‑threaded allocation, reduced GC interference, and lower cost for scaling the fleet.

Conclusion The case study demonstrates a repeatable optimization loop: hypothesize, benchmark, tune, and validate. While specific numbers depend on the application, the methodology and jemalloc migration provide valuable guidance for backend services facing tight memory constraints.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java JVM Memory Optimization Garbage Collection Containerization jemalloc Backend Performance

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.