Off‑Heap Cache (OHC) Practice: Reducing JVM GC Impact and Boosting C‑Side Interface Throughput
This article explains how using an off‑heap local cache (OHC) can dramatically lower GC pauses and cut interface latency by up to tenfold, covering the underlying principles, configuration, custom serializers, performance testing, monitoring metrics, and practical optimization recommendations for high‑traffic Java backend services.
The article starts by describing the performance problems of a high‑traffic C‑side store service where the in‑process local cache grows to several gigabytes, causing long Young GC pauses (average 100 ms) and noticeable latency spikes during traffic spikes.
To mitigate the GC impact, the authors introduce an off‑heap cache (OHC) that stores data outside the JVM heap, thus reducing GC pressure while avoiding the network overhead of Redis.
Background
Local cache hit rate is 99 % but its size (≈3 GB) leads to frequent GC. Expanding the heap does not solve the problem because the cache remains on‑heap.
OHC Overview
OHC uses DirectByteBuffer to allocate off‑heap memory and stores key‑value pairs as binary arrays. Two implementations exist: OHCacheLinkedImpl (per‑entry off‑heap allocation, suitable for medium/large entries) and OHCacheChunkedImpl (segment‑wise allocation, experimental).
Implementation Details
Custom serializers are required because OHC stores byte arrays. The article provides a String serializer and a Protostuff‑based serializer for a sample XxxxInfo class, along with Maven dependencies.
<!-- OHC dependency -->
<dependency>
<groupId>org.caffinitas.ohc</groupId>
<artifactId>ohc-core</artifactId>
<version>0.7.4</version>
</dependency>
<!-- Protostuff dependencies -->
<dependency>
<groupId>io.protostuff</groupId>
<artifactId>protostuff-core</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>io.protostuff</groupId>
<artifactId>protostuff-runtime</artifactId>
<version>1.6.0</version>
</dependency>Cache creation example:
OHCache<String, XxxxInfo> basicStoreInfoCache = OHCacheBuilder.<String, XxxxInfo>newBuilder()
.keySerializer(new OhcStringSerializer()) // key serializer
.valueSerializer(new OhcProtostuffXxxxInfoSerializer()) // value serializer
.segmentCount(512)
.hashTableSize(100000)
.capacity(1024 * 1024 * 1024) // 1 GB
.eviction(Eviction.LRU)
.timeouts(false)
.build();The article also shows the custom serializer implementations for keys and values, using ByteBuffer operations and Protostuff utilities, and a utility class that pools LinkedBuffer via FastThreadLocal to avoid repeated allocations.
public class OhcStringSerializer implements CacheSerializer
{
@Override
public int serializedSize(String value) { return writeUTFLen(value); }
@Override
public void serialize(String value, ByteBuffer buf) {
byte[] bytes = value.getBytes(Charsets.UTF_8);
buf.put((byte) ((bytes.length >>> 8) & 0xFF));
buf.put((byte) (bytes.length & 0xFF));
buf.put(bytes);
}
@Override
public String deserialize(ByteBuffer buf) {
int length = ((buf.get() & 0xFF) << 8) + (buf.get() & 0xFF);
byte[] bytes = new byte[length];
buf.get(bytes);
return new String(bytes, Charsets.UTF_8);
}
static int writeUTFLen(String str) { /* omitted for brevity */ }
}
public class OhcProtostuffXxxxInfoSerializer implements CacheSerializer
{
@Override
public void serialize(XxxxInfo t, ByteBuffer byteBuffer) {
byteBuffer.put(ProtostuffUtils.serialize(t));
}
@Override
public XxxxInfo deserialize(ByteBuffer byteBuffer) {
byte[] bytes = new byte[byteBuffer.remaining()];
byteBuffer.get(bytes);
return ProtostuffUtils.deserialize(bytes, XxxxInfo.class);
}
@Override
public int serializedSize(XxxxInfo t) { return ProtostuffUtils.serialize(t).length; }
}
public class ProtostuffUtils {
private static final FastThreadLocal
bufferPool = new FastThreadLocal
() {
@Override protected LinkedBuffer initialValue() { return LinkedBuffer.allocate(4 * 2 * LinkedBuffer.DEFAULT_BUFFER_SIZE); }
};
private static final Map
, Schema
> schemaCache = new ConcurrentHashMap<>();
@SuppressWarnings("unchecked")
public static
byte[] serialize(T obj) { /* omitted for brevity */ }
public static
T deserialize(byte[] data, Class
clazz) { /* omitted for brevity */ }
private static
Schema
getSchema(Class
clazz) { /* omitted for brevity */ }
}Performance Results
After switching to OHC, the maximum latency (MAX) decreased by a factor of ten, and GC pause time also dropped tenfold, as shown by the before/after charts.
Monitoring
OHC provides OHCacheStats exposing hitCount, missCount, evictionCount, size, capacity, free, etc. Regular collection of these metrics enables hit‑rate calculation and alerting.
Key Takeaways & Recommendations
When local cache hurts GC, consider off‑heap caching to isolate memory pressure.
Choose a serialization framework that balances speed and size (Protostuff, Kryo, Hession, etc.).
Keep the serializer used for size calculation and actual serialization identical to avoid mismatched allocations.
Split hot data across multiple layers (in‑heap → off‑heap → Redis) and shrink object payloads (e.g., shorten JSON field names).
Adjust OHC parameters (segmentCount, hashTableSize, eviction policy) through iterative load testing.
The article concludes that OHC can significantly improve memory utilization and throughput compared with traditional in‑heap caches such as Guava, especially for large‑scale Java backend services.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.