Databases 14 min read

Improving HBase Cluster Performance: Cache Optimization, GC Tuning, and Multiget Concurrency

This article details a series of practical enhancements applied to an HBase 1.2.4‑based cluster—including layered BucketCache, data pre‑heating, GC‑friendly object pooling, and a multiget concurrency model—that together raise throughput several‑fold and consistently keep P99 latency below 50 ms in YCSB benchmarks.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Improving HBase Cluster Performance: Cache Optimization, GC Tuning, and Multiget Concurrency

The original HBase 1.2.4 deployment suffered from severe GC spikes and could only guarantee a 100 ms SLA for P99 latency, limiting its use to near‑line or offline workloads. Recent community patches and in‑house fixes have dramatically improved throughput, especially for multi‑hop graph queries, achieving 3–7× gains.

Efficient Cache Utilization – HBase provides LruBlockCache, BucketCache, and MemcachedBlockCache. By combining LruBlockCache (for index and Bloom blocks in‑heap) with a two‑level BucketCache (L1 off‑heap, L2 on SSD/PMEM), the cluster gains large‑capacity caching while avoiding excessive heap pressure. The CompositeBucketCache implementation replaces the original CombinedBlockCache, and the code has been back‑ported to the community (see HBASE‑23296).

Data Pre‑heating – To eliminate costly disk seeks, all index and Bloom blocks are cached proactively. CacheOnWrite and prefetchOnOpen are enabled, and a threshold‑based policy decides whether to cache only hot blocks or the entire HFile. Additional logic ensures new HFiles are warmed before they become visible and avoids duplicate pre‑heating during scans (patches HBASE‑22888, HBASE‑23355).

Read‑Write Path GC Optimization – Frequent temporary object allocation caused frequent YGC pauses. The solution introduced off‑heap pooling for CellBlock serialization (HBASE‑11425), off‑heap memstore chunk pools (HBASE‑15179), in‑memory WAL writes (HBASE‑14790), and an in‑memory‑flush mechanism (HBASE‑14918). BucketCache reads from SSD were also pooled (HBASE‑21879), reducing memory allocation from ~80 % to ~5 % and cutting GC latency roughly in half. Additional pooling was applied to client‑side CellBlock serialization (HBASE‑22905), BucketCache file reads (HBASE‑22802), CacheOnWrite buffers (HBASE‑23107), and DFSPacket handling (enabling ByteArrayManager).

Multiget Concurrency Enhancement – The default single‑threaded handler for multiget requests limited server utilization. A new thread‑pool model dispatches each get within a multiget to separate workers, increasing parallelism and delivering up to 40 % latency reduction (patch HBASE‑23063).

YCSB Benchmark Results – The cluster was tested with 2 RegionServers (32 GB heap, 64 GB off‑heap, 2 TB SSD L2 cache) loading 2 billion rows (20 GB total). Random get tests (40 threads, 100 M ops) and multiget tests (30 threads, 10 M multigets returning 50 rows each) showed P999 latency consistently under 50 ms, server‑side GC time averaging 6.5 ms, and stable throughput (≈60 k gets/s) without GC spikes.

In summary, layered off‑heap caching, aggressive data pre‑warming, extensive object‑pooling, and multiget parallelism together transform HBase from a latency‑constrained store into a high‑throughput, low‑latency platform suitable for online services.

performanceBig DataCacheHBasebenchmarkgc optimization
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.