Databases 4 min read

HBase Compaction Types and Parameter Tuning Guide

This article explains how HBase uses WAL and MemStore to create HFiles, describes the two compaction types (Minor and Major), and provides detailed recommendations for tuning key compaction-related configuration parameters to improve query performance and reduce HDFS impact.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
HBase Compaction Types and Parameter Tuning Guide

In HBase, data is first written to the WAL and MemStore, then periodically flushed to disk as HFiles; as the number of HFiles grows, query performance degrades and HDFS load increases, so HBase performs regular compaction to merge and reduce HFile count.

1. Two Compaction Types

Minor Compaction selects a few small, adjacent HFiles and merges them into a larger HFile while removing expired data.

Major Compaction merges all HFiles of a column family into a single large HFile and removes expired, deleted, and over‑versioned data.

2. Parameter Tuning

hbase.hstore.compaction.min (default 3): triggers Minor Compaction when the number of HFiles in a column family exceeds this value; recommended to increase to 5‑10.

hbase.hstore.compaction.max (default 10): maximum number of HFiles merged in one Minor Compaction; should be 2‑3 times larger than the min value.

hbase.regionserver.thread.compaction.throttle : determines whether a compaction is handled by the large‑compaction or small‑compaction thread pool; default is 2 × hbase.hstore.compaction.max × hbase.hregion.memstore.flush.size (≈2.5 GB). Usually left unchanged or slightly increased.

hbase.regionserver.thread.compaction.small (default 1): size of the small‑compaction thread pool; typically set to 2‑5.

hbase.regionserver.thread.compaction.large (default 1): size of the large‑compaction thread pool; adjust similarly to the small pool.

hbase.hstore.blockingStoreFiles (default 10): when the number of HFiles reaches this value, writes are blocked until compaction finishes; in production, increase to around 100 to avoid write stalls.

hbase.hregion.majorcompaction (default 604800000 ms, i.e., 7 days): interval for periodic Major Compaction; because Major Compaction is resource‑intensive, it is often disabled (set to 0) and run manually during low‑traffic periods.

performanceCompactionHBasedatabasesParameterstuning
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.