Databases 20 min read

Troubleshooting and Repairing HBase Meta Table Issues

The article explains how HBase’s meta table stores region metadata, outlines common failures such as slow startup, RIT, region holes and overlaps, and provides step‑by‑step online and offline repair procedures—including command‑line tools and configuration tweaks—for both HBase 1.x and 2.x clusters.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Troubleshooting and Repairing HBase Meta Table Issues

HBase is an open‑source, highly reliable, scalable and high‑performance distributed NoSQL database widely used in big‑data processing, real‑time computation, data storage and retrieval. In a distributed cluster, hardware failures are common and can cause node or cluster‑level service interruptions, meta‑table corruption, long‑lasting RIT (Region In Transition), region holes, overlaps, etc. This article focuses on common failures of the HBase meta table and provides corresponding solutions.

Background

HBase meta (catalog) table stores information about all Regions and their RegionServers. Its correctness is critical for cluster stability; inconsistencies can lead to RIT or even prevent HMaster from initializing, causing the whole cluster to be unavailable.

Meta Table Structure

The meta table consists of three column families: info , table and rep_barrier , which record region information and table status. (Diagram omitted.)

Meta Table Loading Process

The loading flow includes pre‑loading all table descriptors and then bringing online the offline regions. Slow or failed cluster startups are often caused by delays in these steps.

Common Issues and Their Causes

Slow cluster startup : Large numbers of tables increase pre‑loading time; HMaster may spend 15‑30 minutes on meta loading.

Cluster startup failure : Meta region cannot be opened after a RegionServer crash changes the startcode.

RIT (Region In Transition) : Occurs during region assign/unassign, split, merge, or when a RegionServer crashes. Long‑lasting RIT may need manual intervention.

Region holes : Missing contiguous key ranges create gaps; hbck reports errors like “There is a hole in the region chain…”.

Region overlaps : Two regions share the same start‑key/end‑key or have intersecting key ranges, reported as “Multiple regions have the same startkey”.

Repair Methods

RIT handling :

For large tables, let HBase self‑heal; increase hbase.master.executor.openregion.threads if needed.

On HBase 1.x, increase hbase.assignment.maximum.attempts or manually assign regions.

If a region is assigned to a non‑existent server, switch HMaster to recover.

Region hole fixing (HBase 1.x):

java -jar -Drepair.tableName=migrate:test_hole2 -Dfix.operator=createRegion -DRegion.startkey=06 -DRegion.endkey=07 hbase-meta-tool-0.0.1.jar

Then move the region directory in HDFS and clean meta entries:

sudo -uhdfs hdfs dfs -mv /tmp/.tmp/data/migrate/test_hole2/c8662e08f6ae705237e390029161f58f /hbase/data/migrate/test_hole2

Delete the broken meta entry and rebuild it:

java -jar -Drepair.tableName=migrate:test_hole2 -Dfix.operator=delete hbase-meta-tool-0.0.1.jar
java -jar -Drepair.tableName=migrate:test_hole2 -Dfix.operator=fixFromHdfs hbase-meta-tool-0.0.1.jar

Region overlap fixing (one‑click tool):

java -jar -Dfix.operator=fixOverlapAndHole hbase-meta-tool-0.0.1.jar

For large‑scale overlaps, delete problematic meta data, backup original HDFS data, and reload using LoadIncrementalHFiles :

hdfs dfs -mv /hbase/data/migrate/test/ /back
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /back/test/region01-regionN migrate:test1

Meta Table Data Repair (Offline)

When the meta table itself is corrupted, rebuild it by following the InitMetaProcedure steps:

protected Flow executeFromState(MasterProcedureEnv env, InitMetaState state) throws ProcedureSuspendedException, ProcedureYieldException, InterruptedException {
    try {
        switch (state) {
            case INIT_META_WRITE_FS_LAYOUT:
                Configuration conf = env.getMasterConfiguration();
                Path rootDir = CommonFSUtils.getRootDir(conf);
                TableDescriptor td = writeFsLayout(rootDir, conf);
                env.getMasterServices().getTableDescriptors().update(td, true);
                setNextState(InitMetaState.INIT_META_ASSIGN_META);
                return Flow.HAS_MORE_STATE;
            case INIT_META_ASSIGN_META:
                addChildProcedure(env.getAssignmentManager().createAssignProcedures(Arrays.asList(RegionInfoBuilder.FIRST_META_RegionINFO)));
                return Flow.NO_MORE_STATE;
            default:
                throw new UnsupportedOperationException("unhandled state=" + state);
        }
    } catch (IOException e) {
        // handle
    }
}

private static TableDescriptor writeFsLayout(Path rootDir, Configuration conf) throws IOException {
    LOG.info("BOOTSTRAP: creating hbase:meta region");
    FileSystem fs = rootDir.getFileSystem(conf);
    Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME);
    if (fs.exists(tableDir) && !fs.delete(tableDir, true)) {
        LOG.warn("Can not delete partial created meta table, continue...");
    }
    TableDescriptor metaDescriptor = FSTableDescriptors.tryUpdateAndGetMetaTableDescriptor(conf, fs, rootDir);
    HRegion.createHRegion(RegionInfoBuilder.FIRST_META_RegionINFO, rootDir, conf, metaDescriptor, null).close();
    return metaDescriptor;
}

Offline repair steps (HBase 1.x):

Stop the HBase cluster.

Run hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix .

Restart the cluster.

Offline repair steps (HBase 2.4.8) using hbase-operator-tools :

Stop the cluster.

Execute hbase org.apache.hbase.hbck1.OfflineMetaRepair -fix .

Restart the cluster.

Single‑table repair involves deleting the problematic meta entries and re‑adding them from HDFS:

java -jar -Drepair.tableName=migrate:test1 -Dfix.operator=delete hbase-meta-tool-0.0.1.jar
java -jar -Drepair.tableName=migrate:test1 -Dfix.operator=fixFromHdfs hbase-meta-tool-0.0.1.jar

Precautions

Offline repair requires stopping the cluster (≈10‑15 minutes).

If region holes or overlaps exist, they must be resolved before running the offline tool.

Conclusion

The correctness of the HBase meta table is essential for cluster stability. This article categorizes meta‑table problems into online and offline repair scenarios, describes the underlying mechanisms (RIT, holes, overlaps, data corruption), and provides concrete command‑line solutions for both HBase 1.x and 2.x versions.

HBaseHBCKMeta TableRegion HoleRegion OverlapRepairRIT
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.