Troubleshooting and Repairing HBase Meta Table Issues
The article explains how HBase’s meta table stores region metadata, outlines common failures such as slow startup, RIT, region holes and overlaps, and provides step‑by‑step online and offline repair procedures—including command‑line tools and configuration tweaks—for both HBase 1.x and 2.x clusters.
HBase is an open‑source, highly reliable, scalable and high‑performance distributed NoSQL database widely used in big‑data processing, real‑time computation, data storage and retrieval. In a distributed cluster, hardware failures are common and can cause node or cluster‑level service interruptions, meta‑table corruption, long‑lasting RIT (Region In Transition), region holes, overlaps, etc. This article focuses on common failures of the HBase meta table and provides corresponding solutions.
Background
HBase meta (catalog) table stores information about all Regions and their RegionServers. Its correctness is critical for cluster stability; inconsistencies can lead to RIT or even prevent HMaster from initializing, causing the whole cluster to be unavailable.
Meta Table Structure
The meta table consists of three column families: info , table and rep_barrier , which record region information and table status. (Diagram omitted.)
Meta Table Loading Process
The loading flow includes pre‑loading all table descriptors and then bringing online the offline regions. Slow or failed cluster startups are often caused by delays in these steps.
Common Issues and Their Causes
Slow cluster startup : Large numbers of tables increase pre‑loading time; HMaster may spend 15‑30 minutes on meta loading.
Cluster startup failure : Meta region cannot be opened after a RegionServer crash changes the startcode.
RIT (Region In Transition) : Occurs during region assign/unassign, split, merge, or when a RegionServer crashes. Long‑lasting RIT may need manual intervention.
Region holes : Missing contiguous key ranges create gaps; hbck reports errors like “There is a hole in the region chain…”.
Region overlaps : Two regions share the same start‑key/end‑key or have intersecting key ranges, reported as “Multiple regions have the same startkey”.
Repair Methods
RIT handling :
For large tables, let HBase self‑heal; increase hbase.master.executor.openregion.threads if needed.
On HBase 1.x, increase hbase.assignment.maximum.attempts or manually assign regions.
If a region is assigned to a non‑existent server, switch HMaster to recover.
Region hole fixing (HBase 1.x):
java -jar -Drepair.tableName=migrate:test_hole2 -Dfix.operator=createRegion -DRegion.startkey=06 -DRegion.endkey=07 hbase-meta-tool-0.0.1.jarThen move the region directory in HDFS and clean meta entries:
sudo -uhdfs hdfs dfs -mv /tmp/.tmp/data/migrate/test_hole2/c8662e08f6ae705237e390029161f58f /hbase/data/migrate/test_hole2Delete the broken meta entry and rebuild it:
java -jar -Drepair.tableName=migrate:test_hole2 -Dfix.operator=delete hbase-meta-tool-0.0.1.jar java -jar -Drepair.tableName=migrate:test_hole2 -Dfix.operator=fixFromHdfs hbase-meta-tool-0.0.1.jarRegion overlap fixing (one‑click tool):
java -jar -Dfix.operator=fixOverlapAndHole hbase-meta-tool-0.0.1.jarFor large‑scale overlaps, delete problematic meta data, backup original HDFS data, and reload using LoadIncrementalHFiles :
hdfs dfs -mv /hbase/data/migrate/test/ /back hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /back/test/region01-regionN migrate:test1Meta Table Data Repair (Offline)
When the meta table itself is corrupted, rebuild it by following the InitMetaProcedure steps:
protected Flow executeFromState(MasterProcedureEnv env, InitMetaState state) throws ProcedureSuspendedException, ProcedureYieldException, InterruptedException {
try {
switch (state) {
case INIT_META_WRITE_FS_LAYOUT:
Configuration conf = env.getMasterConfiguration();
Path rootDir = CommonFSUtils.getRootDir(conf);
TableDescriptor td = writeFsLayout(rootDir, conf);
env.getMasterServices().getTableDescriptors().update(td, true);
setNextState(InitMetaState.INIT_META_ASSIGN_META);
return Flow.HAS_MORE_STATE;
case INIT_META_ASSIGN_META:
addChildProcedure(env.getAssignmentManager().createAssignProcedures(Arrays.asList(RegionInfoBuilder.FIRST_META_RegionINFO)));
return Flow.NO_MORE_STATE;
default:
throw new UnsupportedOperationException("unhandled state=" + state);
}
} catch (IOException e) {
// handle
}
}
private static TableDescriptor writeFsLayout(Path rootDir, Configuration conf) throws IOException {
LOG.info("BOOTSTRAP: creating hbase:meta region");
FileSystem fs = rootDir.getFileSystem(conf);
Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME);
if (fs.exists(tableDir) && !fs.delete(tableDir, true)) {
LOG.warn("Can not delete partial created meta table, continue...");
}
TableDescriptor metaDescriptor = FSTableDescriptors.tryUpdateAndGetMetaTableDescriptor(conf, fs, rootDir);
HRegion.createHRegion(RegionInfoBuilder.FIRST_META_RegionINFO, rootDir, conf, metaDescriptor, null).close();
return metaDescriptor;
}Offline repair steps (HBase 1.x):
Stop the HBase cluster.
Run hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -fix .
Restart the cluster.
Offline repair steps (HBase 2.4.8) using hbase-operator-tools :
Stop the cluster.
Execute hbase org.apache.hbase.hbck1.OfflineMetaRepair -fix .
Restart the cluster.
Single‑table repair involves deleting the problematic meta entries and re‑adding them from HDFS:
java -jar -Drepair.tableName=migrate:test1 -Dfix.operator=delete hbase-meta-tool-0.0.1.jar java -jar -Drepair.tableName=migrate:test1 -Dfix.operator=fixFromHdfs hbase-meta-tool-0.0.1.jarPrecautions
Offline repair requires stopping the cluster (≈10‑15 minutes).
If region holes or overlaps exist, they must be resolved before running the offline tool.
Conclusion
The correctness of the HBase meta table is essential for cluster stability. This article categorizes meta‑table problems into online and offline repair scenarios, describes the underlying mechanisms (RIT, holes, overlaps, data corruption), and provides concrete command‑line solutions for both HBase 1.x and 2.x versions.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.