Databases 10 min read

Didi HBase Team’s Upgrade from 0.98 to 1.4.8: Challenges, Solutions, and Lessons Learned

Didi's HBase team upgraded eleven clusters from version 0.98 to 1.4.8, tackling maintenance burdens and custom‑patch divergence, validating RPC and HFile compatibility, performing extensive functional and performance tests, opting for a rolling upgrade, fixing a region‑split data‑loss bug, merging critical upstream patches, and establishing a reusable migration methodology.

Didi Tech
Didi Tech
Didi Tech
Didi HBase Team’s Upgrade from 0.98 to 1.4.8: Challenges, Solutions, and Lessons Learned

Background: Didi’s HBase service runs 11 clusters (domestic and overseas) with a total throughput of over 1 k ops/s, serving most business lines (maps, finance, ride‑hailing, etc.). The production clusters were still on version 0.98 while the community’s latest release was 2.3, creating a large gap.

Key challenges of upgrading:

High cost of introducing new features: 0.98 is the first stable release and is no longer maintained; back‑porting new features is increasingly difficult.

Maintenance cost of custom patches: dozens of in‑house patches (label grouping, ACL, monitoring, audit logs, etc.) either diverge from upstream or cannot be merged due to version gaps.

Upstream component requirements: downstream engines (Kylin, GeoMesa, OpenTSDB, JanusGraph) all depend on newer HBase features; none support 0.98.

Therefore, an upgrade was deemed urgent.

Technical challenges identified:

RPC interface compatibility: the upgrade must ensure old and new RPC calls work seamlessly.

HFile format compatibility: 1.4.8 uses HFile v3, while 0.98 uses v2; fortunately v2 can read v3, avoiding a costly back‑port.

Custom patch compatibility: each custom patch needed verification for replacement or migration.

Upstream engine compatibility: all dependent engines must be validated against the new version.

Potential new issues: active community releases may introduce regressions; continuous monitoring is required.

Preparation work performed:

Release note review

Migration and testing of custom patches

Basic functionality and performance testing

Advanced feature testing (Bulkload, Snapshot, Replication, Coprocessor, etc.)

Tracking downstream community patches (over 100 merged)

Cross‑version compatibility testing, especially RPC compatibility

Full test suite covering HBase, Phoenix, GeoMesa, OpenTSDB, JanusGraph

Packaging and configuration preparation

Upgrade options evaluated:

Pros

Cons

New cluster + data migration

Fast rollback, low risk

User‑side switch required, longer upgrade window ( >6 months ), higher operational cost

Rolling upgrade

User‑transparent, short upgrade window, low cost

Higher risk of immediate rollback if failures occur

The team chose the rolling upgrade due to confidence from extensive preparation.

Rolling‑upgrade steps:

Resolve compatibility issues (create new rsgroup metadata, rewrite data, mount new coprocessors, etc.).

Upgrade Master nodes.

Upgrade the meta region.

Upgrade business regions one by one.

Critical incident encountered:

During a region split, an RS crash caused the master’s rollback procedure to delete both parent and child regions, leading to data loss. The issue was fixed in the community ticket HBASE‑23693 .

Other notable patches merged during the upgrade:

HBASE‑22620 – replication znode backlog fix

HBASE‑21964 – throttle‑type quota removal

HBASE‑24401 – fix append failure when hbase.server.keyvalue.maxsize=0

HBASE‑24184 – snapshot ACL fix for simple ACL

HBASE‑24485 – off‑heap memory init optimization

HBASE‑24501 – remove unnecessary lock in ByteBufferArray

HBASE‑24453 – add validation when moving table groups

Summary: The upgrade, spanning nearly a year from planning to completion, successfully aligned Didi’s HBase clusters with the community release, bringing improved stability, usability, and new features. The experience yielded a systematic upgrade methodology that can be reused for future version migrations, delivering greater value to the business.

distributed systemsHBaseCompatibilityDidiDatabase upgradepatch managementRolling Upgrade
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.