Databases 8 min read

Design and Evolution of WTable’s Scaling Process Using RocksDB

This article explains how the WTable distributed key‑value store leverages RocksDB’s LSM‑tree architecture and slot‑based data distribution to redesign its scaling workflow, separating full and incremental data migration to reduce compaction overhead and achieve high‑speed, low‑impact cluster expansion.

58 Tech
58 Tech
58 Tech
Design and Evolution of WTable’s Scaling Process Using RocksDB

RocksDB, an open‑source high‑performance key‑value storage engine from Facebook, is widely used in industry and forms the storage layer of the internally developed distributed KV system WTable.

WTable achieves scalability by partitioning data into multiple slots, each being the smallest unit of migration; a slot is determined by hashing the key ( SlotId = Hash(Key)/N ) and prefixing the slot ID to the physical key, which ensures that keys belonging to the same slot are stored contiguously and that slots remain ordered within RocksDB’s SST files.

The initial scaling approach added new nodes, created a migration plan, and migrated each slot in three phases—full data transfer using a RocksDB iterator snapshot, incremental transfer of binlog‑recorded writes, and finally switching routing information in etcd so the new node serves the slot.

This method caused problems in production: when migration speed was high, the mixture of ordered full‑data writes and unordered incremental writes triggered heavy compaction across levels, consuming I/O and leading to write stalls, which degraded service performance.

To address these issues, the process was re‑engineered: all full data from every slot is transferred together without interleaved incremental writes, reducing compaction to simple file moves; incremental data is sent afterward at a slower pace; and routing changes are applied only after all slots have completed migration, resulting in three clear stages—full data migration, incremental data migration, and unified routing switch.

After these improvements, WTable can expand at 50–100 MB/s without impacting online services. Alternative approaches such as transferring SST files via IngestExternalFile were evaluated but rejected because they would break the existing binlog‑based replication mechanism and require extensive additional work.

Data MigrationCompactiondistributed storagescalingRocksDBWTable
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.