Backend Development 18 min read

Design and Migration Strategies for the WLock Distributed Lock Service

The article presents the architecture of WLock, a Paxos‑based distributed lock service, analyzes key isolation schemes, evaluates cluster expansion and splitting, and details a multi‑step key migration process—including forward and reverse migration, node scaling, and consistency safeguards—to achieve high‑availability and isolated lock handling in multi‑tenant environments.

58 Tech
58 Tech
58 Tech
Design and Migration Strategies for the WLock Distributed Lock Service

WLock is a distributed lock service built on the open‑source WPaxos consistency algorithm, offering rich lock types, flexible operations, high reliability, high throughput, multi‑tenant support, and ease of use, suitable for coordinating access to shared resources and master election in distributed systems.

To isolate tenant keys without increasing per‑key resource contention, two isolation schemes were considered: assigning each key a dedicated Paxos group (which still shares node resources) and hashing keys across all Paxos groups with distributed rate limiting (the chosen approach). However, when a key’s traffic grows beyond its allocated quota, further scaling is needed.

Simply adding nodes to the cluster (expanding from three to five nodes) increases the number of nodes required for a Paxos Propose quorum, which can cause higher latency and does not improve throughput. Therefore, node‑level scaling alone cannot solve high‑concurrency key issues.

Instead, the service splits the cluster: the original three‑node cluster is expanded to six nodes while keeping each Paxos group at three nodes, then the three newly added nodes are separated into a new cluster. This reduces the number of groups per node and improves per‑group processing capacity without increasing the quorum size.

Key migration is performed by creating redundant Paxos groups that mirror original groups. When a key’s load spikes, it is migrated to a redundant group, and the original group’s master records the maximum InstanceId to ensure ordering. The migration follows six steps: initialization, migration preparation, migration start, entering a safe state, client configuration change, and migration completion.

Critical points include keeping the master node consistent between original and redundant groups, ensuring lock operation continuity by checking InstanceId ordering, and handling lock version increments by splitting the 64‑bit InstanceId into a 16‑bit group‑change counter and a 48‑bit sequence.

Both forward and reverse migrations are supported, with rollback mechanisms for preparation, start, and safe‑state phases, and idempotent retry logic to maintain consistency even under node failures.

After migration, node changes and cluster splitting are performed: new nodes are added, old nodes are removed one‑by‑one to avoid quorum reduction, and the redundant group’s master election is managed to minimize master drift.

The overall process—key migration, client reconnection, node scaling, cluster splitting, and reverse migration—achieves isolated, high‑throughput lock handling while preserving data consistency and providing disaster recovery capabilities.

multi-tenantDistributed LockconsistencyCluster ScalingPaxosKey MigrationWLock
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.