Backend Development 14 min read

Design and Implementation of Distributed Lock Services: Redis, ZooKeeper, and SharkLock

This article explains the principles, requirements, and implementation details of distributed lock services, comparing Redis and ZooKeeper approaches, and introduces SharkLock's design built on SharkStore with Raft-based replication, covering lock acquisition, release, reliability, scaling, and failover mechanisms.

JD Tech
JD Tech
JD Tech
Design and Implementation of Distributed Lock Services: Redis, ZooKeeper, and SharkLock

Distributed Lock Overview

Distributed lock is a mechanism used to control exclusive access to shared resources in a distributed system, ensuring that only one thread holds the lock at a time, that the lock is re‑entrant, avoids deadlocks, provides blocking semantics, and maintains high performance and availability.

Requirements of a Distributed Lock Service

Only one thread can hold the lock at any moment.

The lock must be re‑entrant.

Deadlocks must not occur.

The lock should support blocking and be promptly wake‑up‑able.

The service must guarantee high performance and high availability.

Redis‑Based Lock Service

Lock acquisition process

SET resource_name my_random_value NX PX max-lock-time

Explanation: the SET command succeeds only when the resource does not exist, guaranteeing uniqueness of the lock holder; an expiration time prevents deadlocks; the holder value is recorded to verify ownership during unlock.

Unlock process

if redis.get("resource_name") == "my_random_value"
    return redis.del("resource_name")
else
    return 0

The unlock logic is wrapped in a Lua script to make the check‑and‑delete operation atomic.

Problems with this scheme:

Choosing an appropriate expiration time is difficult; a short TTL may cause premature expiration, while a long TTL can lead to long‑lasting invalid waits if the holder crashes.

Redis’s asynchronous master‑slave replication may lose lock data: if the master crashes before the lock is replicated, a slave can still grant the lock, causing concurrent access.

ZooKeeper‑Based Lock Service

Lock acquisition process

Create an ephemeral sequential node under /resource_name .

List all child nodes of /resource_name and check whether the created node has the smallest sequence number. If it is the smallest, the lock is acquired; otherwise, watch the predecessor node.

ZooKeeper’s ZAB consensus protocol guarantees the safety of lock data, and the heartbeat mechanism prevents deadlocks by automatically releasing locks when a client disappears.

Unlock process

Delete the temporary node created by the client.

Issues with this approach include scenarios where a client’s heartbeat stops (e.g., network failure) and the server releases the lock, allowing another client to acquire it while the original client later recovers, leading to simultaneous access.

SharkLock Design Choices

Lock metadata includes:

lockBy : unique client identifier.

condition : client‑provided policy for server‑side handling of abnormal situations.

lockTime : timestamp of lock acquisition.

txID : globally incrementing transaction ID.

lease : lease information for automatic expiration.

Ensuring Lock Reliability

SharkLock stores lock data in SharkStore, a distributed persistent key‑value system that uses multiple replicas for safety and the Raft algorithm for strong consistency across replicas.

SharkStore Core Modules

Master Server: manages metadata, sharding, scaling, and failover scheduling.

Data Server: provides RPC access to stored KV data.

Gateway Server: handles client entry.

Sharding, Scaling, and Failover

SharkStore replicates data across multiple nodes; when a shard reaches a size threshold it splits into two. Failover operates at the range level: leaders send heartbeats, missing heartbeats trigger master‑initiated failover, and new nodes are added via Raft replication.

Balance Strategy

Masters collect range‑level and node‑level heartbeat information to rebalance replicas, adding them to low‑traffic nodes and removing from high‑traffic nodes to keep the cluster balanced.

Raft Practices – MultiRaft

Heartbeat Merging : consolidate heartbeats per dataserver, sending only range IDs to reduce overhead.

Snapshot Management : use ACKs, limit concurrent snapshots, and apply rate‑limiting to minimise impact on normal traffic. Also adopt the PreVote algorithm to avoid unnecessary leader changes during network partitions.

Raft Practices – NonVoter (Learner)

New members join as learners (non‑voting) to avoid increasing quorum size prematurely. Once a learner’s log lag falls below a configured threshold, the leader promotes it to a full voting member.

Open‑Source Availability

SharkStore is open‑source; interested readers can explore the repository at https://github.com/tiglabs/sharkstore .

backend developmentRedisZookeeperDistributed LockRaftSharkLock
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.