Redis Distributed Locks: Safety Issues, Redlock Debate, and Best Practices
This article thoroughly examines how Redis distributed locks work, the safety challenges they face—including deadlocks, lock expiration, and node failures—explores the Redlock algorithm and its controversies, compares Redis with Zookeeper implementations, and offers practical guidelines and best‑practice solutions for reliable distributed locking.
Distributed locks are needed when multiple processes must coordinate access to a shared resource, such as updating the same database row in a micro‑service architecture. Redis can provide this capability using the SETNX command to create a lock key only if it does not already exist.
Basic lock acquisition and release look like this:
127.0.0.1:6379> SETNX lock 1 // client 1 acquires lock (returns 1)
127.0.0.1:6379> SETNX lock 1 // client 2 fails to acquire lock (returns 0)
127.0.0.1:6379> DEL lock // client 1 releases lockBecause the lock may be held indefinitely if a client crashes, a common mitigation is to set an expiration time. Before Redis 2.6.12 this required two separate commands ( SETNX and EXPIRE ), which are not atomic and can lead to deadlocks if the second command fails.
Redis 2.6.12 introduced the extended SET syntax that combines creation and expiration atomically:
127.0.0.1:6379> SET lock 1 EX 10 NX // acquire lock with 10 s TTLEven with a TTL, a client may release a lock that it no longer owns after the TTL expires and another client acquires it. To avoid this, the lock value should contain a unique identifier (e.g., a UUID) and the release operation must verify ownership before deleting the key.
A safe release can be performed with a Lua script that runs atomically on the Redis server:
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
endFor Java developers, the Redisson library implements this pattern and adds an automatic renewal (watchdog) thread that extends the TTL while the client is still active, eliminating the need for manual expiration management.
When a Redis cluster uses master‑replica replication, a failover can cause the lock to disappear because the original master’s data may not have been replicated before it crashes. To address this, the Redis author proposed the Redlock algorithm, which requires at least five independent Redis master instances. The client attempts to acquire the lock on all nodes, succeeds if a majority (≥3) grant the lock, and checks that the total acquisition time is less than the lock’s TTL.
Redlock’s safety has been hotly debated. Martin (Cambridge) argues that Redlock cannot guarantee correctness because it relies on synchronized clocks and cannot handle network delays, process pauses, or clock drift (the “NPC” problems). He proposes a fencing‑token approach where a monotonically increasing token is stored with the protected data, ensuring that stale operations are rejected.
Antirez (Redis creator) counters that Redlock only needs loosely synchronized clocks, and that the algorithm already detects excessive acquisition latency in step 3. He also notes that any lock service (including Zookeeper) suffers from the same post‑acquisition failure scenarios, so the criticism is not unique to Redlock.
Zookeeper implements locks via temporary znodes. A client creates an EPHEMERAL node; if it succeeds, it holds the lock. The node is automatically removed if the client’s session expires (e.g., due to missed heartbeats), which also means that long GC pauses can cause the lock to be lost and another client to acquire it, leading to the same safety gap as Redis.
In practice, the article recommends using Redis locks for performance‑critical sections while accepting that extreme edge cases may still cause lock loss. For strong correctness guarantees, combine a distributed lock with application‑level fencing (e.g., UUID verification or token‑based checks) or use a consensus system such as Zookeeper when strict ordering is required.
Overall, the article provides a comprehensive view of Redis‑based distributed locking, the Redlock controversy, and practical guidance for building reliable concurrency control in backend systems.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.