Why Redis Distributed Locks Still Let Flash Sales Oversell—and How to Fix It
This article analyzes a real‑world flash‑sale overselling incident caused by unsafe Redis distributed locks, explains the root causes such as lock expiration and non‑atomic stock checks, and presents safer lock implementations and atomic stock decrement strategies to prevent future overselling.
Preface
Using Redis for distributed locks is common, but this article shares a real incident from a project where a flash‑sale of a scarce product resulted in severe overselling, prompting a post‑mortem and solution.
Incident Scene
The flash‑sale of 100 bottles of a rare product unexpectedly sold out more than available, despite using a distributed lock. Core code (simplified) is shown below.
<code>public SeckillActivityRequestVO seckillHandle(SeckillActivityRequestVO request) {
SeckillActivityRequestVO response;
String key = "key:" + request.getSeckillId;
try {
Boolean lockFlag = redisTemplate.opsForValue().setIfAbsent(key, "val", 10, TimeUnit.SECONDS);
if (lockFlag) {
// user validation, activity validation
Object stock = redisTemplate.opsForHash().get(key+":info", "stock");
assert stock != null;
if (Integer.parseInt(stock.toString()) <= 0) {
// business exception
} else {
redisTemplate.opsForHash().increment(key+":info", "stock", -1);
// generate order, publish event, build response
}
}
} finally {
// release lock
stringRedisTemplate.delete("key");
}
return response;
}
</code>The code sets a 10‑second lock and checks stock, appearing safe at first glance.
Root Causes
Heavy user‑validation load caused the user‑service gateway to delay responses; some requests exceeded the 10‑second lock timeout, allowing the lock to expire while other threads were still processing. Subsequent unlock operations removed locks held by other threads, creating a race condition. Additionally, the stock check used a non‑atomic "get and compare" pattern, directly leading to overselling.
Analysis
No system‑level fault tolerance : When the user service is overloaded, the delayed gateway becomes the trigger for overselling.
Distributed lock is not truly safe : Even with
SET key value [EX seconds] NX, if a thread holds the lock longer than its TTL, another thread can acquire it, and the original thread may later delete the new lock.
Non‑atomic stock verification : Concurrent checks can read stale stock values, causing multiple deductions.
The fundamental issue is that stock verification relied heavily on the lock’s correctness; when the lock fails, the verification becomes ineffective.
Solutions
Implement a Safer Distributed Lock
A safer lock ensures that
SETand
DELare one‑to‑one by storing a unique value and deleting only when the stored value matches. This can be achieved with a Lua script for atomic get‑and‑compare:
<code>public void safedUnLock(String key, String val) {
String luaScript = "local in = ARGV[1] local curr=redis.call('get', KEYS[1]) if in==curr then redis.call('del', KEYS[1]) end return 'OK'";
RedisScript<String> redisScript = RedisScript.of(luaScript);
redisTemplate.execute(redisScript, Collections.singletonList(key), Collections.singleton(val));
}
</code>Implement Safe Stock Decrement
For a single‑item purchase, Redis’s atomic
HINCRBYcan safely decrement stock without extra Lua scripts:
<code>// Redis returns the result atomically
Long currStock = redisTemplate.opsForHash().increment("key", "stock", -1);
</code>The previous explicit stock check was unnecessary.
Refactored Code
A new
DistributedLockerclass handles locking, and the business logic uses the atomic decrement directly:
<code>public SeckillActivityRequestVO seckillHandle(SeckillActivityRequestVO request) {
SeckillActivityRequestVO response;
String key = "key:" + request.getSeckillId();
String val = UUID.randomUUID().toString();
try {
Boolean lockFlag = distributedLocker.lock(key, val, 10, TimeUnit.SECONDS);
if (!lockFlag) {
// business exception
}
// user validation omitted for brevity
Long currStock = stringRedisTemplate.opsForHash().increment(key+":info", "stock", -1);
if (currStock < 0) {
log.error("[Flash sale] No stock");
// business exception
} else {
// generate order, publish event, build response
}
} finally {
distributedLocker.safedUnLock(key, val);
}
return response;
}
</code>Deep Thinking
Is a Distributed Lock Necessary?
Even with atomic stock decrement, a lock can reduce pressure on downstream services by short‑circuiting requests early. However, it adds latency and complexity, so the trade‑off must be evaluated.
Lock Selection
RedLock offers higher reliability at the cost of performance; for this scenario, the simpler lock with Lua‑based safe unlock is sufficient.
Further Optimizations
After fixing the bug, performance improved slightly and overselling stopped. Additional gains are possible by sharding stock across servers and routing requests via consistent hashing, eliminating the need for Redis entirely.
Conclusion
Overselling scarce items can cause severe business impact. This case shows that even seemingly safe code can become a fatal flaw under high concurrency. Thorough design, atomic operations, and proper fault tolerance are essential for reliable systems.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.