Investigation of Snowflake ID Generation Failures Caused by Clock Rollback and Mitigation Strategies
This article analyzes a production incident where duplicate IDs were generated by the Snowflake algorithm due to server clock rollback, explains the algorithm's structure and pitfalls, and presents several open‑source solutions such as Meituan Leaf, Baidu UidGenerator, and Redis for reliable distributed ID generation.
In a recent production incident, creating work orders and orders failed with a duplicate key error, indicating that IDs generated by the Snowflake algorithm were colliding despite its claim of uniqueness.
The Snowflake algorithm, originally open‑sourced by Twitter, produces 64‑bit IDs composed of a sign bit, a 41‑bit timestamp, 10 bits for data‑center and machine identifiers, and a 12‑bit sequence number, offering advantages like monotonic increase and high performance without database dependence.
However, the algorithm heavily relies on the server's clock; if the clock moves backward (e.g., due to NTP synchronization or leap seconds), the timestamp can become smaller than the previously recorded lastTimestamp , causing the generator to return -1 and ultimately produce duplicate keys.
Investigation revealed that the ID generation code contained a guard:
if (timestamp < this.lastTimestamp) {
return -1;
}Because the server's clock was out of sync by several minutes, every call hit this branch, resulting in the observed -1 and ID duplication.
Root causes identified include clock rollback or jumps caused by NTP adjustments, network issues after a server restart, and the occasional leap‑second correction.
To mitigate these risks, three alternative solutions are recommended:
Use Meituan's open‑source Leaf project, which relies on ZooKeeper and waits for the clock to catch up when a rollback is within tolerable limits.
Adopt Baidu's UidGenerator , which pre‑generates IDs using a RingBuffer, eliminating real‑time clock dependence.
Employ Redis to generate auto‑incrementing distributed IDs, noting the trade‑off of predictability and potential security concerns.
In summary, the incident highlights the fragility of clock‑dependent ID generators; while NTP synchronization reduces the likelihood of rollbacks, it is not foolproof, and adopting a more resilient ID generation strategy is advisable.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.