Why Did My Snowflake IDs Collide? Lessons and Fixes for Distributed Systems
An unexpected primary-key duplicate error in a low-traffic Spring Cloud app revealed that multiple servers shared the same Snowflake workId, causing ID collisions; the article explains Snowflake's structure, its pros and cons, and offers three practical methods—including IP-based calculation, environment variables, and middleware—to ensure globally unique workIds.
Background: A bizarre primary-key duplicate incident
One day the online logs occasionally reported “primary key duplicate”. The scenario seemed simple: an app continuously uploads data, with fewer than 10,000 users and very low concurrency, involving only single-table inserts.
However, this simple insert led the team into a deep investigation.
Problem identification
The project uses Spring Cloud + MybatisPlus, with Snowflake IDs as the default primary key. In production, a distributed cluster (machines A/B/C) was deployed without configuring a unique workId.
Conclusion: Multiple machines shared the same workId, causing Snowflake ID collisions.
Knowledge Card: What is a Snowflake ID?
Core principle
Snowflake is Twitter’s open-source distributed ID generation algorithm that creates a 64-bit long ID with the following structure:
0 | timestamp (41 bits) | datacenter ID (5 bits) | machine ID (5 bits) | sequence (12 bits)
Advantages
High performance: each node can generate over 260,000 IDs per second.
Monotonic increase: IDs are roughly ordered, suitable for database indexes.
Decentralized: no external service such as Redis or Zookeeper is required.
Critical drawbacks
Clock rollback: server time reversal can cause duplicate IDs.
Machine-ID conflict: identical workId in a distributed environment inevitably leads to ID duplication.
Non-continuous IDs: high concurrency may produce “gaps”.
Avoiding pitfalls: How to guarantee a globally unique workId?
Solution 1: IP-based dynamic calculation (recommended)
Derive workId from the last segment of the server’s IP address:
// Example: generate workId from IP
String hostAddress = InetAddress.getLocalHost().getHostAddress();
int ipLastSegment = Integer.parseInt(hostAddress.split("\\.")[3]);
return ipLastSegment % 32; // ensure workId is within 0-31Pros: No manual intervention; IP is naturally unique.
Note: Ensure the IP’s last segment does not repeat; suitable for static IP environments.
Solution 2: Environment-variable injection (Docker-friendly)
Pass workId through startup commands:
# Docker run example
docker run -e WORKER_ID=2 -e DATACENTER_ID=1 your-service-imageApplicable scenario: Containerized deployments with flexible control.
Solution 3: Middleware hosting (high-availability)
Maintain a workId mapping table in Redis or a configuration center such as Nacos:
Key: service-name@ip → Value: workIdPros: Suitable for dynamic scaling and avoids issues caused by IP changes.
Practical code: MybatisPlus dynamic workId configuration
@Configuration
public class MybatisPlusConfig {
@Bean
public IdentifierGenerator identifierGenerator() {
return new DefaultIdentifierGenerator(getWorkerId(), getDatacenterId());
}
// Core logic: prioritize environment variable, then IP calculation
private long getWorkerId() {
try {
String workerIdStr = System.getenv("WORKER_ID");
if (workerIdStr != null) return Long.parseLong(workerIdStr);
String hostAddress = InetAddress.getLocalHost().getHostAddress();
int ipLastSegment = Integer.parseInt(hostAddress.split("\\.")[3]);
return ipLastSegment % 32;
} catch (Exception e) {
log.error("Get workId failed, fallback to default 1");
return 1L; // fallback
}
}
// DataCenter ID logic omitted
}Key points
Priority: environment variable > IP calculation > default fallback.
Log alerts are needed when IP retrieval fails and manual intervention is required.
Summary: Core principles of distributed ID design
Global uniqueness: workId must never repeat within the cluster.
Fault tolerance: strategies for clock rollback, IP changes, etc.
Observability: add logging at critical points such as workId generation.
Further thinking
If the service scale exceeds 32 nodes (the 5-bit workId limit), how can it be extended?
How to combine Leaf, UUID, and other schemes for multi-level disaster recovery?
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.