Backend Development 17 min read

Mastering Distributed ID Generation: From UUID to Snowflake and Beyond

This article explores common distributed ID generation strategies—including UUID, database auto‑increment, segment allocation, Redis, Zookeeper, Snowflake, and open‑source solutions like Leaf, Tinyid, and Baidu UID‑Generator—detailing their principles, advantages, drawbacks, and implementation examples for high‑performance backend systems.

Su San Talks Tech

Sep 15, 2024

Mastering Distributed ID Generation: From UUID to Snowflake and Beyond

1 UUID

UUID (Universally Unique Identifier), also known as GUID, is a 128‑bit identifier that ensures global uniqueness without a central coordinator. It originated in the Apollo network and OSF DCE. In Java you can obtain a UUID via UUID.randomUUID().

import java.util.UUID;

public class UuidTest {
    public static void main(String[] args) {
        String uuid = UUID.randomUUID().toString();
        System.out.println(uuid);
    }
}

Example output: 22527933-d0a7-4c2b-a377-aeb438a31b02 Advantages: No central node is required, making the system independent and suitable for data migration across databases.

Disadvantages: The generated string is long, leading to lower index query efficiency, and the values are not ordered.

In distributed logging or tracing systems, UUID can be used to generate unique identifiers for linking request logs.

2 Database Auto‑Increment ID

Most relational databases provide auto‑increment primary keys that guarantee uniqueness, such as MySQL's auto_increment and Oracle's sequence. The ID is generated automatically by the database without any code handling.

Advantages: Simple implementation and high query efficiency.

Disadvantages: Uniqueness is limited to a single table; across tables or databases IDs may collide. The sequential nature makes IDs predictable, posing security risks, and high concurrency can cause performance bottlenecks.

Legacy internal systems sometimes use database auto‑increment IDs as a distributed ID solution when concurrency and data volume are low.

3 Database Segment Mode

To reduce database load in high‑concurrency scenarios, a range (segment) of IDs can be allocated at once (e.g., a step of 1000). The segment is cached in server memory, and IDs are served from this cache until exhausted, after which a new segment is fetched.

Advantages: Simple implementation and reduced database dependency, improving system performance.

Disadvantages: IDs remain sequential and predictable, and a single‑node database remains a single point of failure.

4 Database Multi‑Master Mode

To avoid the single‑node risk of segment mode, multiple master database instances can be used. Each master is assigned a distinct large ID range (e.g., 10xxxx, 11xxxx, 12xxxx). Within each master, the segment mode is still applied.

Advantages: Eliminates the single‑node bottleneck and improves stability while retaining the performance benefits of segment mode.

Disadvantages: IDs generated across masters are not strictly incremental.

5 Redis ID Generation

Redis can generate auto‑increment IDs using the INCR command.

redis> SET ID_VALUE 1000
OK

redis> INCR ID_VALUE
(integer) 1001

redis> GET ID_VALUE
"1001"

Advantages: Simple implementation, better performance than database auto‑increment, and avoids cross‑table or cross‑database ID collisions.

Disadvantages: IDs are predictable, and a single‑node Redis still represents a single point of failure.

6 Zookeeper ID Generation

Zookeeper can generate sequence numbers via znode version numbers, producing 32‑ or 64‑bit values. This approach requires strong dependency on Zookeeper and synchronous API calls, which limits performance under high contention.

7 Snowflake Algorithm

Twitter's Snowflake is a 64‑bit ID scheme composed of:

1 sign bit (always 0)

41 bits timestamp (milliseconds, supporting ~69 years)

10 bits machine identifier (up to 1024 nodes)

12 bits sequence number (up to 4096 IDs per millisecond per node)

Advantages: Simple, in‑memory generation with high throughput (millions of IDs per second), time‑ordered IDs, and no third‑party dependencies.

Disadvantages: Relies on accurate system clocks; clock rollback can cause duplicate IDs.

8 Leaf

Meituan's open‑source Leaf provides two ID generation modes:

Leaf‑segment (database segment mode)

Leaf‑snowflake (Snowflake algorithm with Zookeeper for worker ID allocation)

Leaf‑segment requires a table with biz_tag, max_id, and step. The segment is cached in memory to reduce database writes.

Leaf‑snowflake adds Zookeeper to obtain a workId and includes a mechanism to handle clock rollback.

9 Tinyid

Didi's Tinyid is a Java distributed ID system based on the database segment algorithm, extending Leaf with multi‑master support and offering both REST API and Java client.

Key optimizations:

Tinyid‑client: Retrieves a segment from the server and generates IDs locally, eliminating network latency for each ID.

Double‑segment cache: Pre‑loads the next segment before the current one is exhausted.

Multi‑DB support: Multiple databases generate IDs independently, improving availability.

10 Baidu UID‑Generator

Baidu's UID‑Generator implements a Snowflake‑like algorithm with custom bit allocations: 1 sign bit, 28 bits for delta seconds (relative to 2016‑05‑20), 22 bits for worker ID, and 13 bits for sequence, supporting up to 8192 IDs per second per node.

It uses a RingBuffer to cache generated IDs, achieving up to 6 million QPS on a single machine. The RingBuffer employs Tail and Cursor pointers to manage production and consumption, with fill strategies on startup, when remaining IDs drop below 50 %, and via scheduled tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ZooKeeper uuid Leaf snowflake distributed-id database auto-increment Tinyid

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.