Backend Development 13 min read

How We Built a High‑Availability Distributed ID Service with Leaf

This article details the motivation, design choices, architecture, performance optimizations, and operational lessons learned while implementing a distributed ID generation system using Leaf's segment mode to achieve global uniqueness, high availability, and low latency for large‑scale e‑commerce services.

NetEase Yanxuan Technology Product Team

Mar 4, 2021

How We Built a High‑Availability Distributed ID Service with Leaf

Why Distributed IDs?

Background

Different business lines (main site, distribution, B2B) each generated their own order IDs. When synchronising to the order centre, an additional internal ID was created for downstream warehouse coordination, resulting in inconsistent ID formats across the system.

Problem

The chaotic use of multiple ID schemes caused communication barriers, difficult governance and code decay.

Goal

The distributed‑ID service must produce globally unique, highly secure IDs while providing high availability and low latency.

Architecture Principles

Technology Selection

Among common industry solutions (Snowflake, Sonyflake, UUID, etc.) the Leaf Segment mode was chosen because it supports horizontal scaling and configurable ID length.

Leaf Segment Architecture

Leaf pre‑allocates a fixed‑length segment of IDs from a relational database (MySQL) to each server. The segment is stored in memory; only the maximum ID of the segment is persisted. This enables fast, in‑memory ID generation.

Production Issues

When a segment is exhausted, a sudden spike in TP999 latency occurs because the server blocks while updating the database.

If the database becomes unavailable (crash or master‑slave switch), the service experiences a temporary outage.

Availability Optimizations

A double‑buffer mechanism with asynchronous updates was added. When the active buffer reaches a configurable threshold, a background task loads the next segment into a standby buffer. If the DB fails, the service continues to serve IDs from the standby buffer until the DB recovers.

Dynamic Step Adjustment

Segment length (step) is adjusted based on recent traffic using the relation Q × T = L where Q is QPS, L is segment length, and T is the update interval. The rules are:

If T < 15 min, nextStep = step × 2 If 15 min ≤ T ≤ 30 min, nextStep = step If T > 30 min, nextStep = step ÷ 2 The step is bounded by an upper limit (e.g., 1,000,000) and never reduced below the previous step.

Improvements

Feature Enrichment

Additional capabilities were added to meet business scenarios:

Batch ID acquisition for bulk operations.

Pre‑scale of segment size ahead of large promotional events.

Early segment skipping to avoid hot‑spot consumption.

Availability Guarantees – Database

MySQL is deployed in a master‑slave (read‑write separation) topology with one master and two slaves. Semi‑synchronous replication ensures data consistency, and a “dual‑1” configuration prevents data loss during failover.

Availability Guarantees – SDK

An SDK mirrors the server’s double‑buffer strategy on the client side. It reduces integration effort, offloads the Leaf server, and provides graceful degradation when the server is unavailable.

Stability Measures – Operations

Three pillars ensure runtime stability:

Log monitoring to detect abnormal patterns.

Traffic monitoring to evaluate segment consumption and adjust step size.

Online health checks (heartbeat) to verify service liveness.

Stability Measures – SLA

SLA metrics focus on request latency (p99) and error rate. These indicators are used to trigger alerts and guide capacity planning.

Pitfalls Encountered

Problem Discovery

Service start‑up latency spikes dramatically before stabilising, causing high response times for the first few minutes.

Root Cause Analysis – JVM Tiered Compilation

HotSpot JVM uses a tiered compilation pipeline (L0–L4):

L0 – interpreter (with profiling).

L1 – C1 without profiling.

L2 – C1 with method‑call profiling.

L3 – C1 with full profiling.

L4 – C2 (optimised) compilation.

During start‑up, code runs in L0, then progresses to L3 and finally L4. The transition from C1 to C2 can take several seconds, which explains the initial latency spike.

Solutions

Disable tiered compilation (e.g., -XX:-TieredCompilation) and lower compilation thresholds ( -XX:CompileThreshold, -XX:FreqInlineSize).

Generate mock traffic during deployment to trigger JIT and C2 compilation early.

Use Java 9 AOT ( jaotc) to pre‑compile classes, eliminating the interpreter warm‑up phase.

Usage

Leaf is now in production serving three business domains: order ID, order snapshot ID, and order item snapshot ID. The service has successfully handled peak loads during Double 11 and Double 12 sales events.

Conclusion

The article documents the selection, implementation, and operational experience of a distributed ID service based on Leaf Segment mode. Key improvements include double‑buffered availability, dynamic step adjustment, SDK integration, and JVM warm‑up mitigation. While availability and stability have been significantly enhanced, further performance optimisation (e.g., reducing DB round‑trips, fine‑tuning JIT) remains an open area for future work.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance Optimization Backend Architecture high availability Leaf distributed-id

Written by

NetEase Yanxuan Technology Product Team

The NetEase Yanxuan Technology Product Team shares practical tech insights for the e‑commerce ecosystem. This official channel periodically publishes technical articles, team events, recruitment information, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.