How We Built a High‑Availability Distributed ID Service with Leaf
This article details the motivation, design choices, architecture, performance optimizations, and operational lessons learned while implementing a distributed ID generation system using Leaf's segment mode to achieve global uniqueness, high availability, and low latency for large‑scale e‑commerce services.
Why Distributed IDs?
Background
Different business lines (main site, distribution, B2B) each generated their own order IDs. When synchronising to the order centre, an additional internal ID was created for downstream warehouse coordination, resulting in inconsistent ID formats across the system.
Problem
The chaotic use of multiple ID schemes caused communication barriers, difficult governance and code decay.
Goal
The distributed‑ID service must produce globally unique, highly secure IDs while providing high availability and low latency.
Architecture Principles
Technology Selection
Among common industry solutions (Snowflake, Sonyflake, UUID, etc.) the Leaf Segment mode was chosen because it supports horizontal scaling and configurable ID length.
Leaf Segment Architecture
Leaf pre‑allocates a fixed‑length segment of IDs from a relational database (MySQL) to each server. The segment is stored in memory; only the maximum ID of the segment is persisted. This enables fast, in‑memory ID generation.
Production Issues
When a segment is exhausted, a sudden spike in TP999 latency occurs because the server blocks while updating the database.
If the database becomes unavailable (crash or master‑slave switch), the service experiences a temporary outage.
Availability Optimizations
A double‑buffer mechanism with asynchronous updates was added. When the active buffer reaches a configurable threshold, a background task loads the next segment into a standby buffer. If the DB fails, the service continues to serve IDs from the standby buffer until the DB recovers.
Dynamic Step Adjustment
Segment length (step) is adjusted based on recent traffic using the relation Q × T = L where Q is QPS, L is segment length, and T is the update interval. The rules are:
If T < 15 min, nextStep = step × 2 If 15 min ≤ T ≤ 30 min, nextStep = step If T > 30 min, nextStep = step ÷ 2 The step is bounded by an upper limit (e.g., 1,000,000) and never reduced below the previous step.
Improvements
Feature Enrichment
Additional capabilities were added to meet business scenarios:
Batch ID acquisition for bulk operations.
Pre‑scale of segment size ahead of large promotional events.
Early segment skipping to avoid hot‑spot consumption.
Availability Guarantees – Database
MySQL is deployed in a master‑slave (read‑write separation) topology with one master and two slaves. Semi‑synchronous replication ensures data consistency, and a “dual‑1” configuration prevents data loss during failover.
Availability Guarantees – SDK
An SDK mirrors the server’s double‑buffer strategy on the client side. It reduces integration effort, offloads the Leaf server, and provides graceful degradation when the server is unavailable.
Stability Measures – Operations
Three pillars ensure runtime stability:
Log monitoring to detect abnormal patterns.
Traffic monitoring to evaluate segment consumption and adjust step size.
Online health checks (heartbeat) to verify service liveness.
Stability Measures – SLA
SLA metrics focus on request latency (p99) and error rate. These indicators are used to trigger alerts and guide capacity planning.
Pitfalls Encountered
Problem Discovery
Service start‑up latency spikes dramatically before stabilising, causing high response times for the first few minutes.
Root Cause Analysis – JVM Tiered Compilation
HotSpot JVM uses a tiered compilation pipeline (L0–L4):
L0 – interpreter (with profiling).
L1 – C1 without profiling.
L2 – C1 with method‑call profiling.
L3 – C1 with full profiling.
L4 – C2 (optimised) compilation.
During start‑up, code runs in L0, then progresses to L3 and finally L4. The transition from C1 to C2 can take several seconds, which explains the initial latency spike.
Solutions
Disable tiered compilation (e.g., -XX:-TieredCompilation) and lower compilation thresholds ( -XX:CompileThreshold, -XX:FreqInlineSize).
Generate mock traffic during deployment to trigger JIT and C2 compilation early.
Use Java 9 AOT ( jaotc) to pre‑compile classes, eliminating the interpreter warm‑up phase.
Usage
Leaf is now in production serving three business domains: order ID, order snapshot ID, and order item snapshot ID. The service has successfully handled peak loads during Double 11 and Double 12 sales events.
Conclusion
The article documents the selection, implementation, and operational experience of a distributed ID service based on Leaf Segment mode. Key improvements include double‑buffered availability, dynamic step adjustment, SDK integration, and JVM warm‑up mitigation. While availability and stability have been significantly enhanced, further performance optimisation (e.g., reducing DB round‑trips, fine‑tuning JIT) remains an open area for future work.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
NetEase Yanxuan Technology Product Team
The NetEase Yanxuan Technology Product Team shares practical tech insights for the e‑commerce ecosystem. This official channel periodically publishes technical articles, team events, recruitment information, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
