Backend Development 30 min read

Comprehensive Guide to Rate Limiting Algorithms and Distributed Rate Limiting Solutions

This guide explains why rate limiting is essential for micro‑service stability, outlines six design principles, details four classic algorithms—fixed window, sliding window, leaky bucket, and token bucket—and compares centralized Redis, load‑balancer cache, and coordination‑service distributed solutions.

Tencent Cloud Developer

Feb 28, 2024

Comprehensive Guide to Rate Limiting Algorithms and Distributed Rate Limiting Solutions

With the rise of micro‑services, service dependencies and call relationships become increasingly complex, making service stability a critical concern. Sudden traffic spikes can cause request timeouts or even server crashes. To protect both the system itself and its upstream/downstream services, rate limiting is commonly applied to quickly reject requests that exceed configured limits, thereby ensuring stability.

A good rate‑limiting design must consider business characteristics and include six key points: multi‑level limiting, dynamic threshold adjustment, flexible dimensions, decoupling from business logic, fault tolerance, and monitoring/alerting.

Rate‑limiting concepts

Two fundamental concepts are:

Threshold – the maximum number of requests allowed per unit time (e.g., 500 QPS).

Reject strategy – how to handle requests that exceed the threshold (e.g., immediate reject or queue).

Rate‑limiting can be implemented as single‑machine or distributed solutions. The four classic algorithms are described below.

1. Fixed Window Limiting

The fixed‑window algorithm divides time into equal windows (e.g., one second) and counts requests within each window. If the count exceeds the limit, further requests are rejected until the window resets.

<span>type FixedWindowLimiter struct {</span>
<span>   windowSize  time.Duration // 窗口大小</span>
<span>   maxRequests int           // 最大请求数</span>
<span>   requests    int           // 当前窗口内的请求数</span>
<span>   lastReset   int64         // 上次窗口重置时间（秒级时间戳）</span>
<span>   resetMutex  sync.Mutex    // 重置锁</span>
<span>}</span>
<span>func NewFixedWindowLimiter(windowSize time.Duration, maxRequests int) *FixedWindowLimiter {</span>
<span>   return &FixedWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, lastReset: time.Now().Unix()}</span>
<span>}</span>
<span>func (limiter *FixedWindowLimiter) AllowRequest() bool {</span>
<span>   limiter.resetMutex.Lock(); defer limiter.resetMutex.Unlock()</span>
<span>   if time.Now().Unix()-limiter.lastReset >= int64(limiter.windowSize.Seconds()) {</span>
<span>      limiter.requests = 0; limiter.lastReset = time.Now().Unix()</span>
<span>   }</span>
<span>   if limiter.requests >= limiter.maxRequests { return false }</span>
<span>   limiter.requests++; return true }</span>
<span>func main() {</span>
<span>   limiter := NewFixedWindowLimiter(1*time.Second, 3)</span>
<span>   for i := 0; i < 15; i++ {</span>
<span>      now := time.Now().Format("15:04:05")</span>
<span>      if limiter.AllowRequest() { fmt.Println(now + " 请求通过") } else { fmt.Println(now + " 请求被限流") }</span>
<span>      time.Sleep(100 * time.Millisecond)</span>
<span>   }</span>
<span>}</span>

Advantages : simple to implement, high stability for steady traffic, easy rate control.

Disadvantages : cannot handle short‑term bursts well, may cause unfairness at window boundaries.

2. Sliding Window Limiting

The sliding‑window algorithm improves on fixed windows by continuously moving the window, providing finer granularity and smoother handling of bursts.

<span>type SlidingWindowLimiter struct {</span>
<span>   windowSize   time.Duration // 窗口大小</span>
<span>   maxRequests  int           // 最大请求数</span>
<span>   requests     []time.Time   // 窗口内的请求时间</span>
<span>   requestsLock sync.Mutex    // 请求锁</span>
<span>}</span>
<span>func NewSlidingWindowLimiter(windowSize time.Duration, maxRequests int) *SlidingWindowLimiter {</span>
<span>   return &SlidingWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, requests: make([]time.Time, 0)}</span>
<span>}</span>
<span>func (limiter *SlidingWindowLimiter) AllowRequest() bool {</span>
<span>   limiter.requestsLock.Lock(); defer limiter.requestsLock.Unlock()</span>
<span>   now := time.Now()</span>
<span>   for len(limiter.requests) > 0 && now.Sub(limiter.requests[0]) > limiter.windowSize { limiter.requests = limiter.requests[1:] }</span>
<span>   if len(limiter.requests) >= limiter.maxRequests { return false }</span>
<span>   limiter.requests = append(limiter.requests, now); return true }</span>
<span>func main() {</span>
<span>   limiter := NewSlidingWindowLimiter(500*time.Millisecond, 2)</span>
<span>   for i := 0; i < 15; i++ {</span>
<span>      now := time.Now().Format("15:04:05")</span>
<span>      if limiter.AllowRequest() { fmt.Println(now + " 请求通过") } else { fmt.Println(now + " 请求被限流") }</span>
<span>      time.Sleep(100 * time.Millisecond)</span>
<span>   }</span>
<span>}</span>

Advantages : smooth burst handling, higher precision, real‑time response to traffic changes.

Disadvantages : higher memory consumption, more complex implementation.

3. Leaky Bucket Limiting

The leaky‑bucket algorithm models a bucket that fills with incoming requests and leaks at a constant rate, smoothing traffic and preventing overload.

<span>type LeakyBucket struct {</span>
<span>   rate       float64 // 漏桶速率，单位请求数/秒</span>
<span>   capacity   int     // 漏桶容量，最多可存储请求数</span>
<span>   water      int     // 当前水量，表示当前漏桶中的请求数</span>
<span>   lastLeakMs int64   // 上次漏水的时间戳，单位秒</span>
<span>}</span>
<span>func NewLeakyBucket(rate float64, capacity int) *LeakyBucket {</span>
<span>   return &LeakyBucket{rate: rate, capacity: capacity, water: 0, lastLeakMs: time.Now().Unix()}</span>
<span>}</span>
<span>func (lb *LeakyBucket) Allow() bool {</span>
<span>   now := time.Now().Unix()</span>
<span>   elapsed := now - lb.lastLeakMs</span>
<span>   leakAmount := int(float64(elapsed) / 1000 * lb.rate)</span>
<span>   if leakAmount > 0 { if leakAmount > lb.water { lb.water = 0 } else { lb.water -= leakAmount } }</span>
<span>   if lb.water > lb.capacity { lb.water--; return false }</span>
<span>   lb.water++; lb.lastLeakMs = now; return true }</span>
<span>func main() {</span>
<span>   leakyBucket := NewLeakyBucket(3, 4)</span>
<span>   for i := 1; i <= 15; i++ {</span>
<span>      now := time.Now().Format("15:04:05")</span>
<span>      if leakyBucket.Allow() { fmt.Printf(now+" 第 %d 个请求通过
", i) } else { fmt.Printf(now+" 第 %d 个请求被限流
", i) }</span>
<span>      time.Sleep(200 * time.Millisecond)</span>
<span>   }</span>
<span>}</span>

Advantages : simple, effective at smoothing bursts, prevents overload.

Disadvantages : limited flexibility for sudden spikes, may waste resources when traffic is low.

4. Token Bucket Limiting

The token‑bucket algorithm adds tokens to a bucket at a fixed rate; each request consumes a token. If the bucket is empty, the request is rejected.

<span>type TokenBucket struct {</span>
<span>   rate       float64 // 令牌产生速率</span>
<span>   capacity   float64 // 桶的最大容量</span>
<span>   tokens     float64 // 当前令牌数</span>
<span>   lastUpdate time.Time</span>
<span>   mu         sync.Mutex</span>
<span>}</span>
<span>func NewTokenBucket(rate, capacity float64) *TokenBucket {</span>
<span>   return &TokenBucket{rate: rate, capacity: capacity, tokens: capacity, lastUpdate: time.Now()}</span>
<span>}</span>
<span>func (tb *TokenBucket) Allow() bool {</span>
<span>   tb.mu.Lock(); defer tb.mu.Unlock()</span>
<span>   now := time.Now()</span>
<span>   elapsed := now.Sub(tb.lastUpdate).Seconds()</span>
<span>   tb.tokens += elapsed * tb.rate</span>
<span>   if tb.tokens > tb.capacity { tb.tokens = tb.capacity }</span>
<span>   if tb.tokens >= 1 { tb.tokens--; tb.lastUpdate = now; return true }</span>
<span>   return false }</span>
<span>func main() {</span>
<span>   bucket := NewTokenBucket(2.0, 3.0)</span>
<span>   for i := 1; i <= 10; i++ {</span>
<span>      now := time.Now().Format("15:04:05")</span>
<span>      if bucket.Allow() { fmt.Printf(now+" 第 %d 个请求通过
", i) } else { fmt.Printf(now+" 第 %d 个请求被限流
", i) }</span>
<span>      time.Sleep(200 * time.Millisecond)</span>
<span>   }</span>
<span>}</span>

Advantages : smooths bursts while allowing occasional spikes, flexible via token rate and bucket size.

Disadvantages : more complex to implement, requires precise time handling, possible token waste.

5. Comparison of the Four Algorithms

A summary table (fixed window, sliding window, leaky bucket, token bucket) lists each algorithm’s strengths, weaknesses, and suitable scenarios.

6. Distributed Rate‑Limiting Solutions

6.1 Centralized Redis‑based Token Bucket

A Lua script stored in Redis implements a token bucket. Each request executes the script to atomically check and consume a token.

<span>-- 令牌桶限流脚本</span>
<span>local bucket = KEYS[1]</span>
<span>local capacity = tonumber(ARGV[1])</span>
<span>local tokenRate = tonumber(ARGV[2])</span>
<span>local redisTime = redis.call('TIME')</span>
<span>local now = tonumber(redisTime[1])</span>
<span>local tokens, lastRefill = unpack(redis.call('hmget', bucket, 'tokens', 'lastRefill'))</span>
<span>tokens = tonumber(tokens)</span>
<span>lastRefill = tonumber(lastRefill)</span>
<span>if not tokens or not lastRefill then</span>
<span>   tokens = capacity; lastRefill = now</span>
<span>else</span>
<span>   local intervalsSinceLast = (now - lastRefill) * tokenRate</span>
<span>   tokens = math.min(capacity, tokens + intervalsSinceLast)</span>
<span>end</span>
<span>if tokens < 1 then return 0 else redis.call('hmset', bucket, 'tokens', tokens - 1, 'lastRefill', now) return 1 end</span>

Go client calls the script via Do(ctx, "eval", script, 1, key, capacity, tokenRate). The approach suffers from performance bottlenecks and single‑point‑of‑failure risks.

6.2 Load‑Balancer + Local Cache

Requests are evenly distributed by a load balancer (or service discovery). Each instance runs a local token‑bucket limiter, reducing remote calls. Dynamic adjustment can tune thresholds per instance based on CPU/memory usage.

6.3 Coordination Service (ZooKeeper / etcd)

Each server acquires a token by creating a distributed lock on a node representing the bucket. Tokens are added periodically by a background task. This provides global consistency but adds complexity and depends on the coordination service’s performance.

7. Conclusion

There is no universally best rate‑limiting solution; the optimal choice depends on system requirements, existing tech stack, load characteristics, and underlying infrastructure. Understanding each algorithm’s principles and trade‑offs enables informed decisions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems algorithm Microservices Golang Rate Limiting Token Bucket

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.