Comprehensive Guide to Rate Limiting Algorithms and Distributed Rate Limiting Solutions
This guide explains why rate limiting is essential for micro‑service stability, outlines six design principles, details four classic algorithms—fixed window, sliding window, leaky bucket, and token bucket—and compares centralized Redis, load‑balancer cache, and coordination‑service distributed solutions.
With the rise of micro‑services, service dependencies and call relationships become increasingly complex, making service stability a critical concern. Sudden traffic spikes can cause request timeouts or even server crashes. To protect both the system itself and its upstream/downstream services, rate limiting is commonly applied to quickly reject requests that exceed configured limits, thereby ensuring stability.
A good rate‑limiting design must consider business characteristics and include six key points: multi‑level limiting, dynamic threshold adjustment, flexible dimensions, decoupling from business logic, fault tolerance, and monitoring/alerting.
Rate‑limiting concepts
Two fundamental concepts are:
Threshold – the maximum number of requests allowed per unit time (e.g., 500 QPS).
Reject strategy – how to handle requests that exceed the threshold (e.g., immediate reject or queue).
Rate‑limiting can be implemented as single‑machine or distributed solutions. The four classic algorithms are described below.
1. Fixed Window Limiting
The fixed‑window algorithm divides time into equal windows (e.g., one second) and counts requests within each window. If the count exceeds the limit, further requests are rejected until the window resets.
type FixedWindowLimiter struct {
windowSize time.Duration // 窗口大小
maxRequests int // 最大请求数
requests int // 当前窗口内的请求数
lastReset int64 // 上次窗口重置时间(秒级时间戳)
resetMutex sync.Mutex // 重置锁
}
func NewFixedWindowLimiter(windowSize time.Duration, maxRequests int) *FixedWindowLimiter {
return &FixedWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, lastReset: time.Now().Unix()}
}
func (limiter *FixedWindowLimiter) AllowRequest() bool {
limiter.resetMutex.Lock(); defer limiter.resetMutex.Unlock()
if time.Now().Unix()-limiter.lastReset >= int64(limiter.windowSize.Seconds()) {
limiter.requests = 0; limiter.lastReset = time.Now().Unix()
}
if limiter.requests >= limiter.maxRequests { return false }
limiter.requests++; return true }
func main() {
limiter := NewFixedWindowLimiter(1*time.Second, 3)
for i := 0; i < 15; i++ {
now := time.Now().Format("15:04:05")
if limiter.AllowRequest() { fmt.Println(now + " 请求通过") } else { fmt.Println(now + " 请求被限流") }
time.Sleep(100 * time.Millisecond)
}
}Advantages : simple to implement, high stability for steady traffic, easy rate control.
Disadvantages : cannot handle short‑term bursts well, may cause unfairness at window boundaries.
2. Sliding Window Limiting
The sliding‑window algorithm improves on fixed windows by continuously moving the window, providing finer granularity and smoother handling of bursts.
type SlidingWindowLimiter struct {
windowSize time.Duration // 窗口大小
maxRequests int // 最大请求数
requests []time.Time // 窗口内的请求时间
requestsLock sync.Mutex // 请求锁
}
func NewSlidingWindowLimiter(windowSize time.Duration, maxRequests int) *SlidingWindowLimiter {
return &SlidingWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, requests: make([]time.Time, 0)}
}
func (limiter *SlidingWindowLimiter) AllowRequest() bool {
limiter.requestsLock.Lock(); defer limiter.requestsLock.Unlock()
now := time.Now()
for len(limiter.requests) > 0 && now.Sub(limiter.requests[0]) > limiter.windowSize { limiter.requests = limiter.requests[1:] }
if len(limiter.requests) >= limiter.maxRequests { return false }
limiter.requests = append(limiter.requests, now); return true }
func main() {
limiter := NewSlidingWindowLimiter(500*time.Millisecond, 2)
for i := 0; i < 15; i++ {
now := time.Now().Format("15:04:05")
if limiter.AllowRequest() { fmt.Println(now + " 请求通过") } else { fmt.Println(now + " 请求被限流") }
time.Sleep(100 * time.Millisecond)
}
}Advantages : smooth burst handling, higher precision, real‑time response to traffic changes.
Disadvantages : higher memory consumption, more complex implementation.
3. Leaky Bucket Limiting
The leaky‑bucket algorithm models a bucket that fills with incoming requests and leaks at a constant rate, smoothing traffic and preventing overload.
type LeakyBucket struct {
rate float64 // 漏桶速率,单位请求数/秒
capacity int // 漏桶容量,最多可存储请求数
water int // 当前水量,表示当前漏桶中的请求数
lastLeakMs int64 // 上次漏水的时间戳,单位秒
}
func NewLeakyBucket(rate float64, capacity int) *LeakyBucket {
return &LeakyBucket{rate: rate, capacity: capacity, water: 0, lastLeakMs: time.Now().Unix()}
}
func (lb *LeakyBucket) Allow() bool {
now := time.Now().Unix()
elapsed := now - lb.lastLeakMs
leakAmount := int(float64(elapsed) / 1000 * lb.rate)
if leakAmount > 0 { if leakAmount > lb.water { lb.water = 0 } else { lb.water -= leakAmount } }
if lb.water > lb.capacity { lb.water--; return false }
lb.water++; lb.lastLeakMs = now; return true }
func main() {
leakyBucket := NewLeakyBucket(3, 4)
for i := 1; i <= 15; i++ {
now := time.Now().Format("15:04:05")
if leakyBucket.Allow() { fmt.Printf(now+" 第 %d 个请求通过\n", i) } else { fmt.Printf(now+" 第 %d 个请求被限流\n", i) }
time.Sleep(200 * time.Millisecond)
}
}Advantages : simple, effective at smoothing bursts, prevents overload.
Disadvantages : limited flexibility for sudden spikes, may waste resources when traffic is low.
4. Token Bucket Limiting
The token‑bucket algorithm adds tokens to a bucket at a fixed rate; each request consumes a token. If the bucket is empty, the request is rejected.
type TokenBucket struct {
rate float64 // 令牌产生速率
capacity float64 // 桶的最大容量
tokens float64 // 当前令牌数
lastUpdate time.Time
mu sync.Mutex
}
func NewTokenBucket(rate, capacity float64) *TokenBucket {
return &TokenBucket{rate: rate, capacity: capacity, tokens: capacity, lastUpdate: time.Now()}
}
func (tb *TokenBucket) Allow() bool {
tb.mu.Lock(); defer tb.mu.Unlock()
now := time.Now()
elapsed := now.Sub(tb.lastUpdate).Seconds()
tb.tokens += elapsed * tb.rate
if tb.tokens > tb.capacity { tb.tokens = tb.capacity }
if tb.tokens >= 1 { tb.tokens--; tb.lastUpdate = now; return true }
return false }
func main() {
bucket := NewTokenBucket(2.0, 3.0)
for i := 1; i <= 10; i++ {
now := time.Now().Format("15:04:05")
if bucket.Allow() { fmt.Printf(now+" 第 %d 个请求通过\n", i) } else { fmt.Printf(now+" 第 %d 个请求被限流\n", i) }
time.Sleep(200 * time.Millisecond)
}
}Advantages : smooths bursts while allowing occasional spikes, flexible via token rate and bucket size.
Disadvantages : more complex to implement, requires precise time handling, possible token waste.
5. Comparison of the Four Algorithms
A summary table (fixed window, sliding window, leaky bucket, token bucket) lists each algorithm’s strengths, weaknesses, and suitable scenarios.
6. Distributed Rate‑Limiting Solutions
6.1 Centralized Redis‑based Token Bucket
A Lua script stored in Redis implements a token bucket. Each request executes the script to atomically check and consume a token.
-- 令牌桶限流脚本
local bucket = KEYS[1]
local capacity = tonumber(ARGV[1])
local tokenRate = tonumber(ARGV[2])
local redisTime = redis.call('TIME')
local now = tonumber(redisTime[1])
local tokens, lastRefill = unpack(redis.call('hmget', bucket, 'tokens', 'lastRefill'))
tokens = tonumber(tokens)
lastRefill = tonumber(lastRefill)
if not tokens or not lastRefill then
tokens = capacity; lastRefill = now
else
local intervalsSinceLast = (now - lastRefill) * tokenRate
tokens = math.min(capacity, tokens + intervalsSinceLast)
end
if tokens < 1 then return 0 else redis.call('hmset', bucket, 'tokens', tokens - 1, 'lastRefill', now) return 1 endGo client calls the script via Do(ctx, "eval", script, 1, key, capacity, tokenRate) . The approach suffers from performance bottlenecks and single‑point‑of‑failure risks.
6.2 Load‑Balancer + Local Cache
Requests are evenly distributed by a load balancer (or service discovery). Each instance runs a local token‑bucket limiter, reducing remote calls. Dynamic adjustment can tune thresholds per instance based on CPU/memory usage.
6.3 Coordination Service (ZooKeeper / etcd)
Each server acquires a token by creating a distributed lock on a node representing the bucket. Tokens are added periodically by a background task. This provides global consistency but adds complexity and depends on the coordination service’s performance.
7. Conclusion
There is no universally best rate‑limiting solution; the optimal choice depends on system requirements, existing tech stack, load characteristics, and underlying infrastructure. Understanding each algorithm’s principles and trade‑offs enables informed decisions.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.