Backend Development 30 min read

Mastering Rate Limiting: Algorithms, Pros, Cons, and Distributed Solutions

This article explores why rate limiting is essential for high‑concurrency services, introduces four core algorithms with Go implementations, compares their strengths and weaknesses, and presents practical distributed limiting strategies using Redis, load balancers, and coordination services.

Sanyou's Java Diary
Sanyou's Java Diary
Sanyou's Java Diary
Mastering Rate Limiting: Algorithms, Pros, Cons, and Distributed Solutions

Introduction

With the rise of microservices, dependencies and call relationships become increasingly complex, making service stability crucial. Sudden traffic spikes can cause timeouts or server crashes, so rate limiting is applied to quickly reject requests that exceed configured limits, protecting both the system and its upstream/downstream services.

Key Design Principles

Multi‑level limiting : Apply limits at application, service, and data layers.

Dynamic thresholds : Adjust limits based on real‑time load.

Flexible dimensions : Include interface, device, IP, account ID, or behavior patterns.

Decoupling : Keep limiting as a separate service from business logic.

Fault tolerance : Provide fallback (circuit breaking, degradation) when the limiter fails.

Monitoring & alerts : Real‑time monitoring and alarm when limits are triggered.

Background

Three main tools protect high‑concurrency services: caching, degradation, and rate limiting.

Rate limiting controls the request rate to prevent overload when resources are limited.

Fundamental Concepts

A threshold defines the allowed number of requests per unit time (e.g., 500 QPS). A reject strategy determines how to handle excess requests, such as immediate rejection or queuing.

Rate limiting can be single‑machine or distributed. Single‑machine algorithms include fixed window, sliding window, leaky bucket, and token bucket.

Basic Algorithms

1. Fixed Window

Divides time into fixed windows and counts requests within each window.

<code>type FixedWindowLimiter struct {
    windowSize  time.Duration // window size
    maxRequests int           // max requests per window
    requests    int           // current count
    lastReset   int64         // last reset timestamp
    resetMutex  sync.Mutex    // lock for reset
}

func NewFixedWindowLimiter(windowSize time.Duration, maxRequests int) *FixedWindowLimiter {
    return &FixedWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, lastReset: time.Now().Unix()}
}

func (l *FixedWindowLimiter) AllowRequest() bool {
    l.resetMutex.Lock()
    defer l.resetMutex.Unlock()
    if time.Now().Unix()-l.lastReset >= int64(l.windowSize.Seconds()) {
        l.requests = 0
        l.lastReset = time.Now().Unix()
    }
    if l.requests >= l.maxRequests {
        return false
    }
    l.requests++
    return true
}

func main() {
    limiter := NewFixedWindowLimiter(1*time.Second, 3)
    for i := 0; i < 15; i++ {
        now := time.Now().Format("15:04:05")
        if limiter.AllowRequest() {
            fmt.Println(now + " request passed")
        } else {
            fmt.Println(now + " request throttled")
        }
        time.Sleep(100 * time.Millisecond)
    }
}
</code>

Advantages : Simple to implement, stable for steady traffic, easy rate control.

Disadvantages : Cannot handle burst traffic well, uneven request distribution may cause spikes.

2. Sliding Window

Improves fixed window by sliding the window continuously, providing finer granularity.

<code>type SlidingWindowLimiter struct {
    windowSize   time.Duration
    maxRequests  int
    requests     []time.Time
    requestsLock sync.Mutex
}

func NewSlidingWindowLimiter(windowSize time.Duration, maxRequests int) *SlidingWindowLimiter {
    return &SlidingWindowLimiter{windowSize: windowSize, maxRequests: maxRequests, requests: make([]time.Time, 0)}
}

func (l *SlidingWindowLimiter) AllowRequest() bool {
    l.requestsLock.Lock()
    defer l.requestsLock.Unlock()
    now := time.Now()
    for len(l.requests) > 0 && now.Sub(l.requests[0]) > l.windowSize {
        l.requests = l.requests[1:]
    }
    if len(l.requests) >= l.maxRequests {
        return false
    }
    l.requests = append(l.requests, now)
    return true
}

func main() {
    limiter := NewSlidingWindowLimiter(500*time.Millisecond, 2)
    for i := 0; i < 15; i++ {
        now := time.Now().Format("15:04:05")
        if limiter.AllowRequest() {
            fmt.Println(now + " request passed")
        } else {
            fmt.Println(now + " request throttled")
        }
        time.Sleep(100 * time.Millisecond)
    }
}
</code>

Advantages : Handles bursts more smoothly, higher precision.

Disadvantages : Higher memory consumption, more complex implementation.

3. Leaky Bucket

Models a bucket that leaks at a constant rate; excess requests are dropped.

<code>type LeakyBucket struct {
    rate       float64 // tokens per second
    capacity   int     // max tokens
    water      int     // current tokens
    lastLeakMs int64   // last leak timestamp
}

func NewLeakyBucket(rate float64, capacity int) *LeakyBucket {
    return &LeakyBucket{rate: rate, capacity: capacity, water: 0, lastLeakMs: time.Now().Unix()}
}

func (lb *LeakyBucket) Allow() bool {
    now := time.Now().Unix()
    elapsed := now - lb.lastLeakMs
    leakAmount := int(float64(elapsed) / 1000 * lb.rate)
    if leakAmount > 0 {
        if leakAmount > lb.water {
            lb.water = 0
        } else {
            lb.water -= leakAmount
        }
    }
    if lb.water > lb.capacity {
        lb.water--
        return false
    }
    lb.water++
    lb.lastLeakMs = now
    return true
}

func main() {
    bucket := NewLeakyBucket(3, 4)
    for i := 1; i <= 15; i++ {
        now := time.Now().Format("15:04:05")
        if bucket.Allow() {
            fmt.Printf(now+"  request %d passed\n", i)
        } else {
            fmt.Printf(now+"  request %d throttled\n", i)
        }
        time.Sleep(200 * time.Millisecond)
    }
}
</code>

Advantages : Smooths burst traffic, simple to implement, prevents overload.

Disadvantages : Limited flexibility for bursts, fixed outflow rate.

4. Token Bucket

Tokens are added at a steady rate; each request consumes a token, allowing bursts up to the bucket capacity.

<code>type TokenBucket struct {
    rate       float64
    capacity   float64
    tokens     float64
    lastUpdate time.Time
    mu         sync.Mutex
}

func NewTokenBucket(rate, capacity float64) *TokenBucket {
    return &TokenBucket{rate: rate, capacity: capacity, tokens: capacity, lastUpdate: time.Now()}
}

func (tb *TokenBucket) Allow() bool {
    tb.mu.Lock()
    defer tb.mu.Unlock()
    now := time.Now()
    elapsed := now.Sub(tb.lastUpdate).Seconds()
    tb.tokens += elapsed * tb.rate
    if tb.tokens > tb.capacity {
        tb.tokens = tb.capacity
    }
    if tb.tokens >= 1.0 {
        tb.tokens--
        tb.lastUpdate = now
        return true
    }
    return false
}

func main() {
    bucket := NewTokenBucket(2.0, 3.0)
    for i := 1; i <= 10; i++ {
        now := time.Now().Format("15:04:05")
        if bucket.Allow() {
            fmt.Printf(now+"  request %d passed\n", i)
        } else {
            fmt.Printf(now+"  request %d throttled\n", i)
        }
        time.Sleep(200 * time.Millisecond)
    }
}
</code>

Advantages : Handles bursts, flexible rate adjustment, adapts to varying traffic.

Disadvantages : Slightly more complex implementation, requires precise time handling.

Distributed Rate Limiting

1. Centralized (Redis) Token Bucket

Use a Redis instance to store token bucket state and execute a Lua script atomically.

<code>type Client struct {
    client RedisClient // Redis operations
    script string      // Lua script
}

func NewBucketClient(redis RedisClient) *Client {
    return &Client{client: redis, script: `
        -- Token bucket script
        local bucket = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local tokenRate = tonumber(ARGV[2])
        local redisTime = redis.call('TIME')
        local now = tonumber(redisTime[1])
        local tokens, lastRefill = unpack(redis.call('hmget', bucket, 'tokens', 'lastRefill'))
        tokens = tonumber(tokens)
        lastRefill = tonumber(lastRefill)
        if not tokens or not lastRefill then
            tokens = capacity
            lastRefill = now
        else
            local intervalsSinceLast = (now - lastRefill) * tokenRate
            tokens = math.min(capacity, tokens + intervalsSinceLast)
        end
        if tokens < 1 then
            return 0
        else
            redis.call('hmset', bucket, 'tokens', tokens - 1, 'lastRefill', now)
            return 1
        end
    `}
}

func (c *Client) isAllowed(ctx context.Context, key string, capacity, tokenRate int64) (bool, error) {
    result, err := redis.Int(c.client.Do(ctx, "eval", c.script, 1, key, capacity, tokenRate))
    if err != nil {
        return false, err
    }
    return result == 1, nil
}

func main() {
    c := NewBucketClient(redis.GetPoolByName("redis://127.0.0.1:6379"))
    var wg sync.WaitGroup
    wg.Add(120)
    var count atomic.Int64
    for i := 0; i < 120; i++ {
        go func(i int) {
            defer wg.Done()
            ok, _ := c.isAllowed(context.Background(), "test", 100, 10)
            if ok {
                count.Add(1)
            }
            fmt.Printf("go %d status:%v\n", i, ok)
        }(i)
    }
    wg.Wait()
    fmt.Printf("allowed %d\n", count.Load())
}
</code>

Issues : Redis can become a performance bottleneck or single point of failure; network bandwidth and latency affect throughput.

2. Load‑Balancer Based Approach

Distribute requests evenly with a load balancer, then apply local token‑bucket limiting on each instance. Dynamically adjust per‑instance thresholds based on CPU/memory usage.

Issues : Requires reliable local cache, may suffer from uneven global limiting precision, and the load balancer itself can be a single point of failure.

3. Coordination‑Service Approach (ZooKeeper/etcd)

Store the token count in a coordination service. Each instance acquires a lock to decrement the count for a request and increments it after processing. A background task replenishes tokens.

Issues : Implementation complexity and high demand on the coordination service; if the service cannot handle the request volume, it becomes a bottleneck.

Conclusion

There is no universally best rate‑limiting solution; the optimal choice depends on system requirements, existing technology stack, load characteristics, and underlying infrastructure performance. Understanding each algorithm’s behavior and trade‑offs enables architects to combine rate limiting with other techniques such as caching, degradation, load balancing, and asynchronous processing to build resilient, high‑performance services.

distributed systemsPerformanceAlgorithmmicroservicesGolangrate limiting
Sanyou's Java Diary
Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.