Understanding Rate Limiting: Counter, Sliding Window, Leaky Bucket, and Token Bucket Algorithms
The article introduces the concept of rate limiting, explains why it is needed both offline (e.g., crowded theme parks) and online (e.g., flash sales), and details four common algorithms—counter, sliding window, leaky bucket, and token bucket—along with their implementation considerations and trade‑offs.
During the National Day holiday the author visited Shanghai Disneyland with his girlfriend, only to encounter massive crowds that made the experience chaotic, illustrating the real‑world problem of uncontrolled traffic.
Just as a theme park needs to limit the number of visitors, online systems must enforce flow control to protect core services during traffic spikes such as Spring Festival train ticket rushes, Double‑11 shopping festivals, or flash sales.
🌴 What Is Rate Limiting
Definition: Rate limiting restricts the number of concurrent requests reaching a system, ensuring it can respond to a subset of users while rejecting excess traffic to maintain overall availability.
🌴 Counter Rate Limiting
This simple algorithm increments a counter for each incoming request and compares the value against a threshold; if the limit is exceeded, the request is rejected. It suffers from a “critical point” issue where bursts can temporarily exceed capacity.
🌴 Sliding Window Rate Limiting
To solve the counter’s precision problem, the sliding window divides the time interval into smaller slots (e.g., 1‑second windows within a 5‑second period) and sums the counts of the most recent slots, providing smoother and more accurate throttling.
🌴 Leaky Bucket Rate Limiting
A leaky bucket placed between the traffic producer and consumer buffers incoming requests; excess bursts are dropped, and the bucket releases requests at a fixed rate, smoothing traffic and preventing sudden overloads.
🌴 Token Bucket Rate Limiting
Tokens are added to a bucket at a constant rate; each request must consume a token before proceeding. If the bucket is empty, the request is delayed or rejected. This algorithm allows bursts while enforcing an average rate, and can be implemented with in‑memory counters, Redis, or libraries such as Guava’s RateLimiter .
In practice, choosing the right algorithm and configuring limits (static or dynamic) requires careful testing and monitoring to balance protection and user experience.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.