Understanding Service Rate Limiting: Concepts, Algorithms, and Distributed Implementation
The article explains why service rate limiting is essential for high‑concurrency systems, compares it with service degradation, describes common algorithms such as counter, leaky bucket and token bucket, and shows how to implement distributed rate limiting using Redis.
When building high‑concurrency systems, three main techniques are used to keep services stable: service degradation, rate limiting, and caching. This article focuses on service rate limiting, distinguishing it from service degradation.
Service degradation is the practice of disabling certain interfaces or pages under sudden server pressure to free resources for core tasks.
Rate limiting reduces the traffic volume while keeping the service's processing capacity unchanged; unlike degradation, it does not lower the service's capability.
The article clarifies that the discussed rate limiting is implemented in business code rather than at the Nginx layer.
Why Do We Need Service Rate Limiting?
From the caller’s perspective, services can be categorized as user‑facing services (web services, external APIs) that may face rapid user growth, hot events, malicious attacks, or crawlers, and internal RPC services where a surge in one service can overload dependent services.
Common Rate‑Limiting Algorithms
The article reviews three widely used algorithms: counter algorithm, leaky bucket algorithm, and token bucket algorithm.
Counter Algorithm
The simplest approach counts the number of requests within a time window and rejects requests that exceed a predefined threshold (e.g., 100 QPS). It suffers from burst problems at the window boundaries, which can be mitigated by using a sliding window technique that divides the large window into smaller sub‑windows.
Leaky Bucket Algorithm
This classic algorithm models a bucket that leaks at a constant rate; excess requests overflow and are rejected. It ensures a steady processing rate but cannot handle short‑term traffic spikes.
Token Bucket Algorithm
A bucket holds a fixed number of tokens that are replenished at a constant rate. Each request must acquire a token; if none are available, the request is delayed or rejected. When the bucket is full, excess tokens are discarded, allowing bursts up to the current token count. This algorithm combines the steady rate of the leaky bucket with the ability to absorb short bursts.
Implementation can involve a queue storing tokens and a thread pool that periodically generates tokens; each incoming request consumes a token before proceeding.
Distributed Rate Limiting
Single‑machine rate limiting cannot satisfy complex business requirements such as limiting a user to three accesses per 10 seconds or 100 accesses per day. Distributed rate limiting solves this by using a shared counter, typically backed by Redis, which offers native support for distributed operations.
Each request increments a Redis key (e.g., composed of user ID and API name) with an expiration time, allowing simple enforcement of access limits across a cluster.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.