Understanding Service Rate Limiting: Algorithms and Distributed Implementation
This article explains why service rate limiting is essential for high‑concurrency systems, compares it with service degradation, introduces common limiting algorithms such as counter, leaky‑bucket and token‑bucket, and shows how to implement distributed rate limiting using Redis for real‑world backend services.
When building high‑concurrency systems, stability is usually ensured through three mechanisms: service degradation, rate limiting, and caching. Among them, rate limiting is the focus of this article.
Difference Between Service Degradation and Rate Limiting
Service degradation shuts down certain interfaces or pages under sudden load to free resources for core tasks.
Rate limiting reduces incoming traffic while keeping the service’s processing capacity unchanged; degradation reduces the service’s capacity and reallocates resources.
The article emphasizes that the discussed rate limiting is implemented in business code rather than at the Nginx layer.
Why Do We Need Rate Limiting?
From the caller’s perspective, services can be user‑facing (web, external APIs) or internal RPC services. Factors that can cause traffic spikes include rapid user growth, hot events, malicious attacks or scraping, and unpredictable spikes that make scaling impossible in time.
Common Rate‑Limiting Algorithms
The article reviews three widely used algorithms:
Counter Algorithm
The simplest method counts requests within a time window (e.g., 100 QPS in 1 s) and rejects excess requests. Its drawbacks include unfair rejection when many requests arrive early in the window and vulnerability to burst attacks that exploit window reset points.
A classic improvement is the sliding‑window technique, which divides a larger window into smaller sub‑windows to smooth out bursts.
Leaky‑Bucket Algorithm
This algorithm models a bucket that drains at a constant rate; excess requests overflow and are dropped. It guarantees a steady processing rate but cannot handle short‑term traffic spikes.
Token‑Bucket Algorithm
A bucket holds a limited number of tokens that are replenished at a fixed rate. A request proceeds only after acquiring a token, allowing short bursts when tokens have accumulated, thus addressing the leaky‑bucket’s limitation.
From Single‑Machine to Distributed Rate Limiting
Single‑machine limiting often cannot satisfy complex business requirements such as limiting a user to three accesses per 10 seconds or 100 accesses per day. Distributed limiting solves this by using a shared counter, typically backed by Redis, which offers native support for distributed operations.
Implementation sketch: for each request, generate a Redis key based on user ID and API name, issue an INCR command, and set an expiration on the key to enforce the time window.
The article concludes with a curated list of interview questions from major tech companies (BAT) and invites readers to scan QR codes for additional resources.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.