Backend Development 7 min read

Rate Limiting Strategies and Algorithms for High‑Concurrency Services

The article explains the need for rate limiting in high‑traffic scenarios, describes common algorithms such as token‑bucket and Guava RateLimiter, compares single‑node and distributed implementations, and outlines a Redis‑based solution for precise flow control across multiple service instances.

Architecture Digest
Architecture Digest
Architecture Digest
Rate Limiting Strategies and Algorithms for High‑Concurrency Services

When a massive traffic event like a Double‑Eleven coupon grab occurs, the number of user requests can far exceed the service interface's capacity (e.g., 500 requests/s), leading to overload and dropped requests. To maintain service availability, the request rate must be limited.

What is rate limiting? It is the control of inbound and outbound traffic to prevent resource exhaustion and system instability. A rate‑limiting system typically provides two functions: a limiting strategy and a circuit‑breaker strategy, though this article focuses on the limiting strategy.

Rate limiting algorithms

Instant concurrency limit : Guava's RateLimiter implements token‑bucket algorithms, offering SmoothBursty and SmoothWarmingUp modes.

Window‑based request limit : Restricts the maximum number of requests within a time window (per second, minute, day). This can be implemented by storing counters with expiration, e.g., using Guava's cache with a 2‑second TTL.

Token bucket : If the configured average sending rate is r , a token is added to the bucket every 1/r seconds. The bucket can hold up to b tokens; excess tokens are discarded. Incoming traffic at rate v takes tokens; if a token is available, the request passes, otherwise it is rejected (circuit‑breaker logic).

Properties of the token‑bucket algorithm:

Long‑term throughput stabilizes at the token addition rate r .

It can absorb traffic bursts due to its buffer capacity.

Additional formulas: M (max possible transmission rate in bytes/s) > r ; T_max = b/(M‑r) is the maximum time the system can sustain the burst; B_max = T_max × M is the total data transferred during that period.

Advantages: Traffic is smoothed and the algorithm can tolerate bursty loads.

Implementation options

Guava's RateLimiter class (single‑node only; not suitable for distributed systems where total QPS = per‑node QPS × number of nodes).

Redis‑based solution: store two keys—one for timing and one for counting. Each request increments the counter; if the counter stays below the threshold within the timer window, the request is processed. Because the keys are globally unique, this approach works precisely across multiple instances.

The article references further reading on Redis‑based rate limiting designs and provides links to example code repositories.

References

Design of a Redis‑based rate limiting system: https://www.zybuluo.com/kay2/note/949160

Sample implementations: https://github.com/wukq/rate-limiter

Source: http://www.54tianzhisheng.cn/2017/11/18/flow-control/

BackendRedisGuavaRate Limitingdistributed-systemstoken bucket
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.