Backend Development 14 min read

Comprehensive Guide to Rate Limiting: Concepts, Algorithms, and Implementations

This article explains the principles and practical implementations of rate limiting in backend systems, covering real‑world scenarios, strategies such as circuit breaking, service degradation, delayed and privileged handling, common algorithms like counter, leaky‑bucket and token‑bucket, and code examples using Guava and Nginx + Lua.

Top Architect
Top Architect
Top Architect
Comprehensive Guide to Rate Limiting: Concepts, Algorithms, and Implementations

Rate Limiting Overview

Rate limiting is used to control traffic flow in both physical venues (e.g., tourist attractions) and online services, ensuring system availability by restricting the number of concurrent users or requests.

Limiting Strategies

Circuit Breaker

When a system encounters unrecoverable errors, a circuit breaker automatically rejects incoming traffic to prevent overload. Tools such as Hystrix and Alibaba Sentinel provide implementations.

Service Degradation

Non‑critical functionalities are temporarily disabled during traffic spikes, freeing resources for core services. Examples include disabling comments or points in e‑commerce platforms.

Delay Handling

Requests are buffered in a queue (leaky‑bucket concept) and processed sequentially, reducing immediate pressure on backend services.

Privilege Handling

Requests are classified, giving priority to high‑value users while delaying or rejecting others.

Differences Between Cache, Degradation, and Rate Limiting

Cache improves throughput, degradation shields the system when components fail, and rate limiting caps request rates when cache and degradation are insufficient.

Rate Limiting Algorithms

Counter Algorithm

A simple approach that limits the number of requests within a fixed time window (e.g., 100 requests per minute). When the count exceeds the limit, further requests are rejected.

<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>28.1-jre</version>
</dependency>

Leaky Bucket Algorithm

Requests enter a bucket and are released at a constant rate; excess requests overflow, providing smooth traffic shaping.

Token Bucket Algorithm

Tokens are added to a bucket at a steady rate; a request proceeds only if a token is available, allowing bursts while maintaining an average rate.

Concurrency Limiting

System‑wide QPS thresholds are set (e.g., Tomcat’s maxThreads, maxConnections, acceptCount) to protect against sudden spikes.

Interface Limiting

Limits can be applied per API using fixed windows or sliding windows for more precise control.

Implementation Examples

Guava RateLimiter

LoadingCache<Long, AtomicLong> counter = CacheBuilder.newBuilder()
    .expireAfterWrite(2, TimeUnit.SECONDS)
    .build(new CacheLoader<Long, AtomicLong>() {
        @Override
        public AtomicLong load(Long second) throws Exception {
            return new AtomicLong(0);
        }
    });
counter.get(1L).incrementAndGet();

Token Bucket with Guava

public static void main(String[] args) {
    RateLimiter limiter = RateLimiter.create(2); // 2 tokens per second
    System.out.println(limiter.acquire());
    Thread.sleep(2000);
    System.out.println(limiter.acquire());
    // ... additional acquire calls
}

Distributed Limiting with Nginx + Lua

local locks = require "resty.lock"
function acquire()
    local lock = locks:new("locks")
    local elapsed, err = lock:lock("limit_key")
    local limit_counter = ngx.shared.limit_counter
    local key = "ip:" .. os.time()
    local limit = 5
    local current = limit_counter:get(key)
    if current ~= nil and current + 1 > limit then
        lock:unlock()
        return 0
    end
    if current == nil then
        limit_counter:set(key, 1, 1)
    else
        limit_counter:incr(key, 1)
    end
    lock:unlock()
    return 1
end
ngx.print(acquire())

These examples demonstrate how to apply rate limiting in single‑node and distributed environments, balancing availability and performance.

backenddistributed systemsJavaRate Limitingtoken bucketcircuit-breaker
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.