Backend Development 43 min read

High Concurrency: Challenges, Caching Strategies, Rate Limiting, and Degradation

This article explains the concept of high concurrency, its challenges such as performance degradation and resource contention, and presents comprehensive solutions including various caching mechanisms, multiple rate‑limiting algorithms, and degradation and circuit‑breaker strategies to ensure system stability under heavy load.

High Availability Architecture

Feb 1, 2024

High Concurrency: Challenges, Caching Strategies, Rate Limiting, and Degradation

High concurrency in internet applications results in massive simultaneous requests, demanding systems to achieve the three "high" goals: high performance, high concurrency, and high availability.

Challenges of high concurrency include performance degradation, resource competition, and stability problems, which can lead to system overload or failure.

Definition : High concurrency refers to a system’s ability to process a large number of requests in the same time period without performance loss or response delay.

Typical Scenarios : E‑commerce platforms, social media, and other popular web services where many users perform actions concurrently.

Cache Strategies : Various caching layers—browser cache, client cache, CDN cache, reverse‑proxy cache, local cache, and distributed cache—are introduced with their principles, common technologies, advantages, disadvantages, and applicable scenarios.

Common cache problems such as cache penetration, cache breakdown, cache avalanche, and cache consistency are described, followed by mitigation techniques (Bloom filter, empty‑object caching, delayed double‑check, random expiration, multi‑level cache, etc.).

Rate Limiting is essential to protect systems from overload. The article covers several classic algorithms:

Fixed Window Algorithm – simple counter per fixed time window.

<span>public class FixedWindowRateLimiter {</span>
<span>    private static int counter = 0; // request count</span>
<span>    private static long lastAcquireTime = 0L;</span>
<span>    private static final long windowUnit = 1000L; // 1 second</span>
<span>    private static final int threshold = 10; // max requests per window</span>
<span>    public synchronized boolean tryAcquire() {</span>
<span>        long currentTime = System.currentTimeMillis();</span>
<span>        if (currentTime - lastAcquireTime > windowUnit) {</span>
<span>            counter = 0;</span>
<span>            lastAcquireTime = currentTime;</span>
<span>        }</span>
<span>        if (counter < threshold) {</span>
<span>            counter++;</span>
<span>            return true;</span>
<span>        }</span>
<span>        return false;</span>
<span>    }</span>
<span>}</span>

Sliding Window Algorithm – divides a window into smaller sub‑windows for smoother limiting.

<span>import java.util.LinkedList;</span>
<span>import java.util.Queue;</span>
<span>public class SlidingWindowRateLimiter {</span>
<span>    private Queue<Long> timestamps;</span>
<span>    private int windowSize;</span>
<span>    private long windowDuration;</span>
<span>    public SlidingWindowRateLimiter(int windowSize, long windowDuration) {</span>
<span>        this.windowSize = windowSize;</span>
<span>        this.windowDuration = windowDuration;</span>
<span>        this.timestamps = new LinkedList<>();</span>
<span>    }</span>
<span>    public synchronized boolean tryAcquire() {</span>
<span>        long currentTime = System.currentTimeMillis();</span>
<span>        while (!timestamps.isEmpty() && currentTime - timestamps.peek() > windowDuration) {</span>
<span>            timestamps.poll();</span>
<span>        }</span>
<span>        if (timestamps.size() < windowSize) {</span>
<span>            timestamps.offer(currentTime);</span>
<span>            return true;</span>
<span>        }</span>
<span>        return false;</span>
<span>    }</span>
<span>}</span>

Leaky Bucket Algorithm – controls outflow rate, discarding excess requests.

<span>public class LeakyBucketRateLimiter {</span>
<span>    private long capacity; // max tokens</span>
<span>    private long rate; // tokens per second</span>
<span>    private long water; // current tokens</span>
<span>    private long lastTime; // last request time</span>
<span>    public LeakyBucketRateLimiter(long capacity, long rate) {</span>
<span>        this.capacity = capacity;</span>
<span>        this.rate = rate;</span>
<span>        this.water = 0;</span>
<span>        this.lastTime = System.currentTimeMillis();</span>
<span>    }</span>
<span>    public synchronized boolean tryAcquire() {</span>
<span>        long now = System.currentTimeMillis();</span>
<span>        long elapsed = now - lastTime;</span>
<span>        water = Math.max(0, water - elapsed * rate / 1000);</span>
<span>        if (water < capacity) {</span>
<span>            water++;</span>
<span>            lastTime = now;</span>
<span>            return true;</span>
<span>        }</span>
<span>        return false;</span>
<span>    }</span>
<span>}</span>

Token Bucket Algorithm – generates tokens at a fixed rate; requests consume tokens.

<span>import java.util.concurrent.ScheduledExecutorService;</span>
<span>import java.util.concurrent.ScheduledThreadPoolExecutor;</span>
<span>import java.util.concurrent.TimeUnit;</span>
<span>public class TokenBucketRateLimiter {</span>
<span>    private long capacity;</span>
<span>    private long rate;</span>
<span>    private long tokens;</span>
<span>    private ScheduledExecutorService scheduler;</span>
<span>    public TokenBucketRateLimiter(long capacity, long rate) {</span>
<span>        this.capacity = capacity;</span>
<span>        this.rate = rate;</span>
<span>        this.tokens = capacity;</span>
<span>        this.scheduler = new ScheduledThreadPoolExecutor(1);</span>
<span>        scheduleRefill();</span>
<span>    }</span>
<span>    private void scheduleRefill() {</span>
<span>        scheduler.scheduleAtFixedRate(() -> {</span>
<span>            synchronized (this) {</span>
<span>                tokens = Math.min(capacity, tokens + rate);</span>
<span>            }</span>
<span>        }, 1, 1, TimeUnit.SECONDS);</span>
<span>    }</span>
<span>    public synchronized boolean tryAcquire() {</span>
<span>        if (tokens > 0) {</span>
<span>            tokens--;</span>
<span>            return true;</span>
<span>        }</span>
<span>        return false;</span>
<span>    }</span>
<span>}</span>

Sliding Log Algorithm – records request timestamps in an ordered list for precise rate control.

<span>import java.util.LinkedList;</span>
<span>import java.util.List;</span>
<span>public class SlidingLogRateLimiter {</span>
<span>    private int requests;</span>
<span>    private List<Long> timestamps;</span>
<span>    private long windowDuration;</span>
<span>    private int threshold;</span>
<span>    public SlidingLogRateLimiter(int threshold, long windowDuration) {</span>
<span>        this.requests = 0;</span>
<span>        this.timestamps = new LinkedList<>();</span>
<span>        this.windowDuration = windowDuration;</span>
<span>        this.threshold = threshold;</span>
<span>    }</span>
<span>    public synchronized boolean tryAcquire() {</span>
<span>        long currentTime = System.currentTimeMillis();</span>
<span>        while (!timestamps.isEmpty() && currentTime - timestamps.get(0) > windowDuration) {</span>
<span>            timestamps.remove(0);
<span>            requests--;</span>
<span>        }</span>
<span>        if (requests < threshold) {</span>
<span>            timestamps.add(currentTime);
<span>            requests++;
<span>            return true;</span>
<span>        }</span>
<span>        return false;</span>
<span>    }</span>
<span>}</span>

Each algorithm’s advantages and disadvantages are discussed, along with typical use cases such as protecting third‑party APIs, smoothing traffic spikes, and ensuring fair request distribution.

Degradation and Circuit Breaker – When a system is overloaded, degradation (fallback) and circuit‑breaker mechanisms protect core functionality. The article explains the state machine (closed, half‑open, open), trigger conditions, and tools such as Guava RateLimiter, Sentinel (single‑node and cluster modes), and Nginx rate‑limit directives.

Finally, the article emphasizes that combining caching, rate limiting, and degradation strategies provides a robust approach to handling high‑concurrency scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

high concurrency Circuit Breaker degradation

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.