Backend Development 8 min read

Service Rate Limiting, Degradation, and Caching Strategies for High-Concurrency E‑Commerce Systems

This article discusses how to handle sudden traffic spikes in e‑commerce APIs by employing caching, rate‑limiting (leaky bucket, token bucket, sliding window), Nginx and Java Semaphore limits, distributed queue buffering, service degradation, and cache‑consistency techniques to ensure system stability.

Architect

Dec 14, 2022

Service Rate Limiting

Rate limiting aims to control the speed of concurrent requests or the number of requests within a time window, rejecting, queuing, or degrading service when the limit is reached.

Rate Limiting Algorithms

Leaky Bucket – Requests are placed into a bucket; if the bucket is full, excess requests are dropped or trigger a limit strategy. The bucket releases requests at a fixed rate.

Token Bucket – Tokens are added to a bucket at a constant rate; a request consumes a token, allowing bursts when tokens are available.

Sliding Window – The time window is divided into sub‑intervals; counts are recorded per sub‑interval and old intervals are discarded as the window slides.

Ingress Rate Limiting

Nginx uses the leaky‑bucket algorithm via the limit_req module, limiting requests based on client IP or User‑Agent.

Local Interface Rate Limiting

Java Semaphore can restrict concurrent access to a resource. Example:

private final Semaphore permit = new Semaphore(40, true);
public void process() {
    try {
        permit.acquire();
        // TODO: handle business logic
    } catch (InterruptedException e) {
        e.printStackTrace();
    } finally {
        permit.release();
    }
}

Distributed Interface Rate Limiting

Message queues (MQ or Redis List) can act as a buffering layer based on the leaky‑bucket principle, smoothing bursts before consuming at the service’s throughput.

Service Degradation

When traffic spikes after risk control, a fallback plan can downgrade non‑critical services, either delaying or pausing them.

Degradation Strategies

Stop edge‑case features (e.g., disable historical order queries during peak sales).

Reject requests using random rejection, reject oldest, or reject non‑core requests.

Recovery

After degradation, register additional consumer services and apply slow‑load techniques to handle remaining traffic.

Data Caching

To protect hot data during spikes, use distributed locks, cache hot data in middleware, let requests read from cache, and asynchronously process results via a message queue.

Cache Consistency Issues

For a stock‑keeping interface with limited inventory, strategies include read‑write separation with Redis Sentinel, load‑balanced cache sharding, and page‑cache aggregation to avoid over‑consumption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems service degradation Rate Limiting

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.