Backend Development 6 min read

Traffic Peak Shaving: Origins and Implementation Strategies for High‑Concurrency Scenarios

The article explains why traffic peak shaving is needed in high‑concurrency situations such as flash sales, and describes practical solutions including message‑queue buffering and multi‑layer funnel filtering, along with caching and CDN techniques to protect backend systems.

Mike Chen's Internet Architecture

Dec 28, 2018

Traffic Peak Shaving: Origins and Implementation Strategies for High‑Concurrency Scenarios

Origin of Traffic Peak Shaving

High‑concurrency business scenarios like railway ticket rushes during Chinese New Year or Alibaba's Double‑11 flash sales generate massive, simultaneous user requests that can overwhelm servers, cause crashes, and make the service unavailable.

This problem is analogous to road rush‑hour traffic, where peak‑hour restrictions are used to smooth demand; online systems need similar mechanisms to survive sudden traffic spikes.

How to Implement Traffic Peak Shaving

Fundamentally, peak shaving delays and filters user requests so that the number of operations reaching the database is minimized.

1. Message‑Queue Solution

Using a message queue to buffer burst traffic converts synchronous calls into asynchronous pushes. The queue absorbs the instantaneous flood on one side and releases messages smoothly on the other, preventing the backend from being hit by millions of concurrent requests.

Common middleware includes ActiveMQ, RabbitMQ, ZeroMQ, Kafka, MetaMQ, RocketMQ, etc. The queue acts like a reservoir, storing upstream floodwater and releasing it downstream at a controlled rate.

2. Funnel‑Style Layered Filtering

Another approach is to filter requests at multiple layers, discarding invalid or unnecessary traffic before it reaches critical services.

The core ideas of layered filtering are:

Filter out invalid requests at each layer.

Use CDN to offload static resources (images, CSS, JS).

Leverage distributed caches such as Redis to intercept read requests upstream.

Basic principles include time‑based sharding of write data, rate‑limiting write requests, relaxing strong consistency checks for reads, and applying strong consistency only where necessary (e.g., final order‑payment flow).

Conclusion

1. In high‑concurrency scenarios like flash sales, intercept requests as early as possible to reduce downstream pressure and avoid database lock conflicts or system avalanches.

2. Separate static and dynamic resources; serve static assets via CDN.

3. Fully utilize caches (e.g., Redis) to increase QPS and overall throughput.

4. Deploy message queues (Kafka, RocketMQ, etc.) to absorb burst traffic and release it smoothly.

For deeper coverage of Redis, Dubbo micro‑services, database sharding, and other high‑concurrency architecture topics, refer to the related high‑concurrency series.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

high concurrency Traffic Shaping funnel filtering

Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.