Backend Development 12 min read

Overload Protection Strategies in WeChat's Large‑Scale Microservices

WeChat safeguards its billion‑user microservice platform by detecting overload when average queue wait exceeds 20 ms and applying a two‑dimensional priority system—business and hourly‑hashed user priorities—adjusted via histogram‑based admission controls and upstream back‑pressure, ensuring stable service during massive traffic spikes.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Overload Protection Strategies in WeChat's Large‑Scale Microservices

WeChat, a national‑level application with over one billion monthly active users, frequently encounters massive traffic spikes that can easily overload its services. Despite this, WeChat’s services remain highly stable. This article, based on the 2018 SOCC paper "Overload Control for Scaling WeChat Microservices" by Tencent WXG engineer alexccdong, introduces the overload protection strategies used in WeChat’s large‑scale microservice architecture.

What is service overload? Service overload occurs when the request volume exceeds the maximum capacity a service can handle, leading to high server load, increased latency, and user‑visible slow or failed loading. The resulting retries can cause a cascade of useless requests, potentially triggering a system‑wide avalanche.

Why does overload happen? The Internet naturally experiences bursty traffic (flash sales, viral events, holidays, malicious attacks, etc.). Popular platforms such as Weibo have experienced crashes due to sudden spikes caused by celebrity announcements.

Benefits of overload protection include improved user experience and guaranteed service quality during traffic bursts, preventing total system failure and the associated user loss, reputation damage, or even safety risks in critical applications.

WeChat’s services are built as microservices using a unified RPC framework and are organized into three layers: access services (login, messaging, payment), logic services , and foundation services . Most services belong to the logic layer, handling billions of requests per day. In a microservice call chain, a single user action may trigger dozens of downstream service calls; if any one of them overloads, the entire chain fails.

How overload is detected – WeChat monitors the average waiting time of requests in the queue (time from arrival to start of processing). When this average exceeds 20 ms (derived from five years of operational data) the system is considered overloaded. The default RPC timeout is 500 ms, and the detection is performed over intervals of either 1 s or 2000 requests.

Overload protection policies :

1) Business priority – Different business scenarios have distinct priorities (e.g., login > payment > normal chat > timeline). Requests inherit the priority of the originating business, and lower‑priority requests are discarded first.

2) User priority – Each user is assigned a priority derived from a hash of the user’s unique ID, refreshed hourly to avoid “always‑lucky” users. Requests inherit the user priority, allowing finer‑grained control when business priority alone is insufficient.

3) Adaptive priority adjustment – The system maintains a two‑dimensional control plane (business priority B, user priority U). When overload is detected, the server’s admission priority is adjusted dynamically; a request passes only if its business priority > B or (business priority = B and user priority > U).

WeChat employs a histogram‑based method to quickly adjust admission priorities. For each priority level, the server records the request volume in the previous interval. If overload is detected, the system reduces the target request volume by a factor (e.g., 5 % per priority level) and restores it gradually (e.g., 1 % per level) when load eases.

To further reduce downstream pressure, upstream services query the downstream service’s current admission priority; if the request’s priority is below the downstream threshold, the upstream service discards the request without forwarding it.

Overall flow – A user request enters the access layer, receives unified business and user priorities, and propagates these priorities through all downstream calls. Each service decides locally whether to accept or drop the request based on its current admission priority, which is periodically adjusted according to load. When a downstream service returns its admission priority, the upstream service updates its local record, enabling proactive back‑pressure.

The key characteristics of WeChat’s overload protection are:

• Business‑agnostic metrics (average waiting time) rather than response time. • Combination of independent and joint control planes (service‑level and downstream‑level priorities). • Efficient and fair handling – priority is consistent across a request chain and user priorities are periodically re‑hashed to avoid long‑term bias.

backend engineeringMicroservicesWeChatoverload controlpriority schedulingservice scaling
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.