Backend Development 18 min read

Design and Implementation of a High‑Throughput WeChat Red Packet System

This article describes the design, implementation, and performance testing of a simulated WeChat red‑packet service capable of handling up to 100 billion requests, detailing target load calculations, hardware and software choices, concurrency strategies in Go, monitoring tools, and experimental results across multiple QPS levels.

Java Architect Essentials

Aug 13, 2020

Design and Implementation of a High‑Throughput WeChat Red Packet System

Introduction

The author reflects on a 2015 article about building a reliable Spring Festival red‑packet system and uses its ideas to create a local simulation that can handle massive load, aiming for a single‑machine peak of 60 k QPS and support for one million concurrent connections.

Background Knowledge

QPS: Queries per second.

PPS: Packets per second.

Shake red‑packet: Client requests a random red‑packet; if available the user receives it.

Send red‑packet: System creates a red‑packet with a fixed amount, assigns it to several users, and users can claim portions of the amount.

Target Determination

Based on 638 backend servers and an estimated 1.4 billion QPS peak, the per‑machine load is calculated: roughly 90 k users per server, a single‑machine peak of about 23 k QPS (target raised to 30 k and 60 k for testing), and a shake‑red‑packet rate of 83 per second per machine.

Hardware and Software

The prototype uses Go 1.8r3, shell, and Python on Ubuntu 12.04 servers (Dell R2950, 8‑core, 16 GB RAM) and Debian 5.0 client VMs (4 core, 5 GB RAM). A total of 17 client VMs simulate one million connections.

Technical Analysis and Implementation

Key techniques include:

Dividing one million connections into multiple independent SETs, each managing a few thousand connections, to keep goroutine count low.

Using a single goroutine per connection for reading, and a dedicated SET goroutine for processing messages, reducing CPU and memory usage.

Synchronizing client clocks via NTP and distributing request load by modulo arithmetic (userId % groupCount).

Embedding counters in the code to record per‑second request numbers.

Monitoring network traffic with a Python script that wraps ethtool.

Code Implementation

An example alias used for counting connections:

Alias ss2=Ss –ant | grep 1025 | grep EST | awk –F: "{print $8}" | sort | uniq –c

Practice

The experiment proceeds in three phases:

Start server and monitoring, then launch 17 client VMs to establish one million connections.

Increase client QPS to 30 k, observe stable request rates and monitor network usage.

Raise client QPS to 60 k, note increased variance and occasional drops due to network and scheduler limits.

During each phase, red‑packet generation services emit 200 packets per second, and a separate sending service distributes 20 k red‑packets at the same rate.

Data Analysis

Graphs of client‑side QPS, server‑side QPS, and red‑packet generation show:

Client QPS stays near the target values with minor spikes caused by goroutine scheduling and network latency.

Server QPS mirrors client behavior but exhibits a noticeable dip around 22:57 am, indicating a need for further optimization.

Red‑packet acquisition stabilises around 200 per second at 30 k QPS, but fluctuates at 60 k QPS due to network jitter.

Golang pprof data reveals occasional GC pauses >10 ms, acceptable given the old hardware.

Conclusion

The prototype successfully demonstrates a system that supports one million users and sustains 30 k–60 k QPS on a single machine, achieving the design goals. The author also lists differences from a production environment, such as the absence of payment services and the need for more robust monitoring and scaling mechanisms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Golang load-testing distributed-systems high-concurrency backend-architecture performance-analysis

Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.