Design and Scaling of a High‑Concurrency Spring Festival Shake‑and‑Red‑Packet System
This article details the architecture, challenges, and solutions behind a high‑traffic Spring Festival shake‑and‑red‑packet system, covering prototype design, bandwidth and request‑rate handling, access clusters, overload protection, and iterative versions that achieved up to 110 billion shakes with peak loads of 14 million requests per second.
The author presents a technical case study of the "shake" red‑packet system used during the Chinese New Year (Spring Festival) broadcast, describing how the system was designed to handle massive user interaction and monetary transactions.
V0.1 prototype: Users shake their phones, triggering a client request to a server that determines the outcome (e.g., cash greeting or red packet). The server returns a logo and background image, after which the client downloads these assets, and the red‑packet request flows through the payment and wealth‑transfer systems. The process is categorized into resource, information, business, and fund flows.
Challenges: Anticipated 7 billion viewers and peak traffic of up to 10 million requests per second, far exceeding typical peaks (e.g., 1.2 million for train ticketing, 3.3 million for WeChat messaging). Additional difficulties included unpredictable live‑show changes, a one‑time custom system with no prior reference, and the need for near‑perfect reliability.
Prototype problems: Bandwidth demand projected at 3000 PB/s, external network quality for an estimated 3.5 billion concurrent users, and the need to handle two simultaneous 10 million‑req/s streams (shake service and backend).
Solutions: Pre‑push static resources to clients during idle periods, achieving 65 resource packages with 3.7 PB total traffic and a 1 TB/s peak. Built 18 access clusters across Shanghai and Shenzhen with 638 servers, supporting up to 1.46 billion concurrent connections. Merged the shake service into the access layer, using long‑connection I/O and modular Agent components to keep shake logic lightweight and stable.
Red‑packet issuance: Seed files are generated and distributed to each access server, ensuring each packet is sent only once and rate‑limited. Mechanisms limit each user to three packets and each enterprise to one, with additional checks in the Agent to mitigate malicious claims. Large‑scale behavior analysis is used to detect abnormal account activity.
Live‑broadcast interaction: Rapid configuration updates are achieved via three‑step services deployed in both Shanghai and Shenzhen, with countdown calibration based on broadcast cues. Overload protection employs client‑side throttling (e.g., one request per 5–10 seconds) and server‑side rate limiting, allowing the system to sustain up to 40 million concurrent users.
Version evolution: V0.5 test version (confidence 50%) focused on basic shake‑to‑red‑packet functionality. V0.8 preview (confidence 70%) added more robust testing and code reviews. V1.0 official release (confidence 80%) handled the real Spring Festival event, achieving 110 billion shakes, a peak of 8.1 billion shakes per minute, and 14 million requests per second.
The study concludes with reflections on the remaining 20% risk factors—live‑show variability, on‑site incident handling, and unforeseeable edge cases—emphasizing that perfect reliability is unattainable, but thorough preparation can keep the system operational.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.