Operations 8 min read

Full‑Chain Production Environment Load Testing for Double 11 Promotion: Process, Findings, and Lessons

This article details the end‑to‑end preparation, execution, reporting, and retrospective of a large‑scale production‑environment load test for the Double 11 shopping festival, covering data preparation, QPS target calculation, multi‑scenario testing, issue analysis, and continuous improvement practices.

转转QA
转转QA
转转QA
Full‑Chain Production Environment Load Testing for Double 11 Promotion: Process, Findings, and Lessons

Background – Large‑scale promotional load testing (single‑scenario, multi‑scenario, full‑chain) simulates massive user requests on the production environment to identify performance bottlenecks and resource capacity limits, enabling continuous optimization.

Preparation Phase

1. Identify business flows, pages, and APIs to be tested. 2. Analyze dynamic parameters and record interfaces. 3. Construct dynamic parameter data for load generation. 4. Set target QPS based on last year’s Double 11 and this year’s 618 traffic peaks, applying a promotion factor (1.2×) and a safety multiplier (5‑10×). 计算公式:(80%请求数 × 冗余系数) ÷ 20%时间 或 峰值时间 = TPS

Target QPS – (image of QPS targets)

Issue Review

Token refresh errors due to short cache time.

Missing double quotes around token in cookies caused invalid token errors.

Solutions

Generate tokens via data construction and export to CSV.

Wrap token strings in double quotes in load files.

Related Services Overview – (image of service map)

Load Test Scenarios – (images of scenario diagrams)

Execution Phase

Period: October 9 – November 9, 9 rounds total (4 single‑mall, 3 double‑mall, 2 full‑mall).

Single‑Mall – Goal: resolve single‑API performance issues, adjust rate‑limiting and resources. After four rounds, CPU and container nodes were increased for services zljA‑D, achieving target QPS. Main issues were rate‑limit omissions and resource constraints.

Double‑Mall – Goal: meet QPS targets, adjust resources, uncover dependency bottlenecks. After three rounds, QPS targets were met. Issues included search service timeouts under high thread counts, high Nginx reporting latency, and token cache expiration causing DB pressure. Solutions involved caching, hourly token refresh, and single‑API single‑thread recordings.

Full‑Chain – Goal: meet overall API expectations. Problems were high latency in promotion APIs, search bottlenecks, and DB pressure from red‑packet spikes. Solutions focused on cross‑team call‑chain optimization.

Report Compilation

Contents include resource configuration, data files, round‑by‑round summaries (pass/fail, root causes, owners), test records, timeout‑heavy interfaces, dependency services, and screenshots. (images of report examples)

Summary & Retrospective

Comparison of expected vs. actual QPS for Double 11 and peak business load. (images of QPS vs. business peaks)

Growth Journey

From zero knowledge to mastering load testing: initial unfamiliarity, learning platform usage, script recording, metric observation, and report generation; evolving to understand different testing objectives, monitor platform usage, and identify bottlenecks across services.

Key takeaways: systematic issue summarization, improving test plans, leveraging monitoring tools for root‑cause analysis, and proactive configuration (e.g., pre‑request token refresh, enabling circuit‑breakers). (image of personal growth)

performanceoperationsCapacity Planningload testingQPSdouble11production environment
转转QA
Written by

转转QA

In the era of knowledge sharing, discover 转转QA from a new perspective.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.