Capacity Design and Performance Evaluation: Estimating QPS, Concurrency, and System Scaling
The article explains how to assess system capacity by analyzing daily traffic, calculating average and peak QPS, applying the 80/20 rule, conducting stress tests, and adjusting resources, illustrated with a sports event example and a book reservation system case study.
Background
A company holds an annual sports meet with a 2000 m race; 40 male and 20 female participants normally register, requiring at least six 10‑person races, each lasting 30 minutes, totaling three hours of event time.
When the 4000 m race was cancelled, 50 extra participants joined the 2000 m race, causing capacity issues and forcing half of the races to be postponed, highlighting the need for proper capacity design.
Concept
Design capacity is the process of estimating system capacity using various metrics such as data volume, concurrency, bandwidth, user counts, message length, image size, storage, CPU, and memory.
The article uses concurrency as a concrete example to demonstrate the analysis process.
Analysis Process
Understanding Key Metrics
TPS (Transactions Per Second) and QPS (Queries Per Second) measure throughput; concurrency indicates the number of simultaneous requests the system can handle.
Peak QPS can be estimated using the 80/20 rule: 80% of traffic occurs in 20% of the time.
Formula: Peak QPS = (Total PV × 80%) / (Total seconds per day × 20%).
When to Evaluate System Capacity
1. Temporary traffic spikes (e.g., promotional events like 618, Double 11).
2. Initial system capacity assessment before launch.
3. Changes in baseline capacity due to feature growth, increased data flow, or higher active user numbers.
Capacity Evaluation Steps
1. Analyze daily total visits (PV/UV) from product, operations, and historical data.
2. Estimate average QPS for the active period.
3. Estimate peak QPS using traffic curves or the 80/20 rule.
4. Conduct performance stress testing (e.g., using nGrinder or JMeter) to determine the maximum QPS a single instance can sustain before response time exceeds the acceptable threshold (typically 2 seconds, preferably 1 second).
5. Adjust based on redundancy and actual performance; calculate the required number of instances to handle the peak load.
Case Study: Book Reservation System
Using the 80/20 rule, a system with 1 500 000 total PV over a 9‑hour window yields a peak QPS of approximately 185 requests/s.
With an average response time of 0.5 seconds, the required concurrency is 92.5, rounded to 100, and after applying a pessimism factor, the final target concurrency is set to 200.
Performance testing showed a single web instance can handle about 2 500 QPS, which is reduced to 2 000 QPS to keep response time under 1 second, leading to a recommendation of at least four web instances for a peak of 7 500 QPS.
Summary
System capacity should be evaluated during temporary traffic spikes, initial launch, and when baseline capacity changes; the steps include traffic analysis, QPS estimation, peak assessment, stress testing, and redundancy‑based adjustment.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.