Backend Development 6 min read

Understanding QPS, TPS, Response Time, and Concurrency in High‑Performance Systems

This article explains the meanings, calculations, and differences of key high‑performance metrics such as system throughput, QPS, TPS, response time, and concurrency, illustrating each concept with real‑world examples and visual diagrams to help engineers evaluate and optimize large‑scale systems.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Understanding QPS, TPS, Response Time, and Concurrency in High‑Performance Systems

System Throughput Metrics

System throughput refers to the number of transactions a system can process per unit time, a key indicator of performance; for example, an airport’s annual passenger throughput.

Important parameters include:

QPS

TPS

Response Time

Concurrency

Below each metric is explained in detail.

QPS

QPS (Queries Per Second) measures how many queries a server can handle each second, indicating the query handling capacity of a specific server.

TPS

TPS (Transactions Per Second) measures how many transactions a system can complete per second; a transaction may be a single API call, multiple calls, or an entire business process.

In web performance testing, a transaction includes three steps: a request to the server, internal processing (application server, database, etc.), and the server returning the result to the client.

To calculate TPS, count the number of completed transactions each second—for example, 100 transactions per second yields TPS = 100.

Difference Between QPS and TPS

If one page view counts as one TPS but generates N server requests, then QPS = N × TPS. When a single query interface does not invoke other services, TPS equals QPS.

Response Time (RT)

RT (Response Time) is the total time from when a client sends a request until it receives the response, encompassing both processing time and waiting time.

It reflects how fast a system feels to users and is a critical performance indicator.

Concurrency

Concurrency indicates the number of requests a system can handle simultaneously, reflecting its load‑handling capability.

It differs from parallelism: multiple threads share CPU time slices, creating concurrent execution even on a single CPU core.

Understanding these metrics is essential for designing and evaluating high‑traffic systems such as Alibaba’s Double‑11 shopping festival, where peak QPS can reach 583,000 orders per second.

backendconcurrencyPerformance Metricsresponse timeQPSTPS
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.