Operations 5 min read

Mastering API Latency: What P90, P95, P99 and SLA Really Mean

This article explains key performance metrics such as API latency, SLA commitments, and percentile indicators P90, P95, and P99, illustrating how to calculate and interpret these values along with average and maximum latency to improve system reliability and user experience.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering API Latency: What P90, P95, P99 and SLA Really Mean

What is latency?

API latency is the time between sending a request to an API and receiving its response, usually measured in milliseconds (ms) or seconds (s). Lower latency means faster responses, while higher latency indicates slower performance. Monitoring and optimizing latency is essential for good application performance and user experience.

图片
图片

What is an SLA (Service Level Agreement)?

An SLA is a promise between a service provider and its customers, defining the expected availability and performance. For example, an SLA might guarantee that a network is operational 99% of the time, and if it falls below that, the provider must resolve the issue within a specified period.

What are P90, P95, P99?

These percentiles represent the response‑time thresholds for an API service. P90 means 90% of requests are faster than that value, P95 means 95% are faster, and P99 means 99% are faster. They help quantify the distribution of latency.

图片
图片

Example: given 100 latency measurements sorted from smallest to largest, P50 (median) is the 50th value, P90 is the 90th value, and P99 is the 99th value.

<code>[1, 2, 3, ..., 50, ..., 90, 95, 100, 200, 500]</code>

P50 (median) = 50 ms → 50% of requests ≤ 50 ms.

P90 = 90 ms → 90% of requests ≤ 90 ms.

P99 = 200 ms → 99% of requests ≤ 200 ms.

Average and maximum latency

Average latency is the mean response time, calculated by summing all response times and dividing by the number of requests. For example, with five measurements of 2 s, 3 s, 4 s, 5 s, and 6 s, the average latency is (2+3+4+5+6)/5 = 4 s.

Maximum latency is the longest response time observed. If most users see a video start within a few seconds but one user experiences a 20‑second delay, the maximum latency is 20 seconds, representing the worst‑case experience.

operationsperformance monitoringSLAapi latencypercentiles
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.