How Do Kubernetes Resource Limits Really Work? A Deep Dive into CPU Throttling
This article explains how Kubernetes resource limits function, how to interpret CPU limits as time slices, the Linux accounting system behind them, relevant Prometheus metrics for detecting throttling, practical examples with multithreaded containers, and guidance on setting alerts and avoiding performance pitfalls.
This article introduces how Kubernetes resource limits operate, which metrics to use for setting appropriate limits, and how to identify CPU throttling issues.
1. Understanding Limits
When configuring limits, we tell the Linux node the runtime duration a container may use within a specific period, protecting other workloads from excessive CPU consumption.
Limits refer not to physical CPU cores but to the amount of CPU time allocated to a group of processes or threads in a container before it is paused, which can be confusing at the scheduler level where physical cores are considered.
The Kubernetes scheduler uses the concept of physical cores, but containers should treat limits as CPU time.
2. Limits as Time
Consider a single‑threaded application that needs 1 second of CPU time to complete a transaction. Setting
limits: cpu: 1000mallocates 1000 millicores, i.e., 1 CPU second.
<code>resources:
limits:
cpu: 1000m</code>If the application must run uninterrupted for a full second, the container must be allowed to run for 1000 ms before being throttled.
One CPU second is treated as a period for measuring time blocks.
3. Linux Accounting System
Limits act as an accounting system that tracks and restricts the total vCPU usage of a container within a fixed period, using a global pool of runtime.
The Linux kernel divides each period into 20 slices. To run half a period, allocate half the slices.
Similarly,
cpu.cfs_period_usdefines the period length (default 100 ms) and
cpu.cfs_quota_ussets the allowed CPU time within that period.
4. Multithreaded Containers
Containers often run many threads (hundreds in Go or Java). The accounting system must track vCPU usage across all threads.
Using
container_cpu_usage_seconds_total, we can observe total vCPU seconds consumed by an application’s threads.
5. Global Accounting
When a CPU needs to run a thread, it checks whether the container’s global quota has enough time slices (e.g., 5 ms). If not, the thread is throttled until the next period.
6. Real‑World Scenario
Assume four threads each need 100 ms of CPU time, totaling 400 ms (4000 m). Setting limits accordingly prevents throttling, but actual workloads vary, making static limits unreliable.
CPU burst features (e.g.,
cpu.cfs_burst_us) can carry unused quota to the next period.
7. Common Metrics
Key metrics include
container_cpu_cfs_throttled_periods_total(throttled periods) and
container_cpu_cfs_periods_total(total periods). Their ratio indicates throttling severity.
8. Determining Needed Limits
The
container_cpu_cfs_throttled_seconds_totalmetric from cAdvisor aggregates throttled 5 ms slices, allowing calculation of excess usage.
<code>topk(3, max by (pod, container)(rate(container_cpu_usage_seconds_total{image!="", instance="$instance"}[$__rate_interval]))) / 10</code>Example experiment: run a 4‑thread sysbench workload that needs 400 ms of CPU time within a real 100 ms interval.
<code>command:
- sysbench
- cpu
- --threads=4
- --time=0
- run</code>Observing ~400 ms vCPU usage, then applying limits:
<code>resources:
limits:
cpu: 2000m
memory: 128Mi</code>The CPU usage halves as expected.
<code>topk(3, max by (pod, container)(rate(container_cpu_cfs_throttled_seconds_total{image!="", instance="$instance"}[$__rate_interval]))) / 10</code>9. Alerting
<code># Alert when throttled time exceeds 1s
rate(container_cpu_cfs_throttled_seconds_total{namespace=~"wordpress-.*"}[1m]) > 1
# Alert when throttled periods exceed 50% of total periods
sum(increase(container_cpu_cfs_throttled_periods_total{container!=""}[5m])) by (container, pod, namespace) /
sum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace) * 100 > 50</code>10. Conclusion
The article covered how Kubernetes limits work, the metrics to set appropriate values, and how to diagnose throttling issues. Over‑provisioning CPU limits can lead to idle vCPU and increased latency, while realistic workloads with multiple runtime threads typically tolerate higher limits without performance loss.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.