Cloud Native 12 min read

How Do Kubernetes Resource Limits Really Work? A Deep Dive into CPU Throttling

This article explains how Kubernetes resource limits function, how to interpret CPU limits as time slices, the Linux accounting system behind them, relevant Prometheus metrics for detecting throttling, practical examples with multithreaded containers, and guidance on setting alerts and avoiding performance pitfalls.

Efficient Ops
Efficient Ops
Efficient Ops
How Do Kubernetes Resource Limits Really Work? A Deep Dive into CPU Throttling

This article introduces how Kubernetes resource limits operate, which metrics to use for setting appropriate limits, and how to identify CPU throttling issues.

1. Understanding Limits

When configuring limits, we tell the Linux node the runtime duration a container may use within a specific period, protecting other workloads from excessive CPU consumption.

Limits refer not to physical CPU cores but to the amount of CPU time allocated to a group of processes or threads in a container before it is paused, which can be confusing at the scheduler level where physical cores are considered.

The Kubernetes scheduler uses the concept of physical cores, but containers should treat limits as CPU time.

2. Limits as Time

Consider a single‑threaded application that needs 1 second of CPU time to complete a transaction. Setting

limits:
cpu: 1000m

allocates 1000 millicores, i.e., 1 CPU second.

<code>resources:
  limits:
    cpu: 1000m</code>

If the application must run uninterrupted for a full second, the container must be allowed to run for 1000 ms before being throttled.

One CPU second is treated as a period for measuring time blocks.

3. Linux Accounting System

Limits act as an accounting system that tracks and restricts the total vCPU usage of a container within a fixed period, using a global pool of runtime.

The Linux kernel divides each period into 20 slices. To run half a period, allocate half the slices.

Similarly,

cpu.cfs_period_us

defines the period length (default 100 ms) and

cpu.cfs_quota_us

sets the allowed CPU time within that period.

4. Multithreaded Containers

Containers often run many threads (hundreds in Go or Java). The accounting system must track vCPU usage across all threads.

Using

container_cpu_usage_seconds_total

, we can observe total vCPU seconds consumed by an application’s threads.

5. Global Accounting

When a CPU needs to run a thread, it checks whether the container’s global quota has enough time slices (e.g., 5 ms). If not, the thread is throttled until the next period.

6. Real‑World Scenario

Assume four threads each need 100 ms of CPU time, totaling 400 ms (4000 m). Setting limits accordingly prevents throttling, but actual workloads vary, making static limits unreliable.

CPU burst features (e.g.,

cpu.cfs_burst_us

) can carry unused quota to the next period.

7. Common Metrics

Key metrics include

container_cpu_cfs_throttled_periods_total

(throttled periods) and

container_cpu_cfs_periods_total

(total periods). Their ratio indicates throttling severity.

8. Determining Needed Limits

The

container_cpu_cfs_throttled_seconds_total

metric from cAdvisor aggregates throttled 5 ms slices, allowing calculation of excess usage.

<code>topk(3, max by (pod, container)(rate(container_cpu_usage_seconds_total{image!="", instance="$instance"}[$__rate_interval]))) / 10</code>

Example experiment: run a 4‑thread sysbench workload that needs 400 ms of CPU time within a real 100 ms interval.

<code>command:
  - sysbench
  - cpu
  - --threads=4
  - --time=0
  - run</code>

Observing ~400 ms vCPU usage, then applying limits:

<code>resources:
  limits:
    cpu: 2000m
    memory: 128Mi</code>

The CPU usage halves as expected.

<code>topk(3, max by (pod, container)(rate(container_cpu_cfs_throttled_seconds_total{image!="", instance="$instance"}[$__rate_interval]))) / 10</code>

9. Alerting

<code># Alert when throttled time exceeds 1s
rate(container_cpu_cfs_throttled_seconds_total{namespace=~"wordpress-.*"}[1m]) > 1
# Alert when throttled periods exceed 50% of total periods
sum(increase(container_cpu_cfs_throttled_periods_total{container!=""}[5m])) by (container, pod, namespace) /
 sum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace) * 100 > 50</code>

10. Conclusion

The article covered how Kubernetes limits work, the metrics to set appropriate values, and how to diagnose throttling issues. Over‑provisioning CPU limits can lead to idle vCPU and increased latency, while realistic workloads with multiple runtime threads typically tolerate higher limits without performance loss.

KubernetesPrometheusresource limitsCPU throttlingcAdvisorLinux accounting
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.