Mastering Kubernetes Pod Resource Requests, Limits, and QoS
This guide explains how to configure CPU and memory requests and limits for Kubernetes pods, implement QoS classes, use LimitRange and ResourceQuota, and monitor resource usage with Prometheus queries and Grafana dashboards to ensure stable cluster operations.
1. Overview
Pod CPU Request and Memory Request are critical parameters. If they are omitted, Kubernetes assumes minimal resource needs and may schedule the pod on any node, which can lead to resource starvation when the cluster is under pressure. In such cases the node may evict pods, but critical pods (e.g., those handling data storage, login, or balance queries) must be protected.
Enforce resource quotas so different pods can only consume allocated resources.
Allow over‑provisioning to improve cluster utilization.
Assign QoS classes to pods; low‑priority pods are evicted first when resources are scarce.
Kubernetes nodes provide compute resources (CPU, GPU, Memory). This article focuses on CPU and Memory, as most workloads do not need GPU.
CPU and Memory are specified per container via
resources.requestsand
resources.limits. The scheduler uses the request values to find a node with sufficient capacity.
2. Pod Resource Usage Guidelines
Pod CPU and Memory usage is dynamic and depends on load; it is expressed as a range (e.g., 0.1‑1 CPU, 500 Mi‑1 Gi memory). Two key concepts:
Requests – reserved resources required for normal operation.
Limits – maximum resources a pod may consume; for CPU this is a compressible ceiling, for Memory it is a hard limit.
If a pod exceeds its Memory limit, it is terminated by the kubelet. Therefore, Requests and Limits must be set carefully based on actual workload needs.
Example: a pod with a 1 Gi Memory request is scheduled on a node with 1.2 Gi free. After three days the pod needs 1.5 Gi, but the node only has 200 Mi left, so the pod is killed.
Pods without Limits (or with only one of CPU/Memory limits) appear flexible but are less stable than pods with all four parameters set.
When managing hundreds of pods, manually setting Requests and Limits for each is impractical. Kubernetes provides LimitRange (default values and validation) and ResourceQuota (tenant‑level caps) to automate this.
CPU Rules
Unit: millicores (m), where 10 m = 0.01 core, 1 core = 1000 m.
Requests: estimated based on actual usage.
Limits:
Requests * 1.2(i.e., Requests + 20%).
Memory Rules
Unit: Mi, where 1024 Mi = 1 Gi.
Requests: estimated based on actual usage.
Limits:
Requests * 1.2.
3. Namespace Resource Management
Overall Requests and Limits should not exceed 80 % of cluster capacity to leave headroom for rolling updates.
3.1 Multi‑Tenant Resource Strategy
Use
ResourceQuotato limit resource consumption per project/team.
3.2 Resource Change Process
4. Resource Monitoring and Inspection
4.1 Resource Usage Monitoring
Namespace Requests usage rate
<code>sum (kube_resourcequota{type="used",resource="requests.cpu"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="requests.cpu"}) by (resource,namespace) * 100
sum (kube_resourcequota{type="used",resource="requests.memory"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="requests.memory"}) by (resource,namespace) * 100</code>Namespace Limits usage rate
<code>sum (kube_resourcequota{type="used",resource="limits.cpu"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="limits.cpu"}) by (resource,namespace) * 100
sum (kube_resourcequota{type="used",resource="limits.memory"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="limits.memory"}) by (resource,namespace) * 100</code>4.2 Viewing via Grafana
CPU request rate
<code>sum (kube_resourcequota{type="used",resource="requests.cpu",namespace=~"$NameSpace"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="requests.cpu",namespace=~"$NameSpace"}) by (resource,namespace)</code>Memory request rate
<code>sum (kube_resourcequota{type="used",resource="requests.memory",namespace=~"$NameSpace"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="requests.memory",namespace=~"$NameSpace"}) by (resource,namespace)</code>CPU limit rate
<code>sum (kube_resourcequota{type="used",resource="limits.cpu"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="limits.cpu"}) by (resource,namespace)</code>Memory limit rate
<code>sum (kube_resourcequota{type="used",resource="limits.memory"}) by (resource,namespace) / sum (kube_resourcequota{type="hard",resource="limits.memory"}) by (resource,namespace)</code>4.3 In‑Cluster Resource Inspection
Check resource usage
<code>[root@k8s-dev-slave04 yaml]# kubectl describe resourcequotas -n cloudchain--staging
Name: mem-cpu-demo
Namespace: cloudchain--staging
Resource Used Hard
-------- ---- ----
limits.cpu 200m 500m
limits.memory 200Mi 500Mi
requests.cpu 150m 250m
requests.memory 150Mi 250Mi</code>Check events for quota violations
<code>[root@kevin ~]# kubectl get event -n default
LAST SEEN TYPE REASON OBJECT MESSAGE
46m Warning FailedCreate replicaset/hpatest-57965d8c84 Error creating: pods "hpatest-57965d8c84-s78x6" is forbidden: exceeded quota: mem-cpu-demo, requested: limits.cpu=400m,limits.memory=400Mi, used: limits.cpu=200m,limits.memory=200Mi, limited: limits.cpu=500m,limits.memory=500Mi
... (additional similar events) ...</code>Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.