Optimize Kubernetes Resource Use with Requests, Limits, and Scheduling
This article explains common causes of resource waste in Kubernetes clusters, such as over‑provisioned requests and fluctuating workloads, and provides practical methods—including proper request/limit settings, ResourceQuota and LimitRange policies, node affinity, taints and tolerations, and HPA—to improve overall resource utilization and cluster stability.
Improving Resource Utilization
1.1 Resource Waste Scenarios
Over 50% waste due to resource reservation
Kubernetes uses the Request field to reserve CPU and memory for a container, guaranteeing a minimum amount of resources that cannot be preempted by other containers. If the Request is set too low, the workload may lack resources under high load; therefore users often set Requests high to ensure reliability.
In most periods, actual workload load is not high. The diagram below shows a real‑world case where the reserved CPU (Request) far exceeds the actual CPU usage, resulting in wasted resources that cannot be used by other workloads.
To address this, users should set Requests based on actual load and limit unlimited resource requests, referring to Request Quota and Limit Ranges later.
2. Workload peak‑valley patterns cause obvious waste
Most services exhibit peak‑valley patterns (e.g., bus systems busy by day, quiet at night; games peak on Friday evenings, dip on Sunday). Fixed Requests lead to low utilization during valleys.
Dynamic replica scaling (e.g., Kubernetes HPA) can handle these fluctuations.
3. Different workload types have varying resource needs
Online services require high performance during the day, while offline batch jobs can run during valleys. CPU‑intensive workloads consume more CPU; memory‑intensive workloads consume more memory.
Mixing offline and online workloads with appropriate affinity, taints, and tolerations improves overall utilization.
1.2 Methods to Improve Resource Utilization
Two approaches: use native Kubernetes capabilities for manual resource partitioning and limits, and combine business‑specific automation. This section focuses on native Kubernetes methods.
1.2.1 How to Partition and Limit Resources
Imagine you manage a cluster shared by four business units. To improve overall utilization, you need to cap each unit's resource usage and set sensible defaults.
Ideally, each workload sets appropriate Request (minimum guaranteed) and Limit (maximum allowed). In practice, users often forget or set them excessively high.
Example values:
CPU: Request 0.25, Limit 0.5
Memory: Request 256 MiB, Limit 1024 MiB
For finer‑grained control, use namespace‑level ResourceQuota and LimitRange .
1.2.2 Using ResourceQuota
ResourceQuota limits the total resources a namespace can consume (CPU, memory, storage, object counts). It helps isolate projects and prevent a single namespace from exhausting cluster resources.
Compute resources: sum of all container Requests and Limits
Storage resources: total PVC storage requests
Object counts: total number of PVC, Service, ConfigMap, Deployment, etc.
Typical scenarios:
Allocate separate namespaces for different teams and set quotas per namespace
Set upper limits to improve cluster stability and avoid resource hogging
A script (using the
kubectl-view-allocationsplugin) can generate initial ResourceQuota YAML files for each namespace, inflating the current total Request/Limit by 30%.
<code>wget https://my-repo-1255440668.cos.ap-chengdu.myqcloud.com/ops/ResourceQuota-install.tar.gz
tar -xvf ResourceQuota-install.tar.gz
cd ResourceQuota-install && bash install.sh</code>After execution, a
resourceQuotadirectory with
ResourceQuota.yamlfiles is created for further adjustment and
kubectl apply.
Note: If a namespace’s total Request/Limit exceeds its ResourceQuota, new Pods cannot be created. Pods must specify
requests.cpu,
requests.memory,
limits.cpu, and
limits.memory.
1.2.3 Using LimitRange
LimitRange sets default and min/max values for individual containers within a namespace, preventing users from creating pods with too small or too large resource specifications.
Compute resources: define CPU and memory ranges
Storage resources: define PVC size ranges
Ratio settings: control Request‑to‑Limit ratios
Default values: automatically apply when a pod omits explicit settings
Typical use cases:
Provide default Request/Limit values to avoid user omission and protect QoS
Set different defaults per namespace based on workload characteristics
Enforce upper and lower bounds to keep pods healthy while limiting over‑consumption
1.2.4 Scheduling Strategies
Kubernetes scheduling finds the most suitable node for each Pod. Proper scheduling policies, combined with business characteristics, can greatly improve cluster resource utilization.
1.2.4.1 Node Affinity
If a CPU‑intensive workload lands on a memory‑focused node, CPU resources may be wasted. By labeling nodes (e.g.,
cpu‑intensive=true) and adding matching affinity rules to Pods, the scheduler places workloads on appropriate nodes, enhancing utilization.
1.2.4.2 Taints and Tolerations
Taints mark nodes as unsuitable for Pods unless the Pod explicitly tolerates the taint. Tolerations allow Pods to run on tainted nodes. This mechanism can be used for node exclusivity, special hardware (e.g., GPUs), or handling node failures.
<code>kubectl taint nodes nodename dedicated=groupName:NoSchedule</code>Pods with matching tolerations can then be scheduled onto those nodes.
<code>kubectl taint nodes nodename special=true:NoSchedule
kubectl taint nodes nodename special=true:PreferNoSchedule</code>For node‑failure scenarios, a toleration such as the following lets a Pod survive temporary network partitions:
<code>tolerations:
- key: "node.alpha.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 6000</code>Improving cluster stability involves many techniques; resource utilization is just one of them. More methods will be shared in future articles.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.