Prevent a Single Pod from Crashing Your Kubernetes Cluster with Resource Quota
This article explains why missing ResourceQuota and LimitRange cause cluster-wide failures, walks through core concepts, provides step‑by‑step commands for quota inspection, creation, and validation, shares a real‑world outage case study, and offers best‑practice recommendations, advanced configurations, monitoring, and rollback procedures for Kubernetes resource management.
Problem Background
In production a simple typo—e.g., writing replicas: 500 instead of replicas: 50 or assigning an oversized memory request—can consume all node resources, trigger kubelet eviction, and leave the cluster partially unusable. The root cause is usually the absence of resource limits.
Core Concepts Overview
ResourceQuota
Namespace‑level total caps for resources such as requests.cpu, requests.memory, pods, services, and persistentvolumeclaims. It limits the sum of all requests in the namespace.
LimitRange
Namespace‑level defaults and bounds for individual containers or Pods. It injects default requests and limits when they are omitted and enforces min/max values.
Pod requests vs limits
The scheduler evaluates only requests. If two Pods each request 500 mCPU, the scheduler assumes 500 mCPU for each, even if their limits allow 2 CPU. When the Pods actually use their limits, the node can become oversubscribed, leading to OOM or CPU throttling.
QoS Classes
Guaranteed : requests = limits for all containers.
Burstable : requests < limits or limits omitted.
BestEffort : neither requests nor limits are set.
Practical ResourceQuota Workflow
1. View current quota status
kubectl get resourcequota -n <namespace>
kubectl describe resourcequota -n <namespace>2. Calculate current resource requests
# CPU requests (cores)
kubectl get pods -n <namespace> -o json | \
jq -r '[.items[].spec.containers[].resources.requests.cpu // "0" |
if endswith("m") then (split("m")[0]|tonumber)/1000 else tonumber end] | add'
# Memory requests (Mi)
kubectl get pods -n <namespace> -o json | \
jq -r '[.items[].spec.containers[].resources.requests.memory // "0" |
if endswith("Mi") then (split("Mi")[0]|tonumber)
elif endswith("Gi") then (split("Gi")[0]|tonumber)*1024
else tonumber/1048576 end] | add'3. Create a ResourceQuota
apiVersion: v1
kind: ResourceQuota
metadata:
name: prod-compute-quota
namespace: production
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "100"
services: "20"
configmaps: "50"
persistentvolumeclaims: "10"
requests.storage: "100Gi"
requests.storage.ssd: "200Gi"
requests.storage.hdd: "500Gi"4. Verify enforcement
# Attempt to create a pod that exceeds the quota
kubectl run test-nginx --image=nginx -n production
# The API server should return an "exceeded quota" error.LimitRange Workflow
1. Create a LimitRange
apiVersion: v1
kind: LimitRange
metadata:
name: production-limits
namespace: production
spec:
limits:
- type: Container
default:
cpu: 500m
memory: 256Mi
defaultRequest:
cpu: 100m
memory: 128Mi
max:
cpu: "8"
memory: "32Gi"
min:
cpu: 10m
memory: 16Mi
maxLimitRequestRatio:
cpu: "4"
memory: "4"
- type: Pod
max:
cpu: "8"
memory: "16Gi"
- type: PersistentVolumeClaim
min:
storage: 1Gi
max:
storage: 100Gi2. Verify default injection
cat <<'EOF' | kubectl apply -f - -n production
apiVersion: v1
kind: Pod
metadata:
name: test-nolimits
spec:
containers:
- name: busybox
image: busybox:1.36
command: ["sleep","3600"]
EOF
kubectl get pod test-nolimits -n production -o jsonpath='{.spec.containers[0].resources}'The output shows the default requests and limits injected by the LimitRange.
Scheduling Details
The scheduler checks requests.cpu and requests.memory against each node’s allocatable resources. It ignores limits. Therefore a Pod with low requests but high limits can still overload a node once it runs at full capacity.
Advanced Topics
Vertical Pod Autoscaler (VPA)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: backend-api-vpa
namespace: production
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: backend-api
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: backend
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 4
memory: 8GiIn production it is common to start with updateMode: "Off" or "Initial" and switch to "Auto" after manual review.
PriorityClass + ResourceQuota for Multi‑Tenant Clusters
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
description: "Critical production workloads"
value: 100000
globalDefault: false
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: high-priority-quota
namespace: production
spec:
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["high-priority"]
hard:
requests.cpu: "20"
requests.memory: "40Gi"
pods: "80"When the cluster is under pressure, the scheduler will preempt lower‑priority Pods in favor of those belonging to the high-priority class.
Troubleshooting Path for Quota Exhaustion
Step 1 – Confirm quota exhaustion
# Check quota usage
kubectl get resourcequota -n <namespace>
kubectl describe resourcequota -n <namespace>Step 2 – Identify the biggest consumers
# Sort Pods by CPU request
kubectl get pods -n <namespace> -o wide --sort-by='.spec.containers[0].resources.requests.cpu'
# Sort Pods by memory request
kubectl get pods -n <namespace> -o wide --sort-by='.spec.containers[0].resources.requests.memory'
# List Deployments to spot abnormal replica counts
kubectl get deployments -n <namespace> -o wideStep 3 – Verify LimitRange defaults
kubectl describe limitrange -n <namespace>If a Pod declares a request, that value overrides the LimitRange defaults.
Step 4 – Check node‑level allocatable resources
# Show allocatable resources per node
kubectl describe nodes | grep -A 5 "Allocatable"
# Real usage (requires metrics‑server)
kubectl top nodesStep 5 – Detect evictions or OOM kills
# Look for eviction or OOM events
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | grep -i "evicted\|oom"
# On the node, inspect kubelet logs
journalctl -u kubelet | grep -i "oom\|evict\|memory"Rollback Strategies
# Backup current objects
kubectl get resourcequota prod-compute-quota -n production -o yaml > /tmp/quota-backup.yaml
kubectl get limitrange production-limits -n production -o yaml > /tmp/limitrange-backup.yaml
# Restore previous version
kubectl apply -f /tmp/quota-backup.yaml
kubectl apply -f /tmp/limitrange-backup.yaml
# Temporary hardening (patch)
kubectl patch resourcequota prod-compute-quota -n production \
--type=json -p='[{"op":"replace","path":"/spec/hard/pods","value":"300"}]'Monitoring Quota Usage
kube‑state‑metrics exposes kube_resourcequota metrics. Example PromQL for CPU request usage:
kube_resourcequota{type="used",resource="requests.cpu",namespace="production"}
/
ignoring(type) kube_resourcequota{type="hard",resource="requests.cpu",namespace="production"}Grafana alerts can be set to fire when the usage ratio exceeds 80 % for CPU, memory, or pod count.
Key Takeaways
Enable both ResourceQuota and LimitRange in every namespace.
Set hard values below the cluster’s total allocatable capacity, leaving ~20 % headroom for system components.
Never rely solely on defaults for critical workloads; always declare explicit requests and sensible limits in Deployment manifests.
Use QoS classes, PriorityClasses, and VPA to balance utilization, protect core services, and automate safe scaling.
Implement continuous monitoring and alerting on quota usage to catch over‑commit before it leads to a cluster‑wide outage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
