Cloud Native 40 min read

Prevent a Single Pod from Crashing Your Kubernetes Cluster with Resource Quota

This article explains why missing ResourceQuota and LimitRange cause cluster-wide failures, walks through core concepts, provides step‑by‑step commands for quota inspection, creation, and validation, shares a real‑world outage case study, and offers best‑practice recommendations, advanced configurations, monitoring, and rollback procedures for Kubernetes resource management.

Ops Community

Jun 1, 2026

Prevent a Single Pod from Crashing Your Kubernetes Cluster with Resource Quota

Problem Background

In production a simple typo—e.g., writing replicas: 500 instead of replicas: 50 or assigning an oversized memory request—can consume all node resources, trigger kubelet eviction, and leave the cluster partially unusable. The root cause is usually the absence of resource limits.

Core Concepts Overview

ResourceQuota

Namespace‑level total caps for resources such as requests.cpu, requests.memory, pods, services, and persistentvolumeclaims. It limits the sum of all requests in the namespace.

LimitRange

Namespace‑level defaults and bounds for individual containers or Pods. It injects default requests and limits when they are omitted and enforces min/max values.

Pod requests vs limits

The scheduler evaluates only requests. If two Pods each request 500 mCPU, the scheduler assumes 500 mCPU for each, even if their limits allow 2 CPU. When the Pods actually use their limits, the node can become oversubscribed, leading to OOM or CPU throttling.

QoS Classes

Guaranteed : requests = limits for all containers.

Burstable : requests < limits or limits omitted.

BestEffort : neither requests nor limits are set.

Practical ResourceQuota Workflow

1. View current quota status

kubectl get resourcequota -n <namespace>
kubectl describe resourcequota -n <namespace>

2. Calculate current resource requests

# CPU requests (cores)
kubectl get pods -n <namespace> -o json | \
  jq -r '[.items[].spec.containers[].resources.requests.cpu // "0" |
        if endswith("m") then (split("m")[0]|tonumber)/1000 else tonumber end] | add'

# Memory requests (Mi)
kubectl get pods -n <namespace> -o json | \
  jq -r '[.items[].spec.containers[].resources.requests.memory // "0" |
        if endswith("Mi") then (split("Mi")[0]|tonumber)
        elif endswith("Gi") then (split("Gi")[0]|tonumber)*1024
        else tonumber/1048576 end] | add'

3. Create a ResourceQuota

apiVersion: v1
kind: ResourceQuota
metadata:
  name: prod-compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "100"
    services: "20"
    configmaps: "50"
    persistentvolumeclaims: "10"
    requests.storage: "100Gi"
    requests.storage.ssd: "200Gi"
    requests.storage.hdd: "500Gi"

4. Verify enforcement

# Attempt to create a pod that exceeds the quota
kubectl run test-nginx --image=nginx -n production
# The API server should return an "exceeded quota" error.

LimitRange Workflow

1. Create a LimitRange

apiVersion: v1
kind: LimitRange
metadata:
  name: production-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: 500m
      memory: 256Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    max:
      cpu: "8"
      memory: "32Gi"
    min:
      cpu: 10m
      memory: 16Mi
    maxLimitRequestRatio:
      cpu: "4"
      memory: "4"
  - type: Pod
    max:
      cpu: "8"
      memory: "16Gi"
  - type: PersistentVolumeClaim
    min:
      storage: 1Gi
    max:
      storage: 100Gi

2. Verify default injection

cat <<'EOF' | kubectl apply -f - -n production
apiVersion: v1
kind: Pod
metadata:
  name: test-nolimits
spec:
  containers:
  - name: busybox
    image: busybox:1.36
    command: ["sleep","3600"]
EOF

kubectl get pod test-nolimits -n production -o jsonpath='{.spec.containers[0].resources}'

The output shows the default requests and limits injected by the LimitRange.

Scheduling Details

The scheduler checks requests.cpu and requests.memory against each node’s allocatable resources. It ignores limits. Therefore a Pod with low requests but high limits can still overload a node once it runs at full capacity.

Advanced Topics

Vertical Pod Autoscaler (VPA)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: backend-api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: backend-api
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: backend
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi

In production it is common to start with updateMode: "Off" or "Initial" and switch to "Auto" after manual review.

PriorityClass + ResourceQuota for Multi‑Tenant Clusters

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
description: "Critical production workloads"
value: 100000
globalDefault: false
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: high-priority-quota
  namespace: production
spec:
  scopeSelector:
    matchExpressions:
    - operator: In
      scopeName: PriorityClass
      values: ["high-priority"]
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    pods: "80"

When the cluster is under pressure, the scheduler will preempt lower‑priority Pods in favor of those belonging to the high-priority class.

Troubleshooting Path for Quota Exhaustion

Step 1 – Confirm quota exhaustion

# Check quota usage
kubectl get resourcequota -n <namespace>
kubectl describe resourcequota -n <namespace>

Step 2 – Identify the biggest consumers

# Sort Pods by CPU request
kubectl get pods -n <namespace> -o wide --sort-by='.spec.containers[0].resources.requests.cpu'
# Sort Pods by memory request
kubectl get pods -n <namespace> -o wide --sort-by='.spec.containers[0].resources.requests.memory'
# List Deployments to spot abnormal replica counts
kubectl get deployments -n <namespace> -o wide

Step 3 – Verify LimitRange defaults

kubectl describe limitrange -n <namespace>

If a Pod declares a request, that value overrides the LimitRange defaults.

Step 4 – Check node‑level allocatable resources

# Show allocatable resources per node
kubectl describe nodes | grep -A 5 "Allocatable"
# Real usage (requires metrics‑server)
kubectl top nodes

Step 5 – Detect evictions or OOM kills

# Look for eviction or OOM events
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | grep -i "evicted\|oom"
# On the node, inspect kubelet logs
journalctl -u kubelet | grep -i "oom\|evict\|memory"

Rollback Strategies

# Backup current objects
kubectl get resourcequota prod-compute-quota -n production -o yaml > /tmp/quota-backup.yaml
kubectl get limitrange production-limits -n production -o yaml > /tmp/limitrange-backup.yaml

# Restore previous version
kubectl apply -f /tmp/quota-backup.yaml
kubectl apply -f /tmp/limitrange-backup.yaml

# Temporary hardening (patch)
kubectl patch resourcequota prod-compute-quota -n production \
  --type=json -p='[{"op":"replace","path":"/spec/hard/pods","value":"300"}]'

Monitoring Quota Usage

kube‑state‑metrics exposes kube_resourcequota metrics. Example PromQL for CPU request usage:

kube_resourcequota{type="used",resource="requests.cpu",namespace="production"}
  /
ignoring(type) kube_resourcequota{type="hard",resource="requests.cpu",namespace="production"}

Grafana alerts can be set to fire when the usage ratio exceeds 80 % for CPU, memory, or pod count.

Key Takeaways

Enable both ResourceQuota and LimitRange in every namespace.

Set hard values below the cluster’s total allocatable capacity, leaving ~20 % headroom for system components.

Never rely solely on defaults for critical workloads; always declare explicit requests and sensible limits in Deployment manifests.

Use QoS classes, PriorityClasses, and VPA to balance utilization, protect core services, and automate safe scaling.

Implement continuous monitoring and alerting on quota usage to catch over‑commit before it leads to a cluster‑wide outage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Kubernetes DevOps LimitRange ResourceQuota PodScheduling ClusterOperations

Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Problem Background

Core Concepts Overview

ResourceQuota

LimitRange

Pod requests vs limits

QoS Classes

Practical ResourceQuota Workflow

1. View current quota status

2. Calculate current resource requests

3. Create a ResourceQuota

4. Verify enforcement

LimitRange Workflow

1. Create a LimitRange

2. Verify default injection

Scheduling Details

Advanced Topics

Vertical Pod Autoscaler (VPA)

PriorityClass + ResourceQuota for Multi‑Tenant Clusters

Troubleshooting Path for Quota Exhaustion

Step 1 – Confirm quota exhaustion

Step 2 – Identify the biggest consumers

Step 3 – Verify LimitRange defaults

Step 4 – Check node‑level allocatable resources

Step 5 – Detect evictions or OOM kills

Rollback Strategies

Monitoring Quota Usage

Key Takeaways

Ops Community

How this landed with the community

Was this worth your time?

0 Comments

Step 1 – Confirm quota exhaustion

Step 2 – Identify the biggest consumers

Step 3 – Verify LimitRange defaults

Step 4 – Check node‑level allocatable resources

Step 5 – Detect evictions or OOM kills