Cloud Native 10 min read

Prevent Kubernetes Cluster Collapse: Master Node Allocatable & Resource Reservations

This article explains how Kubernetes nodes schedule pods based on total capacity, why lacking resource reservations can cause node failures and cluster avalanches, and provides step‑by‑step guidance on configuring Node Allocatable, kube‑reserved, system‑reserved, and eviction settings to ensure stable cluster operation.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Prevent Kubernetes Cluster Collapse: Master Node Allocatable & Resource Reservations

Node Allocatable

Kubernetes schedules pods according to a node's total resource capacity, allowing pods to use all available resources by default. Without reserving resources for system daemons, these processes compete with pods, leading to resource shortages.

In production, an uncontrolled pod can consume 100% CPU, starving the kubelet and apiserver, causing the node to become NotReady . The default behavior evicts the pod after five minutes, potentially overloading another node and triggering a cascading "cluster avalanche" where nodes sequentially become NotReady.

To avoid this, configure resource reservations using the kubelet feature

Node Allocatable

, which reserves compute resources for system daemons.

Environment: Kubernetes v1.22.1, container runtime Containerd, cgroup driver systemd .

Understanding Allocatable Resources

The

Allocatable

value represents the amount of CPU, memory, and

ephemeral‑storage

that pods can request. It is shown alongside

Capacity

when running:

<code>kubectl describe node &lt;node-name&gt;</code>

Typical output:

<code>Capacity:
  cpu: 4
  memory: 7990056Ki
  pods: 110
Allocatable:
  cpu: 4
  memory: 7887656Ki
  pods: 110</code>

When no reservations are set,

Capacity

and

Allocatable

are nearly identical. The relationship is:

<code>Node Allocatable Resource = Node Capacity - kube‑reserved - system‑reserved - eviction‑threshold</code>
Pod requests summed across a node must not exceed its Allocatable value.

Configuring Resource Reservations

Reserve resources for the system using kubelet flags:

<code>--enforce-node-allocatable=pods
--kube-reserved=memory=...
--system-reserved=memory=...
--eviction-hard=...</code>

For a specific node (e.g.,

node2

), edit

/var/lib/kubelet/config.yaml

:

<code>apiVersion: kubelet.config.k8s.io/v1beta1
enforceNodeAllocatable:
- pods
kubeReserved:
  cpu: 500m
  memory: 1Gi
  ephemeral-storage: 1Gi
systemReserved:
  memory: 1Gi
evictionHard:
  memory.available: "300Mi"
  nodefs.available: "10%"</code>

After restarting kubelet, re‑run

kubectl describe node

to see the reduced

Allocatable

values, confirming the reservation calculation.

<code>Allocatable CPU: 3500m (Capacity 4 - 500m kube‑reserved)
Allocatable memory: 5585704Ki (Capacity 7990056Ki - 1Gi kube‑reserved - 1Gi system‑reserved)</code>

Eviction vs OOM

Eviction is kubelet‑driven pod removal; OOM is cgroup‑triggered process kill.

Eviction thresholds (e.g.,

--eviction-hard=memory.available&lt;20%

) cause pod eviction when host memory usage exceeds 80%, but do not affect the cgroup limit

/sys/fs/cgroup/memory/kubepods.slice/memory.limit_in_bytes

, which equals

capacity - kube‑reserved - system‑reserved

.

Kubernetes evicts pods in order: first those without resource limits, then those with mismatched limits, and finally those with equal limits.

EnforceNodeAllocatable Details

The flag

--enforce-node-allocatable

accepts a comma‑separated list:

none

,

pods

,

system-reserved

,

kube-reserved

. Setting it to

pods

enforces the allocatable constraint for pods. Adding

kube-reserved

or

system-reserved

requires corresponding cgroup parameters.

For most users, enabling

enforce-node-allocatable=pods

and reserving appropriate

kube‑reserved

and

system‑reserved

resources is sufficient to keep nodes reliable without deep cgroup tuning.

Kuberneteskubeletcluster stabilityNode Allocatableresource reservation
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.