Why Do Some Kubernetes Pods Stay Stuck in Terminating? Causes and Fixes
This article explains the Kubernetes pod lifecycle, the meaning of the Terminating state, detailed pod creation and deletion processes, and the eviction mechanisms of both kube‑controller‑manager and kubelet, offering troubleshooting steps and best practices to resolve pods that remain stuck in Terminating.
Pod Status: Terminating
When a node is NotReady, the deployment controller migrates containers and marks pods on that node as Terminating. After the node recovers, those pods are automatically deleted. Occasionally some pods stay in Terminating and are not rescheduled, so they cannot serve traffic.
Terminating is not a phase value in PodStatus; this article reviews the pod lifecycle and eviction concepts.
Pod lifecycle
From creation to termination, a Pod passes through several states and may run optional components such as init containers, post‑start hooks, liveness/readiness probes, and pre‑stop hooks, depending on its specification.
Pod lifecycle phases
PodStatus.phase can be one of the following:
Pending: API server has stored the pod object but it is not yet scheduled or images are still being pulled.
Running: Pod is scheduled to a node and all containers have been created.
Succeeded: All containers terminated successfully and will not be restarted.
Failed: All containers terminated and at least one exited with a non‑zero status or was killed.
Unknown: API server cannot obtain pod status, usually due to loss of communication with the node.
Note: When a pod is deleted, the UI may show a Terminating state, which is not a pod phase. The default graceful termination period is 30 seconds; you can force deletion with the --force flag.
Pod creation process (illustrated below):
Pod deletion logic
When a delete request is issued, the API server marks the pod with
deletionTimestampand
deletionGracePeriodSeconds. The pod remains in the Running phase while kubelet/kube‑proxy treat it as non‑serviceable and start cleanup.
kubectl delete pod uses a default 30‑second grace period unless overridden.
kube‑proxy removes the pod from Service Endpoints as soon as it sees the Terminating state.
kubelet executes the PreStop hook (if defined) then stops the container; the stop timeout is the remaining grace period after PreStop.
After both steps, kubelet notifies the API server, which finally removes the pod object.
Eviction overview
Eviction removes pods from a node that becomes unhealthy. Two eviction mechanisms exist: one driven by kube‑controller‑manager (node‑level) and one by kubelet (pod‑level).
kube‑controller‑manager eviction
The node controller periodically checks node health; if a node stays NotReady beyond
pod-eviction-timeout(default 5 min), all pods on that node are evicted. Several flags control the behavior, such as
node-eviction-rate,
secondary-node-eviction-rate,
unhealthy-zone-threshold, and
large-cluster-size-threshold.
Pods left in Terminating after this eviction can be removed by:
Deleting the node (cloud providers do this automatically; on‑premise you must run
kubectl delete node).
Waiting for the node to recover, allowing kubelet to reconcile the pod state.
Force‑deleting the pod with
kubectl delete pod --grace-period=0 --force, though this is discouraged for StatefulSets.
kubelet eviction
If a node experiences resource pressure, kubelet evicts pods based on QoS, priority, and resource usage. It supports soft and hard eviction thresholds, configurable via flags such as
eviction-soft,
eviction-soft-grace-period,
eviction-max-pod-grace-period,
eviction-pressure-transition-period,
eviction-minimum-reclaim, and
eviction-hard.
Pods evicted by kubelet appear with an Evicted status and can be deleted directly.
Summary
Pods stuck in Terminating are often caused by deployments using the Recreate strategy; rolling updates are not affected. For such cases, prefer a rollout (rolling) deployment strategy. The three removal methods listed above can be used to clean up lingering Terminating pods.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.