Cloud Native 13 min read

Why Do Some Kubernetes Pods Stay Stuck in Terminating? Causes and Fixes

This article explains the Kubernetes pod lifecycle, the meaning of the Terminating state, detailed pod creation and deletion processes, and the eviction mechanisms of both kube‑controller‑manager and kubelet, offering troubleshooting steps and best practices to resolve pods that remain stuck in Terminating.

Ops Development Stories

Oct 9, 2021

Why Do Some Kubernetes Pods Stay Stuck in Terminating? Causes and Fixes

Pod Status: Terminating

When a node is NotReady, the deployment controller migrates containers and marks pods on that node as Terminating. After the node recovers, those pods are automatically deleted. Occasionally some pods stay in Terminating and are not rescheduled, so they cannot serve traffic.

Terminating is not a phase value in PodStatus; this article reviews the pod lifecycle and eviction concepts.

Pod lifecycle

From creation to termination, a Pod passes through several states and may run optional components such as init containers, post‑start hooks, liveness/readiness probes, and pre‑stop hooks, depending on its specification.

Pod lifecycle phases

PodStatus.phase can be one of the following:

Pending: API server has stored the pod object but it is not yet scheduled or images are still being pulled.

Running: Pod is scheduled to a node and all containers have been created.

Succeeded: All containers terminated successfully and will not be restarted.

Failed: All containers terminated and at least one exited with a non‑zero status or was killed.

Unknown: API server cannot obtain pod status, usually due to loss of communication with the node.

Note: When a pod is deleted, the UI may show a Terminating state, which is not a pod phase. The default graceful termination period is 30 seconds; you can force deletion with the --force flag.

Pod creation process (illustrated below):

Pod deletion logic

When a delete request is issued, the API server marks the pod with deletionTimestamp and deletionGracePeriodSeconds. The pod remains in the Running phase while kubelet/kube‑proxy treat it as non‑serviceable and start cleanup.

kubectl delete pod uses a default 30‑second grace period unless overridden.

kube‑proxy removes the pod from Service Endpoints as soon as it sees the Terminating state.

kubelet executes the PreStop hook (if defined) then stops the container; the stop timeout is the remaining grace period after PreStop.

After both steps, kubelet notifies the API server, which finally removes the pod object.

Eviction overview

Eviction removes pods from a node that becomes unhealthy. Two eviction mechanisms exist: one driven by kube‑controller‑manager (node‑level) and one by kubelet (pod‑level).

kube‑controller‑manager eviction

The node controller periodically checks node health; if a node stays NotReady beyond pod-eviction-timeout (default 5 min), all pods on that node are evicted. Several flags control the behavior, such as node-eviction-rate, secondary-node-eviction-rate, unhealthy-zone-threshold, and large-cluster-size-threshold.

Pods left in Terminating after this eviction can be removed by:

Deleting the node (cloud providers do this automatically; on‑premise you must run kubectl delete node).

Waiting for the node to recover, allowing kubelet to reconcile the pod state.

Force‑deleting the pod with kubectl delete pod --grace-period=0 --force, though this is discouraged for StatefulSets.

kubelet eviction

If a node experiences resource pressure, kubelet evicts pods based on QoS, priority, and resource usage. It supports soft and hard eviction thresholds, configurable via flags such as eviction-soft, eviction-soft-grace-period, eviction-max-pod-grace-period, eviction-pressure-transition-period, eviction-minimum-reclaim, and eviction-hard.

Pods evicted by kubelet appear with an Evicted status and can be deleted directly.

Summary

Pods stuck in Terminating are often caused by deployments using the Recreate strategy; rolling updates are not affected. For such cases, prefer a rollout (rolling) deployment strategy. The three removal methods listed above can be used to clean up lingering Terminating pods.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes cluster operations eviction Pod Lifecycle Terminating

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.