Operations 15 min read

Why Are My Kubernetes Pods Getting Evicted? Diagnosing DiskPressure and Log Bloat

This article explains how to investigate large numbers of evicted Kubernetes pods caused by disk pressure, examines the role of SkyWalking log files stored in EmptyDir volumes, and provides both temporary and permanent solutions to free space and prevent future evictions.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Why Are My Kubernetes Pods Getting Evicted? Diagnosing DiskPressure and Log Bloat

Background

Many pods were found in the

Evicted

state, but without monitoring permissions the author could not check Grafana for anomalies.

The eviction likely occurred when the node’s disk usage exceeded the kubelet’s eviction thresholds; later the space was reclaimed, making the problem appear resolved.

On each Kubernetes node the kubelet’s default root directory is

/var/lib/kubelet

and the log directory is

/var/log

. Both reside on the system partition, which is also used by

EmptyDir

volumes, container logs, image layers and writable layers. Ephemeral‑storage manages this partition.

Large Number of Evicted Pods

<code>kubectl get po -A -o wide | grep -v "Running"
NAMESPACE   NAME   READY   STATUS   RESTARTS   AGE   IP   NODE   ...
nsop   account-service-pre-master-6db67f5cc-5nrgf   0/1   Evicted   0   103m   <none>   node2.pre.ayunw.cn   <none>   <none>
... (additional rows omitted) ...
</code>

Even after eviction each pod remains attached to the network and consumes an IP address, which can exhaust a fixed IP pool. Monitoring evicted pods with

kubectl get pod

becomes difficult without filtering.

Inspecting an Evicted Pod’s Node

Describing a pod shows a warning

DiskPressure

, indicating disk stress on the node.

<code>kubectl describe po account-service-pre-master-6db67f5cc-5nrgf -n nsop
...
Warning  Evicted  100m  kubelet, node2.pre.ayunw.cn  The node had condition: [DiskPressure].
</code>

Checking Node Disk Usage

<code>df -Th | egrep -v "overlay2|kubernetes|docker"
Filesystem   Type   Size  Used Avail Use% Mounted on
/dev/vda1   ext4    50G  7.9G   39G  17% /
/dev/vdb1   xfs   200G 138G   63G  69% /data
</code>

Disk usage still shows ample free space, suggesting that space was reclaimed after the eviction event.

Inspecting Disk I/O

<code>iostat -xk 1 3
... (output omitted) ...
</code>

No obvious I/O pressure was observed.

Node Logs

<code>tail -500 kubelet.log | grep "Evicted"
tail -500 /var/log/messages | grep "Evicted"
</code>

No eviction‑related entries were found in the kubelet or system logs.

Temporary Fixes for Log Bloat

On the master node, locate the node where evicted pods are scheduled, then under /data/kubernetes/kubelet/pods use du -sh to find large pod directories and delete old rotated log files (e.g., skywalking-api.log.2021_08_12_02_56_35 ). Delete the pod; the EmptyDir volume will be removed when the pod restarts.

Permanent Log Retention Solution

Modify the SkyWalking agent configuration to limit retained log files, for example by setting

logging.max_history_files=3

, and rebuild the agent image.

<code># Dockerfile snippet
FROM registry.ayunw.cn/library/alpine:3.12.0
ENV LANG=C.UTF-8 SKYWLKING_AGENT_VERSION=8.6.0
...
COPY agent.config /opt/skywalking/agent/config/
</code>
<code># agent.config snippet
logging.max_history_files=${SW_LOGGING_MAX_HISTORY_FILES:3}
</code>

Rebuild the image and reference it in the Deployment.

<code># deployment.yaml excerpt
initContainers:
- name: init-skywalking-agent
  image: "registry.ayunw.cn/library/skywalking-agent:33-ac402d20"
  command: ['sh','-c','set -ex;mkdir -p /skywalking/agent;cp -r /opt/skywalking/agent/* /skywalking/agent;']
  volumeMounts:
  - name: vol-apm-empty
    mountPath: /skywalking/agent
containers:
- name: demo-hello-pre-master
  image: "registry.ayunw.cn/paas/demo-hello:537-c87b6177"
  volumeMounts:
  - name: vol-apm-empty
    mountPath: /skywalking/agent
volumes:
- name: vol-apm-empty
  emptyDir: {}
</code>

Although SkyWalking appears to be the main cause of the disk pressure, without proper monitoring the root cause cannot be confirmed definitively; additional observability is recommended.

KubernetestroubleshootingSkyWalkingPod EvictionDiskPressureEphemeral Storage
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.