Cloud Native 10 min read

19 Common Kubernetes Failures and How to Fix Them

This guide walks through nineteen typical Kubernetes problems—from service access failures and pod initialization errors to Helm installation issues—explaining root causes, providing concise solutions, and including command‑line examples and screenshots to help operators quickly resolve cluster disruptions.

Efficient Ops
Efficient Ops
Efficient Ops
19 Common Kubernetes Failures and How to Fix Them

Problem 1: K8s cluster service access failure?

Root cause : The certificate cannot be recognized, often due to a custom or expired certificate.

Solution : Update the certificate.

Problem 2: K8s cluster service access failure (connection refused)

<code>curl: (7) Failed connect to 10.103.22.158:3000; Connection refused</code>
Root cause : Port mapping error – the service is running but not exposed correctly.

Solution: Delete the service (svc) and remap the port.

<code>kubectl delete svc nginx-deployment</code>

Problem 3: K8s cluster service exposure failure

<code>Error from server (AlreadyExists): services "nginx-deployment" already exists</code>
Root cause: The container has already exposed the service.

Solution: Delete the existing service and remap the port.

Problem 4: External network cannot access K8s service?

Root cause: The service type is ClusterIP, not exposed to the external network.

Solution: Change the service type to NodePort, allowing access via any cluster node.

<code>kubectl edit svc nginx-deployment</code>

Problem 5: Pod status is ErrImagePull?

<code>readiness-httpget-pod   0/1     ErrImagePull   0          10s</code>
Root cause: Image cannot be pulled.

Solution: Replace the image with a valid one.

Problem 6: Init container in pod stays abnormal

<code>NAME READY STATUS RESTARTS AGE
myapp-pod 0/1 Init:0/2 0 20s</code>
Root cause: Init container has not completed, causing the pod to remain in initializing state.
<code>Error from server (BadRequest): container "myapp-container" in pod "myapp-pod" is waiting to start: PodInitializing</code>

Solution: Create the required Service and add its name to CoreDNS so that the init container can resolve DNS.

<code>kubectl apply -f myservice.yaml</code>

Problem 7: Pod shows CrashLoopBackOff

Root cause: Image issue causing container restart failures.

Solution: Replace the faulty image.

Problem 8: Pod creation fails

<code>readiness-httpget-pod 0/1 Pending 0 0s
... (multiple lines showing CrashLoopBackOff and errors)</code>
Root cause: Image problem prevents container start.

Solution: Replace the image.

Problem 9: Pod ready state never becomes true

<code>readiness-httpget-pod 0/1 Running 0 116s</code>
Root cause: Command execution in the pod fails, unable to acquire resources.

Solution: Enter the container and create the resources defined in the YAML.

Problem 10: Pod creation fails due to YAML errors

Root cause: YAML file contains Chinese characters.

Solution: Correct the myregistrykey content.

Problem 11: kube-flannel pod status Init:0/1

Investigation command:

kubectl -n kube-system describe pod kube-flannel-ds-amd64-ndsf7
Root cause: Node kube-slave1 fails to pull the image.

Solution: Restart Docker on the node and manually pull the image; reinstall the plugin on the master if needed.

<code>kubectl create -f kube-flannel.yml; kubectl get nodes</code>

Problem 12: Service status ErrImagePull

Root cause: Incorrect image name.

Solution: Delete the faulty pod and pull the correct image.

<code>kubectl delete pod test-nginx; kubectl run test-nginx --image=10.0.0.81:5000/nginx:alpine</code>

Problem 13: Cannot enter specified container

Root cause: Duplicate "containers" field in YAML, so the pod lacks the intended container.

Solution: Remove the extra "containers" field and recreate the pod.

Problem 14: PersistentVolume creation fails

Root cause: Duplicate PV name.

Solution: Change the PV name.

Problem 15: Pod cannot mount PVC

Root cause: AccessModes of PVC do not match any available PV (requires >1G and RWO).

Solution: Adjust accessModes in the YAML or modify the PV.

Problem 16: Pod can’t access content after using PV

Root cause: NFS volume is empty or permissions are incorrect.

Solution: Create files in the NFS volume and set proper permissions.

Problem 17: Node status check fails

<code>Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)</code>
Root cause: Heapster service is missing.

Solution: Install the Prometheus monitoring component.

Problem 18: Pod remains in Pending state

Root cause: Same image already deployed, leaving no node to schedule the new pod.

Solution: Delete all existing pods before deploying new ones.

Problem 19: Helm component installation fails

<code># helm install
Error: This command needs 1 argument: chart name
# helm install ./
Error: no Chart.yaml exists in directory "/root/hello-world"</code>
Root cause: Incorrect chart file name.

Solution: Rename Chart.yaml to the correct case (Chart.yaml).

cloud-nativekubernetestroubleshootingServicesPods
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.