Cloud Native 10 min read

19 Common Kubernetes Failures and How to Fix Them

This guide walks through nineteen typical Kubernetes problems—from service access failures and pod initialization errors to Helm installation issues—explaining root causes, providing concise solutions, and including command‑line examples and screenshots to help operators quickly resolve cluster disruptions.

Efficient Ops

Aug 2, 2021

19 Common Kubernetes Failures and How to Fix Them

Problem 1: K8s cluster service access failure?

Root cause : The certificate cannot be recognized, often due to a custom or expired certificate.

Solution : Update the certificate.

Problem 2: K8s cluster service access failure (connection refused)

curl: (7) Failed connect to 10.103.22.158:3000; Connection refused

Root cause : Port mapping error – the service is running but not exposed correctly.

Solution: Delete the service (svc) and remap the port.

kubectl delete svc nginx-deployment

Problem 3: K8s cluster service exposure failure

Error from server (AlreadyExists): services "nginx-deployment" already exists

Root cause: The container has already exposed the service.

Solution: Delete the existing service and remap the port.

Problem 4: External network cannot access K8s service?

Root cause: The service type is ClusterIP, not exposed to the external network.

Solution: Change the service type to NodePort, allowing access via any cluster node.

kubectl edit svc nginx-deployment

Problem 5: Pod status is ErrImagePull?

readiness-httpget-pod   0/1     ErrImagePull   0          10s

Root cause: Image cannot be pulled.

Solution: Replace the image with a valid one.

Problem 6: Init container in pod stays abnormal

NAME READY STATUS RESTARTS AGE
myapp-pod 0/1 Init:0/2 0 20s

Root cause: Init container has not completed, causing the pod to remain in initializing state.

Error from server (BadRequest): container "myapp-container" in pod "myapp-pod" is waiting to start: PodInitializing

Solution: Create the required Service and add its name to CoreDNS so that the init container can resolve DNS.

kubectl apply -f myservice.yaml

Problem 7: Pod shows CrashLoopBackOff

Root cause: Image issue causing container restart failures.

Solution: Replace the faulty image.

Problem 8: Pod creation fails

readiness-httpget-pod 0/1 Pending 0 0s
... (multiple lines showing CrashLoopBackOff and errors)

Root cause: Image problem prevents container start.

Solution: Replace the image.

Problem 9: Pod ready state never becomes true

readiness-httpget-pod 0/1 Running 0 116s

Root cause: Command execution in the pod fails, unable to acquire resources.

Solution: Enter the container and create the resources defined in the YAML.

Problem 10: Pod creation fails due to YAML errors

Root cause: YAML file contains Chinese characters.

Solution: Correct the myregistrykey content.

Problem 11: kube-flannel pod status Init:0/1

Investigation command:

kubectl -n kube-system describe pod kube-flannel-ds-amd64-ndsf7

Root cause: Node kube-slave1 fails to pull the image.

Solution: Restart Docker on the node and manually pull the image; reinstall the plugin on the master if needed.

kubectl create -f kube-flannel.yml; kubectl get nodes

Problem 12: Service status ErrImagePull

Root cause: Incorrect image name.

Solution: Delete the faulty pod and pull the correct image.

kubectl delete pod test-nginx; kubectl run test-nginx --image=10.0.0.81:5000/nginx:alpine

Problem 13: Cannot enter specified container

Root cause: Duplicate "containers" field in YAML, so the pod lacks the intended container.

Solution: Remove the extra "containers" field and recreate the pod.

Problem 14: PersistentVolume creation fails

Root cause: Duplicate PV name.

Solution: Change the PV name.

Problem 15: Pod cannot mount PVC

Root cause: AccessModes of PVC do not match any available PV (requires >1G and RWO).

Solution: Adjust accessModes in the YAML or modify the PV.

Problem 16: Pod can’t access content after using PV

Root cause: NFS volume is empty or permissions are incorrect.

Solution: Create files in the NFS volume and set proper permissions.

Problem 17: Node status check fails

Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

Root cause: Heapster service is missing.

Solution: Install the Prometheus monitoring component.

Problem 18: Pod remains in Pending state

Root cause: Same image already deployed, leaving no node to schedule the new pod.

Solution: Delete all existing pods before deploying new ones.

Problem 19: Helm component installation fails

# helm install
Error: This command needs 1 argument: chart name
# helm install ./
Error: no Chart.yaml exists in directory "/root/hello-world"

Root cause: Incorrect chart file name.

Solution: Rename Chart.yaml to the correct case (Chart.yaml).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Troubleshooting

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.