19 Common Kubernetes Failures and How to Fix Them
This guide walks through nineteen typical Kubernetes problems—from service access failures and pod initialization errors to Helm installation issues—explaining root causes, providing concise solutions, and including command‑line examples and screenshots to help operators quickly resolve cluster disruptions.
Problem 1: K8s cluster service access failure?
Root cause : The certificate cannot be recognized, often due to a custom or expired certificate.
Solution : Update the certificate.
Problem 2: K8s cluster service access failure (connection refused)
<code>curl: (7) Failed connect to 10.103.22.158:3000; Connection refused</code>Root cause : Port mapping error – the service is running but not exposed correctly.
Solution: Delete the service (svc) and remap the port.
<code>kubectl delete svc nginx-deployment</code>Problem 3: K8s cluster service exposure failure
<code>Error from server (AlreadyExists): services "nginx-deployment" already exists</code>Root cause: The container has already exposed the service.
Solution: Delete the existing service and remap the port.
Problem 4: External network cannot access K8s service?
Root cause: The service type is ClusterIP, not exposed to the external network.
Solution: Change the service type to NodePort, allowing access via any cluster node.
<code>kubectl edit svc nginx-deployment</code>Problem 5: Pod status is ErrImagePull?
<code>readiness-httpget-pod 0/1 ErrImagePull 0 10s</code>Root cause: Image cannot be pulled.
Solution: Replace the image with a valid one.
Problem 6: Init container in pod stays abnormal
<code>NAME READY STATUS RESTARTS AGE
myapp-pod 0/1 Init:0/2 0 20s</code>Root cause: Init container has not completed, causing the pod to remain in initializing state.
<code>Error from server (BadRequest): container "myapp-container" in pod "myapp-pod" is waiting to start: PodInitializing</code>Solution: Create the required Service and add its name to CoreDNS so that the init container can resolve DNS.
<code>kubectl apply -f myservice.yaml</code>Problem 7: Pod shows CrashLoopBackOff
Root cause: Image issue causing container restart failures.
Solution: Replace the faulty image.
Problem 8: Pod creation fails
<code>readiness-httpget-pod 0/1 Pending 0 0s
... (multiple lines showing CrashLoopBackOff and errors)</code>Root cause: Image problem prevents container start.
Solution: Replace the image.
Problem 9: Pod ready state never becomes true
<code>readiness-httpget-pod 0/1 Running 0 116s</code>Root cause: Command execution in the pod fails, unable to acquire resources.
Solution: Enter the container and create the resources defined in the YAML.
Problem 10: Pod creation fails due to YAML errors
Root cause: YAML file contains Chinese characters.
Solution: Correct the myregistrykey content.
Problem 11: kube-flannel pod status Init:0/1
Investigation command:
kubectl -n kube-system describe pod kube-flannel-ds-amd64-ndsf7Root cause: Node kube-slave1 fails to pull the image.
Solution: Restart Docker on the node and manually pull the image; reinstall the plugin on the master if needed.
<code>kubectl create -f kube-flannel.yml; kubectl get nodes</code>Problem 12: Service status ErrImagePull
Root cause: Incorrect image name.
Solution: Delete the faulty pod and pull the correct image.
<code>kubectl delete pod test-nginx; kubectl run test-nginx --image=10.0.0.81:5000/nginx:alpine</code>Problem 13: Cannot enter specified container
Root cause: Duplicate "containers" field in YAML, so the pod lacks the intended container.
Solution: Remove the extra "containers" field and recreate the pod.
Problem 14: PersistentVolume creation fails
Root cause: Duplicate PV name.
Solution: Change the PV name.
Problem 15: Pod cannot mount PVC
Root cause: AccessModes of PVC do not match any available PV (requires >1G and RWO).
Solution: Adjust accessModes in the YAML or modify the PV.
Problem 16: Pod can’t access content after using PV
Root cause: NFS volume is empty or permissions are incorrect.
Solution: Create files in the NFS volume and set proper permissions.
Problem 17: Node status check fails
<code>Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)</code>Root cause: Heapster service is missing.
Solution: Install the Prometheus monitoring component.
Problem 18: Pod remains in Pending state
Root cause: Same image already deployed, leaving no node to schedule the new pod.
Solution: Delete all existing pods before deploying new ones.
Problem 19: Helm component installation fails
<code># helm install
Error: This command needs 1 argument: chart name
# helm install ./
Error: no Chart.yaml exists in directory "/root/hello-world"</code>Root cause: Incorrect chart file name.
Solution: Rename Chart.yaml to the correct case (Chart.yaml).
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.