How to Diagnose CrashLoopBackOff in Kubernetes: A Practical Guide
This article explains that CrashLoopBackOff is a symptom, not the root cause, and walks through a production‑grade troubleshooting workflow—including checking pod status, describing events, examining logs (current and previous), and exec‑ing into containers—while covering common failures such as OOMKilled, liveness‑probe misconfiguration, bad config files, database connection issues, image command errors, and disk‑pressure problems, and warns against premature pod deletion.
CrashLoopBackOff Overview
CrashLoopBackOff indicates that a container repeatedly crashes and the kubelet has attempted to restart it many times. The pod itself is created successfully; the failure occurs inside the container.
Pod created → container starts → business program fails → main process exits → kubelet restarts container → repeat → CrashLoopBackOffProduction Investigation Order
Do not delete the pod before gathering evidence, otherwise logs and events are lost.
1. kubectl get pod -A
2. kubectl describe pod pod-name
3. kubectl logs pod-name
4. kubectl logs --previous pod-name
5. kubectl exec -it pod-name -- shStep 1 – Check Pod Status
Run kubectl get pod -A and examine the columns:
READY : container health
STATUS : current pod phase (e.g., CrashLoopBackOff)
RESTARTS : number of restarts; a high value shows repeated crashes
AGE : how long the pod has existed
Step 2 – Describe Events
Execute kubectl describe pod pod-name and inspect the Events section. It often reveals the immediate cause such as OOMKilled, probe failures, or configuration errors.
Classic Fault 1 – OOMKilled
Last State: Terminated
Reason: OOMKilled
Exit Code: 137Example manifest:
resources:
limits:
memory: "512Mi"
# JVM started with -Xmx1024m, requiring ~1GiFixes: increase the memory limit (e.g., memory: 2Gi), reduce JVM heap size (e.g., -Xms256m -Xmx512m), or investigate memory leaks.
Classic Fault 2 – Liveness Probe Failure
Liveness probe failed: Get http://10.244.1.15:8080/healthz: dial tcp 10.244.1.15:8080: connect: connection refusedTypical cause: the probe runs before the application is ready (e.g., initialDelaySeconds: 5 while the app needs ~90 s to start).
Resolution: check startup logs with kubectl logs pod-name and adjust initialDelaySeconds, timeoutSeconds, and failureThreshold to match the real startup time.
Classic Fault 3 – Configuration File Errors
Failed to load property source
mapping values are not allowed hereExamples include malformed SpringBoot YAML or Nginx configuration syntax errors that cause the container to exit immediately.
Investigation commands:
kubectl get cm
kubectl describe cm config-map-name
kubectl exec -it pod-name -- sh
cat /app/config/application.ymlClassic Fault 4 – Database Connection Failure
Communications link failure
Connection refused
Unknown host mysql-serviceRoot causes often involve the database not running, wrong Service name, DNS resolution failure, credential errors, or network issues.
Check commands:
kubectl get svc
kubectl exec -it pod-name -- sh
nslookup mysql-service
telnet mysql-service 3306
nc -zv mysql-service 3306Classic Fault 5 – Image Startup Command Error
exec: "javaa": executable file not found
permission deniedUsually caused by a typo in the Dockerfile CMD or ENTRYPOINT (e.g., "javaa" instead of "java").
Verify the deployment spec:
kubectl get deploy app -o yaml
# check command, args, image fieldsClassic Fault 6 – Disk Space Exhaustion
Evicted
The node had condition: [DiskPressure]Check node disk usage and image storage:
df -h
docker system df
crictl imagesMost Important Command – logs --previous
If the container has already restarted, the current logs belong to the new instance. The crash logs are in the previous container and can be retrieved with kubectl logs --previous pod-name. This command resolves many CrashLoopBackOff investigations.
Complete Production Troubleshooting Flow
Identify the failing pod: kubectl get pod -A Inspect events: kubectl describe pod pod-name View current logs: kubectl logs pod-name View previous logs: kubectl logs --previous pod-name Enter the container for deeper inspection:
kubectl exec -it pod-name -- shCommon Mistakes to Avoid
Deleting pods before reading logs loses valuable evidence.
Ignoring --previous logs hides the actual crash information.
Assuming the problem lies in Kubernetes; in most cases the root cause is inside the container (application code, configuration, resources, dependencies, network, or probes).
Key Takeaway
CrashLoopBackOff is only a symptom. The real issue always resides inside the container – in the program, its configuration, resource limits, dependencies, network, or health probes. Follow the ordered steps (describe → logs → previous → exec) to locate the root cause efficiently.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack DevOps & Kubernetes
Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
