CRITICAL

How to Fix K8s CrashLoopBackOff: Probe Failures & Resources

Quick Fix Summary

TL;DR

Check pod logs, verify liveness/readiness probe endpoints, and increase resource limits to resolve immediate container crashes.

CrashLoopBackOff indicates a pod's container is repeatedly crashing and Kubernetes is backing off restart attempts. This is most commonly caused by failed health probes or insufficient compute resources.

Diagnosis & Causes

Liveness or readiness probe failing.

Insufficient memory (OOMKilled).

Application startup time exceeds initialDelaySeconds.

Missing dependencies or configuration.

Application crashes due to runtime errors.

Recovery Steps

Step 1: Diagnose with kubectl logs and describe

First, gather immediate diagnostic data to see *why* the container is exiting. Check the logs of the most recent crash and inspect the pod's events.

bash

# Get logs from the last container instance
kubectl logs <pod-name> --previous
# Get detailed pod status and recent events
kubectl describe pod <pod-name>

Step 2: Fix Health Probe Failures

If `describe` shows probe failures, verify your probe configuration matches the application's reality. Adjust timeouts, delays, or endpoints.

yaml

# Example: Adjust a liveness probe for a slow-starting app
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 45  # Increase if app boots slowly
  periodSeconds: 10
  failureThreshold: 3

Step 3: Resolve Resource Constraints (OOMKilled)

If logs show 'OOMKilled', the container exceeded its memory limit. You must increase the memory limit or optimize the application.

yaml

# Example: Increase memory limits and requests in pod spec
resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"  # Increase this value
    cpu: "500m"

Step 4: Debug with an Ephemeral Container or Interactive Shell

For complex issues, run a debug container in the pod's namespace to inspect the filesystem, network, or run commands.

bash

# Run a busybox debug container in the problematic pod's namespace
kubectl debug -it <pod-name> --image=busybox:latest --target=<container-name>
# Once inside the debug shell, you can inspect, e.g.,
# wget -O- http://localhost:<port>/health
# cat /etc/config/application.properties

Architect's Pro Tip

"Use `kubectl get events --sort-by='.lastTimestamp' -A` to see cluster-wide events. A 'FailedScheduling' event due to insufficient CPU/Memory on nodes often precedes CrashLoopBackOff."

Frequently Asked Questions

What's the difference between CrashLoopBackOff and ImagePullBackOff?

CrashLoopBackOff means the container image was pulled successfully but the application inside it keeps crashing. ImagePullBackOff means Kubernetes cannot even pull the container image from the registry.

How long does Kubernetes wait between restart attempts in a CrashLoopBackOff?

The backoff delay increases exponentially (10s, 20s, 40s...) up to a cap of 5 minutes. This is to prevent a crashing pod from consuming excessive resources.

Can a misconfigured readiness probe cause CrashLoopBackOff?

No. A failed readiness probe will not restart the container. It only removes the pod from Service endpoints. Only a failed *liveness* probe will cause a container restart, potentially leading to CrashLoopBackOff.

Related Kubernetes Guides

502 Bad Gateway