CRITICAL

How to Fix K8s CrashLoopBackOff: Probe Failures & Resources

Quick Fix Summary

TL;DR

Check pod logs, verify liveness/readiness probe endpoints, and increase resource limits to resolve immediate container crashes.

CrashLoopBackOff indicates a pod's container is repeatedly crashing and Kubernetes is backing off restart attempts. This is most commonly caused by failed health probes or insufficient compute resources.

Diagnosis & Causes

  • Liveness or readiness probe failing.
  • Insufficient memory (OOMKilled).
  • Application startup time exceeds initialDelaySeconds.
  • Missing dependencies or configuration.
  • Application crashes due to runtime errors.
  • Recovery Steps

    1

    Step 1: Diagnose with kubectl logs and describe

    First, gather immediate diagnostic data to see *why* the container is exiting. Check the logs of the most recent crash and inspect the pod's events.

    bash
    # Get logs from the last container instance
    kubectl logs <pod-name> --previous
    # Get detailed pod status and recent events
    kubectl describe pod <pod-name>
    2

    Step 2: Fix Health Probe Failures

    If `describe` shows probe failures, verify your probe configuration matches the application's reality. Adjust timeouts, delays, or endpoints.

    yaml
    # Example: Adjust a liveness probe for a slow-starting app
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 45  # Increase if app boots slowly
      periodSeconds: 10
      failureThreshold: 3
    3

    Step 3: Resolve Resource Constraints (OOMKilled)

    If logs show 'OOMKilled', the container exceeded its memory limit. You must increase the memory limit or optimize the application.

    yaml
    # Example: Increase memory limits and requests in pod spec
    resources:
      requests:
        memory: "256Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"  # Increase this value
        cpu: "500m"
    4

    Step 4: Debug with an Ephemeral Container or Interactive Shell

    For complex issues, run a debug container in the pod's namespace to inspect the filesystem, network, or run commands.

    bash
    # Run a busybox debug container in the problematic pod's namespace
    kubectl debug -it <pod-name> --image=busybox:latest --target=<container-name>
    # Once inside the debug shell, you can inspect, e.g.,
    # wget -O- http://localhost:<port>/health
    # cat /etc/config/application.properties

    Architect's Pro Tip

    "Use `kubectl get events --sort-by='.lastTimestamp' -A` to see cluster-wide events. A 'FailedScheduling' event due to insufficient CPU/Memory on nodes often precedes CrashLoopBackOff."

    Frequently Asked Questions

    What's the difference between CrashLoopBackOff and ImagePullBackOff?

    CrashLoopBackOff means the container image was pulled successfully but the application inside it keeps crashing. ImagePullBackOff means Kubernetes cannot even pull the container image from the registry.

    How long does Kubernetes wait between restart attempts in a CrashLoopBackOff?

    The backoff delay increases exponentially (10s, 20s, 40s...) up to a cap of 5 minutes. This is to prevent a crashing pod from consuming excessive resources.

    Can a misconfigured readiness probe cause CrashLoopBackOff?

    No. A failed readiness probe will not restart the container. It only removes the pod from Service endpoints. Only a failed *liveness* probe will cause a container restart, potentially leading to CrashLoopBackOff.

    Related Kubernetes Guides