Solved: Azure Kubernetes Service (AKS) Pod CrashLoopBackOff in Kubernetes 1.31
Quick Fix Summary
TL;DRCheck pod logs, verify resource limits, and ensure application health checks are correctly configured.
CrashLoopBackOff indicates a pod is repeatedly crashing and restarting, failing to reach a stable running state. In Kubernetes 1.31, this often stems from application errors, misconfigured probes, or insufficient resources.
Diagnosis & Causes
Recovery Steps
Step 1: Immediate Diagnostics - Inspect Pod State & Logs
First, gather the pod's status and the logs from its most recent crash to identify the immediate failure.
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous Step 2: Analyze Container Exit Codes & Resource Limits
Check if the container was terminated by the system (OOM) and review its requested/allocated resources.
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 -B 5 "Last State\|Terminated\|OOM\|Limits"
kubectl top pod <pod-name> -n <namespace> --containers Step 3: Validate & Debug Liveness/Readiness Probes
Incorrect probes are a common cause. Test the probe endpoints manually and adjust timeouts/periods.
# Test the liveness probe endpoint from within the cluster
kubectl exec -it <a-healthy-pod> -n <namespace> -- curl -v http://<pod-ip>:<port>/healthz
# Example probe adjustment in deployment YAML
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3 Step 4: Debug Application Startup Locally & Check Dependencies
Simulate the container's entry point locally to catch early runtime errors like missing files or env vars.
# Get the pod's image and command
kubectl describe pod <pod-name> -n <namespace> | grep -i "image\|command\|args"
# Run a debug container with a shell to inspect the filesystem
kubectl debug -it <pod-name> -n <namespace> --image=busybox --target=<container-name> -- sh Step 5: Apply Fix & Perform a Controlled Rollout
After identifying the root cause, update your deployment. Use a strategic rollout to prevent widespread impact.
# Apply the fix to your deployment manifest
kubectl apply -f deployment-fix.yaml
# Monitor the new rollout closely
kubectl rollout status deployment/<deployment-name> -n <namespace>
# Rollback immediately if the new pods crash
kubectl rollout undo deployment/<deployment-name> -n <namespace> Architect's Pro Tip
"For .NET or Java apps on AKS, a CrashLoopBackOff with exit code 139 (SIGSEGV) often indicates a memory issue. Enable Docker's `--oom-kill-disable` for debugging, but always set correct memory limits."
Frequently Asked Questions
What's the difference between CrashLoopBackOff and ImagePullBackOff?
CrashLoopBackOff means the container started but then crashed. ImagePullBackOff means Kubernetes cannot retrieve the container image from the registry (e.g., auth error, wrong tag).
How do I prevent CrashLoopBackOff during AKS upgrades to 1.31?
Test your application's compatibility in a non-production cluster first. Pay special attention to any deprecated APIs or changes in security context that may affect your pods.
My pod logs show nothing before the crash. What next?
The application may be crashing before it can write to stdout. Use `kubectl debug` to run a sidecar container or increase the `terminationMessagePolicy` to `FallbackToLogsOnError` and check the pod's `terminationMessage` field.