Kubernetes Troubleshooting Guide: Diagnosing Pod Scheduling Failures
Quick Fix Summary
TL;DRCheck node resources, taints, and affinity rules using `kubectl describe pod` and `kubectl describe node`.
FailedScheduling occurs when the Kubernetes scheduler cannot find a suitable node to place a pod. This prevents pod creation and requires investigation of node conditions and pod requirements.
Diagnosis & Causes
Recovery Steps
Step 1: Get Detailed Scheduling Failure Reason
Use kubectl describe to see the exact scheduling failure message from the scheduler.
kubectl describe pod <pod-name> -n <namespace>
kubectl get events --field-selector involvedObject.name=<pod-name> --sort-by='.lastTimestamp' Step 2: Check Node Resource Availability
Examine node capacity and allocatable resources to identify resource constraints.
kubectl describe nodes
kubectl get nodes -o custom-columns='NAME:.metadata.name,CPU_ALLOCATABLE:.status.allocatable.cpu,MEM_ALLOCATABLE:.status.allocatable.memory,CPU_CAPACITY:.status.capacity.cpu,MEM_CAPACITY:.status.capacity.memory' Step 3: Inspect Node Taints and Pod Tolerations
Compare node taints against pod tolerations to identify mismatches.
kubectl describe node <node-name> | grep -A 10 Taints
kubectl describe pod <pod-name> | grep -A 10 Tolerations Step 4: Verify Node Selectors and Affinity Rules
Check if pod's nodeSelector or nodeAffinity rules match any available nodes.
kubectl get nodes --show-labels
kubectl describe pod <pod-name> | grep -A 20 -B 5 'Node-Selectors\|Affinity' Step 5: Check Persistent Volume Constraints
Verify persistent volume access modes and node availability for volume mounting.
kubectl get pv
kubectl get pvc -n <namespace>
kubectl describe storageclass Step 6: Examine Scheduler Logs for Advanced Debugging
Access scheduler pod logs for detailed scheduling decision information.
kubectl logs -n kube-system -l component=kube-scheduler --tail=100
kubectl logs -n kube-system -l component=kube-scheduler --tail=100 | grep -i "<pod-name>" Step 7: Use kubectl debug to Simulate Scheduling
Create an ephemeral debugging pod to test scheduling constraints in real-time.
kubectl debug node/<node-name> -it --image=busybox
kubectl run test-pod --image=nginx --dry-run=client -o yaml | kubectl apply -f - Architect's Pro Tip
"FailedScheduling due to 'pod has unbound immediate PersistentVolumeClaims' often means your StorageClass has volumeBindingMode: WaitForFirstConsumer. Use immediate binding or create PVC before pod."
Frequently Asked Questions
What's the difference between FailedScheduling and Pending status?
Pending means the pod is accepted but not yet scheduled. FailedScheduling is a specific event within Pending status where the scheduler explicitly failed to find a suitable node.
How do I temporarily force a pod to schedule on a specific node?
Use kubectl patch pod <pod-name> -p '{"spec":{"nodeName":"<target-node>"}}' but this bypasses the scheduler and should only be used for debugging.
Can FailedScheduling be caused by pod security policies?
Yes, if PodSecurityPolicy admission controller rejects the pod based on security context, it can appear as FailedScheduling. Check kube-apiserver logs for PSP violations.