ERROR

Kubernetes Troubleshooting Guide: Diagnosing Pod Scheduling Failures

Quick Fix Summary

TL;DR

Check node resources, taints, and affinity rules using `kubectl describe pod` and `kubectl describe node`.

FailedScheduling occurs when the Kubernetes scheduler cannot find a suitable node to place a pod. This prevents pod creation and requires investigation of node conditions and pod requirements.

Diagnosis & Causes

  • Insufficient CPU or memory resources on all nodes
  • Node taints that pod tolerations don't match
  • Pod node affinity/anti-affinity rules too restrictive
  • No nodes with requested persistent volume access modes
  • Node selector labels don't match any available nodes
  • Recovery Steps

    1

    Step 1: Get Detailed Scheduling Failure Reason

    Use kubectl describe to see the exact scheduling failure message from the scheduler.

    bash
    kubectl describe pod <pod-name> -n <namespace>
    kubectl get events --field-selector involvedObject.name=<pod-name> --sort-by='.lastTimestamp'
    2

    Step 2: Check Node Resource Availability

    Examine node capacity and allocatable resources to identify resource constraints.

    bash
    kubectl describe nodes
    kubectl get nodes -o custom-columns='NAME:.metadata.name,CPU_ALLOCATABLE:.status.allocatable.cpu,MEM_ALLOCATABLE:.status.allocatable.memory,CPU_CAPACITY:.status.capacity.cpu,MEM_CAPACITY:.status.capacity.memory'
    3

    Step 3: Inspect Node Taints and Pod Tolerations

    Compare node taints against pod tolerations to identify mismatches.

    bash
    kubectl describe node <node-name> | grep -A 10 Taints
    kubectl describe pod <pod-name> | grep -A 10 Tolerations
    4

    Step 4: Verify Node Selectors and Affinity Rules

    Check if pod's nodeSelector or nodeAffinity rules match any available nodes.

    bash
    kubectl get nodes --show-labels
    kubectl describe pod <pod-name> | grep -A 20 -B 5 'Node-Selectors\|Affinity'
    5

    Step 5: Check Persistent Volume Constraints

    Verify persistent volume access modes and node availability for volume mounting.

    bash
    kubectl get pv
    kubectl get pvc -n <namespace>
    kubectl describe storageclass
    6

    Step 6: Examine Scheduler Logs for Advanced Debugging

    Access scheduler pod logs for detailed scheduling decision information.

    bash
    kubectl logs -n kube-system -l component=kube-scheduler --tail=100
    kubectl logs -n kube-system -l component=kube-scheduler --tail=100 | grep -i "<pod-name>"
    7

    Step 7: Use kubectl debug to Simulate Scheduling

    Create an ephemeral debugging pod to test scheduling constraints in real-time.

    bash
    kubectl debug node/<node-name> -it --image=busybox
    kubectl run test-pod --image=nginx --dry-run=client -o yaml | kubectl apply -f -

    Architect's Pro Tip

    "FailedScheduling due to 'pod has unbound immediate PersistentVolumeClaims' often means your StorageClass has volumeBindingMode: WaitForFirstConsumer. Use immediate binding or create PVC before pod."

    Frequently Asked Questions

    What's the difference between FailedScheduling and Pending status?

    Pending means the pod is accepted but not yet scheduled. FailedScheduling is a specific event within Pending status where the scheduler explicitly failed to find a suitable node.

    How do I temporarily force a pod to schedule on a specific node?

    Use kubectl patch pod <pod-name> -p '{"spec":{"nodeName":"<target-node>"}}' but this bypasses the scheduler and should only be used for debugging.

    Can FailedScheduling be caused by pod security policies?

    Yes, if PodSecurityPolicy admission controller rejects the pod based on security context, it can appear as FailedScheduling. Check kube-apiserver logs for PSP violations.

    Related Kubernetes Guides