How to Fix Kubernetes Pod Pending State: The Complete Troubleshooting Guide

How to Fix Kubernetes Pod Pending State: The Complete Troubleshooting Guide

If you’ve deployed a pod in Kubernetes and it’s stuck in Pending state, you’re not alone. It’s one of the most common issues developers face when working with Kubernetes, and the good news is that most causes are well-documented and fixable once you know where to look.

In this guide, I’ll walk you through how to fix Kubernetes pod pending state issues, starting from the most common causes and working toward edge cases. I’ve spent years debugging Kubernetes clusters in production, and I’ll share the exact diagnostic commands, real error messages, and proven solutions I use every day.

What Does “Pending” State Actually Mean?

Before we fix anything, let’s understand what’s happening. When a pod is in Pending state, it means the Kubernetes scheduler hasn’t been able to assign it to a node. The pod has been created and accepted by the API server, but it’s sitting in a queue waiting for a suitable home.

This is fundamentally different from other pod states like CrashLoopBackOff or ImagePullBackOff, where the pod has been scheduled to a node but is failing to run. With Pending, the pod hasn’t even made it onto a node yet.

How to Diagnose a Pending Pod

Step 1: Check Pod Status and Events

The very first thing I always do is run kubectl describe on the pending pod. This single command gives you about 80% of what you need to diagnose the problem.

kubectl describe pod <pod-name> -n <namespace>

Scroll down to the Events section at the bottom. This is where Kubernetes tells you exactly why the pod can’t be scheduled. Here’s an example of what you might see:

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  12m   default-scheduler  0/3 nodes are available: 3 Insufficient cpu.

That message is your starting point. The scheduler is telling you precisely what resource is lacking.

Step 2: Get a Quick Pod Overview

kubectl get pods -n <namespace> -o wide

This shows you which pods are running, their node assignments, and their current states. If you see a pattern of multiple pods stuck in Pending, it’s likely a cluster-wide resource issue rather than a pod-specific configuration problem.


Root Cause #1: Insufficient CPU or Memory Resources

This is by far the most common reason pods get stuck in Pending. You’ve requested more CPU or memory than any single node can provide.

How to Identify It

The event message will look something like:

Warning  FailedScheduling  3m  default-scheduler  0/5 nodes are available: 5 Insufficient memory.

Or for CPU:

Warning  FailedScheduling  3m  default-scheduler  0/5 nodes are available: 5 Insufficient cpu.

How to Fix It

Option A: Reduce Your Resource Requests

Review your pod’s resource requests. Many developers set requests too high without realizing it. Here’s a typical example of an over-requested deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
      - name: api-server
        image: nginx:1.27
        resources:
          requests:
            cpu: "4"        # This is 4 full cores — often unnecessary
            memory: "8Gi"   # Very high for a typical web service
          limits:
            cpu: "8"
            memory: "16Gi"

A more reasonable configuration for most web services:

resources:
  requests:
    cpu: "500m"    # Half a core
    memory: "512Mi"
  limits:
    cpu: "1000m"   # One full core
    memory: "1Gi"

Option B: Add More Nodes or Enable Cluster Autoscaler

If your resource requests are legitimate, you need more capacity. If you’re on a managed Kubernetes service (EKS, GKE, AKS), enable the cluster autoscaler:

# Check if cluster autoscaler is running
kubectl get deployment cluster-autoscaler -n kube-system

# View autoscaler logs for scaling decisions
kubectl logs -n kube-system deployment/cluster-autoscaler

For GKE specifically, autoscaling is built into node pools:

# Enable autoscaling on an existing node pool
gcloud container clusters update <cluster-name> \
    --enable-autoscaling \
    --min-nodes 1 \
    --max-nodes 10 \
    --zone <zone> \
    --node-pool <node-pool-name>

Option C: Check What’s Consuming Resources

Before adding nodes, check if existing workloads are hogging resources unnecessarily:

# See resource usage across all nodes
kubectl top nodes

# See resource usage per pod
kubectl top pods --all-namespaces --sort-by=cpu

Prevention Tip

Always set resource requests and limits on every container. Use LimitRange to enforce defaults at the namespace level:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - default:
      cpu: "1"
      memory: "1Gi"
    defaultRequest:
      cpu: "200m"
      memory: "256Mi"
    type: Container

Root Cause #2: No Available Nodes (Cluster is Full)

Sometimes the issue isn’t that your single pod is too big — it’s that your cluster simply has no schedulable nodes.

How to Identify It

Warning  FailedScheduling  45s  default-scheduler  0/0 nodes are available.

Notice the 0/0 — there are literally zero nodes in the cluster.

How to Fix It

Check your node status first:

kubectl get nodes

If you see no nodes listed, your cluster is empty. This happens when:

  • You’ve cordoned and drained all nodes
  • An autoscaler scaled to zero and can’t scale back up
  • Your cloud provider has quota or billing issues

Uncordon nodes that are marked unschedulable:

# Check for cordoned nodes
kubectl get nodes -o wide

# Uncordon a node
kubectl uncordon <node-name>

Check node conditions for deeper issues:

kubectl describe node <node-name> | grep -A 10 "Conditions:"

Look for conditions like OutOfDisk, MemoryPressure, PIDPressure, or NetworkUnavailable.


Root Cause #3: Persistent Volume Claims (PVCs) Not Bound

This one catches a lot of people off guard. Your pod might be pending because it’s waiting for a storage volume that doesn’t exist or can’t be provisioned.

How to Identify It

Warning  FailedScheduling  2m  default-scheduler  pod has unbound immediate PersistentVolumeClaims

Check your PVC status:

kubectl get pvc -n <namespace>
NAME          STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-volume   Pending                                      standard       5m

A Pending PVC means the storage hasn’t been provisioned yet. Let’s dig deeper:

kubectl describe pvc <pvc-name> -n <namespace>

Common PVC Issues and Fixes

Issue 1: Missing or Misconfigured StorageClass

# List available storage classes
kubectl get storageclass

# Verify your default storage class
kubectl get storageclass -o jsonpath='{.items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")].metadata.name}'

If there’s no default storage class, create one or specify it explicitly in your PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-volume
spec:
  storageClassName: fast-ssd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

Issue 2: Requested Size Unavailable

Some storage provisioners have minimum or maximum size requirements. For example, AWS EBS volumes must be at least 1Gi. If you request 500Mi, the PVC will stay pending.

Issue 3: Zone Mismatch in Multi-AZ Clusters

If your pod is constrained to a specific zone but the storage class is in a different zone, binding fails. Add volumeBindingMode: WaitForFirstConsumer to your storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

This tells Kubernetes to wait until the pod is scheduled before provisioning the volume in the correct zone.


Root Cause #4: Node Selector and Affinity Constraints

Node selectors, node affinity, and pod anti-affinity rules can over-constrain where a pod can run. If no node matches your constraints, the pod stays pending.

How to Identify It

Warning  FailedScheduling  30s  default-scheduler  0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector.

How to Fix It

Check your pod’s scheduling constraints:

kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec}' | jq .

Example of an over-constrained pod:

apiVersion: v1
kind: Pod
metadata:
  name: specialized-worker
spec:
  nodeSelector:
    node-type: gpu-node        # Requires nodes labeled "gpu-node"
    zone: us-east-1a           # AND in a specific zone
    instance-type: g5.12xlarge # AND a specific instance type
  containers:
  - name: worker
    image: worker:2.1.0

This pod will only schedule on a node with ALL three labels. Verify what labels your nodes actually have:

kubectl get nodes --show-labels

Fix by relaxing constraints or labeling nodes appropriately:

# Add a label to a node
kubectl label nodes <node-name> node-type=gpu-node

Use preferredDuringScheduling instead of requiredDuringScheduling for flexibility:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - amd64
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: node-type
            operator: In
            values:
            - gpu-node

This makes the GPU node a strong preference rather than a hard requirement.


Root Cause #5: Taints and Tolerations Mismatch

Nodes can be “tainted” to repel pods. Unless a pod has a matching toleration, it won’t be scheduled on a tainted node.

How to Identify It

Warning  FailedScheduling  1m  default-scheduler  0/3 nodes are available: 3 node(s) had untolerated taint {dedicated: special}.

How to Fix It

Check node taints:

kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

Or get detailed information:

kubectl describe node <node-name> | grep -i taint

Add a toleration to your pod if it should run on the tainted node:

apiVersion: v1
kind: Pod
metadata:
  name: tolerant-pod
spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "special"
    effect: "NoSchedule"
  containers:
  - name: app
    image: app:3.2.1

Remove a taint from a node if it’s no longer needed:

kubectl taint nodes <node-name> dedicated=special:NoSchedule-

The trailing - is what removes the taint. Without it, you’d be adding one.

Common Taint Scenarios

Kubernetes automatically applies certain taints:

Taint Meaning
node.kubernetes.io/not-ready Node is not ready (network issues, kubelet problems)
node.kubernetes.io/unreachable Node controller can’t reach the node
node.kubernetes.io/memory-pressure Node is running low on memory
node.kubernetes.io/disk-pressure Node is running low on disk space
node.kubernetes.io/unschedulable Node is cordoned

Your pods need appropriate tolerations if you expect them to schedule on nodes with these conditions.


Root Cause #6: Resource Quotas Exhausted

Namespaces can have resource quotas that limit the total amount of CPU, memory, or object counts. When you hit a quota, new pods can’t be scheduled.

How to Identify It

Warning  FailedScheduling  20s  default-scheduler  pod "api-server-7b89f6d4c-x2k9m" is forbidden: exceeded quota: compute-quota, requested: cpu=500m,memory=1Gi, used: cpu=9500m,memory=19Gi, limited: cpu=10000m,memory=20Gi

This message is very explicit — it shows exactly what you’ve used versus what’s allowed.

How to Fix It

Check your namespace quotas:

kubectl get resourcequota -n <namespace>
kubectl describe resourcequota -n <namespace>

Option A: Increase the Quota

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    pods: "50"

Apply it:

kubectl apply -f quota.yaml

Option B: Free Up Resources

Clean up unused or over-provisioned resources in the namespace:

# Find pods using the most resources
kubectl top pods -n <namespace> --sort-by=memory

# Delete unused deployments
kubectl get deployments -n <namespace>
kubectl delete deployment <unused-deployment> -n <namespace>

# Check for completed jobs that haven't been cleaned up
kubectl get jobs -n <namespace>

Root Cause #7: Pod Disruption Budgets and Priority Classes

In more complex setups, pod disruption budgets and priority classes can cause scheduling issues. High-priority pods might preempt lower-priority ones, and if your new pod has a low priority, it might get stuck.

How to Identify It

Check if priority preemption is happening:

kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Conditions:"

Look for scheduling gates or priority-related messages.

How to Fix It

Check priority classes:

kubectl get priorityclasses
kubectl get priorityclasses -o wide

Explicitly set a priority class on your pod:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Priority class for critical workloads"
spec:
  priorityClassName: high-priority
  containers:
  - name: app
    image: app:1.0.0

Root Cause #8: Image Pull Secrets Missing

While this usually causes ImagePullBackOff rather than Pending, some configurations can cause pods to stay pending if the scheduler can’t verify image availability.

How to Identify and Fix It

kubectl describe pod <pod-name> -n <namespace> | grep -i "secret\|image"

Ensure your image pull secrets are correctly configured:

spec:
  imagePullSecrets:
  - name: my-registry-secret
  containers:
  - name: app
    image: private-registry.io/app:4.0.0

Root Cause #9: Scheduler Not Running

In rare cases, the Kubernetes scheduler itself might not be functioning properly. If the scheduler isn’t running, no pods will be scheduled, and they’ll all stay pending.

How to Identify It

# Check if the scheduler pod is running
kubectl get pods -n kube-system | grep scheduler

# Check scheduler logs
kubectl logs -n kube-system kube-scheduler-<master-node-name>

How to Fix It

On managed Kubernetes services (EKS, GKE, AKS), the control plane is managed for you, so scheduler issues are rare. If you’re self-managing, restart the scheduler:

# For kubeadm clusters
sudo systemctl restart kube-apiserver
sudo systemctl restart kube-scheduler

Or if the scheduler is running as a static pod:

sudo mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/
# Wait 30 seconds
sudo mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/

Root Cause #10: Network Policies Blocking Required Connections

Sometimes a pod is technically scheduled but can’t initialize properly because network policies block it from reaching the API server, registry, or other required services.

How to Identify It

This is less

Leave a Reply

Your email address will not be published. Required fields are marked *