How to Fix Kubernetes Pod Pending State: The Complete Troubleshooting Guide
If you’ve deployed a pod in Kubernetes and it’s stuck in Pending state, you’re not alone. It’s one of the most common issues developers face when working with Kubernetes, and the good news is that most causes are well-documented and fixable once you know where to look.
In this guide, I’ll walk you through how to fix Kubernetes pod pending state issues, starting from the most common causes and working toward edge cases. I’ve spent years debugging Kubernetes clusters in production, and I’ll share the exact diagnostic commands, real error messages, and proven solutions I use every day.
What Does “Pending” State Actually Mean?
Before we fix anything, let’s understand what’s happening. When a pod is in Pending state, it means the Kubernetes scheduler hasn’t been able to assign it to a node. The pod has been created and accepted by the API server, but it’s sitting in a queue waiting for a suitable home.
This is fundamentally different from other pod states like CrashLoopBackOff or ImagePullBackOff, where the pod has been scheduled to a node but is failing to run. With Pending, the pod hasn’t even made it onto a node yet.
How to Diagnose a Pending Pod
Step 1: Check Pod Status and Events
The very first thing I always do is run kubectl describe on the pending pod. This single command gives you about 80% of what you need to diagnose the problem.
kubectl describe pod <pod-name> -n <namespace>
Scroll down to the Events section at the bottom. This is where Kubernetes tells you exactly why the pod can’t be scheduled. Here’s an example of what you might see:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 12m default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
That message is your starting point. The scheduler is telling you precisely what resource is lacking.
Step 2: Get a Quick Pod Overview
kubectl get pods -n <namespace> -o wide
This shows you which pods are running, their node assignments, and their current states. If you see a pattern of multiple pods stuck in Pending, it’s likely a cluster-wide resource issue rather than a pod-specific configuration problem.
Root Cause #1: Insufficient CPU or Memory Resources
This is by far the most common reason pods get stuck in Pending. You’ve requested more CPU or memory than any single node can provide.
How to Identify It
The event message will look something like:
Warning FailedScheduling 3m default-scheduler 0/5 nodes are available: 5 Insufficient memory.
Or for CPU:
Warning FailedScheduling 3m default-scheduler 0/5 nodes are available: 5 Insufficient cpu.
How to Fix It
Option A: Reduce Your Resource Requests
Review your pod’s resource requests. Many developers set requests too high without realizing it. Here’s a typical example of an over-requested deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api-server
image: nginx:1.27
resources:
requests:
cpu: "4" # This is 4 full cores — often unnecessary
memory: "8Gi" # Very high for a typical web service
limits:
cpu: "8"
memory: "16Gi"
A more reasonable configuration for most web services:
resources:
requests:
cpu: "500m" # Half a core
memory: "512Mi"
limits:
cpu: "1000m" # One full core
memory: "1Gi"
Option B: Add More Nodes or Enable Cluster Autoscaler
If your resource requests are legitimate, you need more capacity. If you’re on a managed Kubernetes service (EKS, GKE, AKS), enable the cluster autoscaler:
# Check if cluster autoscaler is running
kubectl get deployment cluster-autoscaler -n kube-system
# View autoscaler logs for scaling decisions
kubectl logs -n kube-system deployment/cluster-autoscaler
For GKE specifically, autoscaling is built into node pools:
# Enable autoscaling on an existing node pool
gcloud container clusters update <cluster-name> \
--enable-autoscaling \
--min-nodes 1 \
--max-nodes 10 \
--zone <zone> \
--node-pool <node-pool-name>
Option C: Check What’s Consuming Resources
Before adding nodes, check if existing workloads are hogging resources unnecessarily:
# See resource usage across all nodes
kubectl top nodes
# See resource usage per pod
kubectl top pods --all-namespaces --sort-by=cpu
Prevention Tip
Always set resource requests and limits on every container. Use LimitRange to enforce defaults at the namespace level:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- default:
cpu: "1"
memory: "1Gi"
defaultRequest:
cpu: "200m"
memory: "256Mi"
type: Container
Root Cause #2: No Available Nodes (Cluster is Full)
Sometimes the issue isn’t that your single pod is too big — it’s that your cluster simply has no schedulable nodes.
How to Identify It
Warning FailedScheduling 45s default-scheduler 0/0 nodes are available.
Notice the 0/0 — there are literally zero nodes in the cluster.
How to Fix It
Check your node status first:
kubectl get nodes
If you see no nodes listed, your cluster is empty. This happens when:
- You’ve cordoned and drained all nodes
- An autoscaler scaled to zero and can’t scale back up
- Your cloud provider has quota or billing issues
Uncordon nodes that are marked unschedulable:
# Check for cordoned nodes
kubectl get nodes -o wide
# Uncordon a node
kubectl uncordon <node-name>
Check node conditions for deeper issues:
kubectl describe node <node-name> | grep -A 10 "Conditions:"
Look for conditions like OutOfDisk, MemoryPressure, PIDPressure, or NetworkUnavailable.
Root Cause #3: Persistent Volume Claims (PVCs) Not Bound
This one catches a lot of people off guard. Your pod might be pending because it’s waiting for a storage volume that doesn’t exist or can’t be provisioned.
How to Identify It
Warning FailedScheduling 2m default-scheduler pod has unbound immediate PersistentVolumeClaims
Check your PVC status:
kubectl get pvc -n <namespace>
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-volume Pending standard 5m
A Pending PVC means the storage hasn’t been provisioned yet. Let’s dig deeper:
kubectl describe pvc <pvc-name> -n <namespace>
Common PVC Issues and Fixes
Issue 1: Missing or Misconfigured StorageClass
# List available storage classes
kubectl get storageclass
# Verify your default storage class
kubectl get storageclass -o jsonpath='{.items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")].metadata.name}'
If there’s no default storage class, create one or specify it explicitly in your PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-volume
spec:
storageClassName: fast-ssd
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
Issue 2: Requested Size Unavailable
Some storage provisioners have minimum or maximum size requirements. For example, AWS EBS volumes must be at least 1Gi. If you request 500Mi, the PVC will stay pending.
Issue 3: Zone Mismatch in Multi-AZ Clusters
If your pod is constrained to a specific zone but the storage class is in a different zone, binding fails. Add volumeBindingMode: WaitForFirstConsumer to your storage class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
This tells Kubernetes to wait until the pod is scheduled before provisioning the volume in the correct zone.
Root Cause #4: Node Selector and Affinity Constraints
Node selectors, node affinity, and pod anti-affinity rules can over-constrain where a pod can run. If no node matches your constraints, the pod stays pending.
How to Identify It
Warning FailedScheduling 30s default-scheduler 0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector.
How to Fix It
Check your pod’s scheduling constraints:
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec}' | jq .
Example of an over-constrained pod:
apiVersion: v1
kind: Pod
metadata:
name: specialized-worker
spec:
nodeSelector:
node-type: gpu-node # Requires nodes labeled "gpu-node"
zone: us-east-1a # AND in a specific zone
instance-type: g5.12xlarge # AND a specific instance type
containers:
- name: worker
image: worker:2.1.0
This pod will only schedule on a node with ALL three labels. Verify what labels your nodes actually have:
kubectl get nodes --show-labels
Fix by relaxing constraints or labeling nodes appropriately:
# Add a label to a node
kubectl label nodes <node-name> node-type=gpu-node
Use preferredDuringScheduling instead of requiredDuringScheduling for flexibility:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- amd64
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-type
operator: In
values:
- gpu-node
This makes the GPU node a strong preference rather than a hard requirement.
Root Cause #5: Taints and Tolerations Mismatch
Nodes can be “tainted” to repel pods. Unless a pod has a matching toleration, it won’t be scheduled on a tainted node.
How to Identify It
Warning FailedScheduling 1m default-scheduler 0/3 nodes are available: 3 node(s) had untolerated taint {dedicated: special}.
How to Fix It
Check node taints:
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
Or get detailed information:
kubectl describe node <node-name> | grep -i taint
Add a toleration to your pod if it should run on the tainted node:
apiVersion: v1
kind: Pod
metadata:
name: tolerant-pod
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "special"
effect: "NoSchedule"
containers:
- name: app
image: app:3.2.1
Remove a taint from a node if it’s no longer needed:
kubectl taint nodes <node-name> dedicated=special:NoSchedule-
The trailing - is what removes the taint. Without it, you’d be adding one.
Common Taint Scenarios
Kubernetes automatically applies certain taints:
| Taint | Meaning |
|---|---|
node.kubernetes.io/not-ready |
Node is not ready (network issues, kubelet problems) |
node.kubernetes.io/unreachable |
Node controller can’t reach the node |
node.kubernetes.io/memory-pressure |
Node is running low on memory |
node.kubernetes.io/disk-pressure |
Node is running low on disk space |
node.kubernetes.io/unschedulable |
Node is cordoned |
Your pods need appropriate tolerations if you expect them to schedule on nodes with these conditions.
Root Cause #6: Resource Quotas Exhausted
Namespaces can have resource quotas that limit the total amount of CPU, memory, or object counts. When you hit a quota, new pods can’t be scheduled.
How to Identify It
Warning FailedScheduling 20s default-scheduler pod "api-server-7b89f6d4c-x2k9m" is forbidden: exceeded quota: compute-quota, requested: cpu=500m,memory=1Gi, used: cpu=9500m,memory=19Gi, limited: cpu=10000m,memory=20Gi
This message is very explicit — it shows exactly what you’ve used versus what’s allowed.
How to Fix It
Check your namespace quotas:
kubectl get resourcequota -n <namespace>
kubectl describe resourcequota -n <namespace>
Option A: Increase the Quota
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: production
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
pods: "50"
Apply it:
kubectl apply -f quota.yaml
Option B: Free Up Resources
Clean up unused or over-provisioned resources in the namespace:
# Find pods using the most resources
kubectl top pods -n <namespace> --sort-by=memory
# Delete unused deployments
kubectl get deployments -n <namespace>
kubectl delete deployment <unused-deployment> -n <namespace>
# Check for completed jobs that haven't been cleaned up
kubectl get jobs -n <namespace>
Root Cause #7: Pod Disruption Budgets and Priority Classes
In more complex setups, pod disruption budgets and priority classes can cause scheduling issues. High-priority pods might preempt lower-priority ones, and if your new pod has a low priority, it might get stuck.
How to Identify It
Check if priority preemption is happening:
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Conditions:"
Look for scheduling gates or priority-related messages.
How to Fix It
Check priority classes:
kubectl get priorityclasses
kubectl get priorityclasses -o wide
Explicitly set a priority class on your pod:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "Priority class for critical workloads"
spec:
priorityClassName: high-priority
containers:
- name: app
image: app:1.0.0
Root Cause #8: Image Pull Secrets Missing
While this usually causes ImagePullBackOff rather than Pending, some configurations can cause pods to stay pending if the scheduler can’t verify image availability.
How to Identify and Fix It
kubectl describe pod <pod-name> -n <namespace> | grep -i "secret\|image"
Ensure your image pull secrets are correctly configured:
spec:
imagePullSecrets:
- name: my-registry-secret
containers:
- name: app
image: private-registry.io/app:4.0.0
Root Cause #9: Scheduler Not Running
In rare cases, the Kubernetes scheduler itself might not be functioning properly. If the scheduler isn’t running, no pods will be scheduled, and they’ll all stay pending.
How to Identify It
# Check if the scheduler pod is running
kubectl get pods -n kube-system | grep scheduler
# Check scheduler logs
kubectl logs -n kube-system kube-scheduler-<master-node-name>
How to Fix It
On managed Kubernetes services (EKS, GKE, AKS), the control plane is managed for you, so scheduler issues are rare. If you’re self-managing, restart the scheduler:
# For kubeadm clusters
sudo systemctl restart kube-apiserver
sudo systemctl restart kube-scheduler
Or if the scheduler is running as a static pod:
sudo mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/
# Wait 30 seconds
sudo mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/
Root Cause #10: Network Policies Blocking Required Connections
Sometimes a pod is technically scheduled but can’t initialize properly because network policies block it from reaching the API server, registry, or other required services.
How to Identify It
This is less