This comprehensive guide provides systematic approaches to diagnose and resolve common pod anomalies in Kubernetes environments, covering scheduling issues, image problems, startup failures, readiness concerns, and resource constraints.
Quick reference guide
Use this quick reference to identify and resolve common pod issues based on symptoms:
Symptom | Likely cause | Immediate action |
| Scheduling constraints or resource insufficiency | Check node resources and scheduling policies |
| Image registry connectivity or authentication issues | Verify image pull secrets and registry access |
| Application startup failures or configuration errors | Check container logs and startup commands |
| Health check probe failures | Validate readiness probe configuration |
| Insufficient memory allocation or memory leaks | Adjust memory limits and check application memory usage |
Systematic diagnostic process
Follow this structured approach to systematically identify and resolve pod anomalies:
Phase 1: Initial assessment
Check pod status using the console or kubectl:
kubectl get pods -n <namespace>Examine pod events for error messages:
kubectl describe pod <pod-name> -n <namespace>Review container logs for application-level errors:
kubectl logs <pod-name> -n <namespace> --previous
Phase 2: Scheduling problems
Pod stuck in Pending state
When pods remain in Pending state, investigate scheduling constraints and resource availability.
Error message | Root cause | Resolution steps |
| No healthy nodes available in cluster |
|
| Resource requests exceed node capacity |
|
| Node labeling mismatch with pod scheduling requirements |
|
Resource optimization strategies
Effective resource management prevents scheduling issues and optimizes cluster utilization.
Console-based optimization
Review CPU/Memory request allocation rates
Identify underutilized workloads for rightsizing
Enable Auto Scaling (HPA) for dynamic replica adjustment
Command-line optimization
# View node resource utilization
kubectl top nodes
# Check pod resource requests
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Resources:"
# Enable resource profiling for recommendations
# (Console feature for automated resource optimization)Phase 3: Image pull failures
Common image pull errors
Error type | Cause | Solution |
| Network connectivity issues to image registry | Test registry connectivity from node |
| Authentication failure or missing credentials | Verify imagePullSecrets configuration |
| Image not found locally with PullNever policy | Change pull policy or ensure image availability |
Image pull troubleshooting steps
Verify image repository URL and tag accuracy
Check imagePullSecrets exist and are correctly referenced
Test registry connectivity from worker nodes:
crictl pull <registry-address>/<image>:<tag> curl -v https://<registry-address>Validate registry authentication credentials
Check network policies allowing registry access
Phase 4: Application startup issues
CrashLoopBackOff analysis
Repeated container crashes indicate fundamental application or configuration problems.
Investigation approach
Examine previous container logs for crash reasons:
kubectl logs <pod-name> --previous -n <namespace>Check container startup command and arguments
Verify environment variables and configuration
Validate required dependencies and services
Review application health check configurations
Common crash causes
Missing or incorrect startup commands
Configuration file errors or missing files
Insufficient permissions or security constraints
Port conflicts or binding failures
Resource limit violations during startup
Runtime issues
OOMKilled state analysis
Out-of-memory conditions cause pod termination and require careful resource management.
Investigation steps
Review OOM kill logs:
kubectl logs <pod-name> --previous -n <namespace> | grep -i oomAnalyze memory usage patterns and spikes
Check for memory leaks in application code
Review JVM heap settings for Java applications (-Xmx parameter)
Validate memory resource limits against actual usage
Memory optimization
Increase memory limits if application legitimately requires more RAM
Implement memory profiling to identify usage patterns
Optimize application memory allocation and garbage collection
Consider vertical pod autoscaling for dynamic memory adjustment
Set appropriate memory requests to ensure quality of service
Evicted pod analysis
Pods may be evicted due to node pressure, manual intervention, or resource constraints.
Eviction reason | Trigger condition | Prevention strategy |
| Insufficient node resources (CPU, memory, disk) | Monitor node resource utilization and scale appropriately |
| Higher priority pod requires resources | Set appropriate pod priorities and resource requests |
| Administrative action or maintenance | Implement proper maintenance windows and notifications |
ContainerCreating state troubleshooting
Extended ContainerCreating state indicates issues with container initialization.
Issue | Root cause | Resolution |
Volume mount failures | PersistentVolumeClaim binding issues or storage driver problems | Verify PVC status and storage class configuration |
Image pull delays | Large image size or slow network connectivity | Optimize image size and use image pre-pulling strategies |
Security context violations | Insufficient permissions or SELinux/AppArmor restrictions | Review and adjust pod security policies |
Network connectivity issues
Flannel network troubleshooting
Flannel networking issues can cause pod-to-pod communication failures and DNS resolution problems.
Common symptoms
Pods cannot communicate with each other across nodes
DNS resolution fails for cluster services
Services cannot reach pods or external endpoints
Troubleshooting steps
Verify Flannel daemonset pods are running on all nodes:
kubectl get pods -n kube-flannelInspect Flannel pod logs for errors:
kubectl logs -n kube-flannel <flannel-pod-name>Check Flannel network configuration and subnet assignments
Test pod-to-pod connectivity across different nodes
Manually upgrade Flannel version if configuration issues persist
Solutions
Restart Flannel pods to refresh network configuration
Reconfigure Flannel subnets if IP conflicts exist
Upgrade Flannel to latest stable version
Verify firewall rules allow Flannel traffic (UDP 8472 by default)
FAQ
Why is the pod running but not working correctly?
Application configuration issues can cause pods to run without functioning properly.
Verify container configurations match expectations
Check for YAML configuration syntax errors
Validate environment variable values and secrets
Confirm service discovery and DNS resolution
Test inter-container communication if applicable
What is Completed state?
Pods entering Completed state is normal for certain workloads like Jobs and init containers when their processes exit successfully.
Common troubleshooting interfaces
Access these interfaces through the Container Service Management Console for comprehensive pod troubleshooting:
Operation | Console interface |
View pod status and basic information | Cluster > Workloads > Pods |
Check pod configurations | Pod details page > Configuration tab |
Review pod events | Pod details page > Events tab |
Access container logs | Pod details page > Logs tab |
Connect to container terminal | Pod details page > Terminal access |
Enable pod diagnostics | Pod details page > Diagnostics tools |