When a pod fails to start, crashes repeatedly, or behaves unexpectedly, use this guide to identify the root cause and resolve the issue.
Diagnostic workflow

Check the pod status. If the status is anything other than Running, see Pod status reference for targeted solutions.
If the pod is Running but not behaving as expected, see Running but not working as expected.
If the pod was terminated due to an out-of-memory (OOM) error, see Troubleshoot OOM errors.
If the issue persists after troubleshooting, submit a ticket.
Pod status reference
Status | Meaning | Solution |
Pending | Not scheduled to a node | |
Init:N/M | N of M init containers started | |
Init:Error | Init container failed | |
Init:CrashLoopBackOff | Init container in a crash loop | |
ImagePullBackOff | Failed to pull a container image | |
CrashLoopBackOff | Application crashing repeatedly | |
Completed | All containers exited after finishing the startup command | |
Running | The pod works as expected, or is running but not working as expected | |
Terminating | Being deleted |
Diagnostic tools
All diagnostic tools are available in the ACS console. Start by navigating to the pod:
-
Log on to the ACS console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
In the upper-left corner of the Pods page, select the namespace to which the pod belongs. Find the pod in the list.
Then use any of the following tools:
View pod details
Click the pod name or click View Details in the Actions column to view information such as the pod name, image, and IP address.
View pod configuration (YAML)
Open the pod details page and click Edit in the upper-right corner to view the YAML file and configuration of the pod.
View pod events
Open the pod details page and click the Events tab in the lower section.
Kubernetes retains events from the previous hour by default. To retain events for a longer period, create and use an Event Center.
View pod logs
Open the pod details page and click the Logs tab in the lower section.
Alibaba Cloud Container Compute Service (ACS) integrates with Simple Log Service. Enable Simple Log Service when creating a cluster to collect log data from standard output and text files. For more information, see Collect application logs by using the environment variables of pods.
View pod monitoring
-
Log on to the ACS console. In the left navigation pane, click Clusters.
On the Prometheus Monitoring page, click the Cluster Overview tab to view CPU usage, memory usage, and network I/O for pods.
Connect to a container terminal
-
Log on to the ACS console. In the left navigation pane, click Clusters.
Run pod diagnostics
On the Pods page, find the pod and click Diagnose in the Actions column. Review the diagnostic result after it completes. For more information, see Work with cluster diagnostics.
Pods stuck in Pending
A Pending pod has not been scheduled to any node. This typically happens when required resources are missing or quota configurations are invalid.
Diagnose the issue:
Check the pod events to identify the scheduling failure reason. Common causes include:
Missing resource dependencies: Some pods depend on specific cluster resources such as ConfigMaps or persistent volume claims (PVCs). For example, a PVC must be bound to a persistent volume (PV) before it can be used in a pod spec.
Invalid quota configurations: The pod's resource requests may exceed the available quota. Check the pod events and audit logs for details.
Pods stuck in init container states
These states indicate that one or more init containers failed to complete:
Init:N/M -- The pod has M init containers, N have started, and M-N have failed.
Init:Error -- An init container exited with an error.
Init:CrashLoopBackOff -- An init container keeps crashing and restarting.
Diagnose the issue:
Check the pod events for errors in the failing init container. See View pod events.
Check the logs of the failing init container for error details. See View pod logs.
Verify that the init container configuration is correct in the pod YAML. See View pod configuration (YAML). For more information about debugging init containers, see Debug init containers.
Pods stuck in ImagePullBackOff
The pod is scheduled but cannot pull one or more container images. Check the pod events to identify which image failed.
Diagnose the issue:
Verify the image name and tag. A typo in the image name or tag is the most common cause.
If the image is stored in a private repository, make sure the correct image pull secret is configured. See Use an image stored in an image repository to create an ACS workload.
Pods stuck in CrashLoopBackOff
The application inside the pod is crashing. Kubernetes restarts it automatically, but the crash recurs.
Diagnose the issue:
Check the pod events for error messages. See View pod events.
Check the pod logs for application errors. See View pod logs.
Review the health check configuration. Misconfigured liveness, readiness, or startup probes can cause Kubernetes to kill a healthy container. See View pod configuration (YAML). For more information about configuring probes, see Configure Liveness, Readiness and Startup Probes.
Pods stuck in Completed
All containers in the pod finished running the startup command and exited. This is expected for batch jobs but not for long-running services.
Diagnose the issue:
Check the startup command in the pod configuration. The containers may have completed their intended command without errors. See View pod configuration (YAML).
Check the pod logs for additional context. See View pod logs.
Pods stuck in Running but not working as expected
The pod shows Running status but the application is not functioning correctly. This is often caused by errors in the pod YAML.
Diagnose the issue:
Compare the pod configuration against your expectations. See View pod configuration (YAML).
Check for typos in environment variable keys. Kubernetes silently ignores misspelled field names -- the pod starts successfully, but the intended configuration does not apply.
To detect field-level typos, run the following command before deploying:
NoteKubernetes silently ignores misspelled field names -- the pod starts successfully, but the intended configuration does not apply.
Run the following command:
kubectl apply --validate -f <your-file>.yamlIf a field name is misspelled (for example,
commndinstead ofcommand), the output includes a warning:[<path>] unknown field: commndAlternatively, export the running pod's YAML and compare it against the original:
kubectl get pods <pod-name> -o yaml > pod.yamlIf the exported YAML is missing fields from the original file, the original may contain misspelled keys.
Check the pod logs for runtime errors. See View pod logs.
Connect to the container terminal to inspect local files and application state. See Connect to a container terminal.
Pods stuck in Terminating
A Terminating pod is being deleted but has not yet stopped. Pods in this state typically resolve on their own after the grace period expires.
If a pod remains in Terminating for an extended period, force-delete it:
kubectl delete pod <pod-name> -n <namespace> --grace-period=0 --forceTroubleshoot OOM errors
When a container's memory usage exceeds its configured memory limit, the kernel terminates the container with an out-of-memory (OOM) kill. The terminated container may restart automatically.
Symptoms:
The container restarts unexpectedly.
The Events tab on the pod details page shows the event: pod was OOM killed.
Diagnose the issue:
Check the memory usage graph to identify when spikes occurred. See View pod monitoring.
Determine whether the high memory usage is caused by a memory leak or by legitimate workload demands:
Memory leak: Investigate the application code based on the timing of memory spikes, log entries, and process names.
Normal memory growth: Increase the pod's memory limit. Set the limit so that actual memory usage stays below 80% of the configured limit. For details, see Manage pods.
For more background, see Assign Memory Resources to Containers and Pods.