All Products
Search
Document Center

Container Compute Service:Pod diagnostics

Last Updated:Feb 12, 2025

Container Intelligence Service (CIS) of Container Compute Service (ACS) provides the pod diagnostics feature to help you diagnose pods. This topic describes the pod diagnostic items and the solutions on how to fix pod issues.

CIS develops a diagnostics system based on expert experience and trains an AI-assisted diagnostics model based on large amounts of data. The pod diagnostics feature provides two diagnostic modes, including expert mode and AI mode, to help locate the root cause of issues. Pod diagnostics includes diagnostic items and root cause.

  • Diagnostic items: include pod and cluster component diagnostics.

  • Root cause: locates the root cause of issues and provides suggestions on how to fix the issues. The pod diagnostics feature collects information about clusters, identifies anomalies, and then performs in-depth diagnostics.

Important

When you use the pod diagnostics feature, the system runs a data collection program in the cluster to collect diagnostic results. The collected information includes the system version, the status of workloads, Docker, and kubelet, and the key error information in system logs. The data collection program does not collect business information or sensitive data.

Scenarios

The following table describes the scenarios of pod diagnostics and AI-assisted diagnostics.

Category

Scenario

Pod diagnostics

Pods are not processed by the scheduler.

Pods cannot be scheduled because they do not meet the requirements of the constraints for scheduling.

Pods are scheduled but are not processed by the kubelet.

Pods are waiting for the volumes to reach the Ready state.

Pods are evicted.

Sandboxed containers in pods fail to be created.

Pods remain in the Terminating state.

Out-of-memory (OOM) errors occur to containers in pods.

Containers in pods exceptionally exit.

Containers in pods remain in the CrashLoopBackOff state.

Containers in pods are not ready.

Pods fail to pull container images.

Timeout errors occur when pods pull container images.

AI-assisted diagnostics

The status of pods is abnormal.

OOM errors occur to pods.

Containers in pods exceptionally exit.

The configuration of the ConfigMaps or Secrets of pods is invalid.

Pods fail to pass health checks.

The configuration of the persistent volume claims (PVCs) of pods is invalid.

Errors occur when pods pull container images.

Procedure

The cluster diagnostics feature collects information about clusters, identifies anomalies, and then performs in-depth diagnostics. The expert mode and AI mode are used to help locate the root cause of issues. Diagnostic results are generated through the following steps: anomaly identification, data collection, diagnostic item check, and root cause analysis.

image.png

  • Anomaly identification: collects basic data, such as pod status and cluster event streams, and analyzes the anomalies based on the collected data.

  • Data collection: collects and diagnoses context-related data based on the results of anomaly identification.

  • Diagnostic item check: checks whether key metrics are normal based on the collected data.

  • Root cause analysis: analyzes the root cause of issues based on the collected data and the check results of diagnostic items.

Diagnostic results

The diagnostic results include the results of root cause analysis and the results of diagnostic item check. The results of root cause analysis include detected anomalies, root cause, suggestions for fixes. The results of diagnostic item check include the check results of each diagnostic item. Diagnostic item check is used to locate the cause that may not be identified by root cause analysis.

Note

The diagnostic items may vary based on the cluster configuration. The actual diagnostic items on the diagnostic page shall prevail.

Pod diagnostic items

Category

Description

Pod

Diagnoses common pod issues, including pod status and image pulling.

ClusterComponent

Diagnoses common cluster issues, including the availability of the API server and DNS service.

Pod

Diagnostic item

Description

Solution

Number of container restarts

Indicates the number of times that the containers in a pod restart.

Check the status and log of the pod. For more information, see Pod troubleshooting.

Container image download failures

Check whether the other pods on the node of the current pod fail to download the container image.

Check the status and log of the pod. For more information, see Pod troubleshooting.

Pod scheduling

Check whether pods are scheduled.

Check the status and log of the pod. For more information, see Pod troubleshooting.

ClusterComponent

Diagnostic item

Description

Solution

API Service availability

Checks whether the API Service of the cluster is available.

Run the kubectl get apiservice command to check the availability of the API Service of the cluster. If the API Service is unavailable, run the kubectl describe apiservice command to view information about the API Service and identify the cause.

Endpoints of the DNS service

Checks the number of CoreDNS endpoints.

Check the status and logs of CoreDNS pods.

Cluster IP addresses of CoreDNS pods

Checks whether cluster IP addresses are allocated to CoreDNS pods. If cluster IP addresses are not allocated to CoreDNS pods, service interruptions may occur.

Check the status and logs of CoreDNS pods.