All Products
Search
Document Center

:Troubleshoot faulty applications in Container Service for Kubernetes

Last Updated:May 11, 2022

Overview

This topic describes how to troubleshoot faulty applications in Container Service for Kubernetes (ACK).

Troubleshooting method

The faults that occur on applications may be caused by pods, Services, and controllers such as Deployments, StatefulSets, and DaemonSets. You can troubleshoot faulty applications by performing the following operations:

Debug pods

Log on to the ACK console. On the Clusters page, click the name of the cluster in which you want to debug pods. On the management page of the cluster, choose Workloads > Pods in the left-side navigation pane. On the Pods page, check the status of the pod that you want to debug. Debug the pod based on its status:

Pod in the Pending state

  1. A pod is stuck in the Pending state if it cannot be scheduled to a node because the required resources are unavailable in the cluster.
  2. On the Pods page, click the name of the pod that you want to debug. On the page that appears, click the Events tab. On this tab, view event descriptions to find the cause of the scheduling failure. The scheduling failure may have the following causes:
    • Lack of required resources
      In specific cases, a pod depends on resources in the cluster, such as ConfigMaps and persistent volume claims (PVCs). If the resources are unavailable, the pod fails to be scheduled. For example, before you specify a PVC for a pod, you must bind the PVC to a persistent volume (PV).
    • Insufficient resources
      1. On the management page of the cluster, choose Nodes > Node in the left-side navigation pane. Check the resource usage on each node in the cluster.
        Note: After a pod is scheduled to a node, the resource usage on the node may exceed the specified threshold. In this case, the scheduler will not schedule the pod to the node even if the CPU utilization or memory usage is low on the node. This prevents the node from encountering resource shortage during peak hours.
      2. If the CPU and memory resources are exhausted in a cluster, you can try the following methods to resolve this issue:
        • Delete unnecessary pods. For more information, see Manage pods.
        • Adjust the resource requests of pods.
        • Add nodes to the cluster. For more information, see Scale out a node pool.
        • Upgrade the configurations of nodes. For more information, see Add worker nodes.
    • Use of a hostPort
      If you bind a pod to a hostPort, the number of replicas that you specify for the Deployment or replication controller cannot be greater than the number of nodes in the cluster. This is because a node provides only one specific port. If the hostPort has been occupied by another application, the pod cannot be scheduled. Therefore, we recommend that you do not use a hostPort. Instead, we recommend that you use a Service to expose a pod. For more information, see Service.
    • Taints and tolerations
      If the event descriptions contain "Taints" or "Tolerations", the scheduling failure of a pod is caused by taints. You can delete taints or configure toleration rules for the pod. For more information, see Manage taints and Configure pod scheduling.
      Note: For more information about taints and tolerations, see Taints and Tolerations.

Pod in the Waiting state

A pod is stuck in the Waiting state if it cannot run after it is scheduled to a node. You can view the event descriptions of this pod. The most common cause is that the container image fails to be pulled when you create the pod from a YAML file. Perform the following operations to find the cause:

  • Check whether the name of the container image that you specify is valid.
  • Check whether the container image is pushed to the image repository.
    On the node to which the pod is scheduled, run the docker pull [$Image] command to check whether the container image can be pulled.
    Note: Replace [$Image] with the name of the container image.
  • Check whether the container image is from a private image repository. For more information about how to create an application by pulling a container image from a private image repository, see Create an application from a private image repository.

Pod in the Crash or Unhealthy state

In most cases, a pod is in the Crash or Unhealthy state because an error occurred on the application in containers. You can view the logs of the containers to find the cause. On the management page of the pod, click the Logs tab and select a container. Find the cause based on the log content. For more information about how to debug a pod in the Crash or Unhealthy state, see the Debug Running Pods topic in the official Kubernetes documentation.

Pod in the Running state that does not run as expected

If a pod does not run as expected, the YAML file that you use to deploy the pod may contain an error but the error was ignored when the system created the pod. The error may be the incorrect spelling of a key. For example, if you misspell "command" as "commnd", the pod will be created but it will not use the command that you intended it to run. Instead, the default command in the image will be run. To resolve this issue, perform the following steps:

  1. Add the --validate option to the kubectl apply -f command. The following code provides an example:
    kubectl apply --validate -f XXX.yaml
    If you misspelled "command" as "commnd" in the YAML file, the following error message is returned:
    XXX] unknown field: commnd
    XXX] this may be a false alarm, see https://gXXXb.XXX/6842
    pods/test
  2. Check whether the pod is created as expected. Run the following command to export the pod configurations to the XXX.yaml file from the APIServer. Then, compare the XXX.yaml file with the original YAML file that you used to create the pod. In normal cases, the YAML file that you obtain from the APIServer by running the preceding command contains more lines than the original YAML file. However, if specific code lines in the original YAML file are missing in the XXX.yaml file, the pod may not be created as expected.
    kubectl get pods/[$Pod] -o yaml > XXX.yaml
    Note: Replace [$Pod] with the name of the abnormal pod. You can run the kubectl get pods command to obtain the name.

Debug controllers

Perform the following operations to debug controllers:

  • Most faults related to resources such as Deployments, DaemonSets, StatefulSets, and Jobs are caused by pods. In this case, you must first debug pods. For more information, see Debug pods.
  • View the events and logs of resources, such as the Deployment.
    Note: This topic provides the steps that you can perform to view the events and logs of a Deployment. The steps that are used to view the events or logs of a DaemonSet, StatefulSet, or Job are similar to those in this example.
    1. On the management page of the cluster in which the Deployment is created, choose Workloads > Deployments. On the Deployments page, click the name of the Deployment.
    2. On the page that appears, click the Events or Logs tab. Find the cause by checking the exception information in the events and logs.
  • If you debug a StatefulSet, you may encounter special issues. For more information, see Forced Rollback.

Debug Services

A Service provides load balancing across a set of pods. Several common issues may occur on Services. Perform the following steps to debug a Service:

  1. Check the endpoints of the Service.
  2. Log on to the master node of the ACK cluster. For more information, see Use kubectl to connect to an ACK cluster.
  3. Run the following command to view the endpoints of the Service:
    kubectl get endpoints [$Service_Name]
    Note: Replace [$Service_Name] with the name of the Service that you want to debug.
    Verify that the number of endpoints is the same as the number of pods that you expect to be the members of the Service. For example, if you deploy an application as a Deployment with three replicas, the value of the ENDPOINTS parameter must contain three endpoints, as shown in the following figure.

Incomplete endpoints for the Service

 If the number of endpoints is not the number of expected pods, you can use the label selector to list the pods that are providing the Service based on the label selector of the Service. Check whether the listed pods are the pods that you expect to provide the Service. Perform the following steps:

  1. View the label selector in the YAML file that you use to create the Service, as shown in the following figure.
  2. Run the following command to list the pods that match the label selector. Then, check whether the listed pods are those that you expect to provide the Service.
    kubectl get pods --selector=app=[$App] -n [$Namespace]
    Note:
    • Replace [$App] with the value of the selector that you obtained.
    • Replace [$Namespace] with the namespace to which the Service belongs. If the Service belongs to the default namespace, you do not need to specify the namespace in the command.
  3. If the listed pods are the expected pods but the IP addresses of specific pods are not in the endpoints of the Service, the targetport that is specified for the Service may be invalid. The IP address of a pod is added to the endpoints of the Service only when the specified targetport is exposed by the pod. Run the following command to check whether the targetport can be accessed:
    curl [$IP]:[$Port]
    Note:
    • Replace [$IP] with the value of the ClusterIP parameter in the YAML file that you view in Step 1.
    • Replace [$Port] with the value of the port parameter in the YAML file that you view in Step 1.
    • The test method varies based on the actual environment.

Traffic forwarding

If your client can access the Service and the endpoints match the pods but the connection is soon closed, traffic may not be forwarded to the pods. You can perform the following operations to find the cause:

  • Check whether the pods run as expected.
    For more information, see Debug pods.
  • Check whether the IP addresses of the pods are accessible.
    1. Run the following command to query the IP addresses of the pods:
      kubectl get pods -o wide
    2. Log on to a node and run the ping command to check whether the IP addresses of the pods are accessible.
  • Check whether your application listens on the specified port.
    If your application listens on port 80, you must specify port 80 as the targetport of the Service. Log on to a node and run the curl [$IP]:[$Port] command to check whether the port can be accessed.

References

Application scope

  • ACK