All Products
Search
Document Center

Container Service for Kubernetes:Troubleshooting

Last Updated:Mar 26, 2024

This topic provides answers to some frequently asked questions about using Container Service for Kubernetes (ACK) clusters. This topic also describes the troubleshooting procedure.

ACK cluster exceptions

For more information about the frequently asked questions about ACK, see FAQ (earlier version).

Troubleshoot application issues in ACK

  • Pods remain in the Pending state

    Pods that remain in the Pending state cannot be scheduled to nodes. This is because the cluster does not have sufficient resources to run the pods. You can run the kubectl describe pod command to view the events and troubleshoot the issues.

  • Pods remain in the Waiting state

    If a pod remains in the Waiting state, the pod is scheduled to a node but cannot run as normal. This is because the private image or public image fails to be pulled or the image address is invalid. For more information, see Pods remain in the Waiting state.

  • Pods keep restarting but remain in the Crashing or Unhealthy state

    If a pod remains in the Crashing or Unhealthy state, the pod is scheduled to a node but fails to start. This issue is caused by configuration errors or permission issues. You can view the container log and check if the application in the pod encounters an error. For more information, see Pods remain in the Crashing or Unhealthy state.

  • Pods remain in the Running state but do not run as normal

    This is because the YAML file contains some invalid fields. You can verify the Deployment of the pod to identify the cause. For more information, see Pods remain in the Running state but do not run as normal.

  • Services cannot run as normal

    If the issue is not caused by the network plug-in, the issue is probably caused by invalid labels. In this case, you can check the endpoints to identify the cause. For more information, see Troubleshoot Services.

How do I upgrade an ACK cluster?

You can upgrade an ACK cluster by using one of the following methods:

Troubleshooting procedure and common causes

  • Check whether ECS instances can communicate with each other. For more information, see Fail to ping ECS instances.

  • Check whether the security group is properly configured. For more information, see Check security group rules.

    For more information about how to configure ECS security groups, see Configure security groups in different scenarios.

  • Check whether the RAM user is granted the required permissions. For more information, see Grant permissions to a RAM user.

  • Check whether the running environment is normal when you run the docker run command.

  • Check whether kubectl can be used to log on to a cluster when errors occur in the cluster. Check whether you can run the kubectl get event command as normal. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

  • Check whether a pod can access another pod in the Kubernetes cluster. For more information, see Pod fails to access pods on another node.

  • Check whether Services can be used to access applications. For more information, see Use an existing SLB instance to expose an application.

  • Check whether Ingresses can be used to access applications. For more information, see Access Services by using an ALB Ingress.

  • Check whether errors are recorded in the logs of the API server, scheduler, and controller.

  • Check whether errors are recorded in the log of the Docker daemon.

    If the docker daemon is not running error message is returned, you need only to start the Docker daemon in cmd.exe.

    • If you use Windows, run the following command to start the Docker daemon:

      cd C:\Program Files\Docker\Docker
      DockerCli.exe -SwitchDaemon
    • If you use Linux, run the following command to start the Docker daemon:

      service docker restart

How do I troubleshoot errors based on log data?

You can run the following commands to view logs and troubleshoot errors.

  • Run the kubectl describe **** command to view events.

  • Run the journalctl -u docker -f command to query the log of Docker.

  • Run the journalctl -u kubelet -f command to query the log of kubelet.

  • Run the docker logs <api server container id> command to query the log of the API server.

    Note

    This command is used to query the log of the API server in ACK dedicated clusters. If you use an ACK managed cluster, see Collect the logs of control plane components in ACK Pro clusters.

  • Run the docker logs <scheduler container id> command to query the log of the scheduler.

  • Run the docker logs <worker proxy container id> command to query the log of the worker proxy.

  • Run the docker logs <master proxy container id> command to query the log of the master proxy.

  • Run the docker logs <controller container id> <controller container id> command to query the logs of controllers. The controllers are kube-controller, alicloud-monitor-controller, alicloud-disk-controller, and cloud-controller.

We recommend that you import the logs to Log Service and analyze the logs in Log Service. For more information, see Getting Started.