All Products
Search
Document Center

:How can I troubleshoot the failure to configure the auto scaling component in a Kubernetes cluster?

Last Updated:Nov 30, 2020

Problem description

A failure may occur when you configure the auto scaling component for a Kubernetes cluster. In this case, the cluster-autoscaler pod stays in the failed restart state and cannot run as expected.

Check the logs of the pod to find the following error message.

Causes

An error may occur when you configure the required Resource Access Management (RAM) roles and permissions. Possible causes include:

  • The RAM role for a target Elastic Compute Service (ECS) instance does not exist.
  • An error occurs when you configure permission policies.
  • An error occurs when you configure trust policies.

Solutions

  1. Check the RAM role of the worker nodes by using one of the following methods:
    • Manually check the details of the RAM role by role name.
      1. Log on to the ECS instance where the cluster-autoscaler pod is deployed.
      2. On the command line, run the following command to obtain the RAM role name:
        curl 100.100.100.200/latest/meta-data/ram/security-credentials/
        The following command output is returned.
      3. Log on to the RAM console and check the details of the RAM role by role name.
    • Check the RAM role by navigating from the Container Service for Kubernetes console.
      1. Log on to the Container Service for Kubernetes console.
      2. Go to the Cluster Resources tab of the Kubernetes cluster and click the value of the Worker RAM Role field.
      3. You are navigated to the RAM console to check the details of the RAM role.
  2. Go to the Permissions tab and click the policy name in the Policy column.
  3. Check whether the Action array in the Statement field contains the following permissions on Auto Scaling (ESS). You must manually add the missing permissions that are required.
    "ess:Describe*", 
    "ess:CreateScalingRule", 
    "ess:ModifyScalingGroup", 
    "ess:RemoveInstances", 
    "ess:ExecuteScalingRule", 
    "ess:ModifyScalingRule", 
    "ess:DeleteScalingRule", 
    "ecs:DescribeInstanceTypes",
    "ess:DetachInstances"
  4. Return to the RAM roles page and click the Trust Policy Management tab.
  5. Verify that the following trust policy is configured. Any differences must be manually rectified.
  6. Log on to the ECS console, go to the details page of the ECS instance, and then check whether the required RAM role is configured for the ECS instance. If the required RAM role does not exist, you must configure the RAM role. For more information, see Step 2 as specified in Use RAM roles to access other Alibaba Cloud services.
  7. Use the kubectl command-line tool to connect to a master node of the Kubernetes cluster. On the command line, run the following command to delete the cluster-autoscaler pod. Check the logs and status of the created pod to confirm that the cluster-autoscaler pod is running as expected.
    kubectl delete pod cluster-autoscaler-XXXX

Application scope

  • Dedicated clusters of Container Service for Kubernetes
  • Managed clusters of Container Service for Kubernetes