Container Service for Kubernetes (ACK) provides the auto scaling component (cluster-autoscaler) to automatically scale nodes. Regular instances, GPU-accelerated instances, and preemptible instances can be automatically added to or removed from an ACK cluster to meet your business requirements. This component supports multiple scaling modes, various instance types, and instances that are deployed across zones. This component is applicable to diverse scenarios.

How it works

The auto scaling model of Kubernetes is different from the traditional scaling model that is based on the resource usage threshold. Developers must understand the differences between the two scaling models before they migrate workloads from traditional data centers or other orchestration systems, such as Swarm to Kubernetes.

The traditional scaling model is based on resource usage. For example, if a cluster contains three nodes and the CPU utilization or memory usage of the nodes exceeds the scaling threshold, new nodes are added to the cluster. However, you must consider the following issues when you use the traditional scaling model:

In a cluster, hot nodes may have high resource usage and other nodes may have low resource usage. If an average resource usage is specified as the threshold, auto scaling may be delayed. If the lowest node resource usage is used to set the scaling threshold, the newly added nodes may not be used. This may cause a waste of resources.

In Kubernetes, a pod is used as the smallest unit that runs an application on each node of a cluster. If the resource usage of a pod is high, auto scaling is triggered for the node or the cluster to which the pod is deployed. If the number of pods that run the application and the limits on the pods are not changed, the workloads cannot be distributed to the added nodes.

If a scale-in event is triggered based on a resource usage threshold, the pods that request a large number of resources from nodes but have low resource usage may be evicted from the nodes. If the number of such pods is high in the cluster, scheduling resources may be exhausted and specific pods cannot be scheduled.

How does the auto scaling model of Kubernetes fix these issues? Kubernetes provides a two-layer scaling model that decouples pod scheduling from resource scaling.

In simple terms, pods are scaled based on resource usage. When pods enter the Pending state due to insufficient resources, a scale-out event is triggered. After new nodes are added to the cluster, the pending pods are automatically scheduled to the newly added nodes. This way, the load of the application is balanced. The following section describes the auto scaling model of Kubernetes in detail:

cluster-autoscaler is used to trigger auto scaling by detecting pending pods. When pods become pending due to insufficient scheduling resources, cluster-autoscaler simulates scheduling. Then, the simulated scheduler computes which scaling group can provide new nodes to accept the pending pods. If a scaling group meets the requirements, new nodes are added from the scaling group.

During the simulation, a scaling group is treated as an abstract node, and the instance types in the scaling group indicate the CPU, memory, and GPU capacities of the node. Then, labels and taints are added to the scaling group in the same way labels and taints are added to the node. The simulated scheduler references the abstract node during the simulation. If pending pods can be scheduled to the abstract node, the number of required nodes is calculated to add node from the scaling group.

Only nodes added by scale-out events can be removed in scale-in events. Static nodes cannot be managed by cluster-autoscaler. A scale-in event is triggered on an individual node. When the scheduling resource usage of a node is lower than the specified scheduling threshold, a scale-in event is triggered on the node. In this case, cluster-autoscaler simulates pod eviction on the node and checks whether all pods can be evicted from the node. cluster-autoscaler does not drain the nodes where specific pods run. For example, if non-DaemonSet pods that belong to the kube-system namespace or pods that are controlled by a pod disruption budget (PDB) run on a node, cluster-autoscaler skips this node and then chooses among other candidate nodes. When a node is drained, all pods are evicted from a node to other nodes. After the node is drained, it does not serve your workloads.

Each scaling group is regarded as an abstract node. cluster-autoscaler selects a scaling group for auto scaling based on a policy similar to the scheduling policy. Nodes that meet the scheduling policy are detected. Then, the nodes that meet other policies, such as affinity settings, are selected. If none of the preceding policies is configured, cluster-autoscaler selects a scaling group based on the least-waste policy. The least-waste policy ensures the fewest idle resources after simulation. If a scaling group of GPU-accelerated nodes and a scaling group of CPU nodes both meet the scale-out simulation, the CPU scaling group is prioritized for the scale-out activity.

The result of auto scaling is dependent on the following factors:

  • Whether the scheduling policy is met

    After a scaling group is configured, developers must confirm whether nodes in the scaling group meet the scaling policies of the pods that you want to manage. You can perform a scale-out simulation to check whether the labels of nodes in the scaling group can match the nodeSelector field in pod configurations.

  • Whether scaling resources are sufficient

    After the scheduling simulation is completed, a scaling group is selected to provide nodes. However, whether nodes can be successfully added is determined by the stock availability of the corresponding Elastic Compute Service (ECS) instance types. Therefore, you can configure multiple instance types and multiple zones for the scaling group to improve the success rate of auto scaling.

  • Method 1: Enable the swift mode to accelerate auto scaling. After a scaling group experiences a scale-out event and a scale-in event, the swift mode is enabled for this scaling group.
  • Method 2: Use a custom image based on Alibaba Cloud Linux 2 (formerly known as Aliyun Linux 2) as a base image. This allows you to accelerate resource delivery at the Infrastructure-as-a-Service (IaaS) layer by 50%.

Precaution

  • For each account, the default CPU quota for pay-as-you-go instances in each region is 50 vCPUs. You can add at most 48 custom route entries to each route table of a virtual private cloud (VPC). To request a quota increase, submit a ticket.
  • The stock of ECS instances may be insufficient for auto scaling if you specify only one ECS instance type for a scaling group. We recommend that you specify multiple ECS instance types with the same specification for a scaling group. This increases the success rate of auto scaling.
  • In swift mode, when a node is shut down and reclaimed, the node stops running and enters the NotReady state. When a scale-out event is triggered, the state of the node changes to Ready.
  • If a node is shut down and reclaimed in swift mode, you are charged for only the disks. This rule does not apply to nodes that use local disks, such as the instance type of ecs.d1ne.2xlarge, for which you are also charged a computing fee. If the stock of nodes is sufficient, nodes can be launched within a short period of time.
  • If elastic IP addresses (EIPs) are associated with pods, we recommend that you do not delete the scaling group or remove ECS instances from the scaling group in the ECS console. Otherwise, these EIPs cannot be automatically released.

Step 1: Configure auto scaling

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, click Clusters.
  3. On the Clusters page, follow the instructions to navigate to the Configure Auto Scaling page.
    You can navigate to the Configure Auto Scaling page in the following ways:
    • Method 1: Find the cluster that you want to manage and choose More > Auto Scaling in the Actions column.
    • Method 2:
      1. Find the cluster that you want to manage and click Details in the Actions column.
      2. In the left-side navigation pane of the details page, choose Nodes > Node Pools.
      3. In the upper-right corner of the Node Pools page, click Configure Auto Scaling.

Step 2: Perform authorization

You must perform authorization in the following scenarios:

ACK has limited permissions on nodes in the cluster

If ACK has limited permissions on nodes in the cluster, assign the AliyunCSManagedAutoScalerRole Resource Access Management (RAM) role to ACK.
Note You need to perform the authorization only once for each Alibaba Cloud account.
  1. Activate Auto Scaling (ESS).
    1. In the dialog box that appears, click the first hyperlink to log on to the ESS console.
    2. Click Activate Auto Scaling to go to the Enable Service page.
    3. Select the I agree with Auto Scaling Agreement of Service check box and click Enable Now.
    4. On the Activated page, click Console to log on to the ESS console.
    5. Click Go to Authorize to go to the Cloud Resource Access Authorization page. Then, authorize ESS to access other cloud resources.
    6. Click Confirm Authorization Policy.
  2. Assign the RAM role.
    1. Click the second hyperlink in the dialog box.
      Note You must log on to the RAM console by using an Alibaba Cloud account.
    2. On the Cloud Resource Access Authorization page, click Confirm Authorization Policy.

ACK has unlimited permissions on nodes in the cluster

  1. Activate ESS.
    1. In the dialog box that appears, click the first hyperlink to log on to the ESS console.
    2. Click Activate Auto Scaling to go to the Enable Service page.
    3. Select the I agree with Auto Scaling Agreement of Service check box and click Enable Now.
    4. On the Activated page, click Console to log on to the ESS console.
    5. Click Go to Authorize to go to the Cloud Resource Access Authorization page. Then, authorize ESS to access other cloud resources.
    6. Click Confirm Authorization Policy.
    If the authorization is successful, you are redirected to the ESS console. Close the page and modify the permissions of the worker RAM role.
  2. Modify the permissions of the worker RAM role.
    1. Click the second hyperlink in the dialog box to go to the RAM Roles page.
      Note You must log on to the RAM console by using an Alibaba Cloud account.
    2. On the Permissions tab, click the name of the policy assigned to the RAM role. The details page of the policy appears.
    3. Click Modify Policy Document. The Modify Policy Document panel appears on the right side of the page.
    4. In the Policy Document section, add the following policy content to the Action field and click OK.
      "ess:Describe*", 
      "ess:CreateScalingRule", 
      "ess:ModifyScalingGroup", 
      "ess:RemoveInstances", 
      "ess:ExecuteScalingRule", 
      "ess:ModifyScalingRule", 
      "ess:DeleteScalingRule", 
      "ecs:DescribeInstanceTypes",
      "ess:DetachInstances",
      "vpc:DescribeVSwitches"
      Note Before you add the policy content, add a comma ( ,) to the end of the bottom line in the Action field.

An auto-scaling node pool in the cluster must be associated with an EIP

If you want to associate an auto-scaling group with an EIP, perform the following steps to grant permissions:

  1. Activate ESS.
    1. In the dialog box that appears, click the first hyperlink to log on to the ESS console.
    2. Click Activate Auto Scaling to go to the Enable Service page.
    3. Select the I agree with Auto Scaling Agreement of Service check box and click Enable Now.
    4. On the Activated page, click Console to log on to the ESS console.
    5. Click Go to Authorize to go to the Cloud Resource Access Authorization page. Then, authorize ESS to access other cloud resources.
    6. Click Confirm Authorization Policy.
    If the authorization is successful, you are redirected to the ESS console. Close the page and modify the permissions of the worker RAM role.
  2. Modify the permissions of the worker RAM role.
    1. Click the second hyperlink in the dialog box to go to the RAM Roles page.
      Note You must log on to the RAM console by using an Alibaba Cloud account.
    2. On the Permissions tab, click the name of the policy assigned to the RAM role. The details page of the policy appears.
    3. Click Modify Policy Document. The Modify Policy Document panel appears on the right side of the page.
    4. In the Policy Document section, add the following policy content to the Action field and click OK.
      "ecs:AllocateEipAddress",
      "ecs:AssociateEipAddress",
      "ecs:DescribeEipAddresses",
      "ecs:DescribeInstanceTypes",
      "ecs:DescribeInvocationResults",
      "ecs:DescribeInvocations",
      "ecs:ReleaseEipAddress",
      "ecs:RunCommand",
      "ecs:UnassociateEipAddress",
      "ess:CompleteLifecycleAction",
      "ess:CreateScalingRule",
      "ess:DeleteScalingRule",
      "ess:Describe*",
      "ess:DetachInstances",
      "ess:ExecuteScalingRule",
      "ess:ModifyScalingGroup",
      "ess:ModifyScalingRule",
      "ess:RemoveInstances",
      "vpc:AllocateEipAddress",
      "vpc:AssociateEipAddress",
      "vpc:DescribeEipAddresses",
      "vpc:DescribeVSwitches",
      "vpc:ReleaseEipAddress",
      "vpc:UnassociateEipAddress",
      "vpc:TagResources"
      Note Before you add the policy content, add a comma ( ,) to the end of the bottom line in the Action field.
    5. On the RAM Roles page, click the name of the worker RAM role. On the details page of the RAM role, click the Trust Policy Management tab and click Edit Trust Policy. In the Edit Trust Policy panel, add oos.aliyuncs.com to the Service field, as shown in the following figure. Then, click OK.
      oos

Step 3: Configure auto scaling

  1. On the Configure Auto Scaling page, set the following parameters and click Submit.
    Parameter Description
    Cluster The name of the cluster for which you want to enable auto scaling.
    Scale-in Threshold For a scaling group that is managed by cluster-autoscaler, set the value to the ratio of the requested resources per node to the total resources per node. If the actual value is lower than the threshold, the node is removed from the cluster.
    Note In auto scaling, a scale-out event is automatically triggered based on node scheduling. Therefore, you need to set only scale-in parameters.
    GPU Scale-in Threshold The scale-in threshold for GPU-accelerated nodes. If the actual value is lower than the threshold, one or more GPU-accelerated nodes are removed from the Kubernetes cluster.
    Defer Scale-in For The amount of time that the cluster must wait before the cluster scales in. Unit: minutes. The default value is 10 minutes.
    Cooldown Newly added nodes cannot be removed in scale-in events during the cool down period.
  2. Select an instance type. Supported instance types are regular instances, GPU-accelerated instances, and preemptible instances. Then, click Create.
  3. In the Auto Scaling Group Configuration dialog box, set the following parameters to create a scaling group.
    Parameter Description
    Region The region where you want to deploy the scaling group. The scaling group and the Kubernetes cluster must be deployed in the same region. You cannot change the region after the scaling group is created.
    VPC The scaling group and the Kubernetes cluster must be deployed in the same VPC.
    VSwitch The vSwitches of the scaling group. You can specify vSwitches of different zones. The vSwitches allocate pod CIDR blocks to the scaling group.
  4. Configure worker nodes.
    Parameter Description
    Node Type The types of nodes in the scaling group. The node types must be the same as those selected when the cluster is created.
    Instance Type The instance types in the scaling group.
    Selected Types The instance types that you select. You can select at most 10 instance types.
    System Disk The system disk of the scaling group.
    Mount Data Disk Specify whether to mount data disks to the scaling group. By default, no data disk is mounted.
    Instances The number of instances contained in the scaling group.
    Note
    • Existing instances in the cluster are excluded.
    • By default, the minimum number of instances is 0. If you specify one or more instances, the system adds the instances to the scaling group. When a scale-out event is triggered, the instances in the scaling group are added to the cluster to which the scaling group is bound.
    Key Pair The key pair that is used to log on to the nodes in the scaling group. You can create key pairs in the ECS console.
    Note You can log on to the nodes only by using key pairs.
    Scaling Mode You can select Standard or Swift.
    RDS Whitelist The ApsaraDB RDS instances that can be accessed by the nodes added from the scaling group.
    Node Label Labels are automatically added to nodes that are added to the cluster by scale-out activities.
    Taints After you add taints to a node, ACK no longer schedules pods to the node.
  5. Click OK to create the scaling group.

Check the results

  1. On the Auto Scaling page, you can find the newly created scaling group below Regular Instance.
    Auto Scaling
  2. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
  3. In the left-side navigation pane of the details page, choose Workloads > Deployments.
  4. On the Deployments tab, select the kube-system namespace. You can find the cluster-autoscaler component. This indicates that the scaling group is created.

FAQ

  • Why does the auto scaling component fail to add nodes after a scale-out event is triggered?
    Check whether the following situations exist:
    • The instance types in the scaling group cannot fulfill the resource request from pods. By default, system components are installed for each node. Therefore, the requested pod resources must be less than the resource capacity of the instance type.
    • The worker RAM role does not have the required permissions. You must configure RAM roles for each Kubernetes cluster that is involved in the scale-out activity.
  • Why does the auto scaling component fail to remove nodes after a scale-in event is triggered?
    Check whether the following situations exist:
    • The requested resource threshold of each pod is higher than the configured scale-in threshold.
    • Pods that belong to the kube-system namespace are running on the node.
    • A hard scheduling policy is configured to force the pods to run on the current node. Therefore, the pods cannot be scheduled to other nodes.
    • PodDisruptionBudget is set for the pods on the node and the minimum value of PodDisruptionBudget is reached.

    For more information about FAQ, see open source component.

  • How does the system choose a scaling group for a scaling event?

    When pods cannot be scheduled to nodes, the auto scaling component simulates the scheduling of the pods based on the configurations of scaling groups. The configurations include labels, taints, and instance specifications. If a scaling group can simulate the scheduling of the pods, this scaling group is selected for the scale-out activity. If more than one scaling groups meet the requirements, the system selects the scaling group that has the fewest idle resources after simulation.