Container Service for Kubernetes (ACK) provides the auto scaling component (cluster-autoscaler) to automatically scale nodes. Regular instances, GPU-accelerated instances, and preemptible instances can be automatically added to or removed from an ACK cluster to meet your business requirements. This component supports multiple scaling modes, various instance types, and instances that are deployed across zones. This component is applicable to diverse scenarios.

How auto scaling works

The auto scaling model of Kubernetes is different from the traditional scaling model that is based on the resource usage threshold. Developers must understand the differences between the two scaling models before they migrate workloads from traditional data centers or other orchestration systems to Kubernetes. For example, developers migrate workloads from Swarm clusters to ACK clusters.

The traditional scaling model is based on resource usage. For example, if a cluster contains three nodes and the CPU utilization or memory usage of the nodes exceeds the scaling threshold, new nodes are automatically added to the cluster. However, you must consider the following issues when you use the traditional scaling model:

In a cluster, hot nodes may have high resource usage and other nodes may have low resource usage. If an average resource usage is specified as the threshold, auto scaling may be delayed. If the lowest node resource usage is used to set the scaling threshold, the newly added nodes may not be used. This may cause a waste of resources.

In Kubernetes, a pod is used as the smallest unit that runs an application on each node of a cluster. When auto scaling is triggered for a cluster or a node in the cluster, pods with high resource usage are not replicated and the resource limits of these pods are not changed. As a result, the loads cannot be balanced to newly added nodes.

If scale-in activities are triggered based on resource usage, pods that request large amounts of resources but have low resource usage may be evicted. If the number of these pods is large within a Kubernetes cluster, resources may be exhausted and some pods may fail to be scheduled.

How does the auto scaling model of Kubernetes fix these issues? Kubernetes provides a two-layer scaling model that decouples pod scheduling from resource scaling.

Pods are scaled based on resource usage. When pods enter the Pending state due to insufficient resources, a scale-out activity is triggered. After new nodes are added to the cluster, the pending pods are automatically scheduled to the newly added nodes. This way, the loads of the application are balanced. The following section describes the auto scaling model of Kubernetes in detail:

cluster-autoscaler is used to trigger auto scaling by detecting pending pods. When pods enter the Pending state due to insufficient resources, cluster-autoscaler simulates pod scheduling to decide the scaling group that can provide new nodes to accept the pending pods. If a scaling group meets the requirement, nodes from this scaling group are added to the cluster.

A scaling group is treated as a node during the simulation. The instance type of the scaling group specifies the CPU, memory, and GPU resources of the node. The labels and taints of the scaling group are also applied to the node. The node is used to simulate the scheduling of the pending pods. If the pending pods can be scheduled to the node, cluster-autoscaler calculates the number of nodes that are required to be added from the scaling group.

Only nodes added by scale-out activities can be removed in scale-in activities. Static nodes cannot be managed by cluster-autoscaler. Each node is separately evaluated to determine whether the node needs to be removed. If the resource usage of a node drops below the scale-in threshold, a scale-in activity is triggered for the node. In this case, cluster-autoscaler simulates the eviction of all workloads on the node to determine whether the node can be completely drained. cluster-autoscaler does not drain the nodes that contain specific pods, such as non-DaemonSet pods in the kube-system namespace and pods that are controlled by PodDisruptionBudgets (PDBs). A node is drained before it is removed. After pods on the node are evicted to other nodes, the node can be removed.

Each scaling group is regarded as an abstract node. cluster-autoscaler selects a scaling group for auto scaling based on a policy similar to the scheduling policy. Nodes are first filtered by the scheduling policy. Among the filtered nodes, the nodes that conform to policies, such as affinity settings, are selected. If no scheduling policy or affinity settings are configured, cluster-autoscaler selects a scaling group based on the least-waste policy. The least-waste policy selects the scaling group that has the fewest idle resources after simulation. If a scaling group of regular nodes and a scaling group of GPU-accelerated nodes both meet the requirements, the scaling group of regular nodes is selected by default.

The result of auto scaling is dependent on the following factors:

  • Whether the scheduling policy is met

    After you configure a scaling group, you must be aware of the pod scheduling policies that the scaling group supports. If you are unaware of the pod scheduling policies, you can simulate a scaling activity by using the node selectors of pending pods and the labels of the scaling group.

  • Whether resources are sufficient

    After the scaling simulation is complete, a scaling group is selected. However, the scaling activity fails if the specified types of Elastic Compute Service (ECS) instances in the scaling group are out of stock. Therefore, you can configure multiple instance types and multiple zones for the scaling group to improve the success rate of auto scaling.

  • Method 1: Enable the swift mode to accelerate auto scaling. After a scaling group experiences a scale-out activity and a scale-in activity, the swift mode is enabled for this scaling group.
  • Method 2: Use custom images that are created from the base image of Alibaba Cloud Linux 2 (formerly known as Aliyun Linux 2). This ensures that the resources of Infrastructure as a Service (IaaS) are delivered 50% faster.

Limits

  • For each account, the default CPU quota for pay-as-you-go instances in each region is 50 vCPUs. You can add at most 48 custom route entries to each route table of a virtual private cloud (VPC). To increase the quota, submit a ticket.
  • The stock of ECS instances may be insufficient for auto scaling if you specify only one ECS instance type for a scaling group. We recommend that you specify multiple ECS instance types with the same specification for a scaling group. This increases the success rate of auto scaling.
  • In swift mode, when a node is shut down and reclaimed, the node stops running and enters the NotReady state. When a scale-out activity is triggered, the state of the node changes to Ready.
  • If a node is shut down and reclaimed in swift mode, you are charged only for the disks. This rule does not apply to nodes that use local disks, such as the instance type of ecs.d1ne.2xlarge, for which you are also charged a computing fee. If the stock of nodes is sufficient, nodes can be launched within a short period of time.
  • If elastic IP addresses (EIPs) are bound to pods, we recommend that you do not delete the scaling group or the ECS nodes that are added from the scaling group in the ECS console. Otherwise, these EIPs cannot be automatically released.

Step 1: Configure auto scaling

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, click Clusters.
  3. On the Clusters page, follow the instructions to navigate to the Configure Auto Scaling page.
    You can navigate to the Configure Auto Scaling page in the following ways:
    • Method 1: Find the cluster that you want to manage and choose More > Auto Scaling in the Actions column.
    • Method 2:
      1. Find the cluster that you want to manage and click Details in the Actions column.
      2. In the left-side navigation pane of the details page, choose Nodes > Node Pools.
      3. In the upper-right corner of the Node Pools page, click Configure Auto Scaling.

Step 2: Perform authorization

You must perform authorization in the following scenarios:

ACK has limited permissions on nodes in the cluster

If ACK has limited permissions on nodes in the cluster, assign the AliyunCSManagedAutoScalerRole Resource Access Management (RAM) role to ACK.
Note You need to perform the authorization only once for each Alibaba Cloud account.
  1. Activate Auto Scaling (ESS).
    1. In the dialog box that appears, click the first hyperlink to log on to the ESS console.
    2. Click Activate Auto Scaling to go to the Enable Service page.
    3. Select the I agree with Auto Scaling Agreement of Service check box and click Enable Now.
    4. On the Activated page, click Console to log on to the ESS console.
    5. Click Go to Authorize to go to the Cloud Resource Access Authorization page. Then, authorize ESS to access other cloud resources.
    6. Click Confirm Authorization Policy.
  2. Assign the RAM role.
    1. Click the second hyperlink in the dialog box.
      Note This step requires you to log on to the console with an Alibaba Cloud account.
    2. On the Cloud Resource Access Authorization page, click Confirm Authorization Policy.

ACK has unlimited permissions on nodes in the cluster

  1. Activate ESS.
    1. In the dialog box that appears, click the first hyperlink to log on to the ESS console. autoess
    2. Click Activate Auto Scaling to go to the Enable Service page.
    3. Select the I agree with Auto Scaling Agreement of Service check box and click Enable Now.
    4. On the Activated page, click Console to log on to the ESS console.
    5. Click Go to Authorize to go to the Cloud Resource Access Authorization page. Then, authorize ESS to access other cloud resources.
    6. Click Confirm Authorization Policy.
    If the authorization is successful, you are redirected to the ESS console. Close the page and modify the permissions of the worker RAM role.
  2. Modify the permissions of the worker RAM role.
    1. Click the second hyperlink in the dialog box to go to the RAM Roles page. aliyunrole
      Note This step requires you to log on to the console with an Alibaba Cloud account.
    2. On the Permissions tab, click the name of the policy assigned to the RAM role. The details page of the policy appears. permission
    3. Click Modify Policy Document. The Modify Policy Document panel appears on the right side of the page.
      policycontent
    4. In the Policy Document section, add the following policy content to the Action field and click OK.
      "ess:Describe*", 
      "ess:CreateScalingRule", 
      "ess:ModifyScalingGroup", 
      "ess:RemoveInstances", 
      "ess:ExecuteScalingRule", 
      "ess:ModifyScalingRule", 
      "ess:DeleteScalingRule", 
      "ecs:DescribeInstanceTypes",
      "ess:DetachInstances",
      "vpc:DescribeVSwitches"
      Note Before you add the policy content, add a comma (,) to the end of the bottom line in the Action field.

An auto-scaling node pool in the cluster must be associated with an EIP

If you want to associate an auto-scaling group with an EIP, perform the following steps to grant permissions:

  1. Activate ESS.
    1. In the dialog box that appears, click the first hyperlink to log on to the ESS console. autoess
    2. Click Activate Auto Scaling to go to the Enable Service page.
    3. Select the I agree with Auto Scaling Agreement of Service check box and click Enable Now.
    4. On the Activated page, click Console to log on to the ESS console.
    5. Click Go to Authorize to go to the Cloud Resource Access Authorization page. Then, authorize ESS to access other cloud resources.
    6. Click Confirm Authorization Policy.
    If the authorization is successful, you are redirected to the ESS console. Close the page and modify the permissions of the worker RAM role.
  2. Modify the permissions of the worker RAM role.
    1. Click the second hyperlink in the dialog box to go to the RAM Roles page. aliyunrole
      Note This step requires you to log on to the console with an Alibaba Cloud account.
    2. On the Permissions tab, click the name of the policy assigned to the RAM role. The details page of the policy appears. permission
    3. Click Modify Policy Document. The Modify Policy Document panel appears on the right side of the page.
      policy
    4. In the Policy Document section, add the following policy content to the Action field and click OK.
      "ecs:AllocateEipAddress",
      "ecs:AssociateEipAddress",
      "ecs:DescribeEipAddresses",
      "ecs:DescribeInstanceTypes",
      "ecs:DescribeInvocationResults",
      "ecs:DescribeInvocations",
      "ecs:ReleaseEipAddress",
      "ecs:RunCommand",
      "ecs:UnassociateEipAddress",
      "ess:CompleteLifecycleAction",
      "ess:CreateScalingRule",
      "ess:DeleteScalingRule",
      "ess:Describe*",
      "ess:DetachInstances",
      "ess:ExecuteScalingRule",
      "ess:ModifyScalingGroup",
      "ess:ModifyScalingRule",
      "ess:RemoveInstances",
      "vpc:AllocateEipAddress",
      "vpc:AssociateEipAddress",
      "vpc:DescribeEipAddresses",
      "vpc:DescribeVSwitches",
      "vpc:ReleaseEipAddress",
      "vpc:UnassociateEipAddress",
      "vpc:TagResources"
      Note Before you add the policy content, add a comma (,) to the end of the bottom line in the Action field.
    5. On the Roles page, click the name of the worker RAM role. On the details page of the RAM role, click the Trust Policy Management tab and click Edit Trust Policy. In the Edit Trust Policy panel, add oos.aliyuncs.com to the Service field, as shown in the following figure. Then, click OK.
      oos

Step 3: Configure auto scaling

  1. On the Configure Auto Scaling page, set the following parameters and click Submit.
    Parameter Description
    Clusters The name of the cluster for which you want to enable auto scaling.
    Allow Scale-in Specify whether to allow the scale-in of nodes. If you turn off this option, scale-in configurations do not take effect. Proceed with caution.
    Scale-in Threshold For a scaling group that is managed by cluster-autoscaler, set the value to the ratio of the requested resources per node to the total resources per node. If the actual value is lower than the threshold, the node is removed from the cluster.
    Note In auto scaling, a scale-out activity is automatically triggered based on node scheduling. Therefore, you only need to set scale-in parameters.
    GPU Scale-in Threshold The scale-in threshold for GPU-accelerated nodes. If the actual value is lower than the threshold, one or more GPU-accelerated nodes are removed from the Kubernetes cluster.
    Defer Scale-in For The time to wait after the scale-in threshold is met and before the scale-in activity starts. Unit: minutes. The default value is 10 minutes.
    Cooldown Newly added nodes cannot be removed in scale-in activities during the cool down period.
    Scan Interval You can set this parameter to configure the interval at which the cluster is evaluated for scaling. Valid values: 15s, 30s, 60s. Default value: 30s.
    Node Pool Scaling Policy The policy for selecting which node pool to scale. Valid values: least-waste and random.
  2. Select an instance type. Supported instance types are regular instances, GPU-accelerated instances, and preemptible instances. Then, click Create Node Pool.
  3. In the Create Node Pool dialog box, set the node pool parameters.
    For more information, see Create a managed Kubernetes cluster. The following list describes some of the parameters:
    Parameter Description
    Region The region where you want to deploy the scaling group. The scaling group and the Kubernetes cluster must be deployed in the same region. You cannot change the region after the scaling group is created.
    VPC The scaling group and the Kubernetes cluster must be deployed in the same VPC.
    vSwitch The vSwitches of the scaling group. You can specify vSwitches of different zones. The vSwitches allocate pod CIDR blocks to the scaling group.
    Auto Scaling Select the node type based on your requirements. You can select Regular Instance, GPU Instance, Shared GPU Instance, or Preemptible Instance. The selected node type must be the same as the node type that you select when you create the cluster.
    Instance Type The instance types in the scaling group.
    Selected Types The instance types that you have selected. You can select at most 10 instance types.
    System Disk The system disk of the scaling group.
    Mount Data Disk Specify whether to mount data disks to the scaling group. By default, no data disk is mounted.
    Instances The number of instances contained in the scaling group.
    Note
    • Existing instances in the cluster are excluded.
    • By default, the minimum number of instances is 0. If you specify one or more instances, the system adds the instances to the scaling group. When a scale-out activity is triggered, the instances in the scaling group are added to the cluster to which the scaling group is bound.
    Key Pair The key pair that is used to log on to the nodes in the scaling group. You can create key pairs in the ECS console.
    Note You can log on to the nodes only by using key pairs.
    RDS Whitelist The ApsaraDB RDS instances that can be accessed by the nodes in the scaling group after a scaling activity is triggered.
    Node Label Node labels are automatically added to nodes that are added to the cluster by scale-out activities.
    Scaling Mode You can select Standard or Swift.
    • Standard: the standard mode. Auto scaling is implemented by creating and releasing ECS instances based on resource requests and usage.
    • Swift: the swift mode. Auto scaling is implemented by creating, stopping, and starting ECS instances. This mode accelerates scaling activities.
    Taints After you add taints to a node, ACK no longer schedules pods to the node.
  4. Click Confirm Order to create the scaling group.

Check the results

  1. On the Auto Scaling page, you can find the newly created scaling group below Regular Instance.
    Auto Scaling
  2. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
  3. In the left-side navigation pane of the details page, choose Workloads > Deployments.
  4. On the Deployments page, select the kube-system namespace. You can find the cluster-autoscaler component. This indicates that the scaling group is created.

FAQ

  • Why does the auto scaling component fail to add nodes after a scale-out activity is triggered?
    Check whether the following situations exist:
    • The instance types in the scaling group cannot fulfill the resource request from pods. By default, system components are installed for each node. Therefore, the requested pod resources must be less than the resource capacity of the instance type.
    • The RAM role does not have the permissions to manage the Kubernetes cluster. You must configure RAM roles for each Kubernetes cluster that is involved in the scale-out activity.
  • Why does the auto scaling component fail to remove nodes after a scale-in activity is triggered?
    Check whether the following situations exist:
    • The requested resource threshold of each pod is higher than the configured scale-in threshold.
    • Pods that belong to the kube-system namespace are running on the node.
    • A scheduling policy forces the pods to run on the current node. Therefore, the pods cannot be scheduled to other nodes.
    • PodDisruptionBudget is set for the pods on the node and the minimum value of PodDisruptionBudget is reached.

    For more information about FAQ, see open source component.

  • How does the system choose a scaling group for a scaling activity?

    When pods cannot be scheduled to nodes, the auto scaling component simulates the scheduling of the pods based on the configurations of scaling groups. The configurations include labels, taints, and instance specifications. If a scaling group meets the requirements, this scaling group is selected for the scale-out activity. If more than one scaling group meet the requirements, the system selects the scaling group that has the fewest idle resources after simulation.