Container Service for Kubernetes (ACK) provides the auto scaling component (cluster-autoscaler) to automatically scale nodes. Regular instances, GPU-accelerated instances, and preemptible instances can be automatically added to or removed from an ACK cluster to meet your business requirements. This component supports multiple scaling modes, various instance types, and instances that are deployed across zones. This component is applicable to diverse scenarios.

How auto scaling works

The auto scaling model of Kubernetes is different from the traditional scaling model that is based on the resource usage threshold. Developers must understand the differences between the two scaling models before they migrate workloads from traditional data centers or other orchestration systems to Kubernetes. For example, developers migrate workloads from Swarm clusters to ACK clusters.

The traditional scaling model is based on resource usage. For example, if a cluster contains three nodes and the CPU utilization or memory usage of the nodes exceeds the scaling threshold, new nodes are automatically added to the cluster. However, you must consider the following issues when you use the traditional scaling model:

In a cluster, hot nodes may have high resource usage and other nodes may have low resource usage. If an average resource usage is specified as the threshold, auto scaling may not be triggered in a timely manner. If the lowest node resource usage is set as the scaling threshold, the newly added nodes may not be used. This may cause a waste of resources.

In Kubernetes, a pod is used as the smallest unit that runs an application on each node of a cluster. When auto scaling is triggered for a cluster or a node in the cluster, pods with high resource usage are not replicated and the resource limits of these pods are not changed. As a result, the loads cannot be balanced to newly added nodes.

If scale-in activities are triggered based on resource usage, pods that request large amounts of resources but have low resource usage may be evicted. If the number of these pods is large within a Kubernetes cluster, resources may be exhausted and some pods may fail to be scheduled.

How does the auto scaling model of Kubernetes fix these issues? Kubernetes provides a two-layer scaling model that decouples pod scheduling from resource scaling.

Pods are scaled based on resource usage. When pods enter the Pending state due to insufficient resources, a scale-out activity is triggered. After new nodes are added to the cluster, the pending pods are automatically scheduled to the newly added nodes. This way, loads of the application are balanced. The following section describes the auto scaling model of Kubernetes in detail:

cluster-autoscaler is used to trigger auto scaling by detecting pending pods. When pods enter the Pending state due to insufficient resources, cluster-autoscaler simulates pod scheduling to decide the scaling group that can provide new nodes to accept the pending pods. If a scaling group meets the requirement, nodes from this scaling group are added to the cluster.

A scaling group is treated as a node during the simulation. The instance type of the scaling group specifies the CPU, memory, and GPU resources of the node. The labels and taints of the scaling group are also applied to the node. The node is used to simulate the scheduling of the pending pods. If the pending pods can be scheduled to the node, cluster-autoscaler calculates the number of nodes that are required to be added from the scaling group.

Only nodes added by scale-out activities can be removed in scale-in activities. Static nodes cannot be managed by cluster-autoscaler. Each node is separately evaluated to determine whether the node needs to be removed. If the resource usage of a node drops below the scale-in threshold, a scale-in activity is triggered for the node. In this case, cluster-autoscaler simulates the eviction of all workloads on the node to determine whether the node can be completely drained. cluster-autoscaler does not drain the nodes that contain specific pods, such as non-DaemonSet pods in the kube-system namespace and pods that are controlled by PodDisruptionBudgets (PDBs). A node is drained before it is removed. After pods on the node are evicted to other nodes, the node can be removed.

Each scaling group is regarded as an abstract node. cluster-autoscaler selects a scaling group for auto scaling based on a policy similar to the scheduling policy. Nodes are first filtered by the scheduling policy. Among the filtered nodes, the nodes that conform to policies, such as affinity settings, are selected. If no scheduling policy or affinity settings are configured, cluster-autoscaler selects a scaling group based on the least-waste policy. The least-waste policy selects the scaling group that has the fewest idle resources after simulation. If a scaling group of regular nodes and a scaling group of GPU-accelerated nodes both meet the requirements, the scaling group of regular nodes is selected by default.

The result of auto scaling is dependent on the following factors:

  • Whether the scheduling policy is met

    After you configure a scaling group, you must be aware of the pod scheduling policies that the scaling group supports. If you are unaware of the pod scheduling policies, you can simulate a scaling activity by using the node selectors of pending pods and the labels of the scaling group.

  • Whether resources are sufficient

    After the scaling simulation is complete, a scaling group is selected. However, the scaling activity fails if the specified types of Elastic Compute Service (ECS) instances in the scaling group are out of stock. Therefore, you can configure multiple instance types and multiple zones for the scaling group to improve the success rate of auto scaling.

  • Method 1: Enable the swift mode to accelerate auto scaling. After a scaling group experiences a scale-out activity and a scale-in activity, the swift mode is enabled for this scaling group.
  • Method 2: Use custom images that are created from the base image of Alibaba Cloud Linux 2 (formerly known as Aliyun Linux 2). This ensures that the resources of Infrastructure as a Service (IaaS) are delivered 50% faster.

Considerations

  • For each account, the default CPU quota for pay-as-you-go instances in each region is 50 vCPUs. You can add at most 200 custom route entries to each route table of a virtual private cloud (VPC). To apply for a quota increase, submit a ticket.
  • The stock of ECS instances may be insufficient for auto scaling if you specify only one ECS instance type for a scaling group. We recommend that you specify multiple ECS instance types with the same specification for a scaling group. This increases the success rate of auto scaling.
  • In swift mode, when a node is shut down and reclaimed, the node stops running and enters the NotReady state. When a scale-out activity is triggered, the state of the node changes to Ready.
  • If a node is shut down and reclaimed in swift mode, you are charged only for the disks. This rule does not apply to nodes that use local disks, such as the instance type of ecs.d1ne.2xlarge, for which you are also charged a computing fee. If the stock of nodes is sufficient, nodes can be launched within a short period of time.
  • If elastic IP addresses (EIPs) are bound to pods, we recommend that you do not delete the ECS nodes that are added from the scaling group in the ECS console. Otherwise, these EIPs cannot be automatically released.
  • Auto Scaling can recognize node labels and taints only after they are mapped to scaling group tags. The number of tags that can be added to a scaling group is also limited. Make sure that the total number of ECS labels, taints, and node labels configured for a node pool that has the auto scaling feature enabled is smaller than 12.

Step 1: Configure auto scaling

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.
  2. On the Clusters page, find the cluster that you want to manage and choose More > Auto Scaling in the Actions column.

Step 2: Perform authorization

You must perform authorization in the following scenarios:

The cluster has limited permissions on nodes

If ACK has limited permissions on nodes in the cluster, assign the AliyunCSManagedAutoScalerRole Resource Access Management (RAM) role to ACK.
Note You need to perform the authorization only once for each Alibaba Cloud account.
  1. Activate Auto Scaling.
    Note The following steps are for reference only. Follow the instructions on the page.
    1. In the dialog box that appears, click the first hyperlink to log on to the Auto Scaling console.
    2. Click Activate Auto Scaling to go to the Enable Service page.
    3. Select the I agree with Auto Scaling Agreement of Service check box and click Enable Now.
    4. On the Activated page, click Console to log on to the Auto Scaling console.
    5. Click Go to Authorize to go to the Cloud Resource Access Authorization page. Then, authorize Auto Scaling to access other cloud resources.
    6. Click Confirm Authorization Policy.
  2. Assign a RAM role to ACK.
    1. Click the second hyperlink in the dialog box.
      Note This step requires you to log on to the console with an Alibaba Cloud account.
    2. On the Cloud Resource Access Authorization page, click Confirm Authorization Policy.

The cluster has unlimited permissions on nodes

  1. Activate Auto Scaling.
    Note The following steps are for reference only. Follow the actual instructions on the page.
    1. In the dialog box that appears, click the first hyperlink to log on to the Auto Scaling console. autoess
    2. Click Activate Auto Scaling to go to the Enable Service page.
    3. Select the I agree with Auto Scaling Agreement of Service check box and click Enable Now.
    4. On the Activated page, click Console to log on to the Auto Scaling console.
    5. Click Go to Authorize to go to the Cloud Resource Access Authorization page. Then, authorize Auto Scaling to access other cloud resources.
    6. Click Confirm Authorization Policy.
    If the authorization is successful, you are redirected to the Auto Scaling console. Close the page and modify the permissions of the worker RAM role.
  2. Modify the policy of the worker RAM role.
    1. Click the second hyperlink in the dialog box to go to the RAM Roles page. aliyunrole
      Note This step requires you to log on to the console with an Alibaba Cloud account.
    2. On the Permissions tab, click the name of the policy assigned to the RAM role. The details page of the policy appears. permission
    3. Click Modify Policy Document. The Modify Policy Document panel appears on the right side of the page.
      policycontent
    4. In the Policy Document section, add the following policy content to the Action field and click OK.
      Note Before you add the content, add a comma (,) to the end of the bottom line in any Action field.
      • ACK managed clusters:
        "ess:Describe*", 
        "ess:CreateScalingRule", 
        "ess:ModifyScalingGroup", 
        "ess:RemoveInstances", 
        "ess:ExecuteScalingRule", 
        "ess:ModifyScalingRule", 
        "ess:DeleteScalingRule", 
        "ecs:DescribeInstanceTypes",
        "ecs:DescribeImages",
        "ess:DetachInstances",
        "vpc:DescribeVSwitches"
      • ACK dedicated clusters:
        "ess:Describe*", 
        "ess:CreateScalingRule", 
        "ess:ModifyScalingGroup", 
        "ess:RemoveInstances", 
        "ess:ExecuteScalingRule", 
        "ess:ModifyScalingRule", 
        "ess:DeleteScalingRule", 
        "ecs:DescribeInstanceTypes",
        "ecs:DescribeImages",
        "ess:DetachInstances",
        "ess:ScaleWithAdjustment",
        "vpc:DescribeVSwitches"

Clusters whose scaling groups must be associated with EIPs

If you want to associate a scaling group with an EIP, perform the following steps to grant permissions:

  1. Activate Auto Scaling.
    Note The following steps are for reference only. Follow the actual instructions on the page.
    1. In the dialog box that appears, click the first hyperlink to log on to the Auto Scaling console. autoess
    2. Click Activate Auto Scaling to go to the Enable Service page.
    3. Select the I agree with Auto Scaling Agreement of Service check box and click Enable Now.
    4. On the Activated page, click Console to log on to the Auto Scaling console.
    5. Click Go to Authorize to go to the Cloud Resource Access Authorization page. Then, authorize Auto Scaling to access other cloud resources.
    6. Click Confirm Authorization Policy.
    If the authorization is successful, you are redirected to the Auto Scaling console. Close the page and modify the permissions of the worker RAM role.
  2. Modify the policy of the worker RAM role.
    1. Click the second hyperlink in the dialog box to go to the RAM Roles page. aliyunrole
      Note This step requires you to log on to the console with an Alibaba Cloud account.
    2. On the Permissions tab, click the name of the policy assigned to the RAM role. The details page of the policy appears. permission
    3. Click Modify Policy Document. The Modify Policy Document panel appears on the right side of the page.
      policy
    4. In the Policy Document section, add the following policy content to the Action field and click OK.
      "ecs:AllocateEipAddress",
      "ecs:AssociateEipAddress",
      "ecs:DescribeEipAddresses",
      "ecs:DescribeInstanceTypes",
      "ecs:DescribeInvocationResults",
      "ecs:DescribeInvocations",
      "ecs:ReleaseEipAddress",
      "ecs:RunCommand",
      "ecs:UnassociateEipAddress",
      "ess:CompleteLifecycleAction",
      "ess:CreateScalingRule",
      "ess:DeleteScalingRule",
      "ess:Describe*",
      "ess:DetachInstances",
      "ess:ExecuteScalingRule",
      "ess:ModifyScalingGroup",
      "ess:ModifyScalingRule",
      "ess:RemoveInstances",
      "vpc:AllocateEipAddress",
      "vpc:AssociateEipAddress",
      "vpc:DescribeEipAddresses",
      "vpc:DescribeVSwitches",
      "vpc:ReleaseEipAddress",
      "vpc:UnassociateEipAddress",
      "vpc:TagResources"
      Note Before you add the policy content, add a comma (,) to the end of the bottom line in the Action field.
    5. On the Roles page, click the name of the worker RAM role. On the details page of the RAM role, click the Trust Policy Management tab and click Edit Trust Policy. In the Edit Trust Policy panel, add oos.aliyuncs.com to the Service field, as shown in the following figure. Then, click OK.
      oos

Step 3: Configure auto scaling

  1. On the Configure Auto Scaling page, set the following parameters and click Submit.
    ParameterDescriptionRemarks
    Allow Scale-inSpecify whether to allow the scale-in of nodes. If you turn off this option, scale-in configurations do not take effect. Proceed with caution. -
    Scale-in ThresholdFor a scaling group that is managed by cluster-autoscaler, set the value to the ratio of the requested resources per node to the total resources per node. The node is removed from the cluster only if the actual value is lower than the threshold.
    Note In auto scaling, a scale-out activity is automatically triggered based on node scheduling. Therefore, you need to set only scale-in parameters.
    • For nodes other than GPU-accelerated nodes, the following conditions are required for triggering scale-in activities. Nodes are removed only if all of the following conditions are met. No scale-in activity is triggered if any of the conditions is not met.
      • The ratio of the requested resources per node to the total resources per node in the scaling group managed by cluster-autoscaler is lower than the value of the Scale-in Threshold parameter.
      • The waiting period specified in the Defer Scale-in For parameter ends.
      • The amount of time that the system waits after performing a scale-out activity exceeds the value specified in the Cooldown parameter.
    • For GPU-accelerated nodes, the following conditions are required for triggering scale-in activities. Nodes are removed only if all of the following conditions are met. No scale-in activity is triggered if any of the conditions is not met.
      • The ratio of the requested resources per node to the total resources per node in the scaling group managed by cluster-autoscaler is lower than the value of the GPU Scale-in Threshold parameter.
      • The waiting period specified in the Defer Scale-in For parameter ends.
      • The amount of time that the system waits after performing a scale-out activity exceeds the value specified in the Cooldown parameter.
    GPU Scale-in ThresholdThe scale-in threshold for GPU-accelerated nodes. GPU-accelerated nodes can be removed from the Kubernetes cluster only if the actual value is lower than the threshold.
    Defer Scale-in ForThe time to wait after the scale-in threshold is reached and before the scale-in activity starts. Unit: minutes. The default value is 10 minutes.
    CooldownAfter the system performs a scale-out activity, the system waits for a cooldown period to end before it can perform scale-in activities. The system cannot perform scale-in activities within the cooldown period but can still check whether the nodes meet the scale-in conditions. After the cooldown period ends, if a node meets the scale-in conditions and the waiting period specified in the Defer Scale-in For parameter ends, the node is removed.

    For example, the Cooldown parameter is set to 10 minutes and the Defer Scale-in For parameter is set to 5 minutes. The system cannot perform scale-in activities within the 10-minute cooldown period after performing a scale-out activity. However, the system can still check whether the nodes meet the scale-in conditions within the cooldown period. When the cooldown period ends, nodes that meet the scale-in conditions are removed after 5 minutes.

    Scan IntervalYou can set this parameter to configure the interval at which the cluster is evaluated for auto scaling. -
    Node Pool Scale-out Policy
    least-wasteThe default selection policy. If multiple node pools meet the requirement, this policy selects the node pool that will have the least idle resources after the scale-out activity is completed. -
    randomThe random selection policy. If multiple node pools meet the requirement, this policy selects a random node pool for the scale-out activity. -
    priorityThe priority-based selection policy. If multiple node pools meet the requirement, this policy selects the node pool with the highest priority for the scale-out activity.

    The priorities of node pools are configured in the cluster-autoscaler-priority-expander ConfigMap in the kube-system namespace. When a scale-out activity is triggered, the policy obtains the node pool priority from the ConfigMap by node pool ID and then selects the node pool with the highest priority for the scale-out activity.

    Note
    • The priority must be an integer from 1 to 100.
    • If you use the priority-based selection policy, you must configure the priorities for the node pools that have auto scaling enabled. You can specify only one priority value for a node pool ID.
    • Node pools that meet the requirement but are not configured with priorities in the ConfigMap are not selected for the scale-out activity. Therefore, if you use the priority-based selection policy, you must configure priorities for all node pools that have auto scaling enabled in the cluster-autoscaler-priority-expander ConfigMap.
    The following code block is an example:
    kind: ConfigMap
    metadata:
      name: cluster-autoscaler-priority-expander
    data:
      priorities: |-
        10:
          - asg-1largeid     # Set the priority of asg-1largeid to 10. 
          - asg-2largeid     # Set the priority of asg-2largeid to 10. 
        50:
          - asg-3largeid     # Set the priority of asg-3largeid to 50. 

    For example, asg-2largeid and asg-3largeid meet the requirement. In this case, asg-3largeid has a higher priority and is selected for the scale-out activity.

    -
  2. Select an instance type. Supported instance types are regular instances, GPU-accelerated instances, and preemptible instances. Then, click Create Node Pool.
  3. In the Create Node Pool dialog box, set the node pool parameters.
    For more information about the parameters, see Create an ACK managed cluster. The following table describes some of the parameters.
    ParameterDescription
    RegionThe region where you want to deploy the scaling group. The scaling group and the Kubernetes cluster must be deployed in the same region. You cannot change the region after the scaling group is created.
    VPCThe scaling group and the Kubernetes cluster must be deployed in the same VPC.
    vSwitchThe vSwitches of the scaling group. You can specify vSwitches of different zones. The vSwitches allocate pod CIDR blocks to the scaling group.
    Auto ScalingSelect the node type based on your requirements. You can select Regular Instance, GPU Instance, Shared GPU Instance, or Preemptible Instance. The selected node type must be the same as the node type that you select when you create the cluster.
    Instance TypeThe instance types that are used by the scaling group.
    Selected TypesThe instance types that you have selected. You can select at most 10 instance types.
    System DiskThe system disk of the scaling group.
    Mount Data DiskSpecify whether to mount data disks to the scaling group. By default, no data disk is mounted.
    InstancesThe number of instances contained in the scaling group.
    Note
    • Existing instances in the cluster are excluded.
    • By default, the minimum number of instances is 0. If you specify one or more instances, the system adds the instances to the scaling group. When a scale-out activity is triggered, the instances in the scaling group are added to the cluster to which the scaling group is associated.
    Operating SystemWhen you enable auto scaling, you can select an image based on Alibaba Cloud Linux, CentOS, Windows, or Windows Core.
    Note If you select an image based on Windows or Windows Core, the system automatically adds the taint { effect: 'NoSchedule', key: 'os', value: 'windows' } to nodes in the scaling group.
    Key PairThe key pair that is used to log on to the nodes in the scaling group. You can create key pairs in the ECS console.
    Note You can log on to the nodes only by using key pairs.
    RDS WhitelistThe ApsaraDB RDS instances that can be accessed by the nodes in the scaling group after a scaling activity is triggered.
    Node LabelNode labels are automatically added to nodes that are added to the cluster by scale-out activities.
    Scaling Policy
    • Priority: The system scales the node pool based on the priorities of the vSwitches that you select for the node pool. The vSwitches that you select are displayed in descending order of priority. If Auto Scaling fails to create ECS instances in the zone of the vSwitch with the highest priority, Auto Scaling attempts to create ECS instances in the zone of the vSwitch with a lower priority.
    • Cost Optimization: The system creates instances based on the vCPU unit prices in ascending order. Preemptible instances are preferentially created when multiple preemptible instance types are specified in the scaling configurations. If preemptible instances cannot be created due to reasons such as insufficient stocks, the system attempts to create pay-as-you-go instances.

      If you select Preemptible Instance for the Billing Method parameter, you must set the following parameters:

      • Percentage of Pay-as-you-go Instances: Specify the percentage of pay-as-you-go instances in the node pool. Valid values: 0 to 100.
      • Enable Supplemental Preemptible Instances: After you enable this feature, Auto Scaling automatically creates the same number of preemptible instances 5 minutes before the system reclaims the existing preemptible instances. The system sends a notification to Auto Scaling 5 minutes before it reclaims preemptible instances.
      • Enable Supplemental Pay-as-you-go Instances: After you enable this feature, Auto Scaling attempts to create pay-as-you-go ECS instances to meet the scaling requirement if Auto Scaling fails to create preemptible instances for reasons such as that the unit price is too high or preemptible instances are out of stock.
    • Distribution Balancing: The even distribution policy takes effect only when you select multiple vSwitches. This policy ensures that ECS instances are evenly distributed among the zones (the vSwitches) of the scaling group. If ECS instances are unevenly distributed across the zones due to reasons such as insufficient stocks, you can perform a rebalancing operation.
    Scaling ModeYou can select Standard or Swift.
    • Standard: the standard mode. Auto scaling is implemented by creating and releasing ECS instances based on resource requests and usage.
    • Swift: the swift mode. Auto scaling is implemented by creating, stopping, and starting ECS instances. This mode accelerates scaling activities.
      Note If a stopped ECS instance fails to be restarted in swift mode, the ECS instance is not released. You can manually release the ECS instance.
    TaintsAfter you add taints to a node, ACK no longer schedules pods to the node.
  4. Click Confirm Order to create the scaling group.

Expected result

  1. On the Configure Auto Scaling page, you can find the newly created scaling group below Regular Instance.
    Auto scaling configuration
  2. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
  3. In the left-side navigation pane of the details page, choose Workloads > Deployments.
  4. On the Deployments page, select the kube-system namespace. You can find the cluster-autoscaler component. This indicates that the scaling group is created.

FAQ

  • Why does the auto scaling component fail to add nodes after a scale-out activity is triggered?
    Check whether the following situations exist:
    • The instance types in the scaling group cannot fulfill the resource request from pods. Some resources provided by the specified ECS instance type are reserved or occupied for the following purposes:
    • Cross-zone scale-out activities cannot be triggered for pods that have limits on zones.
    • The RAM role does not have the permissions to manage the Kubernetes cluster. You must configure RAM roles for each Kubernetes cluster that is involved in the scale-out activity. For more information about the authorization, see Step 2: Perform authorization.
  • Why does the auto scaling component fail to remove nodes after a scale-in activity is triggered?
    Check whether the following situations exist:
    • The requested resource threshold of each pod is higher than the specified scale-in threshold.
    • Pods that belong to the kube-system namespace are running on the node.
    • A scheduling policy forces the pods to run on the current node. Therefore, the pods cannot be scheduled to other nodes.
    • PodDisruptionBudget is set for the pods on the node and the minimum value of PodDisruptionBudget is reached.

    For more information about FAQ, see open source component.

  • How does the system choose a scaling group for a scaling activity?

    When pods cannot be scheduled to nodes, the auto scaling component simulates the scheduling of the pods based on the configurations of scaling groups. The configurations include labels, taints, and instance specifications. If a scaling group meets the requirements, this scaling group is selected for the scale-out activity. If more than one scaling group meet the requirements, the system selects the scaling group that has the fewest idle resources after simulation.

  • What types of pods can prevent cluster-autoscaler from removing nodes?