Container Service for Kubernetes (ACK) provides the auto scaling component (cluster-autoscaler) to automatically scale nodes. Regular instances, GPU-accelerated instances, and preemptible instances can be automatically added to or removed from an ACK cluster to meet your business requirements. This component supports multiple scaling modes, various instance types, and instances that are deployed across zones. This component is applicable to diverse scenarios.
How auto scaling works
The auto scaling model of Kubernetes is different from the traditional scaling model that is based on the resource usage threshold. Developers must understand the differences between the two scaling models before they migrate workloads from traditional data centers or other orchestration systems to Kubernetes. For example, developers migrate workloads from Swarm clusters to ACK clusters.
The traditional scaling model is based on resource usage. For example, if a cluster contains three nodes and the CPU utilization or memory usage of the nodes exceeds the scaling threshold, new nodes are automatically added to the cluster. However, you must consider the following issues when you use the traditional scaling model:
In a cluster, hot nodes may have high resource usage and other nodes may have low resource usage. If an average resource usage is specified as the threshold, auto scaling may not be triggered in a timely manner. If the lowest node resource usage is set as the scaling threshold, the newly added nodes may not be used. This may cause a waste of resources.
In Kubernetes, a pod is used as the smallest unit that runs an application on each node of a cluster. When auto scaling is triggered for a cluster or a node in the cluster, pods with high resource usage are not replicated and the resource limits of these pods are not changed. As a result, the loads cannot be balanced to newly added nodes.
If scale-in activities are triggered based on resource usage, pods that request large amounts of resources but have low resource usage may be evicted. If the number of these pods is large within a Kubernetes cluster, resources may be exhausted and some pods may fail to be scheduled.
How does the auto scaling model of Kubernetes fix these issues? Kubernetes provides a two-layer scaling model that decouples pod scheduling from resource scaling.
Pods are scaled based on resource usage. When pods enter the Pending state due to insufficient resources, a scale-out activity is triggered. After new nodes are added to the cluster, the pending pods are automatically scheduled to the newly added nodes. This way, loads of the application are balanced. The following section describes the auto scaling model of Kubernetes in detail:
cluster-autoscaler is used to trigger auto scaling by detecting pending pods. When pods enter the Pending state due to insufficient resources, cluster-autoscaler simulates pod scheduling to decide the scaling group that can provide new nodes to accept the pending pods. If a scaling group meets the requirement, nodes from this scaling group are added to the cluster.
A scaling group is treated as a node during the simulation. The instance type of the scaling group specifies the CPU, memory, and GPU resources of the node. The labels and taints of the scaling group are also applied to the node. The node is used to simulate the scheduling of the pending pods. If the pending pods can be scheduled to the node, cluster-autoscaler calculates the number of nodes that are required to be added from the scaling group.
Only nodes added by scale-out activities can be removed in scale-in activities. Static nodes cannot be managed by cluster-autoscaler. Each node is separately evaluated to determine whether the node needs to be removed. If the resource usage of a node drops below the scale-in threshold, a scale-in activity is triggered for the node. In this case, cluster-autoscaler simulates the eviction of all workloads on the node to determine whether the node can be completely drained. cluster-autoscaler does not drain the nodes that contain specific pods, such as non-DaemonSet pods in the kube-system namespace and pods that are controlled by PodDisruptionBudgets (PDBs). A node is drained before it is removed. After pods on the node are evicted to other nodes, the node can be removed.
Each scaling group is regarded as an abstract node. cluster-autoscaler selects a scaling group for auto scaling based on a policy similar to the scheduling policy. Nodes are first filtered by the scheduling policy. Among the filtered nodes, the nodes that conform to policies, such as affinity settings, are selected. If no scheduling policy or affinity settings are configured, cluster-autoscaler selects a scaling group based on the least-waste policy. The least-waste policy selects the scaling group that has the fewest idle resources after simulation. If a scaling group of regular nodes and a scaling group of GPU-accelerated nodes both meet the requirements, the scaling group of regular nodes is selected by default.
The result of auto scaling is dependent on the following factors:
- Whether the scheduling policy is met
After you configure a scaling group, you must be aware of the pod scheduling policies that the scaling group supports. If you are unaware of the pod scheduling policies, you can simulate a scaling activity by using the node selectors of pending pods and the labels of the scaling group.
- Whether resources are sufficient
After the scaling simulation is complete, a scaling group is selected. However, the scaling activity fails if the specified types of Elastic Compute Service (ECS) instances in the scaling group are out of stock. Therefore, you can configure multiple instance types and multiple zones for the scaling group to improve the success rate of auto scaling.
- Method 1: Enable the swift mode to accelerate auto scaling. After a scaling group experiences a scale-out activity and a scale-in activity, the swift mode is enabled for this scaling group.
- Method 2: Use custom images that are created from the base image of Alibaba Cloud Linux 2 (formerly known as Aliyun Linux 2). This ensures that the resources of Infrastructure as a Service (IaaS) are delivered 50% faster.
Considerations
- For each account, the default CPU quota for pay-as-you-go instances in each region is 50 vCPUs. You can add at most 48 custom route entries to each route table of a virtual private cloud (VPC). To increase the quota, submit a ticket.
- The stock of ECS instances may be insufficient for auto scaling if you specify only one ECS instance type for a scaling group. We recommend that you specify multiple ECS instance types with the same specification for a scaling group. This increases the success rate of auto scaling.
- In swift mode, when a node is shut down and reclaimed, the node stops running and enters the NotReady state. When a scale-out activity is triggered, the state of the node changes to Ready.
- If a node is shut down and reclaimed in swift mode, you are charged only for the disks. This rule does not apply to nodes that use local disks, such as the instance type of ecs.d1ne.2xlarge, for which you are also charged a computing fee. If the stock of nodes is sufficient, nodes can be launched within a short period of time.
- If elastic IP addresses (EIPs) are bound to pods, we recommend that you do not delete the scaling group or the ECS nodes that are added from the scaling group in the ECS console. Otherwise, these EIPs cannot be automatically released.
Step 1: Configure auto scaling
Step 2: Perform authorization
You must perform authorization in the following scenarios:
The cluster has limited permissions on nodes
- Activate Auto Scaling.
- In the dialog box that appears, click the first hyperlink to log on to the Auto Scaling console.
- Click Activate Auto Scaling to go to the Enable Service page.
- Select the I agree with Auto Scaling Agreement of Service check box and click Enable Now.
- On the Activated page, click Console to log on to the Auto Scaling console.
- Click Go to Authorize to go to the Cloud Resource Access Authorization page. Then, authorize Auto Scaling to access other cloud resources.
- Click Confirm Authorization Policy.
- In the dialog box that appears, click the first hyperlink to log on to the Auto Scaling console.
- Assign a RAM role to ACK.
The cluster has unlimited permissions on nodes
The cluster requires to associate an auto-scaling node pool with an EIP
If you want to associate a scaling group with an EIP, perform the following steps to grant permissions:
Step 3: Configure auto scaling
Expected result
FAQ
- Why does the auto scaling component fail to add nodes after a scale-out activity is
triggered?
Check whether the following situations exist:
- The instance types in the scaling group cannot fulfill the resource request from pods. By default, system components are installed for each node. Therefore, the requested pod resources must be less than the resource capacity of the instance type.
- The RAM role does not have the permissions to manage the Kubernetes cluster. You must configure RAM roles for each Kubernetes cluster that is involved in the scale-out activity.
- Why does the auto scaling component fail to remove nodes after a scale-in activity
is triggered?
Check whether the following situations exist:
- The requested resource threshold of each pod is higher than the configured scale-in threshold.
- Pods that belong to the kube-system namespace are running on the node.
- A scheduling policy forces the pods to run on the current node. Therefore, the pods cannot be scheduled to other nodes.
- PodDisruptionBudget is set for the pods on the node and the minimum value of PodDisruptionBudget is reached.
For more information about FAQ, see open source component.
- How does the system choose a scaling group for a scaling activity?
When pods cannot be scheduled to nodes, the auto scaling component simulates the scheduling of the pods based on the configurations of scaling groups. The configurations include labels, taints, and instance specifications. If a scaling group meets the requirements, this scaling group is selected for the scale-out activity. If more than one scaling group meet the requirements, the system selects the scaling group that has the fewest idle resources after simulation.