Before you begin
To better use the node scaling solutions of ACK and choose the solution that best suits your business, we recommend that you read this topic before you enable the node scaling feature.
Before you read this topic, we recommend that you understand the terms of manual scaling, auto scaling, horizontal scaling, and vertical scaling. For more information, see Kubernetes official documentation.
How it works
Node scaling in Kubernetes differs from the traditional scaling model that is based on resource utilization thresholds. Typically, you need to fix the node scaling issue after you migrate your business from data centers or other orchestration systems to Kubernetes.
How are scaling thresholds determined?
The resource utilization of hot nodes in the cluster is usually higher than other nodes.
If resource scaling is triggered based on the average resource utilization of nodes in the cluster, the resource utilization of hot nodes is spread to other nodes. Consequently, resources cannot be scaled out promptly when the resource utilization of hot nodes exceeds the threshold.
If resource scaling is triggered based on the highest resource utilization, resource waste usually occurs. This adversely affects the entire cluster.
How are loads reduced after nodes are added?
In a Kubernetes cluster, pods are the smallest deployable units for applications. Pods are deployed on different nodes. When the resource utilization of a pod is high, even if the node or cluster that hosts the pod is scaled out, the number of pods deployed for the application or the resource limits of the pods remain unchanged. In this case, the loads of the node cannot be balanced to the newly added nodes.
How is node scaling triggered and performed?
If resource scaling is triggered based on resource utilization, pods with heavy resource requests and low resource usage may be evicted. When the cluster contains large numbers of the preceding pods, the schedulable resources in the cluster are exhausted. Consequently, some pods become unschedulable.
How are scale-out activities triggered?
The node scaling model listens to pods that fail to be scheduled to determine if a scale-out activity is needed. When pods fail to be scheduled due to insufficient resources, the node scaling model starts to simulate pod scheduling, selects a node pool with the auto scaling feature enabled and can provide required resources to host these pods, and adds nodes in the node pool to the cluster.
Note The scheduling simulation treats each node pool with the auto scaling feature enabled as an abstracted node. The instance types specified in the configuration of the node pool are abstracted into the CPU capacity, memory capacity, and GPU capacity of the node. In addition, the labels and taints of the node pool are mapped to the labels and taints of the node. The scheduler adds the abstracted node to the schedulable list during the scheduling simulation. When the scheduling conditions are met, the scheduler calculates the required number of nodes and adds nodes in the node pool to the cluster.
How are scale-in activities triggered?
The node scaling model scales in nodes only in node pools with the auto scaling feature enabled. It cannot manage static nodes, including nodes that are not in the node pool with the auto scaling feature enabled. The node scaling model matches each node against the scale-in conditions. When the resource utilization of a node is lower than the scale-in threshold, a scale-in activity is triggered. Then, the node scaling model attempts to simulate pod eviction on the node to check whether the node can be drained. Non-DaemonSet pods in the kube-system namespace and PodDisruptionBudget pods skip the node and choose other candidate nodes. The node is drained before it is removed. After pods on the node are evicted to other nodes, the node is removed.
How can I improve the success rate of auto scaling?
The success rate of auto scaling depends on the following factors:
Whether the scheduling conditions are met
After you create a node pool with the auto scaling feature enabled, you need to confirm the pod scheduling policy that suits the node pool. If you cannot confirm the policy, configure nodeSelector
to select the label of the node pool and perform a scheduling simulation.
Whether resources are sufficient
After the scheduling simulation is complete, the system automatically selects the node pool with the auto scaling feature enabled and adds nodes in the node pool to the cluster. However, the inventory of Elastic Compute Service (ECS) instance types specified in the node pool configuration affects the success rate of the scale-out activity. Therefore, we recommend that you specify multiple instance types across different zones to improve the success rate.
How can I accelerate auto scaling?
Method 1: Use the swift mode to accelerate auto scaling. After a node pool with the auto scaling feature enabled warms up by completing a scale-out activity and a scale-in activity, the node pool runs in swift mode. For more information, see Enable node auto scaling.
Method 2: Use a custom image based on Alibaba Cloud Linux 3 to improve the efficiency of resource delivery at the Infrastructure as a Service (IaaS) layer by 50%. For more information, see Create custom images.
Node scaling solution
The node scaling model scales resources at the resource layer. When the size of a cluster cannot meet pod scheduling requirements, this model automatically scales node resources to provide additional scheduling capacity. Node scaling is managed by the cluster-autoscaler component, which periodically polls and maintains the cluster state to identify conditions that meet scaling requirements, thereby automatically adjusting the number of cluster nodes.
Scaling speed and efficiency
A scaling activity requires 60 seconds in standard mode and 50 seconds in swift mode.
When the duration of a scaling activity reaches 1 minute, auto scaling encounters a performance bottleneck. The efficiency of auto scaling will fluctuate based on the number of node pools and scaling scenarios. For example, if the number of node pools exceeds 100, the duration of a scaling activity is increased to 100 to 150 seconds.
Using a polling model, and constrained by the dependency on cluster state maintenance, the minimum latency is 5 seconds.
Usage notes
Quotas and limits
You can add up to 200 custom route entries to a route table of a virtual private cloud (VPC). To increase the quota limit, log on to the Quota Center console and submit an application. For more information about the quotas of other resources and how to increase the quota limits, see the Dependent cloud service quotas section of the "Quotas and limits" topic.
We recommend that you properly configure the maximum number of nodes in a node pool with the auto scaling feature enabled. Make sure that the dependent resources and quotas are sufficient for the specified number of nodes, such as the VPC CIDR blocks and vSwitches. Otherwise, scale-out activities may fail. For more information about the maximum number of nodes supported by a node pool with the auto scaling feature enabled, see Enable node auto scaling. For more information about how to plan an ACK network, see Plan the network of an ACK cluster.
The node scaling feature does not support subscription nodes. If you want to create a node pool with the auto scaling feature enabled, do not set the billing method of the node pool to subscription. If you want to enable the auto scaling feature for an existing node pool, make sure that the node pool does not have subscription nodes.
Maintenance of dependent resources
If elastic IP addresses (EIPs) are associated with ECS nodes added by the node scaling feature, do not directly delete the ECS nodes in the ECS console. Otherwise, the EIPs cannot be automatically released.
Further reading
If you encounter issues when you use node scaling, refer to FAQs of node scaling for troubleshooting.
Expand to view the FAQs index of node scaling