ACK Edge Auto Scaling Architecture Explained - Container Service for Kubernetes

Container Service for Kubernetes (ACK) Edge clusters can manage different node resources, both online and offline, including Elastic Compute Service (ECS) nodes across regions, data center nodes, nodes from other cloud service providers, and server nodes located in various settings such as factories, retail outlets, motor vehicles, and ships. In scenarios where offline node resources are inadequate, the auto scaling feature can seamlessly expand cloud nodes for ACK Edge clusters, enhancing scheduling capabilities. This dynamic scaling conserves resources, thereby significantly saving costs.

Before you begin

To better use the node scaling solutions of ACK and choose the solution that best suits your business, we recommend that you read this topic before you enable the node scaling feature.

Before you read this topic, we recommend that you understand the terms of manual scaling, auto scaling, horizontal scaling, and vertical scaling. For more information, see Kubernetes official documentation.

How it works

Node scaling in Kubernetes differs from the traditional scaling model that is based on resource utilization thresholds. Typically, you need to fix the node scaling issue after you migrate your business from data centers or other orchestration systems to Kubernetes.

How are scaling thresholds determined?

The resource utilization of hot nodes in the cluster is usually higher than other nodes.

If resource scaling is triggered based on the average resource utilization of nodes in the cluster, the resource utilization of hot nodes is spread to other nodes. Consequently, resources cannot be scaled out promptly when the resource utilization of hot nodes exceeds the threshold.
If resource scaling is triggered based on the highest resource utilization, resource waste usually occurs. This adversely affects the entire cluster.

How are loads reduced after nodes are added?

In a Kubernetes cluster, pods are the smallest deployable units for applications. Pods are deployed on different nodes. When the resource utilization of a pod is high, even if the node or cluster that hosts the pod is scaled out, the number of pods deployed for the application or the resource limits of the pods remain unchanged. In this case, the loads of the node cannot be balanced to the newly added nodes.

How is node scaling triggered and performed?

If resource scaling is triggered based on resource utilization, pods with heavy resource requests and low resource usage may be evicted. When the cluster contains large numbers of the preceding pods, the schedulable resources in the cluster are exhausted. Consequently, some pods become unschedulable.

How are scale-out activities triggered?

The node scaling model listens to pods that fail to be scheduled to determine if a scale-out activity is needed. When pods fail to be scheduled due to insufficient resources, the node scaling model starts to simulate pod scheduling, selects a node pool with the auto scaling feature enabled and can provide required resources to host these pods, and adds nodes in the node pool to the cluster.

Note

The scheduling simulation treats each node pool with the auto scaling feature enabled as an abstracted node. The instance types specified in the configuration of the node pool are abstracted into the CPU capacity, memory capacity, and GPU capacity of the node. In addition, the labels and taints of the node pool are mapped to the labels and taints of the node. The scheduler adds the abstracted node to the schedulable list during the scheduling simulation. When the scheduling conditions are met, the scheduler calculates the required number of nodes and adds nodes in the node pool to the cluster.

How are scale-in activities triggered?

The node scaling model scales in nodes only in node pools with the auto scaling feature enabled. It cannot manage static nodes, including nodes that are not in the node pool with the auto scaling feature enabled. The node scaling model matches each node against the scale-in conditions. When the resource utilization of a node is lower than the scale-in threshold, a scale-in activity is triggered. Then, the node scaling model attempts to simulate pod eviction on the node to check whether the node can be drained. Non-DaemonSet pods in the kube-system namespace and PodDisruptionBudget pods skip the node and choose other candidate nodes. The node is drained before it is removed. After pods on the node are evicted to other nodes, the node is removed.

How can I improve the success rate of auto scaling?

The success rate of auto scaling depends on the following factors:

Whether the scheduling conditions are met
After you create a node pool with the auto scaling feature enabled, you need to confirm the pod scheduling policy that suits the node pool. If you cannot confirm the policy, configure nodeSelector to select the label of the node pool and perform a scheduling simulation.
Whether resources are sufficient
After the scheduling simulation is complete, the system automatically selects the node pool with the auto scaling feature enabled and adds nodes in the node pool to the cluster. However, the inventory of Elastic Compute Service (ECS) instance types specified in the node pool configuration affects the success rate of the scale-out activity. Therefore, we recommend that you specify multiple instance types across different zones to improve the success rate.

How can I accelerate auto scaling?

Method 1: Use the swift mode to accelerate auto scaling. After a node pool with the auto scaling feature enabled warms up by completing a scale-out activity and a scale-in activity, the node pool runs in swift mode. For more information, see Enable node auto scaling.
Method 2: Use a custom image based on Alibaba Cloud Linux 3 to improve the efficiency of resource delivery at the Infrastructure as a Service (IaaS) layer by 50%. For more information, see Create custom images.

Node scaling solution

The node scaling model scales resources at the resource layer. When the size of a cluster cannot meet pod scheduling requirements, this model automatically scales node resources to provide additional scheduling capacity. Node scaling is managed by the cluster-autoscaler component, which periodically polls and maintains the cluster state to identify conditions that meet scaling requirements, thereby automatically adjusting the number of cluster nodes.

Scaling speed and efficiency

A scaling activity requires 60 seconds in standard mode and 50 seconds in swift mode.
When the duration of a scaling activity reaches 1 minute, auto scaling encounters a performance bottleneck. The efficiency of auto scaling will fluctuate based on the number of node pools and scaling scenarios. For example, if the number of node pools exceeds 100, the duration of a scaling activity is increased to 100 to 150 seconds.
Using a polling model, and constrained by the dependency on cluster state maintenance, the minimum latency is 5 seconds.

Usage notes

Quotas and limits

You can add up to 200 custom route entries to a route table of a virtual private cloud (VPC). To increase the quota limit, log on to the Quota Center console and submit an application. For more information about the quotas of other resources and how to increase the quota limits, see the Dependent cloud service quotas section of the "Quotas and limits" topic.
We recommend that you properly configure the maximum number of nodes in a node pool with the auto scaling feature enabled. Make sure that the dependent resources and quotas are sufficient for the specified number of nodes, such as the VPC CIDR blocks and vSwitches. Otherwise, scale-out activities may fail. For more information about the maximum number of nodes supported by a node pool with the auto scaling feature enabled, see Enable node auto scaling. For more information about how to plan an ACK network, see Plan the network of an ACK cluster.
The node scaling feature does not support subscription nodes. If you want to create a node pool with the auto scaling feature enabled, do not set the billing method of the node pool to subscription. If you want to enable the auto scaling feature for an existing node pool, make sure that the node pool does not have subscription nodes.

Maintenance of dependent resources

If elastic IP addresses (EIPs) are associated with ECS nodes added by the node scaling feature, do not directly delete the ECS nodes in the ECS console. Otherwise, the EIPs cannot be automatically released.

Category	Subcategory	Issue
Scaling behavior of node auto scaling	Limits
	Scale-out behavior	What scheduling policies does cluster-autoscaler use to determine whether unschedulable pods can be scheduled to a node pool for which node auto scaling is enabled? What resources can cluster-autoscaler simulate? Why does cluster-autoscaler fail to add nodes after a scale-out activity is triggered? How does cluster-autoscaler evaluate the resource capacity of a scaling group that uses multiple types of instances? How do I choose between multiple node pools for which auto scaling is enabled when I perform a scaling activity? How do I add custom resources to node pools for which auto-scaling is enabled?
	Scale-in behavior	Why does cluster-autoscaler fail to remove nodes after a scale-in activity is triggered? How do I enable or disable pod eviction for a DaemonSet pod? What types of pods can prevent cluster-autoscaler from removing nodes?
	Extended support	Does cluster-autoscaler support CRD?
Custom scaling behavior	Use pods to manage scaling	How do I set a scale-out delay in cluster-autoscaler for unschedulable pods?
Custom scaling behavior	Use nodes to manage scaling	How do I prevent cluster-autoscaler from removing nodes? How do I use pod annotations to allow cluster-autoscaler to remove the node that hosts the pod or prevent cluster-autoscaler from removing the node that hosts the pod?
Questions related to cluster-autoscaler		How do I update cluster-autoscaler to the latest version? What operations can trigger the system to automatically update cluster-autoscaler? Why does node scaling still fail after I complete role authorization in the ACK managed cluster?

Container Service for Kubernetes:Node scaling overview