If your cluster's capacity planning cannot meet application pod scheduling requirements, you can use the node scaling feature to automatically scale node resources and increase scheduling capacity. ACK provides two elastic scaling solutions: node auto scaling and node instant scaling. Node instant scaling offers faster scaling, higher delivery efficiency, and a lower barrier to entry than node auto scaling.
Before you begin
This overview helps you understand the node scaling solutions provided by ACK and select the one that best fits your business needs before you enable the node scaling feature.
Before you read this topic, we recommend that you familiarize yourself with scaling concepts such as manual scaling, auto scaling, horizontal scaling, and vertical scaling by reading the official Kubernetes documentation.
How it works
In Kubernetes, node scaling works differently from the traditional model that is based on usage thresholds. Understanding this difference is important when you migrate from traditional data centers or other orchestration systems to a Kubernetes cluster.
The traditional elastic scaling model is based on usage. For example, if the CPU and memory usage of the nodes in a cluster exceed specific thresholds, the system scales out by adding new nodes. However, this model has the following issues.
To address these issues, ACK uses a two-layer elastic model: node scaling (resource layer) and workload scaling (scheduling layer). For example, the node scaling feature triggers changes to application replicas, which are scheduling units, based on resource usage. The following sections describe the technical details.
Elastic scaling solutions: node auto scaling and node instant scaling
Node scaling is an elastic scaling capability at the resource layer. It automatically scales node resources to increase scheduling capacity when the cluster's capacity cannot meet application pod scheduling requirements. ACK provides two node scaling solutions.
Introduction
Only one elastic scaling component can run in a cluster. The two elastic scaling solutions cannot be used together. To enable the node scaling feature, follow the standard procedure in Enable node auto scaling or Enable node instant scaling.
The elastic scaling performance data provided in this topic are theoretical values based on custom images that are optimized for elastic scaling. The actual performance may vary depending on your business environment. For more information about custom images, see Optimize elastic scaling with custom images.
Solution | Elastic scaling component | Description |
Solution 1: node auto scaling | cluster-autoscaler component | Periodically maintains and checks the cluster status using polling to find conditions that meet scale-out or scale-in requirements, and then automatically scales cluster nodes. |
Solution 2: node instant scaling | Node instant scaling component | An event-driven node scaling controller. It ensures better elastic resource delivery in scenarios such as large-scale clusters (for example, a node pool with auto scaling enabled has more than 100 nodes, or there are more than 20 such node pools) and consecutive scale-out activities. The scaling speed (the time from the first pod scheduling failure to a successful scheduling) is stable at 45s, the success rate can reach 99%, and resource fragmentation is reduced by about 30%. It also offers better extensibility for custom scaling policies. |
Solution comparison
If a node pool in your cluster has automatic elastic scaling enabled and its Scaling Mode is set to Non-swift Mode, node instant scaling is compatible with the semantics and behavior of the node auto scaling component. This allows for a seamless transition for all types of applications. This section describes the optimized features of node instant scaling compared with node auto scaling.
Enhanced feature | Node auto scaling | Node instant scaling |
Scaling speed and efficiency | For a single scaling activity, the scaling speed is approximately 60s in standard mode and 50s in swift mode. | Triggers scaling actions through an event-driven mechanism and uses Alibaba Cloud ContainerOS capabilities to accelerate elastic scaling. The scaling speed is approximately 45±10s. |
When the scaling time reaches 1 minute, the scaling speed encounters a bottleneck. The elastic scaling speed also shows significant jitter at different scales (multiple node pools) and in different scenarios (consecutive scaling). For example, when the number of node pools exceeds 100, the scaling speed decreases to a range of 100s to 150s. | Performance does not significantly degrade as the number of node pools and pods increases. This makes it more suitable for scenarios with high requirements for elastic delivery speed. | |
Uses a polling model and is limited by its dependency on cluster state maintenance. The minimum elastic scaling sensitivity is 5s. | Is event-driven and uses a responsive model. The elastic scaling sensitivity is 1s to 3s. | |
Resource delivery certainty | The inventory of cloud resources changes frequently. Due to issues such as complex instance type combinations and insufficient inventory, the elastic scaling success rate of node auto scaling is approximately 97%. | Supports an automatic inventory selection policy. It can filter out-of-stock instance types from thousands of Alibaba Cloud instance type combinations based on your configured filter conditions and order. It then selects the most suitable type for scale-out or compensates with a qualified type if the inventory is insufficient. This greatly reduces the burden on O&M engineers to select instance types and increases the delivery success rate to 99%. |
Supports scaling out the same instance type as configured in the node pool. If multiple types are configured, it selects the smallest instance type that meets the requirements for scale-out. | Supports scaling out different instance types. | |
When resource delivery fails, it retries periodically, which is a reactive approach. | When resource delivery fails, it supports an inventory alert feature to provide advance notice of potential risks associated with instance type combinations. | |
Use and O&M threshold | Compared to node auto scaling, node instant scaling has a lower barrier to entry. This is mainly reflected in the following aspects.
| |
Scheduling policy | In addition to all the scheduling features of node auto scaling, node instant scaling also supports the following features:
| |
Node instant scaling supports selecting the optimal Bin Packing and PreBind policies (custom features) based on the pod, which can reduce the scheduling fragmentation rate by up to 30%. | ||
Limits of node instant scaling
Understanding the limits of node instant scaling is an important part of evaluating the node instant scaling solution.
The swift mode is not supported.
A node pool cannot scale out more than 180 nodes in a single batch.
Disabling scale-in at the cluster level is not currently supported.
NoteTo disable scale-in at the node level, see How do I prevent a specific node from being scaled in by node instant scaling?.
Node instant scaling does not support checking the inventory of spot instances. For a node pool where the Billing Method is set to Spot Instance and Use On-Demand Instances To Supplement Spot Instance Capacity is enabled, on-demand instances may be scaled out even when the spot instance inventory is sufficient.
Suggestions on selecting a solution
Based on the preceding Solution comparison and Limits of node instant scaling, you can select the appropriate solution for your needs. If your business has relatively low requirements for scaling speed, resource delivery certainty, and O&M costs, and cannot accept the limits of node instant scaling, node auto scaling may be sufficient. Conversely, if you have the following business requirements, node instant scaling is the recommended solution.
The cluster is large. For example, if an auto-scaling-enabled node pool has more than 100 nodes, or if there are more than 20 such node pools, the scale-out efficiency of node auto scaling decreases significantly as the cluster size grows. In contrast, the performance of node instant scaling fluctuates less.
You have high requirements for resource delivery speed (elastic scaling speed). In a single scaling scenario, the elastic scaling speed of node auto scaling in standard mode is approximately 60s, while for node instant scaling it is approximately 45s.
Business workload batches are unpredictable, and you often need to perform consecutive scale-outs for the same elastic node pool. In consecutive scaling mode, the performance of node auto scaling decreases and shows significant jitter. In contrast, node instant scaling can still achieve a scaling speed of approximately 45s.
Notes
Quotas and limits
You can add up to 200 custom routes to a route table of a virtual private cloud (VPC). To increase this quota, go to Quota Center and submit a request. For more information about the quotas of other resources and how to increase them, see Quotas for underlying cloud dependencies.
We recommend that you properly configure the maximum number of nodes in an auto-scaling-enabled node pool. Make sure that dependent resources and quotas, such as VPC CIDR blocks and vSwitches, are sufficient for the specified number of nodes. Otherwise, scale-out activities may fail. For more information about how to configure the maximum number of nodes, see Configure the number of instances. For more information about network planning for ACK, see Network planning for an ACK managed cluster.
The node scaling feature does not support subscription nodes. When you create a new auto-scaling-enabled node pool, do not select subscription as the billing method. To enable auto scaling for an existing node pool, make sure that the node pool does not contain any subscription nodes.
The node scaling feature does not currently support Sidecar Containers. Deploy workloads that use Sidecar Containers to a node pool where auto scaling is not enabled.
Maintenance of dependent resources
If you bind EIPs to nodes, do not directly delete the ECS nodes that are scaled out by node scaling in the ECS console. Otherwise, the EIPs cannot be automatically released.
What to do next
Based on the solution introduction and comparison, you can choose to enable node auto scaling or enable node instant scaling.
If you encounter problems while using node scaling, see Node auto scaling FAQ for troubleshooting.