Use ACK GOATScaler to implement node instant scaling - Container Service for Kubernetes

If your cluster is large, the cluster requires faster resource scaling, or you want to automatically scale resources across multiple instance types and zones, node auto scaling may not meet your requirements. In this scenario, we recommend that you use node instant scaling. A cluster is considered large if a node pool that has auto scaling enabled in the cluster contains more than 100 nodes or more than 20 node pools in the cluster have auto scaling enabled. The node instant scaling feature reduces the technical gap for developers, improves scaling efficiency, and reduces manpower for O&M.

Before you start

To better work with the node instant scaling feature, we recommend that you read Overview of node scaling and pay attention to the following items before you start:

How node instant scaling works

Benefits of node instant scaling

Use scenarios of node instant scaling

Usage notes for node instant scaling

Prerequisites

An ACK managed cluster or ACK dedicated cluster that runs Kubernetes 1.26 or later is created. For more information, see Create an ACK managed cluster, Create an ACK dedicated cluster, and Update an ACK cluster.
You are added to the whitelist for using node instant scaling. If you are not in the whitelist, submit a ticket and describe your business scenario in the ticket.

Note

If your node pool has auto scaling enabled and Scaling Mode is not set to Swift, the node instant scaling feature is compatible with the original semantics and behavior of the node pool configuration. In addition, this feature can be seamlessly enabled for all types of pods. If Scaling Mode is set to Swift, the node instant scaling feature is incompatible with the node pool.

Step 1: Enable node instant scaling

Before you use node instant scaling, you must enable auto scaling on the Node Pools page. When you configure node instant scaling, select ACK GOATScaler as the autoscaler to implement node auto scaling.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage and choose Nodes > Node Pools in the left-side navigation pane.
On the Node Pools page, click Enable next to Configure Auto Scaling.
If this is the first time you use auto scaling, follow the on-screen instructions to activate Auto Scaling and complete authorization. If you have already completed the authorization, skip this step.
View the procedure for activating Auto Scaling and completing authorization
Note
The following procedure for activating Auto Scaling is for reference only. Follow the actual instructions on the page when you activate Auto Scaling.
1. Click the hyperlink next to Auto Scaling in the panel that appears to go to the Auto Scaling page and follow the on-screen instructions to activate Auto Scaling.
2. On the Activated page, click Console to log on to the Auto Scaling console.
3. Click Go to Authorize to go to the Cloud Resource Access Authorization page and authorize Auto Scaling to access other cloud resources. Then, click Confirm Authorization Policy.
  Note
  If you use ACK dedicated clusters, follow the instructions to attach the AliyunCSManagedAutoScalerRolePolicy policy to each cluster.
  If you use an ACK managed cluster, make sure that the addon.aliyuncsmanagedautoscalerrole.token is stored in a Secret in the kube-system namespace. If the token does not exist, submit a ticket.
  If the token does not exist, ACK assumes the worker RAM role to use the relevant capabilities by default. You can refer to How do I attach the AliyunCSManagedAutoScalerRolePolicy policy to the worker RAM role? to grant the required permissions.

In the Configure Auto Scaling panel, set Scaling Component to GOATScaler, configure scaling parameters, and then click OK.

GOATScaler automatically triggers scale-out activities based on the actual scheduling status. You need only to configure scale-in conditions.

Parameter	Description
Scale-in Threshold	Specify the ratio of the resource request of a node to resource capacity of the node in a node pool that has node instant scaling enabled. A scale-in activity is performed only when the CPU and memory utilization of a node is lower than the Scale-in Threshold.
GPU Scale-in Threshold (unavailable in the current version)	The scale-in threshold for GPU-accelerated nodes. A scale-in activity is performed only when the CPU, memory, and GPU utilization of a node is lower than the Scale-in Threshold.
Defer Scale-in For	The interval between the time when the scale-in threshold is reached and the time when the scale-in activity (reduce the number of pods) starts. Unit: minutes. Default value: 10. Important The autoscaler performs scale-in activities only when Scale-in Threshold is configured and the Defer Scale-in For condition is met.
Cooldown	After the autoscaler performs a scale-out activity, the autoscaler waits a cooldown period before it can perform a scale-in activity. The autoscaler cannot perform scale-in activities within the cooldown period but can still check whether the nodes meet the scale-in conditions. After the cooldown period ends, if a node meets the scale-in conditions and the waiting period specified in the Defer Scale-in For parameter ends, the node is removed. For example, the Cooldown parameter is set to 10 minutes and the Defer Scale-in For parameter is set to 5 minutes. The autoscaler cannot perform scale-in activities within the 10-minute cooldown period after performing a scale-out activity. However, the autoscaler can still check whether the nodes meet the scale-in conditions within the cooldown period. When the cooldown period ends, the nodes that meet the scale-in conditions are removed after 5 minutes.

View advanced scale-in settings

Parameter	Description
Pod Termination Timeout	The maximum amount of time to wait for pods on a node to terminate during a scale-in activity. Unit: seconds.
Minimum Number of Replicated Pods	The minimum number of pods that are allowed in each ReplicaSet during node draining.
Evict DaemonSet Pods	After you enable this feature, DaemonSet pods are evicted during a scale-in activity.
Skip Nodes Hosting Kube-system Pods	After you enable this feature, the node that hosts the pods in the kube-system namespace is ignored during a scale-in activity. Note This feature does not take effect on mirror pods and DaemonSet pods.

Step 2: Create a node pool that has auto scaling enabled

The node instant scaling feature scales only nodes in node pools that have auto scaling enabled. Therefore, after you configure node instant scaling, you need to configure at least one node pool that has auto scaling enabled. You can create a node pool that has auto scaling enabled or enable auto scaling for an existing node pool. For more information, see Create a node pool and Modify a node pool.

Step 3: (Optional) Verify node instant scaling

After you complete the preceding configuration, you can use the node instant scaling feature. The node pool displays that auto scaling is enabled and ACK GOATScaler is installed in the cluster.

Auto scaling is enabled for the node pool

The Node Pools page shows that auto scaling is enabled for the node pool.

ACK GOATScaler is installed

On the Clusters page, click the name of the cluster that you want to manage and choose Operations > Add-ons in the left-side navigation pane.
On the Add-ons page, the ACK GOATScaler component displays Installed.

Introduction to key events related to node instant scaling

The node instant scaling feature involves the following key events. This helps you learn the internal status of node instant scaling when these events occur.

Event name	Event object	Description
ProvisionNode	pod	The node instant scaling feature triggers a pod scale-out activity.
ProvisionNodeFailed	pod	The node instant scaling feature fails to trigger a pod scale-out activity.
ResetPod	pod	The node instant scaling feature re-adds pods that meet the scale-out conditions and have triggered scale-out activities but are still in the Unschedulable state to the scale-out list.

Introduction to node instant scaling labels

The node instant scaling feature maintains the following labels. Do not manually modify these labels in case exceptions occur.

Node labels

Node label	Description
goatscaler.io/managed:true	Identifies nodes that are managed by the node instant scaling feature. The node instant scaling feature periodically checks whether nodes that have this label meet the scale-in conditions.
k8s.aliyun.com: true	Identifies nodes that are managed by the node instant scaling feature. The node instant scaling feature periodically checks whether nodes that have this label meet the scale-in conditions.
goatscaler.io/provision-task-id:{task-id}	Indicates the ID of a scale-out task run by the node instant scaling feature so that you can trace the source of the nodes that are added to the cluster.

Node taints

Node taint	Description
goatscaler.io/node-terminating	Nodes that have this taint are scaled in by the node instant scaling feature.

Pod annotations

Pod annotation	Description
goatscaler.io/provision-task-id	Indicates the ID of a scale-out task that is created by the node instant scaling feature for the current pod. The node instant scaling feature does not add another node for a pod that has this annotation and waits for the current node to launch.
goatscaler.io/reschedule-deadline	The deadline that the node instant scaling feature waits for a pod to be scheduled to a node. If a pod is still unschedulable after the deadline ends, the node instant scheduling feature re-adds the pod to the scale-out list.

What to do next

Update ACK GOATScaler

We recommend that you update ACK GOATScaler at your earliest convenience to use the latest features and optimizations. For more information, see Manage components.