This topic describes common issues and solutions when you use the node instant scaling feature.
Index
Category | Subcategory | Jump link |
Scaling behavior of node instant scaling | ||
Custom scaling behavior | ||
Known limitations
Feature limitations
The swift mode is not supported.
A node pool cannot scale out more than 180 nodes in a single batch.
Disabling scale-in at the cluster level is not currently supported.
NoteTo disable scale-in at the node level, see How do I prevent a specific node from being scaled in by node instant scaling?.
Node instant scaling does not support checking the inventory of spot instances. For a node pool where the Billing Method is set to Spot Instance and Use On-Demand Instances To Supplement Spot Instance Capacity is enabled, on-demand instances may be scaled out even when the spot instance inventory is sufficient.
Inaccurate node resource estimation
The underlying system of an ECS instance consumes some resources. This means the available memory of an instance is less than the amount defined in its instance type. For more information, see Why is the memory size of a purchased instance different from the memory size defined in its instance type?. As a result, the schedulable resources of a node estimated by the node instant scaling component may be greater than the actual schedulable resources. The estimation is not 100% accurate. Note the following points when you configure pod requests.
When you configure pod requests, the total requested resources, including CPU, memory, and disk, must be less than the instance type specifications. The total requested resources should not exceed 70% of the node's resources.
When the node instant scaling component checks whether a node has sufficient resources, it considers only Kubernetes pod resources, such as pending pods and DaemonSet pods. If static pods that are not managed by a DaemonSet exist on the node, you must reserve resources for these pods in advance.
If a pod requests a large amount of resources, for example, more than 70% of a node's resources, you must test and confirm in advance that the pod can be scheduled to a node of the same instance type.
Limited simulatable resource types
The node instant scaling component supports only a limited number of resource types for simulating and determining whether to perform scaling operations. For more information, see What resource types can node instant scaling simulate?.
Scale-out behavior
What resource types can node instant scaling simulate?
The following resource types are supported for simulating and determining scaling behavior.
cpu
memory
ephemeral-storage
aliyun.com/gpu-mem # Only shared GPUs are supported.
nvidia.com/gpuDoes node instant scaling support scaling out nodes of a suitable instance type from a node pool based on pod resource requests?
Yes, it does. For example, you configure two instance types, 4-core 8 GB and 12-core 48 GB, for a node pool with Auto Scaling enabled. A pod requests 2 CPU cores. When node instant scaling performs a scale-out, it prioritizes scheduling the pod to a 4-core 8 GB node. If the 4-core 8 GB instance type is later upgraded to 8-core 16 GB, node instant scaling automatically runs the pod on an 8-core 16 GB node.
If a node pool has multiple instance types, how does node instant scaling select one by default?
Based on the instance types configured in the node pool, node instant scaling periodically excludes instance types with insufficient inventory. It then sorts the remaining types by the number of CPU cores and checks each one to see if it meets the resource requests of unschedulable pods. Once an instance type meets the requirements, node instant scaling selects that instance type and does not check the remaining types.
When using node instant scaling, how can I monitor real-time changes in the instance type inventory of a node pool?
Node instant scaling provides health metrics that periodically update the inventory of instance types in a node pool with Auto Scaling enabled. When the inventory status of an instance type changes, node instant scaling sends a Kubernetes event named InstanceInventoryStatusChanged. You can subscribe to this event notification to monitor the inventory health of the node pool, assess its current status, and adjust the instance type configuration in advance. For more information, see View the health status of node instant scaling.
How can I optimize the node pool configuration to prevent scale-out failures due to insufficient inventory?
Consider the following configuration suggestions to expand the range of available instance types:
Configure multiple optional instance types for the node pool, or use a generalized configuration.
Configure multiple zones for the node pool.
Why does node instant scaling fail to add nodes?
Check for the following scenarios.
The instance types configured for the node pool have insufficient inventory.
The instance types configured for the node pool cannot meet the pod's resource requests. The resource size of an ECS instance type is its listed specification. Consider the following resource reservations during runtime.
During instance creation, some resources are consumed by virtualization and the operating system. For more information, see Why is the memory size of a purchased instance different from the memory size defined in its instance type?.
ACK requires a certain amount of node resources to run Kubernetes components and system processes, such as kubelet, kube-proxy, Terway, and the container runtime. For a detailed description of the reservation policy, see Node resource reservation policy.
By default, system components are installed on nodes. The resources requested by a pod must be less than the instance specifications.
You have completed the authorization as described in Enable instant elasticity for nodes.
The node pool with Auto Scaling enabled fails to scale out instances.
To ensure the accuracy of subsequent scaling and system stability, the node instant scaling component does not perform scaling operations until issues with abnormal nodes are resolved.
How do I configure custom resources for a node pool that has node instant scaling enabled?
You can configure ECS tags with the following fixed prefix for a node pool that has node instant scaling enabled. This allows the scaling component to identify the available custom resources in the node pool or the exact values of specified resources.
The version of the node instant scaling component ACK GOATScaler must be v0.2.18 or later. To upgrade the component, see Manage add-ons.
goatscaler.io/node-template/resource/{resource-name}:{resource-size}Example:
goatscaler.io/node-template/resource/hugepages-1Gi:2GiScale-in behavior
Why does node instant scaling fail to remove nodes?
Consider the following scenarios.
The option to scale in only empty nodes is enabled, but the node being checked is not empty.
The resource request threshold of the pods on the node is higher than the configured scale-in threshold.
Pods from the kube-system namespace are running on the node.
The pods on the node have a mandatory scheduling policy that prevents other nodes from running them.
The pods on the node have a PodDisruptionBudget, and the minimum number of available pods has been reached.
If a new node is added, node instant scaling will not perform a scale-in operation on that node within 10 minutes.
Offline nodes exist. An offline node is a running instance that does not have a corresponding node object. The node instant scaling component supports an automatic cleanup feature in v0.5.3 and later. For earlier versions, you must manually delete these residual instances.
Version v0.5.3 is in phased release. Please submit a ticket to request access. For information about how to upgrade the component, see Components.
On the Node Pools page, click Sync Node Pool, and then click Details. On the Nodes tab, check whether any nodes are in the offline state.
What types of pods can prevent node instant scaling from removing nodes?
If a pod is not created by a native Kubernetes controller, such as a deployment, ReplicaSet, Job, or StatefulSet, or if pods on a node cannot be safely terminated or migrated, the node instant scaling component may be prevented from removing the node.
Control scaling behavior using pods
How do I control node instant scaling node scale-in using pods?
You can use the pod annotation goatscaler.io/safe-to-evict to specify whether a pod prevents a node from being scaled in during a node instant scaling scale-in.
To prevent the node from being scaled in, add the annotation
"goatscaler.io/safe-to-evict": "false"to the pod.To allow the node to be scaled in, add the annotation
"goatscaler.io/safe-to-evict": "true"to the pod.
Control scaling behavior using nodes
How do I specify which nodes to delete during a node instant scaling scale-in?
You can add the goatscaler.io/force-to-delete:true:NoSchedule taint to the nodes that you want to remove. After you add this taint, node instant scaling directly deletes the nodes without checking the pod status or draining the pods. Use this feature with caution because it may cause service interruptions or data loss.
How do I prevent node instant scaling from removing specific nodes?
You can configure the node annotation "goatscaler.io/scale-down-disabled": "true" for the target node to prevent it from being scaled in by the node instant scaling component. The following is a sample command to add the annotation.
kubectl annotate node <nodename> goatscaler.io/scale-down-disabled=trueCan node instant scaling scale in only empty nodes?
You can configure whether to scale in only empty nodes at the node level or cluster level. If you configure this feature at both levels, the node-level configuration takes precedence.
Node level: Add the label
goatscaler.io/scale-down-only-empty:trueorgoatscaler.io/scale-down-only-empty:falseto a node to enable or disable scaling in only empty nodes.Cluster level: In the Container Service for Kubernetes console, go to the Add-ons page. Find the node instant scaling component ACK GOATScaler and follow the on-screen instructions to set ScaleDownOnlyEmptyNodes to true or false. This enables or disables scaling in only empty nodes.
About the node instant scaling component
Are there any operations that trigger the automatic update of the node instant scaling component?
No. ACK does not automatically update the node instant scaling component, ACK GOATScaler, except during system maintenance and upgrades. You need to manually upgrade the component on the Container Service Management Console Add-ons page.
Role authorization for an ACK managed cluster is complete, but node scaling activities still do not work. Why?
This may be because the secret addon.aliyuncsmanagedautoscalerrole.token does not exist in the kube-system namespace. By default, ACK uses the WorkerRole to implement related features. Follow the procedure below to grant the AliyunCSManagedAutoScalerRolePolicy permission to the WorkerRole for a dedicated cluster.
On the Clusters page, click the name of the target cluster. In the navigation pane on the left, choose Cluster Information.
On the Clusters page, find the cluster to manage and click its name. In the left navigation pane, choose .
On the Node Pools page, click Enable to the right of Node Scaling.
Follow the on-screen instructions to grant permissions for the KubernetesWorkerRole role and the AliyunCSManagedAutoScalerRolePolicy system policy. The entry points are shown in the figure.

Manually restart the cluster-autoscaler deployment (node autoscaling) or the ack-goatscaler deployment (real-time node scaling) in the kube-system namespace for the permissions to take effect immediately.