Creating a GPU node pool with auto scaling enabled lets you dynamically add or remove nodes based on actual resource needs. This on-demand, elastic scheduling improves GPU resource utilization and reduces operational costs.
Prerequisites
-
You have created an ACK managed Pro cluster.
Step 1: Create an auto-scaling GPU node pool
To ensure independent scheduling and resource isolation for your GPU workloads, create a dedicated GPU node pool and enable auto scaling. This lets the system dynamically adjust compute resources based on workload changes.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
Click Create Node Pool and configure the node pool settings as prompted.
The following are key parameters. For more information about all parameters, see Create and manage node pools.
-
Scaling Mode: Select Auto, and then configure the Instances. If the cluster has insufficient resources for pods, ACK automatically scales the number of nodes within the range you specify.
-
Instance configuration:
Select Specify Instance Type.
-
Architecture: Select GPU-accelerated instance.
-
Instance Type: Select a suitable GPU instance family based on your business requirements, such as
ecs.gn7i-c8g1.2xlarge(NVIDIA A10). To improve the success rate of scale-outs, we recommend selecting multiple instance types.
-
-
Taints: To prevent non-target applications from being scheduled to GPU nodes, we recommend that you add a taint to the node pool. For example:
-
Key: scaler
-
Value: gpu
-
Effect: NoSchedule

-
-
Node Labels: Add a unique label to the node pool, such as
gpu-spec: NVIDIA-A10. This label ensures that GPU applications are scheduled only to the specified node pool.
-
Step 2: Configure GPU resources and node affinity
To schedule your application to the GPU node pool, you must modify its Deployment configuration to request GPU resources and set node affinity.
-
Configure GPU resource requests.
In the container's
resourcesfield, declare the number of GPUs to use.# ... spec: containers: - name: gpu-auto-scaler # ... resources: limits: nvidia.com/gpu: 1 # Request 1 GPU # ... -
Configure node affinity.
Use
nodeAffinityto schedule pods to GPU nodes with the specified label.# ... spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: gpu-spec # Match the label key on the node pool operator: In values: - NVIDIA-A10 # Match the label value on the node pool # ... -
Configure tolerations.
Add a toleration that matches the node pool's taint to ensure that pods can be scheduled to the tainted GPU nodes.
# ... spec: tolerations: - key: "scaler" # Match the taint key on the node pool operator: "Equal" value: "gpu" # Match the taint value on the node pool effect: "NoSchedule" # Match the taint effect on the node pool # ...
Step 3: Deploy the application and verify node scaling
This section uses a Deployment to demonstrate dynamic node scaling.
-
Create a file named
gpu-deployment.yaml.This Deployment runs two pod replicas. The pods must be scheduled to GPU nodes that have the
gpu-spec=NVIDIA-A10label, and each pod uses one GPU. -
Deploy the application and verify the initial scale-out.
-
Deploy the application.
kubectl apply -f gpu-deployment.yamlBecause the cluster does not have any GPU nodes that meet the requirements, the pods enter the
Pendingstate and trigger a scale-out of the node pool. It typically takes several minutes for the new GPU nodes to become ready. -
Check the pod events to confirm that the scale-out was triggered.
kubectl describe pod <your-pod-name>Expected output:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal TriggeredScaleUp 8m32s cluster-autoscaler pod triggered scale-up: [{asg-uf646aomci1pkqya54y7 0->2 (max: 10)}] Normal Scheduled 6m8s default-scheduler Successfully assigned default/gpu-auto-scaler-565994fcf9-6nmz2 to cn-shanghai.10.XX.XX.244 Normal AllocIPSucceed 6m4s terway-daemon Alloc IP 10.XX.XX.245/16 took 4.505870999s Normal Pulling 6m4s kubelet Pulling image "registry.cn-hangzhou.aliyuncs.com/ack/ubuntu:22.04" Normal Pulled 6m2s kubelet Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/ack/ubuntu:22.04" in 1.687s (1.687s including waiting). Normal Created 6m2s kubelet Created container gpu-auto-scaler Normal Started 6m2s kubelet Started container gpu-auto-scaler -
After the pods are running, check the GPU nodes in the cluster that have the corresponding label.
kubectl get nodes -l gpu-spec=NVIDIA-A10The output should show two GPU nodes.
NAME STATUS ROLES AGE VERSION cn-shanghai.10.XX.XX.243 Ready <none> 7m26s v1.34.1-aliyun.1 cn-shanghai.10.XX.XX.244 Ready <none> 7m25s v1.34.1-aliyun.1
-
-
Verify auto scale-out.
-
Scale up the number of application replicas to three.
kubectl scale deployment gpu-auto-scaler --replicas=3Run
kubectl get pod. At this point, two pods are running and one new pod is in thePendingstate due to insufficient resources. This triggers another scale-out of the node pool. -
After a few minutes, run
kubectl get nodes -l gpu-spec=NVIDIA-A10again.The output shows that the number of nodes with the specified label in the cluster has increased to three.
NAME STATUS ROLES AGE VERSION cn-shanghai.10.XX.XX.243 Ready <none> 11m v1.34.1-aliyun.1 cn-shanghai.10.XX.XX.244 Ready <none> 11m v1.34.1-aliyun.1 cn-shanghai.10.XX.XX.247 Ready <none> 45s v1.34.1-aliyun.1
-
-
Verify auto scale-in.
-
Scale down the number of application replicas to one.
kubectl scale deployment gpu-auto-scaler --replicas=1The two extra pods are terminated, leaving two GPU nodes idle. After a specified idle duration, the node scaling component automatically removes these nodes to save costs. This feature is part of node instant scaling.
-
After waiting for the scale-in delay, run
kubectl get nodes -l gpu-spec=NVIDIA-A10again.The output should show that the number of nodes with the specified label in the cluster has been reduced to one.
NAME STATUS ROLES AGE VERSION cn-shanghai.10.XX.XX.243 Ready <none> 31m v1.34.1-aliyun.1
-
Production recommendations
-
Cost optimization: GPU-accelerated instances are costly, and scaled-out nodes are billed on a pay-as-you-go basis. We recommend adding spot instances to the node pool to significantly reduce compute costs. In addition, configure a reasonable value for Max. Instances to prevent unexpected cost overruns during traffic peaks.
-
High availability: To prevent scale-out failures caused by insufficient inventory in a single zone or of a single instance type, we recommend that you select vSwitches in multiple zones and specify multiple GPU instance types when you create the node pool.
-
Monitoring and alerting: Enable GPU monitoring to track GPU-related metrics. This helps you monitor GPU usage, health, and workload performance, which enables rapid issue diagnosis and resource optimization.
FAQ
Pod pending without node pool scale-out
Possible reasons include:
-
Affinity configuration error: Check that the labels in the application's
nodeAffinityconfiguration exactly match the node pool's labels. -
Resource request mismatch: Confirm that the number of GPUs requested by the application (
nvidia.com/gpu) does not exceed the capacity of a single node in the pool. -
Node scaling component error: Check the component logs for error messages.
To collect component logs:
-
Node pool configuration limit: Check whether the maximum number of nodes set for the node pool has reached the quota limit.
Use different GPU types in a cluster
You can create multiple node pools. Configure each node pool with a different GPU instance type and a unique node label, such as gpu-spec: NVIDIA-A10 and gpu-spec: NVIDIA-L20. When you deploy applications, use nodeAffinity to specify the corresponding labels to schedule different applications to different types of GPU nodes.
View GPU devices on a node
After a node pool is created, you can view the GPU devices attached to a node.