GPU nodes are expensive to keep running around the clock. With Auto Mode enabled on an ACK cluster, managed node pools provision GPU nodes only when workloads need them and release those nodes when they go idle — so you pay only for the GPU time your applications actually use. This is especially cost-effective for workloads with fluctuating demand, such as online inference.
Prerequisites
Before you begin, ensure that you have:
ContainerOS 3.6 or later on your nodes (upgrade ContainerOS if needed)
Step 1: Create a managed node pool with GPU instances
Create a dedicated node pool for GPU workloads so GPU nodes stay isolated from general-purpose workloads. When a GPU workload is submitted, Auto Mode automatically provisions the required GPU nodes. When those nodes go idle and meet the scale-in criteria, Auto Mode releases them automatically.
On the ACK Clusters page, click the name of your cluster. In the left navigation pane, choose Nodes > Node Pools.
Click Create Node Pool and configure the following parameters. For all available options, see Create a node pool.
Parameter Description Configure Managed Node Pool Select Auto Mode. vSwitch Select two or more vSwitches in different zones for high availability. During scaling, nodes are provisioned in the zones of the selected vSwitches based on the Scaling Policy. Instance-related configurations Set Instance Configuration Mode to Specify Instance Type. Set Architecture to GPU-accelerated. For Instance Type, select one or more GPU instance families, such as ecs.gn7i-c8g1.2xlarge(NVIDIA A10). Selecting multiple instance types improves the success rate of scale-out.Taints Add a taint to prevent non-GPU workloads from being scheduled onto these GPU nodes: Key nvidia.com/gpu, Valuetrue, EffectNoSchedule.
Step 2: Configure GPU resource requests and taint tolerations
For the Pod to land on the GPU node pool and trigger auto-provisioning, declare its GPU requirements and taint toleration in the YAML manifest.
GPU resource request — in the container's resources field:
# ...
spec:
containers:
- name: gpu-automode
resources:
limits:
nvidia.com/gpu: 1 # Request 1 GPU
# ...Taint toleration — allows the Pod to be scheduled onto nodes with the nvidia.com/gpu taint:
# ...
spec:
tolerations:
- key: "nvidia.com/gpu" # Matches the taint key set on the node pool
operator: "Equal"
value: "true" # Matches the taint value
effect: "NoSchedule" # Matches the taint effect
# ...Step 3: Deploy the GPU workload and verify autoscaling
The following example uses a Stable Diffusion Web UI application to demonstrate the full deployment and autoscaling flow.
Deploy the workload
Create a file named
stable-diffusion.yamlwith the following content. The manifest contains two parts:Deployment: Runs the Stable Diffusion workload. The Pod requests one NVIDIA GPU and declares the corresponding taint toleration.
Service: Exposes the workload through a public IP address via a
LoadBalancerService on port7860.
apiVersion: apps/v1 kind: Deployment metadata: labels: app: stable-diffusion name: stable-diffusion namespace: default spec: replicas: 1 selector: matchLabels: app: stable-diffusion template: metadata: labels: app: stable-diffusion spec: containers: - args: - --listen command: - python3 - launch.py image: yunqi-registry.cn-shanghai.cr.aliyuncs.com/lab/stable-diffusion:v1.0.0-gpu imagePullPolicy: IfNotPresent name: stable-diffusion ports: - containerPort: 7860 protocol: TCP readinessProbe: tcpSocket: port: 7860 resources: limits: nvidia.com/gpu: 1 # Request 1 GPU requests: cpu: "6" memory: 12Gi # Allow the Pod to be scheduled onto the GPU node pool tolerations: - key: "nvidia.com/gpu" operator: "Equal" value: "true" effect: "NoSchedule" --- apiVersion: v1 kind: Service metadata: annotations: # Expose the Service over the Internet service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: internet service.beta.kubernetes.io/alibaba-cloud-loadbalancer-instance-charge-type: PayByCLCU name: stable-diffusion-svc namespace: default spec: externalTrafficPolicy: Local ports: - port: 7860 protocol: TCP targetPort: 7860 selector: app: stable-diffusion # Routes traffic to Pods with this label type: LoadBalancerApply the manifest.
kubectl apply -f stable-diffusion.yaml
Verify node auto-provisioning
Right after deployment, the Pod enters a Pending state because no GPU nodes exist yet. Auto Mode detects the unschedulable Pod and provisions one.
Check the Pod status.
kubectl get pod -l app=stable-diffusionInspect the Pod events to confirm that provisioning was triggered.
kubectl describe pod -l app=stable-diffusionIn the
Eventssection, aFailedSchedulingwarning appears first, followed by aProvisionNodeevent, indicating that Auto Mode has started provisioning a GPU node.Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 15m default-scheduler 0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }, 2 Insufficient cpu, 2 Insufficient memory, 2 Insufficient nvidia.com/gpu. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling., , Normal ProvisionNode 16m GOATScaler Provision node asa-2ze2h0f4m5ctpd8kn4f1 in Zone: cn-beijing-k with InstanceType: ecs.gn7i-c8g1.2xlarge, Triggered time 2025-11-19 02:58:01.096 Normal AllocIPSucceed 12m terway-daemon Alloc IP 10.XX.XX.141/16 took 4.764400743s Normal Pulling 12m kubelet Pulling image "yunqi-registry.cn-shanghai.cr.aliyuncs.com/lab/stable-diffusion:v1.0.0-gpu" Normal Pulled 3m48s kubelet Successfully pulled image "yunqi-registry.cn-shanghai.cr.aliyuncs.com/lab/stable-diffusion:v1.0.0-gpu" in 8m47.675s (8m47.675s including waiting). Image size: 11421866941 bytes. Normal Created 3m42s kubelet Created container: stable-diffusion Normal Started 3m24s kubelet Started container stable-diffusionConfirm the provisioned node is
Ready.# Store the node name where the Pod is running NODE_NAME=$(kubectl get pod -l app=stable-diffusion -o jsonpath='{.items[0].spec.nodeName}') # Print the node name echo "Stable Diffusion is running on node: $NODE_NAME" # Verify the node is in Ready state kubectl get node $NODE_NAME
Access the Stable Diffusion UI
Wait a few minutes for the node to join the cluster and the Pod to start, then access the application.
Get the public IP address of the Service.
kubectl get svc stable-diffusion-svcFind the
EXTERNAL-IPin the output.NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE stable-diffusion-svc LoadBalancer 192.XXX.XX.196 8.XXX.XX.68 7860:31302/TCP 18mOpen a browser and navigate to
http://<EXTERNAL-IP>:7860. If the Stable Diffusion Web UI loads, the workload is running on the GPU node.
Verify node scale-in
To test automatic scale-in, delete the Deployment so the GPU node goes idle.
Delete the Deployment and the Service.
kubectl delete deployment stable-diffusion kubectl delete service stable-diffusion-svcAfter the Defer Scale-in For period elapses (3 minutes by default in Auto Mode), the autoscaler removes the idle node. Query the node by its stored name to confirm.
kubectl get node $NODE_NAMEThe expected output confirms the node has been released.
Error from server (NotFound): nodes "<nodeName>" not found