GPU-HPN virtual nodes in ACS clusters expose GPU attribute labels that let you target pods at a specific GPU model or network zone — without modifying your application code.
Prerequisites
Before you begin, ensure that you have:
-
Associated the cluster with a GPU-HPN capacity reservation
GPU attribute labels
Every GPU-HPN virtual node carries the following labels.
| Label | Description |
|---|---|
alibabacloud.com/gpu-model-series |
The GPU model used by the virtual node |
alibabacloud.com/node-series |
The type of reserved resource of the virtual node |
alibabacloud.com/hpn-zone |
The zone of the high-performance network |
To view the GPU models supported by your capacity reservation, submit a ticket.
Choose a scheduling method
Kubernetes provides two ways to target a pod at a specific node.
| Method | How it works | When to use |
|---|---|---|
| node selector | Hard match — the pod schedules only to nodes with the exact label value | Pinning to a specific GPU model with no fallback |
| node affinity | Supports both hard (required) and soft (preferred) constraints |
More flexible rules, such as preferring one GPU model but accepting another |
The following example uses a node selector. For node affinity syntax, see the Kubernetes documentation.
Schedule pods to a specific GPU model
The example runs a TensorFlow MNIST training job on a virtual node with a specific GPU model.
-
Create a file named
tensorflow-mnist.yamlwith the following content.apiVersion: batch/v1 kind: Job metadata: name: tensorflow-mnist spec: parallelism: 1 template: metadata: labels: app: tensorflow-mnist spec: nodeSelector: alibabacloud.com/gpu-model-series: "gpu-example" # Replace with your GPU model containers: - name: tensorflow-mnist image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5 command: - python - tensorflow-sample-code/tfjob/docker/mnist/main.py - --max_steps=1000 - --data_dir=tensorflow-sample-code/data resources: requests: cpu: 1 memory: 1 nvidia.com/gpu: 1 limits: cpu: 1 memory: 1 nvidia.com/gpu: 1 workingDir: /root restartPolicy: Never -
Deploy the Job.
kubectl apply -f tensorflow-mnist.yaml -
Check the pod status.
kubectl get pod -l app=tensorflow-mnist -o wideCheck the NODE column to confirm the pod is running on the expected virtual node.
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default tensorflow-mnist-xxx 0/2 Running 0 4h2m <none> cn-shanghai-b.cr-u4ub6c3un2mrjlct2l9c <none> <none>The output indicates that the pods run on the virtual node that uses the gpu-example GPU.