Use attribute labels to schedule pods to specific GPU models on GPU-HPN virtual nodes - Container Compute Service

GPU-HPN virtual nodes in ACS clusters expose GPU attribute labels that let you target pods at a specific GPU model or network zone — without modifying your application code.

Prerequisites

Before you begin, ensure that you have:

Activated ACS
Associated the cluster with a GPU-HPN capacity reservation

GPU attribute labels

Every GPU-HPN virtual node carries the following labels.

Label	Description
`alibabacloud.com/gpu-model-series`	The GPU model used by the virtual node
`alibabacloud.com/node-series`	The type of reserved resource of the virtual node
`alibabacloud.com/hpn-zone`	The zone of the high-performance network

To view the GPU models supported by your capacity reservation, submit a ticket.

Choose a scheduling method

Kubernetes provides two ways to target a pod at a specific node.

Method	How it works	When to use
node selector	Hard match — the pod schedules only to nodes with the exact label value	Pinning to a specific GPU model with no fallback
node affinity	Supports both hard (`required`) and soft (`preferred`) constraints	More flexible rules, such as preferring one GPU model but accepting another

The following example uses a node selector. For node affinity syntax, see the Kubernetes documentation.

Schedule pods to a specific GPU model

The example runs a TensorFlow MNIST training job on a virtual node with a specific GPU model.

Create a file named tensorflow-mnist.yaml with the following content.

apiVersion: batch/v1
kind: Job
metadata:
  name: tensorflow-mnist
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: tensorflow-mnist
    spec:
      nodeSelector:
        alibabacloud.com/gpu-model-series: "gpu-example"  # Replace with your GPU model
      containers:
      - name: tensorflow-mnist
        image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=1000
        - --data_dir=tensorflow-sample-code/data
        resources:
          requests:
            cpu: 1
            memory: 1
            nvidia.com/gpu: 1
          limits:
            cpu: 1
            memory: 1
            nvidia.com/gpu: 1
        workingDir: /root
      restartPolicy: Never

Deploy the Job.
```
kubectl apply -f tensorflow-mnist.yaml
```

Check the pod status.

kubectl get pod -l app=tensorflow-mnist -o wide

Check the NODE column to confirm the pod is running on the expected virtual node.

NAMESPACE   NAME                     READY   STATUS    RESTARTS   AGE    IP       NODE                                   NOMINATED NODE   READINESS GATES
default     tensorflow-mnist-xxx     0/2     Running   0          4h2m   <none>   cn-shanghai-b.cr-u4ub6c3un2mrjlct2l9c  <none>           <none>

The output indicates that the pods run on the virtual node that uses the gpu-example GPU.