All Products
Search
Document Center

Container Compute Service:Schedule applications to GPU-HPN virtual nodes based on attribute labels

Last Updated:Mar 26, 2026

GPU-HPN virtual nodes in ACS clusters expose GPU attribute labels that let you target pods at a specific GPU model or network zone — without modifying your application code.

Prerequisites

Before you begin, ensure that you have:

GPU attribute labels

Every GPU-HPN virtual node carries the following labels.

Label Description
alibabacloud.com/gpu-model-series The GPU model used by the virtual node
alibabacloud.com/node-series The type of reserved resource of the virtual node
alibabacloud.com/hpn-zone The zone of the high-performance network
To view the GPU models supported by your capacity reservation, submit a ticket.

Choose a scheduling method

Kubernetes provides two ways to target a pod at a specific node.

Method How it works When to use
node selector Hard match — the pod schedules only to nodes with the exact label value Pinning to a specific GPU model with no fallback
node affinity Supports both hard (required) and soft (preferred) constraints More flexible rules, such as preferring one GPU model but accepting another

The following example uses a node selector. For node affinity syntax, see the Kubernetes documentation.

Schedule pods to a specific GPU model

The example runs a TensorFlow MNIST training job on a virtual node with a specific GPU model.

  1. Create a file named tensorflow-mnist.yaml with the following content.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: tensorflow-mnist
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: tensorflow-mnist
        spec:
          nodeSelector:
            alibabacloud.com/gpu-model-series: "gpu-example"  # Replace with your GPU model
          containers:
          - name: tensorflow-mnist
            image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5
            command:
            - python
            - tensorflow-sample-code/tfjob/docker/mnist/main.py
            - --max_steps=1000
            - --data_dir=tensorflow-sample-code/data
            resources:
              requests:
                cpu: 1
                memory: 1
                nvidia.com/gpu: 1
              limits:
                cpu: 1
                memory: 1
                nvidia.com/gpu: 1
            workingDir: /root
          restartPolicy: Never
  2. Deploy the Job.

    kubectl apply -f tensorflow-mnist.yaml
  3. Check the pod status.

    kubectl get pod -l app=tensorflow-mnist -o wide

    Check the NODE column to confirm the pod is running on the expected virtual node.

    NAMESPACE   NAME                     READY   STATUS    RESTARTS   AGE    IP       NODE                                   NOMINATED NODE   READINESS GATES
    default     tensorflow-mnist-xxx     0/2     Running   0          4h2m   <none>   cn-shanghai-b.cr-u4ub6c3un2mrjlct2l9c  <none>           <none>

    The output indicates that the pods run on the virtual node that uses the gpu-example GPU.