All Products
Search
Document Center

Container Compute Service:Schedule pods to GPU-HPN virtual nodes based on attribute labels

Last Updated:Mar 25, 2025

When you use GPU-HPN resource reservations to run pods, you can schedule the pods to the specified virtual node by using the GPU attribute labels of the virtual node. This topic describes the attribute labels of GPU-HPN nodes and how to run pods on a specified GPU model.

Prerequisites

GPU attribute labels

GPU-HPN virtual nodes in ACS clusters have the following labels.

Label

Description

alibabacloud.com/gpu-model-series

The GPU model used by the virtual node.

alibabacloud.com/node-series

The type of reserved resource of the virtual node.

alibabacloud.com/hpn-zone

The zone of the high-performance network.

Note

To view the supported GPU models, submit a ticket.

Schedule pods to the virtual node with the specified attribute

You can use the node selector or node affinity rules of Kubernetes to control the GPU model to which pods are scheduled.

The following example uses the node selector to schedule pods to the specified GPU model.

  1. Create a file named tensorflow-mnist.yaml and add the following content to the file.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: tensorflow-mnist
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: tensorflow-mnist
        spec:
          nodeSelector:
            alibabacloud.com/gpu-model-series: "gpu-example" # Schedule the pods of the application to the gpu-example virtual node. The model is for reference only. 
          containers:
          - name: tensorflow-mnist
            image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5
            command:
            - python
            - tensorflow-sample-code/tfjob/docker/mnist/main.py
            - --max_steps=1000
            - --data_dir=tensorflow-sample-code/data
            resources:
              requests:
                cpu: 1
                memory: 1
                nvidia.com/gpu: 1
              limits:
                cpu: 1
                memory: 1
                nvidia.com/gpu: 1
            workingDir: /root
          restartPolicy: Never
  2. Run the following command to deploy the tensorflow-mnist application.

    kubectl apply -f tensorflow-mnist.yaml
  3. Run the following command to query the status of the pods.

    kubectl get pod -l app=tensorflow-mnist -o wide

    Expected results:

    NAMESPACE   NAME                     READY   STATUS      RESTARTS   AGE    IP         NODE                                      NOMINATED NODE   READINESS GATES
    default     tensorflow-mnist-xxx     0/2     Running     0          4h2m   <none>     cn-shanghai-b.cr-u4ub6c3un2mrjlct2l9c     <none>           <none>

    The output indicates that the pods run on the virtual node that uses the gpu-example GPU.