When you use GPU-HPN resource reservations to run pods, you can schedule the pods to the specified virtual node by using the GPU attribute labels of the virtual node. This topic describes the attribute labels of GPU-HPN nodes and how to run pods on a specified GPU model.
Prerequisites
The cluster is associated with a GPU-HPN capacity reservation.
GPU attribute labels
GPU-HPN virtual nodes in ACS clusters have the following labels.
Label | Description |
alibabacloud.com/gpu-model-series | The GPU model used by the virtual node. |
alibabacloud.com/node-series | The type of reserved resource of the virtual node. |
alibabacloud.com/hpn-zone | The zone of the high-performance network. |
To view the supported GPU models, submit a ticket.
Schedule pods to the virtual node with the specified attribute
You can use the node selector or node affinity rules of Kubernetes to control the GPU model to which pods are scheduled.
The following example uses the node selector to schedule pods to the specified GPU model.
Create a file named tensorflow-mnist.yaml and add the following content to the file.
apiVersion: batch/v1 kind: Job metadata: name: tensorflow-mnist spec: parallelism: 1 template: metadata: labels: app: tensorflow-mnist spec: nodeSelector: alibabacloud.com/gpu-model-series: "gpu-example" # Schedule the pods of the application to the gpu-example virtual node. The model is for reference only. containers: - name: tensorflow-mnist image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5 command: - python - tensorflow-sample-code/tfjob/docker/mnist/main.py - --max_steps=1000 - --data_dir=tensorflow-sample-code/data resources: requests: cpu: 1 memory: 1 nvidia.com/gpu: 1 limits: cpu: 1 memory: 1 nvidia.com/gpu: 1 workingDir: /root restartPolicy: NeverRun the following command to deploy the tensorflow-mnist application.
kubectl apply -f tensorflow-mnist.yamlRun the following command to query the status of the pods.
kubectl get pod -l app=tensorflow-mnist -o wideExpected results:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default tensorflow-mnist-xxx 0/2 Running 0 4h2m <none> cn-shanghai-b.cr-u4ub6c3un2mrjlct2l9c <none> <none>The output indicates that the pods run on the virtual node that uses the gpu-example GPU.