To use Container Service for Kubernetes (ACK) clusters for GPU computing, you must schedule pods to GPU nodes. To enable flexible and efficient scheduling, you can add labels to the GPU nodes.
- An ACK cluster is created and GPU nodes are added to the cluster. For more information, see Configure a Kubernetes GPU cluster to support GPU scheduling.
- A master node is connected. This provides an easy method for you to check node labels. For more information, see Use kubectl to connect to a cluster.
When the ACK cluster deploys nodes that are based on NVIDIA GPUs, the attributes of these GPUs are shown on node labels. These labels have the following benefits:
- You can use the labels to filter GPU nodes.
- The labels can be used as the conditions for pod scheduling.
- Log on to the Container Service for Kubernetes (ACK) console.
- In the left-side navigation pane, click Clusters.
- On the Clusters page, click the name of a cluster or click Details in the Actions column. The details page of the cluster appears.
- In the left-side navigation pane, click Nodes.
- On the Nodes page, find the GPU node that you want to manage, and choose in the Actions column for the node.Check the labels of the GPU node.
You can also log on to a master node, and run the following command in the command-line interface (CLI) to view the labels on GPU nodes:Run the following command:
kubectl get nodesResponse:
NAME STATUS ROLES AGE VERSION cn-beijing.i-2ze2dy2h9w97v65uuaft Ready master 2d v1.11.2 cn-beijing.i-2ze8o1a45qdv5q8a7luz Ready <none> 2d v1.11.2 # Compare these nodes with the nodes that are displayed in the console to identify GPU nodes. cn-beijing.i-2ze8o1a45qdv5q8a7lv0 Ready <none> 2d v1.11.2 cn-beijing.i-2ze9xylyn11vop7g5bwe Ready master 2d v1.11.2 cn-beijing.i-2zed5sw8snjniq6mf5e5 Ready master 2d v1.11.2 cn-beijing.i-2zej9s0zijykp9pwf7lu Ready <none> 2d v1.11.2Select a GPU node and run the following command to query the labels on the node:
kubectl describe node cn-beijing.i-2ze8o1a45qdv5q8a7luzResponse:
Name: cn-beijing.i-2ze8o1a45qdv5q8a7luz Roles: <none> Labels: aliyun.accelerator/nvidia_count=1 # This field is important. aliyun.accelerator/nvidia_mem=12209MiB aliyun.accelerator/nvidia_name=Tesla-M40 beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=ecs.gn4-c4g1.xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=cn-beijing failure-domain.beta.kubernetes.io/zone=cn-beijing-a kubernetes.io/hostname=cn-beijing.i-2ze8o1a45qdv5q8a7luz ......
In this example, the following labels are added to the GPU node:
The number of GPU cores.
The size of the GPU memory. Unit: MiB.
The name of the NVIDIA graphics card.
GPU nodes of the same type have the same graphics card name. You can use this label to filter nodes.Run the following command:
kubectl get no -l aliyun.accelerator/nvidia_name=Tesla-M40Response:
NAME STATUS ROLES AGE VERSION cn-beijing.i-2ze8o1a45qdv5q8a7luz Ready <none> 2d v1.11.2 cn-beijing.i-2ze8o1a45qdv5q8a7lv0 Ready <none> 2d v1.11.2
- Go to the Cluster Information page. In the left-side navigation pane, click Workload. The Deployments tab appears. Click Create from Template.
- Create a TensorFlow application and schedule this application to a GPU node.The following YAML template is used in this example:
--- # Define the tensorflow deployment apiVersion: apps/v1 kind: Deployment metadata: name: tf-notebook labels: app: tf-notebook spec: replicas: 1 selector: # define how the deployment finds the pods it mangages matchLabels: app: tf-notebook template: # define the pods specifications metadata: labels: app: tf-notebook spec: nodeSelector: # This field is important. aliyun.accelerator/nvidia_name: Tesla-M40 containers: - name: tf-notebook image: tensorflow/tensorflow:1.4.1-gpu-py3 resources: limits: nvidia.com/gpu: 1 # This field is important. ports: - containerPort: 8888 hostPort: 8888 env: - name: PASSWORD value: mypassw0rdv
- You can also allow a GPU node to repel specific applications. The following example
shows how to deploy an NGINX pod and schedules the pod based on node affinity. For
more information, see the section that describes node affinity in Use an image to create a stateless application.
The following YAML template is used in this example:
apiVersion: v1 kind: Pod metadata: name: not-in-gpu-node spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: aliyun.accelerator/nvidia_name operator: DoesNotExist containers: - name: not-in-gpu-node image: nginx
- Create a TensorFlow application and schedule this application to a GPU node.
- Click the Pods tab.On the Pods tab, the specified pods have been scheduled to the required nodes. You can use labels to schedule pods to specific GPU nodes with ease.