How to use cGPU to allocate computing power - Container Service for Kubernetes

You can request GPU memory and computing power for applications in Container Service for Kubernetes (ACK) Pro clusters. This topic describes how to use GPU sharing to allocate computing power.

Prerequisites

An ACK Pro cluster that runs Kubernetes 1.20.11 is created. For more information, see Create an ACK managed cluster.
The required scheduler version varies based on the Kubernetes version of the cluster. The following table describes the scheduler versions that are required for different Kubernetes versions. For more information about the features of different versions of the scheduler, see kube-scheduler.
Kubernetes version
Scheduler version
1.20
1.20.4-ack-8.0 and later
1.22
1.22.15-ack-2.0 and later
1.24
1.24.3-ack-2.0 and later
The GPU sharing component is installed. Make sure that the version of the Helm chart that you install is later than 1.2.0. For more information about how to install the GPU sharing component, see Configure the GPU sharing component.
cGPU 1.0.5 or a later version is installed. For more information about how to update the cGPU version, see Update the cGPU version on a node.

Limits

GPU sharing supports jobs that request only GPU memory and jobs that request both GPU memory and computing power. However, you cannot deploy both types of jobs on a node at the same time. You can create only jobs that request only GPU memory or jobs that request both GPU memory and computing power on a node.
The following limits apply when you request computing power for jobs:
- When you configure parameters to allocate the computing power of a GPU, the maximum value you can specify is 100, which indicates 100% of the computing power of the GPU. For example, a value of 20 indicates 20% of the computing power of the GPU.
- The computing power value that you can specify must be a multiple of 5 and the minimum value is 5. If the value that you specify is not a multiple of 5, the job cannot be submitted.

Only regions in the following table support the allocation of GPU memory and computing power. If you want to allocate GPU memory and computing power, make sure that the region where your cluster resides is included in the following table.

Region	Region ID
China (Beijing)	cn-beijing
China (Shanghai)	cn-shanghai
China (Hangzhou)	cn-hangzhou
China (Zhangjiakou)	cn-zhangjiakou
China (Shenzhen)	cn-shenzhen
China (Chengdu)	cn-chengdu
China (Heyuan)	cn-heyuan
China (Hong Kong)	cn-hongkong
Indonesia (Jakarta)	ap-southeast-5
Singapore	ap-southeast-1
US (Virginia)	us-east-1
US (Silicon Valley)	us-west-1
Japan (Tokyo)	ap-northeast-1

The scheduler version that supports computing power allocation was released on March 1, 2022. Clusters that were created on March 1, 2022 or later use the latest scheduler version. The version of the scheduler used by clusters that were created before March 1, 2022 is not automatically updated. You must manually update the scheduler version. If your cluster was created before March 1, 2022, perform the following steps:
1. Submit a ticket to apply to join the private preview for the latest GPU sharing version.
2. Uninstall the outdated version of the GPU sharing component.
  If the Helm chart version of the GPU sharing component that is installed is 1.2.0 or earlier, the version of the GPU sharing component is outdated and supports only memory sharing. Perform the following steps to uninstall the outdated version:
  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, click Clusters.
  3. On the Clusters page, find the cluster that you want to manage. Then, click the name of the cluster or click Details in the Actions column.
  4. On the cluster details page of a cluster, choose Applications > Helm in the left-side navigation pane.
  5. On the Helm page, find ack-ai-installer and click Delete in the Actions column.
  6. In the Delete dialog box, click OK.
3. Install the latest version of the GPU sharing component. For more information, see Configure the GPU sharing component.

Step 1: Create a node pool that supports computing power allocation

Log on to the ACK console.
In the left-side navigation pane of the ACK console, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the details page, choose Nodes > Node Pools.

On the right side of the Node Pools page, click Create Node Pool.

The following table describes some of the parameters. For more information about the parameters, see Configure a node pool.

Parameter	Description
Node Pool Name	Enter a name for the node pool. In this example, `gpu-core` is used.
Expected Nodes	Specify the initial number of nodes in the node pool. If you do not want to create nodes in the node pool, set this parameter to `0`.
Operating System	Only CentOS 7.x and Alibaba Cloud Linux 2.x are supported.
ECS Label	Add labels to the ECS instances.
Custom Resource Group	Specify the resource group of the nodes to be added to the node pool.
Node Label	Add labels to the nodes in the node pool. The following configurations are used in this topic. For more information about node labels, see Labels for enabling GPU scheduling policies and methods for changing label values. To enable GPU memory isolation and computing power isolation, click , and then set Key to `ack.node.gpu.schedule` and Value to `core_mem`. To use the binpack algorithm to select GPUs for pods, click , and then set Key to `ack.node.gpu.placement` and Value to `binpack`.

Important

If you want to enable computing power isolation for existing GPU-accelerated nodes in the cluster, you must first remove the nodes from the cluster and then add the nodes to a node pool that supports computing power isolation. You cannot directly use the kubectl label nodes <NODE_NAME> ack.node.gpu.schedule=core_mem command to enable computing power isolation for existing GPU-accelerated nodes.

Step 2: Check whether computing power allocation is enabled for the node pool

Run the following command to check whether computing power allocation is enabled for the nodes in the node pool:

kubectl get nodes <NODE_NAME> -o yaml

Expected output:

# Irrelevant fields are not shown. 
status:
  # Irrelevant fields are not shown. 
  allocatable:
    # The nodes have 4 GPUs, which provide 400% of computing power in total. Each GPU provides 100% of computing power. 
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    # The nodes have 4 GPUs, which provide 60 GiB of memory in total. Each GPU provides 15 GiB of memory. 
    aliyun.com/gpu-mem: "60"
  capacity:
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    aliyun.com/gpu-mem: "60"

The output contains the aliyun.com/gpu-core.percentage field, which indicates that computing power allocation is enabled.

Step 3: Use the computing power allocation feature

If you do not enable computing power allocation, a pod can use 100% of the computing power of a GPU. In this example, the memory of a GPU is 15 GiB. The following steps show how to create a job that requests both GPU memory and computing power. The job requests 2 GiB of GPU memory and 30% of the computing power of the GPU.

Use the following YAML template to create a job that requests both GPU memory and computing power:

cat > /tmp/cuda-sample.yaml <<-EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: cuda-sample
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: cuda-sample
    spec:
      containers:
      - name: cuda-sample
        image:  registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:cuda-sample-11.0.3
        command:
        - bandwidthTest
        resources:
          limits:
            2 # Apply for 2 GiB of GPU memory. 
            aliyun.com/gpu-mem: 2
            # Apply for 30% of the computing power of the GPU. 
            aliyun.com/gpu-core.percentage: 30
        workingDir: /root
      restartPolicy: Never
EOF

Run the following command to submit the cuda-sample job:
```
kubectl apply -f /tmp/cuda-sample.yaml
```
Note
The image used by the job is large in size and therefore the image pulling process may be time-consuming.
Run the following command to query the cuda-sample job:
```
kubectl get po -l app=cuda-sample
```
Expected output:
```
NAME                READY   STATUS    RESTARTS   AGE
cuda-sample-m****   1/1     Running   0          15s
```
In the output, Running is displayed in the STATUS column, which indicates that the job is deployed.
Run the following command to query the amount of GPU memory and computing power used by the pod that is provisioned for the job:
```
kubectl exec -ti cuda-sample-m**** -- nvidia-smi
```
Expected output:
```
Thu Dec 16 02:53:22 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:08.0 Off |                    0 |
| N/A   33C    P0    56W / 300W |    337MiB /  2154MiB |     30%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
```
The output indicates the following information:
- GPU memory: Before you enable computing power allocation, the pod can use 100% of the memory provided by the GPU. In this example, the total amount of memory provided by the GPU is 15 GiB. You can run the nvidia-smi command on the node to query the total amount of memory provided by the GPU. After you enable computing power allocation, the amount of memory that the pod uses is 337 MiB and the total amount of memory that the pod can use is 2,154 MiB (about 2 GiB). This indicates that memory isolation is enabled.
- Computing power: Before you enable computing power allocation, the pod can use 100% of the computing power of the GPU. You can set the requested amount to 100 to verify that the pod can use 100% of the computing power. After you enable computing power allocation, the pod uses 30% of the computing power of the GPU. This indicates that computing power isolation is enabled.
Note
For example, you created n jobs. Each job requests 30% of the computing power and the value of n is no greater than 3. The jobs are scheduled to one GPU. If you log on to the pods of the jobs and run the nvidia-smi command, the output shows that the pods use n × 30% of the computing power. The output of the nvidia-smi command shows only the computing power utilization per GPU. The command does not show the computing power utilization per job.

Run the following command to print the log of the pod:

kubectl logs cuda-sample-m**** -f

Expected output:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: Tesla V100-SXM2-16GB
 Quick Mode

time: 2021-12-16/02:50:59,count: 0,memSize: 32000000,succeed to copy data from host to gpu
time: 2021-12-16/02:51:01,count: 1,memSize: 32000000,succeed to copy data from host to gpu
time: 2021-12-16/02:51:02,count: 2,memSize: 32000000,succeed to copy data from host to gpu
time: 2021-12-16/02:51:03,count: 3,memSize: 32000000,succeed to copy data from host to gpu

The output shows that the pod log is generated at a lower rate after you enable computing power allocation. This is because each pod can use only about 30% of the computing power of the GPU.

Optional: Run the following command to delete the cuda-sample job:
After you verify that computing power allocation works as expected, you can delete the job.
```
kubectl delete job cuda-sample
```

Kubernetes version	Scheduler version
1.20	1.20.4-ack-8.0 and later
1.22	1.22.15-ack-2.0 and later
1.24	1.24.3-ack-2.0 and later