You can request GPU memory and computing power for applications in Container Service for Kubernetes (ACK) Pro clusters. This topic describes how to use cGPU to allocate computing power.
Prerequisites
An ACK Pro cluster that runs Kubernetes 1.20.11 is created. For more information, see Create an ACK managed cluster.
The required scheduler version varies based on the Kubernetes version of the cluster. The following table describes the scheduler versions that are required for different Kubernetes versions. For more information about the features of different versions of the scheduler, see kube-scheduler.
Kubernetes version
Scheduler version
1.20
1.20.4-ack-8.0 and later
1.22
1.22.15-ack-2.0 and later
1.24
1.24.3-ack-2.0 and later
The cGPU component is installed. Make sure that the version of the Helm chart that you install is later than 1.2.0. For more information about how to install the cGPU component, see Install and use ack-ai-installer and the GPU inspection tool.
cGPU 1.0.5 or a later version is installed. For more information about how to update the cGPU version, see Update the cGPU version on a node.
Limits
cGPU supports jobs that request only GPU memory and jobs that request both GPU memory and computing power. However, you cannot deploy both types of jobs on a node at the same time. You can create only jobs that request only GPU memory or jobs that request both GPU memory and computing power on a node.
The following limits apply when you request computing power for jobs:
When you configure parameters to allocate the computing power of a GPU, the maximum value you can specify is 100, which indicates 100% of the computing power of the GPU. For example, a value of 20 indicates 20% of the computing power of the GPU.
The computing power value that you can specify must be a multiple of 5 and the minimum value is 5. If the value that you specify is not a multiple of 5, the job cannot be submitted.
Only regions in the following table support the allocation of GPU memory and computing power. If you want to allocate GPU memory and computing power, make sure that the region where your cluster resides is included in the following table.
Region
Region ID
China (Beijing)
cn-beijing
China (Shanghai)
cn-shanghai
China (Hangzhou)
cn-hangzhou
China (Zhangjiakou)
cn-zhangjiakou
China (Shenzhen)
cn-shenzhen
China (Chengdu)
cn-chengdu
China (Heyuan)
cn-heyuan
China (Hong Kong)
cn-hongkong
Indonesia (Jakarta)
ap-southeast-5
Singapore
ap-southeast-1
US (Virginia)
us-east-1
US (Silicon Valley)
us-west-1
Japan (Tokyo)
ap-northeast-1
The scheduler version that supports computing power allocation was released on March 1, 2022. Clusters that were created on March 1, 2022 or later use the latest scheduler version. The version of the scheduler used by clusters that were created before March 1, 2022 is not automatically updated. You must manually update the scheduler version. If your cluster was created before March 1, 2022, perform the following steps:
Submit a ticket to apply to join the private preview for the latest cGPU version.
Uninstall the outdated version of the cGPU component.
If the Helm chart version of the cGPU component that is installed is 1.2.0 or earlier, the version of the cGPU component is outdated and supports only memory sharing. Perform the following steps to uninstall the outdated version:
Log on to the ACK console.
In the left-side navigation pane of the ACK console, click Clusters.
On the Clusters page, find the cluster that you want to manage. Then, click the name of the cluster or click Details in the Actions column.
On the cluster details page of a cluster, choose
in the left-side navigation pane.On the Helm page, find ack-ai-installer and click Delete in the Actions column.
In the Delete dialog box, click OK.
Install the latest version of the cGPU component. For more information, see Install and use ack-ai-installer and the GPU inspection tool.
Step 1: Create a node pool that supports computing power allocation
Log on to the ACK console.
In the left-side navigation pane of the ACK console, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the details page, choose .
On the right side of the Node Pools page, click Create Node Pool.
The following table describes some of the parameters. For more information about the parameters, see Configure a node pool.
Parameter
Description
Node Pool Name
Enter a name for the node pool. In this example, gpu-core is used.
Expected Nodes
Specify the initial number of nodes in the node pool. If you do not want to create nodes in the node pool, set this parameter to 0.
Operating System
Only CentOS 7.x and Alibaba Cloud Linux 2.x are supported.
ECS Label
Add labels to the ECS instances.
Custom Resource Group
Specify the resource group of the nodes to be added to the node pool.
Node Label
Add labels to the nodes in the node pool. The following configurations are used in this topic. For more information about node labels, see Labels used by ACK to control GPUs.
To enable GPU memory isolation and computing power isolation, click , and then set Key to ack.node.gpu.schedule and Value to core_mem.
To use the binpack algorithm to select GPUs for pods, click , and then set Key to ack.node.gpu.placement and Value to binpack.
ImportantIf you want to enable computing power isolation for existing GPU-accelerated nodes in the cluster, you must first remove the nodes from the cluster and then add the nodes to a node pool that supports computing power isolation. You cannot directly use the
kubectl label nodes <NODE_NAME> ack.node.gpu.schedule=core_mem
command to enable computing power isolation for existing GPU-accelerated nodes.
Step 2: Check whether computing power allocation is enabled for the node pool
Run the following command to check whether computing power allocation is enabled for the nodes in the node pool:
kubectl get nodes <NODE_NAME> -o yaml
Expected output:
# Irrelevant fields are not shown.
status:
# Irrelevant fields are not shown.
allocatable:
# The nodes have 4 GPUs, which provide 400% of computing power in total. Each GPU provides 100% of computing power.
aliyun.com/gpu-core.percentage: "400"
aliyun.com/gpu-count: "4"
# The nodes have 4 GPUs, which provide 60 GiB of memory in total. Each GPU provides 15 GiB of memory.
aliyun.com/gpu-mem: "60"
capacity:
aliyun.com/gpu-core.percentage: "400"
aliyun.com/gpu-count: "4"
aliyun.com/gpu-mem: "60"
The output contains the aliyun.com/gpu-core.percentage
field, which indicates that computing power allocation is enabled.
Step 3: Use the computing power allocation feature
If you do not enable computing power allocation, a pod can use 100% of the computing power of a GPU. In this example, the memory of a GPU is 15 GiB. The following steps show how to create a job that requests both GPU memory and computing power. The job requests 2 GiB of GPU memory and 30% of the computing power of the GPU.
Use the following YAML template to create a job that requests both GPU memory and computing power:
cat > /tmp/cuda-sample.yaml <<-EOF apiVersion: batch/v1 kind: Job metadata: name: cuda-sample spec: parallelism: 1 template: metadata: labels: app: cuda-sample spec: containers: - name: cuda-sample image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:cuda-sample-11.0.3 command: - bandwidthTest resources: limits: 2 # Apply for 2 GiB of GPU memory. aliyun.com/gpu-mem: 2 # Apply for 30% of the computing power of the GPU. aliyun.com/gpu-core.percentage: 30 workingDir: /root restartPolicy: Never EOF
Run the following command to submit the cuda-sample job:
kubectl apply -f /tmp/cuda-sample.yaml
NoteThe image used by the job is large in size and therefore the image pulling process may be time-consuming.
Run the following command to query the cuda-sample job:
kubectl get po -l app=cuda-sample
Expected output:
NAME READY STATUS RESTARTS AGE cuda-sample-m**** 1/1 Running 0 15s
In the output,
Running
is displayed in theSTATUS
column, which indicates that the job is deployed.Run the following command to query the amount of GPU memory and computing power used by the pod that is provisioned for the job:
kubectl exec -ti cuda-sample-m**** -- nvidia-smi
Expected output:
Thu Dec 16 02:53:22 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:00:08.0 Off | 0 | | N/A 33C P0 56W / 300W | 337MiB / 2154MiB | 30% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
The output indicates the following information:
GPU memory: Before you enable computing power allocation, the pod can use 100% of the memory provided by the GPU. In this example, the total amount of memory provided by the GPU is 15 GiB. You can run the
nvidia-smi
command on the node to query the total amount of memory provided by the GPU. After you enable computing power allocation, the amount of memory that the pod uses is 337 MiB and the total amount of memory that the pod can use is 2,154 MiB (about 2 GiB). This indicates that memory isolation is enabled.Computing power: Before you enable computing power allocation, the pod can use 100% of the computing power of the GPU. You can set the requested amount to 100 to verify that the pod can use 100% of the computing power. After you enable computing power allocation, the pod uses 30% of the computing power of the GPU. This indicates that computing power isolation is enabled.
NoteFor example, you created n jobs. Each job requests 30% of the computing power and the value of n is no greater than 3. The jobs are scheduled to one GPU. If you log on to the pods of the jobs and run the
nvidia-smi
command, the output shows that the pods use n × 30% of the computing power. The output of thenvidia-smi
command shows only the computing power utilization per GPU. The command does not show the computing power utilization per job.Run the following command to print the log of the pod:
kubectl logs cuda-sample-m**** -f
Expected output:
[CUDA Bandwidth Test] - Starting... Running on... Device 0: Tesla V100-SXM2-16GB Quick Mode time: 2021-12-16/02:50:59,count: 0,memSize: 32000000,succeed to copy data from host to gpu time: 2021-12-16/02:51:01,count: 1,memSize: 32000000,succeed to copy data from host to gpu time: 2021-12-16/02:51:02,count: 2,memSize: 32000000,succeed to copy data from host to gpu time: 2021-12-16/02:51:03,count: 3,memSize: 32000000,succeed to copy data from host to gpu
The output shows that the pod log is generated at a lower rate after you enable computing power allocation. This is because each pod can use only about 30% of the computing power of the GPU.
Optional: Run the following command to delete the cuda-sample job:
After you verify that computing power allocation works as expected, you can delete the job.
kubectl delete job cuda-sample