All Products
Search
Document Center

Container Service for Kubernetes:Use cGPU to allocate computing power

Last Updated:Oct 16, 2023

You can request GPU memory and computing power for applications in Container Service for Kubernetes (ACK) Pro clusters. This topic describes how to use cGPU to allocate computing power.

Prerequisites

  • An ACK Pro cluster that runs Kubernetes 1.20.11 is created. For more information, see Create an ACK managed cluster.

  • The required scheduler version varies based on the Kubernetes version of the cluster. The following table describes the scheduler versions that are required for different Kubernetes versions. For more information about the features of different versions of the scheduler, see kube-scheduler.

    Kubernetes version

    Scheduler version

    1.20

    1.20.4-ack-8.0 and later

    1.22

    1.22.15-ack-2.0 and later

    1.24

    1.24.3-ack-2.0 and later

  • The cGPU component is installed. Make sure that the version of the Helm chart that you install is later than 1.2.0. For more information about how to install the cGPU component, see Install and use ack-ai-installer and the GPU inspection tool.

  • cGPU 1.0.5 or a later version is installed. For more information about how to update the cGPU version, see Update the cGPU version on a node.

Limits

  • cGPU supports jobs that request only GPU memory and jobs that request both GPU memory and computing power. However, you cannot deploy both types of jobs on a node at the same time. You can create only jobs that request only GPU memory or jobs that request both GPU memory and computing power on a node.

  • The following limits apply when you request computing power for jobs:

    • When you configure parameters to allocate the computing power of a GPU, the maximum value you can specify is 100, which indicates 100% of the computing power of the GPU. For example, a value of 20 indicates 20% of the computing power of the GPU.

    • The computing power value that you can specify must be a multiple of 5 and the minimum value is 5. If the value that you specify is not a multiple of 5, the job cannot be submitted.

  • Only regions in the following table support the allocation of GPU memory and computing power. If you want to allocate GPU memory and computing power, make sure that the region where your cluster resides is included in the following table.

    Region

    Region ID

    China (Beijing)

    cn-beijing

    China (Shanghai)

    cn-shanghai

    China (Hangzhou)

    cn-hangzhou

    China (Zhangjiakou)

    cn-zhangjiakou

    China (Shenzhen)

    cn-shenzhen

    China (Chengdu)

    cn-chengdu

    China (Heyuan)

    cn-heyuan

    China (Hong Kong)

    cn-hongkong

    Indonesia (Jakarta)

    ap-southeast-5

    Singapore

    ap-southeast-1

    US (Virginia)

    us-east-1

    US (Silicon Valley)

    us-west-1

    Japan (Tokyo)

    ap-northeast-1

  • The scheduler version that supports computing power allocation was released on March 1, 2022. Clusters that were created on March 1, 2022 or later use the latest scheduler version. The version of the scheduler used by clusters that were created before March 1, 2022 is not automatically updated. You must manually update the scheduler version. If your cluster was created before March 1, 2022, perform the following steps:

    1. Submit a ticket to apply to join the private preview for the latest cGPU version.

    2. Uninstall the outdated version of the cGPU component.

      If the Helm chart version of the cGPU component that is installed is 1.2.0 or earlier, the version of the cGPU component is outdated and supports only memory sharing. Perform the following steps to uninstall the outdated version:

      1. Log on to the ACK console.

      2. In the left-side navigation pane of the ACK console, click Clusters.

      3. On the Clusters page, find the cluster that you want to manage. Then, click the name of the cluster or click Details in the Actions column.

      4. On the cluster details page of a cluster, choose Applications > Helm in the left-side navigation pane.

      5. On the Helm page, find ack-ai-installer and click Delete in the Actions column.

      6. In the Delete dialog box, click OK.

    3. Install the latest version of the cGPU component. For more information, see Install and use ack-ai-installer and the GPU inspection tool.

Step 1: Create a node pool that supports computing power allocation

  1. Log on to the ACK console.

  2. In the left-side navigation pane of the ACK console, click Clusters.

  3. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.

  4. In the left-side navigation pane of the details page, choose Nodes > Node Pools.

  5. On the right side of the Node Pools page, click Create Node Pool.

    The following table describes some of the parameters. For more information about the parameters, see Configure a node pool.

    Parameter

    Description

    Node Pool Name

    Enter a name for the node pool. In this example, gpu-core is used.

    Expected Nodes

    Specify the initial number of nodes in the node pool. If you do not want to create nodes in the node pool, set this parameter to 0.

    Operating System

    Only CentOS 7.x and Alibaba Cloud Linux 2.x are supported.

    ECS Label

    Add labels to the ECS instances.

    Custom Resource Group

    Specify the resource group of the nodes to be added to the node pool.

    Node Label

    Add labels to the nodes in the node pool. The following configurations are used in this topic. For more information about node labels, see Labels used by ACK to control GPUs.

    • To enable GPU memory isolation and computing power isolation, click 添加节点标签, and then set Key to ack.node.gpu.schedule and Value to core_mem.

    • To use the binpack algorithm to select GPUs for pods, click 添加节点标签, and then set Key to ack.node.gpu.placement and Value to binpack.

    Important

    If you want to enable computing power isolation for existing GPU-accelerated nodes in the cluster, you must first remove the nodes from the cluster and then add the nodes to a node pool that supports computing power isolation. You cannot directly use the kubectl label nodes <NODE_NAME> ack.node.gpu.schedule=core_mem command to enable computing power isolation for existing GPU-accelerated nodes.

Step 2: Check whether computing power allocation is enabled for the node pool

Run the following command to check whether computing power allocation is enabled for the nodes in the node pool:

kubectl get nodes <NODE_NAME> -o yaml

Expected output:

# Irrelevant fields are not shown. 
status:
  # Irrelevant fields are not shown. 
  allocatable:
    # The nodes have 4 GPUs, which provide 400% of computing power in total. Each GPU provides 100% of computing power. 
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    # The nodes have 4 GPUs, which provide 60 GiB of memory in total. Each GPU provides 15 GiB of memory. 
    aliyun.com/gpu-mem: "60"
  capacity:
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    aliyun.com/gpu-mem: "60"

The output contains the aliyun.com/gpu-core.percentage field, which indicates that computing power allocation is enabled.

Step 3: Use the computing power allocation feature

If you do not enable computing power allocation, a pod can use 100% of the computing power of a GPU. In this example, the memory of a GPU is 15 GiB. The following steps show how to create a job that requests both GPU memory and computing power. The job requests 2 GiB of GPU memory and 30% of the computing power of the GPU.

  1. Use the following YAML template to create a job that requests both GPU memory and computing power:

    cat > /tmp/cuda-sample.yaml <<-EOF
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: cuda-sample
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: cuda-sample
        spec:
          containers:
          - name: cuda-sample
            image:  registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:cuda-sample-11.0.3
            command:
            - bandwidthTest
            resources:
              limits:
                2 # Apply for 2 GiB of GPU memory. 
                aliyun.com/gpu-mem: 2
                # Apply for 30% of the computing power of the GPU. 
                aliyun.com/gpu-core.percentage: 30
            workingDir: /root
          restartPolicy: Never
    EOF
  2. Run the following command to submit the cuda-sample job:

    kubectl apply -f /tmp/cuda-sample.yaml
    Note

    The image used by the job is large in size and therefore the image pulling process may be time-consuming.

  3. Run the following command to query the cuda-sample job:

    kubectl get po -l app=cuda-sample

    Expected output:

    NAME                READY   STATUS    RESTARTS   AGE
    cuda-sample-m****   1/1     Running   0          15s

    In the output, Running is displayed in the STATUS column, which indicates that the job is deployed.

  4. Run the following command to query the amount of GPU memory and computing power used by the pod that is provisioned for the job:

    kubectl exec -ti cuda-sample-m**** -- nvidia-smi

    Expected output:

    Thu Dec 16 02:53:22 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  On   | 00000000:00:08.0 Off |                    0 |
    | N/A   33C    P0    56W / 300W |    337MiB /  2154MiB |     30%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    +-----------------------------------------------------------------------------+

    The output indicates the following information:

    • GPU memory: Before you enable computing power allocation, the pod can use 100% of the memory provided by the GPU. In this example, the total amount of memory provided by the GPU is 15 GiB. You can run the nvidia-smi command on the node to query the total amount of memory provided by the GPU. After you enable computing power allocation, the amount of memory that the pod uses is 337 MiB and the total amount of memory that the pod can use is 2,154 MiB (about 2 GiB). This indicates that memory isolation is enabled.

    • Computing power: Before you enable computing power allocation, the pod can use 100% of the computing power of the GPU. You can set the requested amount to 100 to verify that the pod can use 100% of the computing power. After you enable computing power allocation, the pod uses 30% of the computing power of the GPU. This indicates that computing power isolation is enabled.

    Note

    For example, you created n jobs. Each job requests 30% of the computing power and the value of n is no greater than 3. The jobs are scheduled to one GPU. If you log on to the pods of the jobs and run the nvidia-smi command, the output shows that the pods use n × 30% of the computing power. The output of the nvidia-smi command shows only the computing power utilization per GPU. The command does not show the computing power utilization per job.

  5. Run the following command to print the log of the pod:

    kubectl logs cuda-sample-m**** -f

    Expected output:

    [CUDA Bandwidth Test] - Starting...
    Running on...
    
     Device 0: Tesla V100-SXM2-16GB
     Quick Mode
    
    time: 2021-12-16/02:50:59,count: 0,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:01,count: 1,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:02,count: 2,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:03,count: 3,memSize: 32000000,succeed to copy data from host to gpu

    The output shows that the pod log is generated at a lower rate after you enable computing power allocation. This is because each pod can use only about 30% of the computing power of the GPU.

  6. Optional: Run the following command to delete the cuda-sample job:

    After you verify that computing power allocation works as expected, you can delete the job.

    kubectl delete job cuda-sample