Work with GPU sharing and scheduling by using ack-co-scheduler - Container Service for Kubernetes

GPU sharing is a resource management policy that allows multiple tasks or processes to share one GPU. You can use GPU sharing in a registered cluster to avoid resource waste of traditional GPU scheduling and improve GPU utilization.

Prerequisites

A registered cluster is created and an external cluster is connected to the registered cluster. For more information, see Create a registered cluster.
A kubectl client is connected to the registered cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

The following table lists the versions of the system components that are required.

Component	Version requirement
Kubernetes	1.22 or later
Operating system	CentOS 7.6 (discontinued) CentOS 7.7 (discontinued) Ubuntu 16.04 Ubuntu 18.04 Alibaba Cloud Linux 2 (End of maintenance) Alibaba Cloud Linux 3

Billing

The cloud-native AI suite is activated before you use GPU sharing. For more information about the cloud-native AI suite and how it is billed, see Overview of the cloud-native AI suite and Billing of the cloud-native AI suite.

Limits

Do not set the CpuPolicy parameter to static for nodes that have GPU sharing enabled.
The pods managed by the DaemonSet of the shared GPU do not enjoy the highest priority. Therefore, the resources may be scheduled to pods that have higher priority and the node may evict the pods managed by the DaemonSet. To prevent this issue, you can modify the actual DaemonSet of the shared GPU. For example, you can modify the gpushare-device-plugin-ds DaemonSet used to share GPU memory and specify priorityClassName: system-node-critical to ensure the priority of the pods managed by the DaemonSet.

Step 1: Install components

Install the ack-ai-installer component in the registered cluster. This component implements scheduling capabilities such as GPU sharing (including GPU memory isolation) and topology-aware GPU scheduling.
1. Log on to the ACK console. In the left-side navigation pane, click Clusters.
2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Applications > Helm.
3. On the Helm page, click Create. Search for and install the ack-ai-installer component.
Install the ack-co-scheduler component in the registered cluster. This component allows you to create ResourcePolicy custom resources (CRs) to use the multilevel resource scheduling feature.
1. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Operations > Add-ons.
2. On the Add-ons page, search for the ack-co-scheduler component, and click Install at the bottom right of the card.

Step 2: Install and use the GPU inspection tool

Download kubectl-inspect-cgpu. The executable file must be downloaded to a directory included in the PATH environment variable. This section uses /usr/local/bin/ as an example.
- If you use Linux, run the following command to download kubectl-inspect-cgpu:
```
wget http://aliacs-k8s-cn-beijing.oss-cn-beijing.aliyuncs.com/gpushare/kubectl-inspect-cgpu-linux -O /usr/local/bin/kubectl-inspect-cgpu
```
- If you use macOS, run the following command to download kubectl-inspect-cgpu:
```
wget http://aliacs-k8s-cn-beijing.oss-cn-beijing.aliyuncs.com/gpushare/kubectl-inspect-cgpu-darwin -O /usr/local/bin/kubectl-inspect-cgpu
```
Run the following command to grant the execute permissions to kubectl-inspect-cgpu:
```
chmod +x /usr/local/bin/kubectl-inspect-cgpu
```

Step 3: Create GPU-accelerated nodes

Create an Elastic GPU Service, and install the driver and nvidia-container-runtime. For more information, see Create and manage a node pool.

Note

Skip this step if you have added GPU-accelerated nodes to the node pool and configured the environment when you created the node pool.
For more information about the driver installation script, see Manually update the NVIDIA driver of a node.
Nodes that have GPU sharing enabled must be labeled with ack.node.gpu.schedule=share. You can manually add this label to on-premises nodes. For cloud nodes, you can update the label value to ack.node.gpu.schedule=cgpu using the labeling feature provided by the node pool, enabling GPU memory isolation. For more information, see Labels for enabling GPU scheduling policies.

Step 4: Work with GPU sharing

Run the following command to query the GPU usage of the cluster:

kubectl inspect cgpu

Expected output:

NAME                           IPADDRESS       GPU0(Allocated/Total)  GPU Memory(GiB)
cn-zhangjiakou.192.168.66.139  192.168.66.139  0/15                   0/15
---------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
0/15 (0%)

Create a file named GPUtest.yaml and copy the following content to the file.

apiVersion: batch/v1
kind: Job
metadata:
  name: gpu-share-sample
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: gpu-share-sample
    spec:
      schedulerName: ack-co-scheduler
      containers:
      - name: gpu-share-sample
        image: registry.cn-hangzhou.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=100000
        - --data_dir=tensorflow-sample-code/data
        resources:
          limits:
            # The unit is GiB, and this pod requests a total of 3 GiB of video memory.
            aliyun.com/gpu-mem: 3 # Set the GPU video memory size.
        workingDir: /root
      restartPolicy: Never

Run the following command to deploy a sample application that has GPU sharing enabled and requests 3 GiB of GPU memory for the application:
```
kubectl apply -f GPUtest.yaml
```

Run the following command to query the memory usage of the GPU:

kubectl inspect cgpu

Expected output:

NAME                           IPADDRESS       GPU0(Allocated/Total)  GPU Memory(GiB)
cn-zhangjiakou.192.168.66.139  192.168.66.139  3/15                   3/15
---------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
3/15 (20%)

The output shows that the total GPU memory of the cn-zhangjiakou.192.168.66.139 node is 15 GiB and 3 GiB of GPU memory is allocated.

References

For more information about GPU sharing, see GPU sharing.