how to use GPU sharing to share multiple GPUs - Container Service for Kubernetes

ACK Pro clusters support GPU sharing. You can use GPU sharing to share GPU resources and isolate GPU memory. This topic describes how to configure a multiple GPU sharing policy.

Prerequisites

Introduction to multiple GPU sharing

Important

You can use multiple GPU sharing only in scenarios where computing power is shared by containers and GPU memory is isolated. You cannot use multiple GPU sharing if computing power is not shared.

Developers may need to use more than one GPU when developing models but the development platform cannot use all GPU resources. If all GPUs are allocated to the same development platform, resource waste may occur. To avoid this problem, you can use multiple GPU sharing.

Multiple GPU sharing works in the following way: an application requests N GiB of GPU memory in total and requires M GPUs to allocate the requested amount of memory. The memory that is allocated by each GPU is N/M. The value of N/M must be an integer and the used GPUs must be installed on the same node. For example, an application requests 8 GiB of memory and requires 2 GPUs to allocate the requested memory. In this case, a node needs to allocate 2 GPUs to the application and each GPU needs to allocate 4 GiB of memory. Difference between single GPU sharing and multiple GPU sharing:

Single GPU sharing: A pod can request GPU resources that are allocated by only one GPU.
Multiple GPU sharing: A pod can request GPU resources that are evenly allocated by multiple GPUs.

Configure a multiple GPU sharing policy

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage and choose Workloads > Jobs in the left-side navigation pane.

Click Create from YAML in the upper-right part of the page. Copy the following content to the Template section and click Create:

Click to view YAML content

apiVersion: batch/v1
kind: Job
metadata:
  name: tensorflow-mnist-multigpu
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: tensorflow-mnist-multigpu
        # Add a pod label and a resource limit to request 8 GiB of memory allocated by 2 GPUs. Each GPU allocates 4 GiB of memory. 
        aliyun.com/gpu-count: "2"
    spec:
      containers:
      - name: tensorflow-mnist-multigpu
        image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=100000
        - --data_dir=tensorflow-sample-code/data
        resources:
          limits:
            aliyun.com/gpu-mem: 8 # Request 8 GiB of memory. 
        workingDir: /root
      restartPolicy: Never

YAML template description:

The YAML template defines a TensorFlow MNIST job. The job requests 8 GiB of memory allocated by 2 GPUs. Each GPU allocates 4 GiB of memory.
Add the aliyun.com/gpu-count=2 pod label to request two GPUs.
Add the aliyun.com/gpu-mem: 8 resource limit to request 8 GiB of memory.

Verify the multiple GPU sharing policy

On the Clusters page, click the name of the cluster that you want to manage and choose Workloads > Pods in the left-side navigation pane.

Click Terminal in the Actions column of the pod that you created, such as tensorflow-mnist-multigpu-***, to log on to the pod and run the following command:

nvidia-smi

Expected output:

Wed Jun 14 03:24:14 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
| N/A   38C    P0    61W / 300W |    569MiB /  4309MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:00:0A.0 Off |                    0 |
| N/A   36C    P0    61W / 300W |    381MiB /  4309MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The output indicates that the pod can use only two GPUs. Each GPU can provide 4,309 MiB of memory, which is requested by the pod. The actual memory size of each GPU is 16,160 MiB.

Click Logs in the Actions column of the pod to view the logs of the pod. The following information is displayed:
```
totalMemory: 4.21GiB freeMemory: 3.91GiB
totalMemory: 4.21GiB freeMemory: 3.91GiB
```
The device information indicates that each GPU allocates 4 GiB of memory. The actual memory size of each GPU is 16,160 MiB. This means that memory isolation is implemented.