Container Service for Kubernetes (ACK) managed Pro clusters support GPU sharing. You can use GPU sharing to share GPU resources and isolate memory. This topic describes how to configure multi-GPU sharing policies.
Prerequisites
Introduction to multi-GPU sharing
Multi-GPU sharing requires shared compute resources with isolated GPU memory.
During the model development phase, applications may require multiple GPUs without occupying their full resource. Allocating all GPUs to the development environment may lead to resource wastage. To avoid this issue, use multi-GPU sharing.
Multi-GPU sharing allows an application to request N GiB of GPU memory distributed across M GPUs, where each GPU allocates N/M GiB. The value of N/M must be an integer, and all M GPUs must reside on the same Kubernetes node. For example, when requesting 8 GiB of memory and specifying 2 GPUs, each GPU will allocate 4 GiB.
Single GPU sharing: A pod can request GPU resources that are allocated by only one GPU.
Multiple GPU sharing: A pod can request GPU resources that are evenly allocated by multiple GPUs.
Configure a multiple GPU sharing policy
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Jobs page, click Create from YAML. Copy the following content to the Template section and click Create:
YAML template description:
The YAML template defines a TensorFlow MNIST job. The job requests 8 GiB of memory allocated by 2 GPUs. Each GPU allocates 4 GiB of memory.
Add the
aliyun.com/gpu-count: 2
pod label to request two GPUs.Add the
aliyun.com/gpu-mem: 8
resource limit to request 8 GiB of memory.
Verify the multiple GPU sharing policy
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
Click Terminal in the Actions column of the pod that you created, such as tensorflow-mnist-multigpu-***, to log on to the pod and run the following command:
nvidia-smi
Expected output:
Wed Jun 14 03:24:14 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:00:09.0 Off | 0 | | N/A 38C P0 61W / 300W | 569MiB / 4309MiB | 2% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... On | 00000000:00:0A.0 Off | 0 | | N/A 36C P0 61W / 300W | 381MiB / 4309MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
The output indicates that the pod can use only two GPUs. Each GPU can provide 4,309 MiB of memory, which is requested by the pod. The actual memory size of each GPU is 16,160 MiB.
Click Logs in the Actions column of the pod to view the logs of the pod. The following information is displayed:
totalMemory: 4.21GiB freeMemory: 3.91GiB totalMemory: 4.21GiB freeMemory: 3.91GiB
The device information indicates that each GPU allocates 4 GiB of memory. The actual memory size of each GPU is 16,160 MiB. This means that memory isolation is implemented.