cGPU can be used to isolate GPU resources so that multiple containers can share a single physical GPU. This topic describes how to install and use cGPU on GPU-accelerated instances.
Prerequisites
- Submit a ticket to obtain the download link to the cGPU installation package.
- Make sure that the instance meets the following requirements:
- The instance type is gn6i, gn6v, gn6e, gn5i, gn5, ebmgn6i, or ebmgn6e.
- The operating system is CentOS 7.6, CentOS 7.7, Ubuntu 16.04, Ubuntu 18.04, or Alibaba Cloud Linux.
- NVIDIA driver 418.87.01 or later is installed.
- Docker 19.03.5 or later is installed.
Background information
cGPU uses the server kernel driver developed by Alibaba Cloud to provide containers with virtual GPU devices. This can isolate the memory and computing power of the GPUs without performance compromises and ensure the full use of GPU hardware resources for training and inference. You can use commands to configure the virtual GPU devices in containers.
- cGPU is compatible with standard open source solutions such as Kubernetes and NVIDIA Docker.
- You do not need to re-compile AI applications or replace the Compute Unified Device Architecture (CUDA) library. Reconfiguration is not required after the CUDA and NVIDIA CUDA Deep Neural Network (cuDNN) libraries are upgraded.
- cGPU isolates the memory and computing power of GPUs at the same time.
Install cGPU
Run cGPU
Environment variable | Value type | Description | Example value |
---|---|---|---|
CGPU_DISABLE | Boolean | Specifies whether to enable cGPU. Valid values:
|
None |
ALIYUN_COM_GPU_MEM_DEV | Integer | Specifies the total memory of each GPU on the GPU-accelerated instance, which depends
on the instance type. Unit: GiB.
Note The value of this variable must be an integer.
|
A GPU-accelerated instance of the ecs.gn6i-c4g1.xlarge instance type is equipped with one NVIDIA® Tesla® T4 graphics card. Run the nvidia-smi command on the instance to check the total memory. In this example, the check result is 15,109 MiB, and the value of this variable is 15 GiB. |
ALIYUN_COM_GPU_MEM_CONTAINER | Integer | Specifies the size of the memory to be used by the container. This variable is used together with ALIYUN_COM_GPU_MEM_DEV. If this variable is not specified or is set to 0, the default NVIDIA container service is used instead of cGPU. | On a GPU with a total memory of 15 GiB, set ALIYUN_COM_GPU_MEM_DEV to 15 and ALIYUN_COM_GPU_MEM_CONTAINER to 1. 1 GiB of memory is allocated to the container.
|
ALIYUN_COM_GPU_VISIBLE_DEVICES | Integer or uuid | Specifies the GPU to be used by the container. | On a GPU-accelerated instance with four GPUs, run the nvidia-smi -L command to query the device numbers and UUIDs of the GPUs. A response similar to
the following one is returned:
Set the following environment variables:
|
ALIYUN_COM_GPU_SCHD_WEIGHT | Integer | Specifies the weight of the GPU computing power for the container. Valid values: 1 to min(max_inst, 16). | None |
The following example shows how cGPU is used to enable two containers to share one GPU on a GPU-accelerated instance of the ecs.gn6i-c4g1.xlarge instance type.
View procfs nodes
At runtime, cGPU generates and manages multiple procfs nodes in the /proc/cgpu_km directory. You can view and configure cGPU information based on the procfs nodes. The following section describes how each procfs node is used.
Command | Effect |
---|---|
echo 2 > /proc/cgpu_km/0/policy | Changes the scheduling policy to weight-based preemptive scheduling. |
cat /proc/cgpu_km/0/free_weight | Returns the available weight on the GPU. If free_weight is set to 0, the weight of the computing power to be used by the new container is
0. The container cannot obtain any GPU computing power or be used to run applications
that require GPU computing power.
|
cat /proc/cgpu_km/0/$dockerid/weight | Returns the weight allocated to a specific container. |
echo 4 > /proc/cgpu_km/0/$dockerid/weight | Modifies the weight allocated to a specific container. |
Upgrade cGPU
Uninstall cGPU
Examples of computing power scheduling on a GPU
- Fair-share scheduling
Time slices are allocated when the containers are created. cGPU starts scheduling from Slice 1. The scheduling task is submitted to the physical GPU and executed in the container within the time slice (X ms). Then, cGPU moves to the next time slice. Each container obtains the same amount of computing power, which is
1/max_inst
. The following figure shows an example. - Preemptive scheduling
Time slices are allocated when the containers are created. cGPU starts scheduling from Slice 1. However, if a container is not used or does not have GPU-enabled processes, cGPU skips scheduling and moves to the next time slice.
Example:- A single container Docker 1 is created and allocated with Slice 1. Two TensorFlow processes run in Docker 1. In this case, Docker 1 can obtain the computing power of the entire physical GPU.
- Another container Docker 2 is created and allocated with Slice 2. If no GPU-enabled processes exist in Docker 2, cGPU skips scheduling for Docker 2.
- When a process in Docker 2 has GPU enabled, both Slice 1 and Slice 2 are scheduled.
Each of Docker 1 and Docker 2 obtains up to a half of the computing power of the physical
GPU. The following figure shows an example.
- Weight-based preemptive scheduling
If ALIYUN_COM_GPU_SCHD_WEIGHT is set to a value greater than 1 during container creation, weight-based preemptive scheduling is used. cGPU divides the computing power of the physical GPU into max_inst portions based on the number of containers (max_inst). If the ALIYUN_COM_GPU_SCHD_WEIGHT value is greater than 1, cGPU combines multiple time slices into one bigger time slice and allocates the time slice to the containers.
Example:- Docker 1: ALIYUN_COM_GPU_SCHD_WEIGHT=m
- Docker 2: ALIYUN_COM_GPU_SCHD_WEIGHT=n
Scheduling results:- If only Docker 1 is running, Docker 1 obtains the computing power of the entire physical GPU.
- If both Docker 1 and Docker 2 are running, they obtain the computing power based on
a theoretical ratio of m:n. Docker 2 consumes n time slices even if it does not have
a GPU-enabled process. This is different from the case in preemptive scheduling.
Note The running performance of the containers differs when m:n is set to 2:1 and 8:4. The number of time slices within one second when m:n is set to 2:1 is four times that when m:n is set to 8:4.
Weight-based preemptive scheduling limits the theoretical maximum amount of GPU computing power that a container can obtain. However, for graphics cards such as NVIDIA® V100 that have strong computing power, a computing task can be complete within a single time slice if the memory is not fully used. In this case, if m:n is set to 8:4, the GPU computing power becomes idle during the remaining time slices and the limit becomes invalid. We recommend that you set appropriate weights based on the GPU computing power. Examples:- If you use NVIDIA® V100 graphics cards, set m:n to 2:1 to reduce the weight and prevent the GPU computing power from becoming idle.
- If you use NVIDIA® Telsa® T4 graphics cards, set m:n to 8:4 to increase the weight and ensure that sufficient GPU computing power is allocated.