You can use cGPU to isolate GPU resources. This allows multiple containers to share a physical GPU. This topic describes how to install and use cGPU on GPU-accelerated instances.

Prerequisites

Before you install cGPU, make sure you complete the following preparations:
  • Submit a ticket to obtain the download link to the cGPU installation package.
  • Make sure that GPU-accelerated instances meet the following requirements:
    • The instance type is gn6i, gn6v, gn6e, gn5i, gn5, ebmgn6i, or ebmgn6e.
    • The operating system is CentOS 7.6, CentOS 7.7, Ubuntu 16.04, Ubuntu 18.04, or Alibaba Cloud Linux.
    • NVIDIA driver 418.87.01 or later is installed.
    • Docker 19.03.5 or later is installed.
    Note One drawback of using cGPU to isolate GPU resources is that cGPU does not support Unified Virtual Memory (UVM)-related operations. Therefore, you cannot request the memory of the GPU by calling the CUDA API operation cudaMallocManaged(). For more information, see NVIDIA official documentation. Use other methods to request the memory of the GPU, such as calling cudaMalloc().

Install cGPU

  1. Download and decompress the cGPU installation package.
  2. Run the following command to view the files in the installation package:
    Note The files in the installation package may vary based on the version of the installation package.
    cd cgpu
    ls
    The following output is returned:
    cgpu-container-wrapper  cgpu-km.c  cgpu.o  cgpu-procfs.c  install.sh  Makefile  os-interface.c  README  uninstall.sh  upgrade.sh  version.h
  3. Run the following command to install cGPU:
    sh install.sh
  4. Run the following command to verify whether cGPU is installed:
    lsmod | grep cgpu
    If the output shows the following status information, cGPU is installed:
    cgpu_km                71355  0

Run cGPU

The following table describes the environment variables of cGPU. When you create a container, you can set values for the environment variables to specify the computing power that the container can obtain by using cGPU.
Environment variable Value type Description Example
CGPU_DISABLE Boolean Specifies whether to enable cGPU. Valid values:
  • false: enables cGPU.
  • true: disables cGPU and uses the default NVIDIA container service.
None
ALIYUN_COM_GPU_MEM_DEV Integer Specifies the total memory of each GPU on a GPU-accelerated instance based on the instance type. Unit: GiB.
Note The value of this variable must be an integer.
A GPU-accelerated instance of the ecs.gn6i-c4g1.xlarge instance type is equipped with one NVIDIA® Tesla® T4 graphics card. Run the nvidia-smi command on the instance to view the total memory. In this example, 15,109 MiB is returned, and the value of this variable is 15 GiB.
ALIYUN_COM_GPU_MEM_CONTAINER Integer Specifies the size of the memory to be used by the container. This variable is used together with ALIYUN_COM_GPU_MEM_DEV. If this variable is not specified or is set to 0, the default NVIDIA container service is used instead of cGPU. On a GPU whose total memory is 15 GiB, set ALIYUN_COM_GPU_MEM_DEV to 15 and ALIYUN_COM_GPU_MEM_CONTAINER to 1. 1 GiB of memory is allocated to the container.
ALIYUN_COM_GPU_VISIBLE_DEVICES Integer or uuid Specifies the GPU to be used by the container. On a GPU-accelerated instance configured with four GPUs, run the nvidia-smi -L command to view the device numbers and UUIDs of the GPUs. The following sample responses are returned:
GPU 0: Tesla T4 (UUID: GPU-b084ae33-e244-0959-cd97-83****)
GPU 1: Tesla T4 (UUID: GPU-3eb465ad-407c-4a23-0c5f-bb****)
GPU 2: Tesla T4 (UUID: GPU-2fce61ea-2424-27ec-a2f1-8b****)
GPU 3: Tesla T4 (UUID: GPU-22401369-db12-c6ce-fc48-d7****)
Set the following environment variables:
  • Set ALIYUN_COM_GPU_VISIBLE_DEVICES to 0,1. The first and second GPUs are allocated to the container.
  • Set ALIYUN_COM_GPU_VISIBLE_DEVICES to GPU-b084ae33-e244-0959-cd97-83****,GPU-3eb465ad-407c-4a23-0c5f-bb****,GPU-2fce61ea-2424-27ec-a2f1-8b****. Three GPUs with specified UUIDs are allocated to the container.
ALIYUN_COM_GPU_SCHD_WEIGHT Integer Specifies the weight in which GPU computing power is allocated for the container. Valid values: 1 to min(max_inst, 16). None

The following example shows how to use cGPU to allow two containers to share the GPU of a GPU-accelerated instance of the ecs.gn6i-c4g1.xlarge instance type.

  1. Run the following commands to create containers and specify the memory to be used by the containers:
    docker run -d -t --gpus all --privileged --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --name gpu_test1 -v /mnt:/mnt -e ALIYUN_COM_GPU_MEM_CONTAINER=6 -e ALIYUN_COM_GPU_MEM_DEV=15 nvcr.io/nvidia/tensorflow:19.10-py3
    docker run -d -t --gpus all --privileged --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --name gpu_test2 -v /mnt:/mnt -e ALIYUN_COM_GPU_MEM_CONTAINER=8 -e ALIYUN_COM_GPU_MEM_DEV=15 nvcr.io/nvidia/tensorflow:19.10-py3
    Note In the preceding commands, the TensorFlow image nvcr.io/nvidia/tensorflow:19.10-py3 is used. Replace it with your own container image as needed. For more information about how to use the TensorFlow image to build a TensorFlow deep learning framework, see Deploy an NGC environment on instances with GPU capabilities.
    In this example, ALIYUN_COM_GPU_MEM_CONTAINER is set to specify the total memory of the GPU, and ALIYUN_COM_GPU_MEM_DEV is set to specify the memory to be used by the containers. Two containers are created:
    • gpu_test1: 6 GiB of the GPU memory is allocated.
    • gpu_test2: 8 GiB of the GPU memory is allocated.
  2. Access a container.
    In this example, gpu_test1 is used.
    docker exec -it gpu_test1 bash
  3. Run the following command to view GPU information such as memory usage.
    nvidia-smi
    The following figure shows that the memory used by Container gpu_test1 is 6,043 MiB.gpu_test1

View procfs nodes

At runtime, cGPU generates and manages multiple procfs nodes in the /proc/cgpu_km directory. You can view cGPU information and configure cGPU settings in the procfs nodes. The following section describes how each procfs node is used.

  1. Run the following command to view the information of nodes:
    ls /proc/cgpu_km/
    The following output is returned:
    0  default_memsize  inst_ctl  major  version
    The following table describes the information of the nodes.
    Node Read/write type Description
    0 Read and write cGPU generates a directory for each GPU on the GPU-accelerated instance, and uses numbers such as 0, 1, and 2 as the directory names. In this example, a single GPU is available and the corresponding directory ID is 0.
    default_memsize Read and write If the ALIYUN_COM_GPU_MEM_CONTAINER parameter is not specified, the memory size allocated to the newly created container is used.
    inst_ctl Read and write The control node.
    major Read only The primary device number of the cGPU kernel driver.
    version Read only The cGPU version.
  2. Run the following command to view the directory of the GPU:
    In this example, GPU 0 is used.
    ls /proc/cgpu_km/0
    The following output is returned:
    012b2edccd7a   0852a381c0cf   free_weight    max_inst    policy
    The following table describes the contents of the directory.
    Node Read/write type Description
    Directory of the container Read and write cGPU generates a directory for each container that runs on the GPU-accelerated instance, and uses the container IDs as the directory names. You can run the docker ps command to view the created containers.
    free_weight Read only The weight in which GPU computing power is allocated for the container. If free_weight is set to 0, the weight in which GPU computing power is allocated for the new container is 0. The container cannot obtain GPU computing power and cannot be used to run applications that require GPU computing power.
    max_inst Read and write The maximum number of containers. Valid values: 1 to 16.
    policy Read and write cGPU supports the following computing power scheduling policies:
    • 0: fair-share scheduling. Each container occupies a fixed time slice. The percentage of the time slice is 1/max_inst.
    • 1: preemptive scheduling. Each container occupies as many time slices as possible. The percentage of the time slices is 1/Number of containers.
    • 2: weight-based preemptive scheduling. When the ALIYUN_COM_GPU_SCHD_WEIGHT value is greater than 1, weight-based preemptive scheduling is used.

    You can change the scheduling policy by modifying the policy value. For more information, see Examples of the computing power scheduling by using cGPU.

  3. Run the following command to view the directory of a container:
    In this example, the 012b2edccd7a container is used.
    ls /proc/cgpu_km/0/012b2edccd7a
    The following output is returned:
    id  meminfo  memsize  weight
    The following table describes the contents of the directory.
    Node Read/write type Description
    id Read only The ID of the container.
    memsize Read and write The GPU memory size allocated to the container. cGPU sets this value based on the ALIYUN_COM_GPU_MEM_DEV parameter.
    meminfo Read only The information of the memory, including the remaining GPU memory in the container, the ID of the process that is using the GPU, and the GPU memory usage of the process. Example:
    Free: 6730809344
    PID: 19772 Mem: 200278016
    weight Read and write The weight in which maximum GPU computing power is allocated for the container. Default value: 1. The sum of the weights of all running containers must be less than or equal to max_inst.
You can run commands on the GPU-accelerated instance to perform operations such as changing the scheduling policy and modifying the weight. The following table lists some sample commands.
Command Description
echo 2 > /proc/cgpu_km/0/policy Changes the scheduling policy to weight-based preemptive scheduling.
cat /proc/cgpu_km/0/free_weight Returns the available weight on the GPU. If free_weight is set to 0, the weight in which GPU computing power is allocated for the new container is 0. The container cannot obtain the GPU computing power and cannot be used to run applications that require the GPU computing power.
cat /proc/cgpu_km/0/$dockerid/weight Returns the weight allocated to a specific container.
echo 4 > /proc/cgpu_km/0/$dockerid/weight Modifies the weight allocated to a specific container to obtain the GPU computing power.

Upgrade cGPU

  1. Disable all the running containers.
    docker stop (docker ps -a | awk '{ print $1}' | tail -n +2)
  2. Run the upgrade.sh script to upgrade cGPU to the latest version.
    sh upgrade.sh

Uninstall cGPU

  1. Disable all the running containers.
    docker stop (docker ps -a | awk '{ print $1}' | tail -n +2)
  2. Run the sh uninstall.sh script to uninstall cGPU.
    sh uninstall.sh

Examples of the computing power scheduling by using cGPU

When cGPU loads the cgpu_km module, cGPU sets time slices (X ms) for each GPU based on the maximum number of containers (max_inst) to allocate the GPU computing power for the containers. In the examples, time slices such as Slice 1, Slice 2, and Slice N are used. The following examples demonstrate how the GPU computing power is scheduled by using different scheduling policies.
  • Fair-share scheduling
    Time slices are allocated when the containers are created. cGPU starts scheduling from Slice 1. The scheduling task is submitted to the physical GPU and executed in the container within the time slice (X ms). Then, cGPU moves to the next time slice. Each container obtains the same amount of the computing power, which is 1/max_inst. The following figure shows the procedure. Fair-share scheduling
  • Preemptive scheduling

    Time slices are allocated when the containers are created. cGPU starts scheduling from Slice 1. However, if a container is not in use or does not have GPU-enabled processes, cGPU skips scheduling and moves to the next time slice.

    Example:
    1. A container Docker 1 is created and Slice 1 is allocated to this container. Two TensorFlow processes run in Docker 1. In this case, Docker 1 can obtain the computing power of the entire physical GPU.
    2. Another container Docker 2 is created and Slice 2 is allocated to this container. If no GPU-enabled processes exist in Docker 2, cGPU skips scheduling for Docker 2.
    3. When a process in Docker 2 requests GPU resources, both Slice 1 and Slice 2 are scheduled. Docker 1 and Docker 2 obtain a maximum of half of the computing power of the physical GPU respectively. The following figure shows the procedure. Preemptive scheduling
  • Weight-based preemptive scheduling

    If ALIYUN_COM_GPU_SCHD_WEIGHT is set to a value greater than 1 when you create a container, weight-based preemptive scheduling is used. cGPU divides the computing power of the physical GPU into max_inst portions based on the number of containers (max_inst). If the ALIYUN_COM_GPU_SCHD_WEIGHT value is greater than 1, cGPU combines multiple time slices into a larger time slice and allocates the time slice to the containers.

    Examples:
    • Docker 1: ALIYUN_COM_GPU_SCHD_WEIGHT=m
    • Docker 2: ALIYUN_COM_GPU_SCHD_WEIGHT=n
    The following content shows the scheduling results:
    • If only Docker 1 is running, Docker 1 obtains the computing power of the entire physical GPU.
    • If both Docker 1 and Docker 2 are running, they obtain the computing power based on a theoretical ratio of m:n. Docker 2 consumes n time slices even if it does not have a GPU-enabled process. This is different from the case in preemptive scheduling.
      Note The running performance of the containers differs when m:n is set to 2:1 and 8:4. The number of time slices within one second when m:n is set to 2:1 is four times as much as that when m:n is set to 8:4.
    Weight-based preemptive scheduling
    Weight-based preemptive scheduling limits the theoretical maximum amount of GPU computing power that a container can obtain. However, for graphics cards such as NVIDIA® V100 that have strong computing power, a computing task can be completed within a single time slice if the memory is not fully used. In this case, if m:n is set to 8:4, the GPU computing power becomes idle during the remaining time slices and the limit becomes invalid. We recommend that you set appropriate weights based on the GPU computing power. Examples:
    • If you use NVIDIA® V100 graphics cards, set m:n to 2:1 to reduce the weight and prevent the GPU computing power from becoming idle.
    • If you use NVIDIA® Tesla® T4 graphics cards, set m:n to 8:4 to increase the weight and ensure that sufficient GPU computing power is allocated.