All Products
Search
Document Center

Container Service for Kubernetes:cGPU overview

Last Updated:Oct 10, 2023

This topic introduces the cGPU solution provided by Alibaba Cloud, describes the benefits of cGPU Professional Edition, and compares the features and use scenarios of cGPU Basic Edition and cGPU Professional Edition. This helps you better understand and use cGPU.

Background information

cGPU allows you to run multiple containers on one GPU. After Container Service for Kubernetes (ACK) makes cGPU open source, you can use cGPU in Kubernetes clusters that run on Alibaba Cloud, AWS, or Google Compute Engine (GCE) or run cGPU in self-managed Kubernetes clusters. cGPU reduces the expenses on GPUs. However, when you run multiple containers on one GPU, the stability of the containers cannot be guaranteed.

To ensure container stability, you must isolate the GPU resources that are allocated to each container. When you run multiple containers on one GPU, GPU resources are allocated to each container as requested. However, if one container occupies excessive GPU resources, the performance of other containers may be affected. To solve this issue, many solutions are provided in the computing industry. For example, NVIDIA vGPU, Multi-Process Service (MPS), and vCUDA enable fine-grained sharing of GPUs.

ACK provides the cGPU solution to meet the preceding requirements. cGPU enables a GPU to be shared by multiple tasks. cGPU also allows you to isolate the GPU memory that is allocated to each application and partition the computing capacity of the GPU.

Features and benefits

The cGPU solution uses the server kernel driver that is developed by Alibaba Cloud to provide more efficient use of the underlying drivers of NVIDIA GPUs. cGPU provides the following features:

  • High compatibility: cGPU is compatible with standard open source solutions, such as Kubernetes and NVIDIA Docker.

  • Ease of use: cGPU provides excellent user experience. To replace a Compute Unified Device Architecture (CUDA) library of an AI application, you do not need to recompile the application or create a new container image.

  • Stability: cGPU provides stable underlying operations on NVIDIA GPUs. API operations on CUDA libraries and some private API operations on CUDA Deep Neural Network (cuDNN) are difficult to call.

  • Resource isolation: cGPU ensures that the allocated GPU memory and computing power do not affect each other.

cGPU provides a cost-effective, reliable, and user-friendly solution that allows you to enable GPU scheduling and memory isolation.

Benefit

Description

Supports GPU sharing, scheduling, and memory isolation.

  • Supports GPU sharing, scheduling, and memory isolation on a one-pod-one-GPU basis. This is commonly used in model inference scenarios.

  • Supports GPU sharing, scheduling, and memory isolation on a one-pod-multi-GPU basis. This is commonly used to build the code to train distributed models.

Supports flexible GPU sharing and memory isolation policies.

  • Supports GPU allocation by using the binpack and spread algorithms.

    • Binpack: The system preferentially shares one GPU with multiple pods. This applies to scenarios where high GPU utilization is required.

    • Spread: The system attempts to allocate one GPU to each pod. This applies to scenarios where the high availability of GPUs is required. The system attempts to avoid allocating the same GPU to different replicated pods of an application.

  • Supports GPU sharing without memory isolation. This applies to deep learning scenarios where applications are configured with user-defined isolation systems at the application layer.

  • Supports GPU sharing on multiple GPUs and memory isolation.

Supports comprehensive monitoring of GPU resources.

Supports monitoring of both exclusive GPUs and shared GPUs.

Billing

cGPU is a charged service. You need to activate the cloud-native AI suite before you can use cGPU. For more information about the billing details, see Billing of the cloud-native AI suite.

Usage notes

cGPU supports only ACK Pro clusters. For more information about how to install and use cGPU, see the following topics:

You can also use the following advanced features provided by cGPU:

Terms

Share mode and exclusive mode

The share mode allows multiple pods to share one GPU, as shown in the following figure. TU2.png

The exclusive mode allows a pod to occupy one or more GPUs exclusively, as shown in the following figure. TU1.png

GPU memory isolation

cGPU can only ensure that multiple pods run on one GPU but cannot prevent resource contention among the pods when GPU memory isolation is disabled. The following section shows an example.

Pod 1 requests 5 GiB of GPU memory and Pod 2 requests 10 GiB of GPU memory. When GPU memory isolation is disabled, Pod 1 can use up to 10 GiB of GPU memory, including the 5 GiB of GPU memory requested by Pod 2. Consequently, Pod 2 fails to launch due to insufficient GPU memory. After GPU memory isolation is enabled, when Pod 1 attempts to use GPU memory greater than the requested value, the GPU memory isolation module forces Pod 1 to fail. TU3.png

Single GPU sharing and multiple GPU sharing

  • Single GPU sharing: A pod can request GPU resources that are allocated by only one GPU.

  • Multiple GPU sharing: A pod can request GPU resources that are evenly allocated by multiple GPUs.

图片1.png