This topic introduces the cGPU solution provided by Alibaba Cloud, describes the benefits of cGPU Professional Edition, and compares the features and use scenarios of cGPU Basic Edition and cGPU Professional Edition. This helps you better understand and use cGPU.

Background information

Container Service for Kubernetes (ACK) provides the open source cGPU solution that allows you to share one GPU among multiple containers in a Kubernetes cluster. You can enable cGPU for container clusters that are deployed in Alibaba Cloud, Amazon Web Services (AWS), Google Compute Engine (GCE), or data centers. cGPU reduces the expenses on GPUs. However, when you run multiple containers on one GPU, the stability of the containers cannot be ensured.

To ensure container stability, you must isolate the GPU resources that are allocated to each container. When you run multiple containers on one GPU, GPU resources are allocated to each container as requested. However, if one container occupies excessive GPU resources, the performance of other containers may be affected. To solve this issue, many solutions are provided in the computing industry. For example, NVIDIA vGPU, Multi-Process Service (MPS), and vCUDA enable fine-grained sharing of GPUs.

ACK provides the cGPU solution to meet the preceding requirements. cGPU enables a GPU to be shared by multiple tasks. cGPU also allows you to isolate the GPU memory that is allocated to each application and partition the computing capacity of the GPU.

Features

The cGPU solution uses the server kernel driver that is developed by Alibaba Cloud to provide more efficient use of the underlying drivers of NVIDIA GPUs. cGPU provides the following features:
  • High compatibility: cGPU is compatible with standard open source solutions, such as Kubernetes and NVIDIA Docker.
  • Ease of use: cGPU provides excellent user experience. To replace a Compute Unified Device Architecture (CUDA) library of an AI application, you do not need to recompile the application or create a new container image.
  • Stability: cGPU provides stable underlying operations on NVIDIA GPUs. API operations on CUDA libraries and some private API operations on CUDA Deep Neural Network (cuDNN) are difficult to call.
  • Resource isolation: cGPU isolates the allocated GPU memory and computing capacity.

cGPU provides a cost-effective, reliable, and user-friendly solution that allows you to enable GPU scheduling and memory isolation.

Benefits of cGPU Professional Edition

BenefitDescription
Supports GPU sharing, scheduling, and memory isolation.
  • Supports GPU sharing, scheduling, and memory isolation on a one-pod-one-GPU basis. This is commonly used in model inference scenarios.
  • Supports GPU sharing, scheduling, and memory isolation on a one-pod-multi-GPU basis. This is commonly used to build the code to train distributed models.
Supports flexible GPU sharing and memory isolation policies.
  • Supports GPU allocation by using the binpack and spread algorithms.
    • Binpack: The system preferentially shares one GPU with multiple pods. This applies to scenarios where high GPU utilization is required.
    • Spread: The system attempts to allocate one GPU to each pod. This applies to scenarios where the high availability of GPUs is required. The system attempts to avoid allocating the same GPU to different replicated pods of an application.
  • Supports GPU sharing without memory isolation. This applies to deep learning scenarios where applications are configured with user-defined isolation systems at the application layer.
  • Supports GPU sharing on multiple GPUs and memory isolation.
Supports comprehensive monitoring of GPU resources. Supports monitoring of both exclusive GPUs and shared GPUs.

Comparison between cGPU Basic Edition and cGPU Professional Edition

FeaturecGPU Professional EditioncGPU Basic Edition
GPU sharing and scheduling on one GPU SupportedSupported
GPU sharing and scheduling on multiple GPUs SupportedNot supported
Memory isolation on one GPU SupportedSupported
Memory isolation on multiple GPUs SupportedNot supported
Monitoring and auto scaling of exclusive GPUs and shared GPUs SupportedSupported
Node pools that support flexible policy configurations Supported. Allows you to create different GPU policies for a node pool. You can enable GPU sharing with or without memory isolation for a node pool. Supported. You can configure different GPU policies for a node pool. You can enable GPU sharing with or without memory isolation for a node pool. In addition, you can use the binpack or spread algorithm to allocate GPUs.
Allocate GPU memory to pods by using algorithms Supported. GPUs can be allocated by using the binpack and spread algorithms. You can choose binpack or spread to meet your business requirements. Supported. By default, GPUs are allocated by using the binpack algorithm.

Usage notes

The installation steps and supported editions of cGPU vary based on cluster types.For more information about the differences between dedicated Kubernetes clusters and professional Kubernetes clusters, see Cluster type.