All Products
Search
Document Center

Container Service for Kubernetes:cGPU overview

Last Updated:Jan 27, 2025

This topic provides an overview of the cGPU solution offered by Alibaba Cloud, highlights the advantages of the cGPU Professional Edition, and contrasts the features and usage scenarios of both the cGPU Basic and Professional Editions, aiding in your understanding and application of cGPU.

Background

Following the release of the open-source GPU sharing scheduler, Alibaba Cloud's Container Service for Kubernetes (ACK) has enabled the use of the GPU sharing scheduling framework on container clusters within both Alibaba Cloud and private data centers. This allows multiple containers to share a single GPU device, significantly reducing costs. However, to ensure stable operation of containers on the GPU, it is crucial to manage resource usage and prevent interference from overuse. The industry has investigated various solutions, such as NVIDIA vGPU, MPS, and vCUDA, for more granular GPU management.

Addressing these requirements, the Alibaba Cloud Container Service team has developed a cGPU solution that permits a single GPU to support multiple tasks while providing memory isolation and segmented GPU computing power for each application on the GPU.

Features and benefits

The cGPU solution from Alibaba Cloud leverages a proprietary host kernel driver to optimize the use of NVIDIA GPUs' underlying nv driver. cGPU offers the following features:

  • High Compatibility: cGPU is compatible with standard open-source solutions such as Kubernetes and NVIDIA Docker.

  • Ease of Use: cGPU enhances the user experience. You can replace a Compute Unified Device Architecture (CUDA) library in an AI application without recompiling the application or creating a new container image.

  • Stability: cGPU ensures stable operations on NVIDIA GPUs, even when calling CUDA libraries' API operations and certain private APIs of CUDA Deep Neural Network (cuDNN).

  • Resource Isolation: cGPU guarantees that allocated GPU memory and computing power are isolated from one another.

cGPU delivers a cost-effective, reliable, and user-friendly solution for GPU scheduling and memory isolation.

Benefits

Description

Enables GPU sharing, scheduling, and memory isolation.

  • Facilitates GPU sharing, scheduling, and memory isolation on a one-pod-one-GPU basis, typically used in model inference scenarios.

  • Allows GPU sharing, scheduling, and memory isolation on a one-pod-multi-GPU basis, ideal for compiling code to train distributed models.

Supports flexible GPU sharing and memory isolation policies.

  • Offers GPU allocation using binpack and spread algorithms.

    • Binpack: Prioritizes sharing one GPU with multiple pods, suitable for scenarios requiring high GPU utilization.

    • Spread: Allocates one GPU per pod to enhance GPU availability, avoiding allocation of the same GPU to different replicated pods of an application.

  • Permits GPU sharing without memory isolation, applicable to deep learning scenarios with user-defined isolation at the application layer.

  • Supports GPU sharing across multiple GPUs with memory isolation.

Provides comprehensive monitoring of GPU resources.

Enables monitoring of both exclusive and shared GPU resources.

Free of Charge

To utilize cGPU, you must first activate the cloud-native AI suite. Starting from June 6, 2024, at 00:00:00, the cloud-native AI suite will be available for free.

Instructions

cGPU scheduling is currently supported only in ACK Pro Edition clusters. For guidance on installing and using cGPU, refer to the following topics:

Additionally, you can explore advanced features provided by cGPU:

Related concepts

cGPU scheduling vs exclusive GPU scheduling

cGPU scheduling allows multiple pods to share a single GPU card, as illustrated below: TU2.png

Exclusive GPU scheduling assigns one or more GPU cards to a single pod exclusively, as depicted below: TU1.png

Memory isolation

Without GPU isolation modules, cGPU scheduling can only ensure that multiple pods run on a single GPU card but cannot prevent pods from affecting each other. The example below demonstrates memory usage.

Imagine Pod 1 requests 5 GiB of GPU memory and Pod 2 requests 10 GiB. Without isolation modules, Pod 1 could use up to 10 GiB, causing Pod 2 to malfunction, effectively allowing Pod 1 to use 5 GiB of memory unlawfully. With the isolation module, any attempt by Pod 1 to exceed its requested GPU memory would result in failure and termination.

TU3.png

GPU scheduling policies: binpack and spread

If a node with the GPU sharing feature enabled has multiple GPUs, you can choose one of the following GPU selection policies:

  • Binpack: By default, the binpack policy is used. The scheduler allocates all resources of a GPU to pods before you switch to another GPU. This helps prevent GPU fragments.

  • Spread: The scheduler attempts to spread pods to different GPUs on the node in case business interruptions occur when a GPU is faulty.

In this example, a node has two GPUs. Each GPU provides 15 GiB of memory. Pod1 requests 2 GiB of memory and Pod2 requests 3 GiB of memory.

image

Single GPU sharing vs multiple GPU sharing

  • Single GPU Sharing: A pod can request resources from only one GPU.

  • Multiple GPU Sharing: A pod can request resources from multiple GPUs, distributed evenly.

图片1.png