High GPU Utilization with cGPU

Maximize GPU utilization efficiency and save costs by managing and scheduling GPU memory and computing power as containers


Alibaba Cloud Container GPU (cGPU) segments the memory and computing power of a GPU and manages them transparently as multiple isolated containers. You can enable cGPU for container clusters that are deployed on Alibaba Cloud, Amazon Web Services (AWS), Google Compute Engine (GCE), or data centers, and easily customize the GPU sharing and scheduling policies in Alibaba Cloud Container Service for Kubernetes (ACK) based on the requirements of AI inference or training tasks to optimize resource utilization and save costs.

Solution Highlights

  • Efficient GPU Memory Isolation

    Maximize GPU utilization and prevent memory conflicts by segmenting the memory and computing resources of a GPU and managing them as multiple containers based on the virtual GPU isolation technology of Alibaba Cloud's host kernel

  • Flexible GPU Resource Sharing and Scheduling

    Allocate the memory and computing resources of GPUs to AI jobs with scheduling strategies for different business scenarios to maximize GPU utilization while maintaining high performance for containers

  • Customizable GPU Auto-Scaling

    Increase GPU resource availability and guarantee AI task stability with automatic GPU scaling by setting key GPU metrics such as utilization ratio and memory usage to trigger the scale-ups and scale-downs of GPU instances

  • Real-Time Visualized GPU Monitoring

    Maintain GPU instance health and balance source distribution based on real-time visualized information of key GPU metrics such as memory allocation and usage and GPU temperature from dashboards of clusters and nodes in Prometheus

Improve GPU Utilization with Alibaba Cloud cGPU

Start 1-On-1 Consultation

How It Works

Your Challenge

For multiple tasks such as AI inference workloads to run on one GPU, the GPU is always occupied by only one task with full memory usage yet most of the compute resource is idle, causing low GPU utilization.

Our Solution

  • Alibaba Cloud cGPU enables multiple GPU containers to run on a single GPU with applications isolated in respective containers. It is driven by a host kernel that provides virtual GPUs for containers to isolate memory and computing power. The GPU resources assigned to each container are isolated to prevent operation conflicts and security risks. cGPU is compatible with open source Kubernetes and NVIDIA Docker, and you can deploy it on different types of Alibaba Cloud GPU-accelerated instances, or on Alibaba Cloud Container Service for Kubernetes (ACK) to support GPU sharing.

Elastic GPU Service

Powerful parallel computing capabilities based on GPU technology

Learn More

ECS 7th Gen

Fully equipped with TPM chips and increasing instance computing power by up to 40%

Learn More

ECS Bare Metal Instance

Featuring both the elasticity of a virtual server and the high-performance and comprehensive features of a physical server

Learn More

Your Challenge

Current GPU sharing solutions on the market either rely on API modification, which requires application recompilation, or cannot isolate allocated GPU resources, triggering GPU errors and application crashes. The default scheduling feature of Kubernetes cannot assign nodes to run AI jobs.

Our Solution

  • ACK enables multiple containers to share the resources of one GPU. You can monitor GPU usage in the Application Real-Time Monitoring Service (ARMS) console and configure GPU memory for each pod based on the requirement of the container workloads, to maximize resource utilization and save costs. For AI inference and training tasks in different business scenarios, you can configure AI job scheduling policies accordingly to improve task efficiency and GPU utilization. For example, you can choose the binpack scheduling policy to allocate jobs to one node until it has insufficient resources to avoid data transfer across servers and prevent resource fragmentation. Or choose the gang scheduling policy to allocate resources only when all the subtasks of a job can obtain their required resources to avoid task failure and resource waste due to resource deadlock.

Alibaba Cloud Container Service for Kubernetes

Ensuring high efficiency for enterprises by running containerized applications on the cloud

Learn More

Application Real-Time Monitoring Service

Building business monitoring capabilities with real-time responses based on comprehensive monitoring capabilities

Learn More

Security and Compliance

We are committed to providing stable, reliable, secure, and compliant cloud computing infrastructure services across major jurisdictions around the world.
Learn More
  • ISO 27001
  • SOC2 Type II Report
  • C5
  • MLPS 2.0
  • MTCS

Improve GPU Utilization with Alibaba Cloud cGPU

Start 1-On-1 Consultation

Related Resources


AI Acceleration Whitepaper

This whitepaper is a complete guide on the mechanism of the AI infrastructure layer and how it supports AI acceleration


cGPU Technology Improves GPU Usage, Boosts AI Efficiency, and Reduces Costs

Alibaba Cloud launched the cGPU container sharing technology, allowing users to employ containers to schedule underlying GPU resources in a fine-grained manner.


GPU Management and Device Plugin Implementation with Kubernetes

This article introduces the common GPU plugins for Kubernetes and how to use GPUs in Docker and Kubernetes.

Start with Alibaba Cloud Solutions

Learn and experience the power of Alibaba Cloud.

Contact Sales