Alibaba Cloud Container GPU (cGPU) segments the memory and computing power of a GPU and manages them transparently as multiple isolated containers. You can enable cGPU for container clusters that are deployed on Alibaba Cloud, Amazon Web Services (AWS), Google Compute Engine (GCE), or data centers, and easily customize the GPU sharing and scheduling policies in Alibaba Cloud Container Service for Kubernetes (ACK) based on the requirements of AI inference or training tasks to optimize resource utilization and save costs.
Efficient GPU Memory Isolation
Maximize GPU utilization and prevent memory conflicts by segmenting the memory and computing resources of a GPU and managing them as multiple containers based on the virtual GPU isolation technology of Alibaba Cloud's host kernel
Flexible GPU Resource Sharing and Scheduling
Allocate the memory and computing resources of GPUs to AI jobs with scheduling strategies for different business scenarios to maximize GPU utilization while maintaining high performance for containers
Customizable GPU Auto-Scaling
Increase GPU resource availability and guarantee AI task stability with automatic GPU scaling by setting key GPU metrics such as utilization ratio and memory usage to trigger the scale-ups and scale-downs of GPU instances
Real-Time Visualized GPU Monitoring
Maintain GPU instance health and balance source distribution based on real-time visualized information of key GPU metrics such as memory allocation and usage and GPU temperature from dashboards of clusters and nodes in Prometheus
How It Works
For multiple tasks such as AI inference workloads to run on one GPU, the GPU is always occupied by only one task with full memory usage yet most of the compute resource is idle, causing low GPU utilization.
Alibaba Cloud cGPU enables multiple GPU containers to run on a single GPU with applications isolated in respective containers. It is driven by a host kernel that provides virtual GPUs for containers to isolate memory and computing power. The GPU resources assigned to each container are isolated to prevent operation conflicts and security risks. cGPU is compatible with open source Kubernetes and NVIDIA Docker, and you can deploy it on different types of Alibaba Cloud GPU-accelerated instances, or on Alibaba Cloud Container Service for Kubernetes (ACK) to support GPU sharing.
Elastic GPU Service
Powerful parallel computing capabilities based on GPU technologyLearn More
ECS 7th Gen
Fully equipped with TPM chips and increasing instance computing power by up to 40%Learn More
ECS Bare Metal Instance
Featuring both the elasticity of a virtual server and the high-performance and comprehensive features of a physical serverLearn More
Current GPU sharing solutions on the market either rely on API modification, which requires application recompilation, or cannot isolate allocated GPU resources, triggering GPU errors and application crashes. The default scheduling feature of Kubernetes cannot assign nodes to run AI jobs.
ACK enables multiple containers to share the resources of one GPU. You can monitor GPU usage in the Application Real-Time Monitoring Service (ARMS) console and configure GPU memory for each pod based on the requirement of the container workloads, to maximize resource utilization and save costs. For AI inference and training tasks in different business scenarios, you can configure AI job scheduling policies accordingly to improve task efficiency and GPU utilization. For example, you can choose the binpack scheduling policy to allocate jobs to one node until it has insufficient resources to avoid data transfer across servers and prevent resource fragmentation. Or choose the gang scheduling policy to allocate resources only when all the subtasks of a job can obtain their required resources to avoid task failure and resource waste due to resource deadlock.
Alibaba Cloud Container Service for Kubernetes
Ensuring high efficiency for enterprises by running containerized applications on the cloudLearn More
Application Real-Time Monitoring Service
Building business monitoring capabilities with real-time responses based on comprehensive monitoring capabilitiesLearn More
Security and Compliance
SOC2 Type II Report
AI Acceleration Whitepaper
This whitepaper is a complete guide on the mechanism of the AI infrastructure layer and how it supports AI acceleration
cGPU Technology Improves GPU Usage, Boosts AI Efficiency, and Reduces Costs
Alibaba Cloud launched the cGPU container sharing technology, allowing users to employ containers to schedule underlying GPU resources in a fine-grained manner.
GPU Management and Device Plugin Implementation with Kubernetes
This article introduces the common GPU plugins for Kubernetes and how to use GPUs in Docker and Kubernetes.