This topic describes how to enable graphics processing unit (GPU) sharing among multiple containers in Container Service for Kubernetes.
Container Service for Kubernetes provides the open source GPU sharing and scheduling feature. This feature allows you to run multiple containers in a cluster on one GPU. You can enable this feature for container clusters that are deployed on Alibaba Cloud, Amazon Web Services (AWS), Google Compute Engine (GCE), or in on-premises data centers. This feature reduces the economic costs of GPUs. However, when you run multiple containers on one GPU, a stable runtime is also important.
To ensure the stability of containers, you must isolate the resources assigned to each container. When you run multiple containers on one GPU, GPU resources are assigned to containers as required. However, if a container overuses GPU resources, the performance of other containers may be affected. To solve this problem, many solutions have been developed in the industry. For example, technologies including NVIDIA virtual GPU (vGPU), Multi-Process Service (MPS), and vCUDA contribute to management of fine-grained GPU resources.
- High compatibility: cGPU is compatible with standard open source solutions such as Kubernetes and NVIDIA Docker.
- Ease-of-use: cGPU adopts a user-friendly design. To replace a CUDA library of an artificial intelligence (AI) application, you do not need to re-compile the application or create a new container image.
- Stability: cGPU provides stable underlying operations on NVIDIA GPUs. API operations on CUDA libraries and some private API operations on cuDNN are difficult to call.
- Resource isolation: cGPU ensures that the GPU memory and the hash rate do not affect each other.
The cGPU solution and the GPU sharing and scheduling feature can coordinate to enable a cost-effective, reliable, and user-friendly solution for large-scale GPU scheduling and resource isolation. This solution applies throughout the process from GPU scheduling to container running.