GPU topology-aware scheduling - Container Service for Kubernetes

This topic describes GPU topology and the benefits of topology-aware GPU scheduling.

GPU topology

The following figure shows the hybrid cube mesh topology of eight Tesla V100 GPUs that communicate with each other through NVLinks. Each Tesla V100 GPU is assigned six NVLinks, and the eight Tesla V100 GPUs cannot be fully interconnected. A maximum of two NVLink connections can be established between two Tesla V100 GPUs.

In this example, GPU0 is connected to GPU3 and GPU4 through two NVLinks. GPU0 is connected to GPU1 and GPU2 through one NVLink. In contrast, no NVLink connection exists between GPU0 and GPU6, and their communication relies on the Peripheral Component Interconnect Express (PCIe).

Benefits of topology-aware GPU scheduling

The one-way bandwidth and two-way bandwidth of each NVLink are 25 GB/s and 50 GB/s. The bandwidth of the PCIe link is 16 GB/s. During training, different GPU combinations can lead to differences in training speed. Therefore, You can combine different GPU models to achieve the best training performance.

Kubernetes is unaware of the topology of GPU resources on nodes. Therefore, Kubernetes schedules GPU resources in a random manner. The training speed varies among different combinations. To resolve this issue, Container Service for Kubernetes (ACK) supports topology-aware GPU scheduling based on the scheduling framework. This allows ACK to select the configuration with the best training speed from the GPU combinations on nodes.