Overview of topology-aware GPU scheduling - Container Service for Kubernetes

This topic describes the GPU topology and benefits of topology-aware GPU scheduling.

GPU topology

The following figure shows the topology of eight Tesla V100 GPUs that communicate with each other through NVLinks. Each Tesla V100 GPU is assigned six NVLinks. However, NVLinks cannot be established between every two Tesla V100 GPUs. At most two NVLinks can be established between two Tesla V100 GPUs. In this example, two NVLinks are established between GPU 0 and GPU 3. Two NVLinks are established between GPU 0 and GPU 4. One NVLink is established between GPU 0 and GPU 1. GPU 0 and GPU 6 communicate with each other through Peripheral Component Interconnect Express (PCIe), instead of NVLinks.

Benefits of topology-aware GPU scheduling

The one way bandwidth and two-way bandwidth of each NVLink are 25 GB/s and 50 GB/s. The bandwidth of the PCIe link is 16 GB/s. You can combine different GPU models to achieve the optimal GPU acceleration for training jobs.

Kubernetes is unaware of the topology of GPU resources on nodes. Therefore, Kubernetes schedules GPU resources in a random manner. As a result, GPU acceleration for training jobs considerably varies based on the scheduling results of GPU resources. Container Service for Kubernetes (ACK) supports topology-aware GPU scheduling based on the scheduling framework. This feature selects a combination of GPUs from GPU-accelerated nodes to achieve the optimal GPU acceleration for training jobs.