This topic describes the GPU topology and benefits of topology-aware GPU scheduling.
The following figure shows the topology of eight Tesla V100 GPUs that communicate with each other through NVLinks. Each Tesla V100 GPU is assigned six NVLinks. However, NVLinks cannot be established between every two Tesla V100 GPUs. At most two NVLinks can be established between two Tesla V100 GPUs. Two NVLinks are established between GPU0 and GPU3 and between GPU0 and GPU4, respectively. A NVLink is established between GPU0 and GPU1. GPU0 and GPU6 communicate through a PCIe link because no NVLink is established.
Benefits of topology-aware GPU scheduling
The one way bandwidth and two-way bandwidth of each NVLink are 25 GB/s and 50 GB/s. The bandwidth of the PCIe link is 16 GB/s. You can combine different GPU models to achieve the optimal GPU acceleration for training jobs.
Kubernetes is unaware of the topology of GPU resources on nodes. Therefore, Kubernetes schedules GPU resources in a random manner. As a result, GPU acceleration for training jobs considerably varies based on the scheduling results of GPU resources. Container Service for Kubernetes (ACK) supports topology-aware GPU scheduling based on the scheduling framework. This feature selects a combination of GPUs from GPU-accelerated nodes to achieve the optimal GPU acceleration for training jobs.