This topic describes the components and configurations that are required to enable topology-aware GPU scheduling.

Prerequisites

  • A Container Service for Kubernetes (ACK) Pro cluster is created. When you create the cluster, set Instance Type to Heterogeneous Computing. For more information, see Create a professional managed Kubernetes cluster.
    Notice Only ACK Pro clusters support topology-aware GPU scheduling. If you want to enable topology-aware GPU scheduling for ACK dedicated clusters, submit a ticket to apply to be added to a whitelist.
  • A kubectl client is connected to the cluster. For more information, see Connect to ACK clusters by using kubectl.
  • The following table describes the versions that are required for the system components.
    Component Required version
    Kubernetes V1.18.8 and later
    Helm 3.0 and later
    Nvidia 418.87.01 and later
    NVIDIA Collective Communications Library (NCCL) 2.7+
    Docker 19.03.5
    OS CentOS 7.6, CentOS 7.7, Ubuntu 16.04 and 18.04, and Alibaba Cloud Linux 2.
    GPU V100

Procedure

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, choose Marketplace > App Catalog.
  3. On the Marketplace page, click the App Catalog tab. Find and click ack-ai-installer.
  4. On the ack-ai-installer page, click Deploy.
  5. In the Deploy wizard, select a cluster and namespace, and then click Next.
  6. On the Parameters wizard page, set the parameters and click OK.