This topic describes the components and configurations that are required to activate topology-aware GPU scheduling.

Prerequisites

  • A professional Kubernetes cluster is created. When you create the cluster, set Instance Type to Heterogeneous Computing. For more information, see Create a professional managed Kubernetes cluster.
    Notice Only professional Kubernetes clusters are supported. If you want to activate topology-aware GPU scheduling for dedicated Kubernetes clusters, submit a ticket to be added to the whitelist.
  • You are connected to the cluster by using kubectl. For more information, see Use kubectl to connect to an ACK cluster.
  • The following table lists the required components and versions.
    Component Version
    Kubernetes V1.18.8 and later
    Helm V3.0 and later
    Nvidia V418.87.01 and later
    NVIDIA Collective Communications Library (NCCL) 2.7+
    Docker 19.03.5
    Operating system CentOS 7.6, CentOS 7.7, Ubuntu 16.04 and 18.04, and Alibaba Cloud Linux 2.
    GPU V100

Procedure

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, choose Marketplace > App Catalog.
  3. On the App Catalog page, enter ack-ai-installer into the Name search box. Click the component after it appears.
  4. In the Deploy section of the App Catalog - ack-ai-installer page, select the cluster where you want to deploy the component from the Cluster drop-down list and click Create.image