This topic describes the components and configurations that are required to activate GPU topology-aware scheduling.

Prerequisites

  • Create a professional managed Kubernetes cluster. Set Instance Type of the cluster to Heterogeneous Computing. For more information, see Create a professional managed Kubernetes cluster.
    Notice Only professional managed Kubernetes cluster are supported. If you want to activate GPU topology-aware scheduling for dedicated Kubernetes clusters, submit a ticket to add your account to the whitelist.
  • Use kubectl to connect to the Container Service for Kubernetes (ACK) cluster. For more information, see Use kubectl to connect to an ACK cluster.
  • The following table lists the required components and versions.
    Component Required version
    Kubernetes V1.18.8 and later
    Helm V3.0 and later
    Nvidia V418.87.01 and later
    NVIDIA Collective Communications Library (NCCL) 2.7+
    Docker 19.03.5
    Operating system CentOS 7.6, CentOS 7.7, Ubuntu 16.04, Ubuntu 18.04, and Alibaba Cloud Linux 2
    Graphics card V100

Procedure

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, choose Marketplace > App Catalog.
  3. On the App Catalog page, select Name from the drop-down list in the upper-right corner of the page, enter ack-ai-installer in the search box, and then click the search icon.
  4. In the Deploy pane of the ack-ai-installer page, select the cluster that you want to manage from the Cluster drop-down list and click Create.image