Container Service for Kubernetes (ACK) provides GPU sharing based on cGPU. You can use cGPU to share one GPU in model prediction scenarios. In addition, the NVIDIA kernel driver ensures GPU memory isolation among containers. This topic describes how to install the resource isolation module and an inspection tool in a dedicated Kubernetes cluster that contains GPU-accelerated nodes. This enables GPU sharing and memory isolation.

Scenarios

Prerequisites

Usage notes

Item Supported version
Kubernetes 1.12.6 and later. Only dedicated Kubernetes clusters are supported.
Helm 3.0 and later
NVIDIA driver 418.87.01 and later
Docker 19.03.5
Operating system CentOS 7.x, Alibaba Cloud Linux 2.x, Ubuntu 16.04, and Ubuntu 18.04
GPU Tesla P4, Tesla P100, Tesla T4, and Tesla V100

Step 1: Add labels to GPU-accelerated nodes

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, click Clusters.
  3. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
  4. In the left-side navigation pane of the details page, choose Nodes > Nodes.
  5. On the Nodes page, click Manage Labels and Taints in the upper-right corner.
  6. On the Labels tab of the Manage Labels and Taints page, select the nodes that you want to manage and click Add Label.
  7. In the Add dialog box, set Name and Value.
    Notice
    • To enable cGPU, you must set Name to cgpu and Value to true.
    • If you delete the cgpu label, cGPU is not disabled. To disable cGPU, set Name to cgpu and Value to false.
  8. Click OK.

Step 2: Install ack-cgpu on the labeled nodes

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, choose Marketplace > App Catalog.
  3. On the App Catalog page, search for ack-cgpu and click ack-cgpu after it appears.
  4. On the App Catalog - ack-cgpu page, select the cluster that you want to manage in the Deploy section and click Create.
    You do not need to set Namespace or Release Name. The default values are used. cgpu
    You can run the helm get manifest cgpu -n kube-system | kubectl get -f - command to check whether ack-cgpu is installed. If the following output is returned, it indicates that ack-cgpu is installed.
    helm get manifest cgpu -n kube-system | kubectl get -f -
    NAME                                    SECRETS   AGE
    serviceaccount/gpushare-device-plugin   1         39s
    serviceaccount/gpushare-schd-extender   1         39s
    
    NAME                                                           AGE
    clusterrole.rbac.authorization.k8s.io/gpushare-device-plugin   39s
    clusterrole.rbac.authorization.k8s.io/gpushare-schd-extender   39s
    
    NAME                                                                  AGE
    clusterrolebinding.rbac.authorization.k8s.io/gpushare-device-plugin   39s
    clusterrolebinding.rbac.authorization.k8s.io/gpushare-schd-extender   39s
    
    NAME                             TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)           AGE
    service/gpushare-schd-extender   NodePort   10.6.13.125   <none>        12345:32766/TCP   39s
    
    NAME                                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR    AGE
    daemonset.apps/cgpu-installer              4         4         4       4            4           cgpu=true        39s
    daemonset.apps/device-plugin-evict-ds      4         4         4       4            4           cgpu=true        39s
    daemonset.apps/device-plugin-recover-ds    0         0         0       0            0           cgpu=false   39s
    daemonset.apps/gpushare-device-plugin-ds   4         4         4       4            4           cgpu=true        39s
    
    NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/gpushare-schd-extender   1/1     1            1           38s
    
    NAME                           COMPLETIONS   DURATION   AGE
    job.batch/gpushare-installer   3/1 of 3      3s         38s