Container Service for Kubernetes can use the managed Prometheus plugin to monitor the graphics processing unit (GPU) resources. The cGPU solution isolates the GPU resources allocated to containers that share one GPU without modifying the existing GPU program. This topic describes how to monitor the GPU memory usage by using the managed Prometheus plugin and how to isolate GPU resources by using cGPU.

Prerequisites

  • A standard dedicated GPU cluster is created, and the Kubernetes version is 1.16 or later.
  • Application Real-Time Monitoring Service (ARMS) is activated. For more information, see Activate ARMS.
  • Your Alibaba Cloud account is used to log on to the RAM console and is authorized to use ARMS Prometheus.
  • The GPU hardware is Telsa P4, Telsa P100, Telsa T4, or Telsa V100 (16 GB).

Background information

The rise of artificial intelligence (AI) is fueled by high GPU hash rates, large amounts of data, and optimized algorithms. NVIDIA GPUs provide widely used heterogeneous computing techniques that lay a foundation for high-performance deep learning. GPUs are expensive. If each application uses one GPU, computing resources may be wasted. GPU sharing helps you improve the resource usage. You must also consider how to allow the highest query rate at the minimum cost and how to fulfill the application service level agreement (SLA).

Use the managed Prometheus plugin to monitor a GPU that is used by one container

  1. Log on to the ARMS console.
  2. In the left-side navigation pane, click Prometheus Monitoring.
  3. On the Prometheus Monitoring page, select the region where the target cluster is deployed, and click Install in the Actions column of the cluster.
  4. In the Confirm dialog box, click OK.
    It takes about two minutes to install the Prometheus plugin. After the Prometheus plugin is installed, it is displayed in the Installed plugins column.
  5. You can deploy the following example application in the command-line interface (CLI). For more information, see Manage applications by using commands.
    apiVersion: apps/v1beta1
    kind: StatefulSet
    
    metadata:
      name: app-3g-v1
      labels:
        app: app-3g-v1
    spec:
      replicas: 1
      serviceName: "app-3g-v1"
      podManagementPolicy: "Parallel"
      selector: # define how the deployment finds the pods it manages
        matchLabels:
          app: app-3g-v1
      updateStrategy:
        type: RollingUpdate
      template: # define the pods specifications
        metadata:
          labels:
            app: app-3g-v1
        spec:
          containers:
          - name: app-3g-v1
            image: registry.cn-shanghai.aliyuncs.com/tensorflow-samples/cuda-malloc:3G
            resources:
              limits:
                nvidia.com/gpu: 1
    After the application is deployed, run the following command to check the application status. According to the command output, the application name is app-3g-v1-0.
    kubectl get po
    NAME          READY   STATUS    RESTARTS   AGE
    app-3g-v1-0   1/1     Running   1          2m56s
  6. Click GPU APP in the Actions column of the target cluster.
    The following figure shows that the application uses only 20% of the GPU memory, which indicates that 80% of the total memory is wasted. The total memory of the GPU is about 16 GB. However, the used memory remains at about 3.4 GB. Therefore, a lot of GPU resources are wasted if you allocate one GPU for each application. To improve the GPU usage, you can use cGPU to deploy multiple applications on one GPU.GPU memory usage

Use cGPU to enable GPU sharing among multiple containers

  1. Label the nodes on which GPUs are installed.
    1. Log on to the ACK console.
    2. In the left-side navigation pane, choose Cluster > Nodes.
    3. On the Nodes page, select the target cluster and click Manage Labels in the upper-right corner.
    4. On the Manage Labels page, select target worker nodes and click Add Label.
    5. In the Add dialog box, specify the label name and value. Set the label name to cgpu and the value to true. Then, click OK.添加标签
  2. Install cGPU.
    1. Log on to the ACK console.
    2. In the left-side navigation pane, choose Marketplace > App Catalog. Then, click ack-cgpu on the right.
    3. In the Deploy section on the right, select the target cluster and namespace, and click Create.
    4. Run the following command to check the GPU resource.
      # kubectl inspect cgpu
      NAME                       IPADDRESS      GPU0(Allocated/Total)  GPU Memory(GiB)
      cn-hangzhou.192.168.2.167  192.168.2.167  0/15                   0/15
      ----------------------------------------------------------------------
      Allocated/Total GPU Memory In Cluster:
      0/15 (0%)
  3. Deploy workloads for the shared GPU.
    1. Modify the YAML file that was used to deploy the cGPU application.
      • Modify the replicas parameter from 1 to 2. Before the modification, only one pod can be deployed on the GPU. Now, you can deploy two pods on the GPU.
      • Change the resource type from nvidia.com/gpu to aliyun.com/gpu-mem. The unit of the GPU resource is changed to GiB.
      apiVersion: apps/v1beta1
      kind: StatefulSet
      
      metadata:
        name: app-3g-v1
        labels:
          app: app-3g-v1
      spec:
        replicas: 2
        serviceName: "app-3g-v1"
        podManagementPolicy: "Parallel"
        selector: # define how the deployment finds the pods it manages
          matchLabels:
            app: app-3g-v1
        template: # define the pods specifications
          metadata:
            labels:
              app: app-3g-v1
          spec:
            containers:
            - name: app-3g-v1
              image: registry.cn-shanghai.aliyuncs.com/tensorflow-samples/cuda-malloc:3G
              resources:
                limits:
                  aliyun.com/gpu-mem: 4
    2. Create new workloads for the shared GPU based on memory allocation.
      According to the command output, the two pods are running on the same GPU.
      #  kubectl inspect cgpu -d
      
      NAME:       cn-hangzhou.192.168.2.167
      IPADDRESS:  192.168.2.167
      
      NAME         NAMESPACE  GPU0(Allocated)
      app-3g-v1-0  default    4
      app-3g-v1-1  default    4
      Allocated :  8 (53%)
      Total :      15
      --------------------------------------------------------
      
      Allocated/Total GPU Memory In Cluster:  8/15 (53%)
    3. Run the following commands to log on to the two containers.
      According to the command outputs, the maximum GPU memory that can be used by each container is 4,301 MiB.
      kubectl exec -it app-3g-v1-0 nvidia-smi
      Mon Apr 13 01:33:10 2020
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
      |-------------------------------+----------------------+----------------------+
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |===============================+======================+======================|
      |   0  Tesla V100-SXM2...  On   | 00000000:00:07.0 Off |                    0 |
      | N/A   37C    P0    57W / 300W |   3193MiB /  4301MiB |      0%      Default |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | Processes:                                                       GPU Memory |
      |  GPU       PID   Type   Process name                             Usage      |
      |=============================================================================|
      +-----------------------------------------------------------------------------+
      
      kubectl exec -it app-3g-v1-1 nvidia-smi
      Mon Apr 13 01:36:07 2020
      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
      |-------------------------------+----------------------+----------------------+
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |===============================+======================+======================|
      |   0  Tesla V100-SXM2...  On   | 00000000:00:07.0 Off |                    0 |
      | N/A   38C    P0    57W / 300W |   3193MiB /  4301MiB |      0%      Default |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | Processes:                                                       GPU Memory |
      |  GPU       PID   Type   Process name                             Usage      |
      |=============================================================================|
      +-----------------------------------------------------------------------------+
    4. Log on to the GPU node to check the resource usage.
      According to the command output, the total used GPU memory is 6,396 MiB, which equals the sum of the memory used by the two containers. This indicates that the cGPU solution has isolated GPU memory by container. If you log on to a container to apply for more GPU resources, an error message that indicates a memory allocation error appears.
      kubectl exec -it app-3g-v1-1 bash
      root@app-3g-v1-1:/# cuda_malloc -size=1024
      cgpu_cuda_malloc starting...
      Detected 1 CUDA Capable device(s)
      
      Device 0: "Tesla V100-SXM2-16GB"
        CUDA Driver Version / Runtime Version          10.1 / 10.1
        Total amount of global memory:                 4301 MBytes (4509925376 bytes)
      Try to malloc 1024 MBytes memory on GPU 0
      CUDA error at cgpu_cuda_malloc.cu:119 code=2(cudaErrorMemoryAllocation) "cudaMalloc( (void**)&dev_c, malloc_size)"
In the ARMS console, you can monitor the GPU resources that are used by nodes and applications.
  • GPU APP: You can view the memory usage and memory usage percentage of each application.GPU App
  • GPU Node: You can view the memory usage of a GPU.GPU node

Use the managed Prometheus plugin to monitor a shared GPU

When the memory used by an application exceeds the assign memory, cGPU ensures that the memory usage of other applications is not affected.

  1. Deploy a new application on the shared GPU.
    The application applies for 4 GiB GPU memory. However, its actual memory usage is 6 GiB.
    apiVersion: apps/v1beta1
    kind: StatefulSet
    
    metadata:
      name: app-6g-v1
      labels:
        app: app-6g-v1
    spec:
      replicas: 1
      serviceName: "app-6g-v1"
      podManagementPolicy: "Parallel"
      selector: # define how the deployment finds the pods it manages
        matchLabels:
          app: app-6g-v1
      template: # define the pods specifications
        metadata:
          labels:
            app: app-6g-v1
        spec:
          containers:
          - name: app-6g-v1
            image: registry.cn-shanghai.aliyuncs.com/tensorflow-samples/cuda-malloc:6G
            resources:
              limits:
                aliyun.com/gpu-mem: 4
  2. Run the following command to query the pod status.
    The pod on which the new application is running is in the CrashLoopBackOff state. The other two pods are running properly.
    # kubectl get pod
    NAME          READY   STATUS             RESTARTS   AGE
    app-3g-v1-0   1/1     Running            0          7h35m
    app-3g-v1-1   1/1     Running            0          7h35m
    app-6g-v1-0   0/1     CrashLoopBackOff   5          3m15s
  3. Run the following command to check error details in the container logs.
    According to the command output, the error is caused by cudaErrorMemoryAllocation.
    kubectl logs app-6g-v1-0
    CUDA error at cgpu_cuda_malloc.cu:119 code=2(cudaErrorMemoryAllocation) "cudaMalloc( (void**)&dev_c, malloc_size)"
  4. Use the GPU APP component of the managed Prometheus plugin to view the container status.
    The following figure shows that the deployment of the new application does not affect the running of the existing containers.GPU resource isolation by application