All Products
Search
Document Center

Use ARMS Prometheus Monitoring to monitor a GPU-accelerated elastic container instance

Last Updated: Aug 26, 2021

After a serverless Kubernetes (ASK) cluster is connected to Application Real-Time Monitoring Service (ARMS) Prometheus Monitoring, you can use the dashboards predefined in ARMS to monitor performance metrics of the GPU-accelerated elastic container instances in the cluster. This topic describes how to use ARMS Prometheus Monitoring to monitor a GPU-accelerated elastic container instance.

Prerequisites

An ASK cluster is created and connected to ARMS Prometheus Monitoring. For more information, see Connect ASK clusters to ARMS Prometheus Monitoring.

Procedure

  1. Log on to the Container Service console.

  2. Create a GPU-accelerated elastic container instance.

    YAML example:

    apiVersion: v1
    kind: Pod
    metadata:
      name: cg-gpu-0
      annotations:
        # Specify a GPU-accelerated instance type.
        k8s.aliyun.com/eci-use-specs : "ecs.gn6i-c4g1.xlarge"
    spec:
      containers:
      - image: nginx
        name: cg
        resources: 
          limits:
            cpu: 500m
            # Specify the number of GPUs allocated to a container.
            nvidia.com/gpu: '1'
        command: ["bash","-c","sleep 100000"]
      dnsPolicy: ClusterFirst
      restartPolicy: Always
  3. View GPU metrics.

    1. Find the cluster to which the created GPU-accelerated elastic container instance belongs and click the cluster name.

    2. On the Cluster Information page, click Prometheus Monitoring in the upper-right corner.

    3. On the GPU APP or GPU Node tab, view monitoring data.

      After an ASK cluster is connected to ARMS Prometheus Monitoring, you can monitor the GPU-accelerated elastic container instances in the cluster without the need to install additional plug-ins. By default, ARMS Prometheus Monitoring provides ready-to-use predefined monitoring dashboards.

      • GPU APP

        In the GPU APP dashboard, you can view monitoring data about GPUs on a single pod, as shown in the following figure.

        GPU monitoring 1
      • GPU Node

        In the GPU Node dashboard, you can view monitoring data about all GPUs on the node, as shown in the following figure.GPU monitoring