After a serverless Kubernetes (ASK) cluster is connected to Application Real-Time Monitoring Service (ARMS) Prometheus Monitoring, you can use the dashboards predefined in ARMS to monitor performance metrics of the GPU-accelerated elastic container instances in the cluster. This topic describes how to use ARMS Prometheus Monitoring to monitor a GPU-accelerated elastic container instance.
Prerequisites
An ASK cluster is created and connected to ARMS Prometheus Monitoring. For more information, see Connect ASK clusters to ARMS Prometheus Monitoring.
Procedure
Log on to the Container Service console.
Create a GPU-accelerated elastic container instance.
YAML example:
apiVersion: v1 kind: Pod metadata: name: cg-gpu-0 annotations: # Specify a GPU-accelerated instance type. k8s.aliyun.com/eci-use-specs : "ecs.gn6i-c4g1.xlarge" spec: containers: - image: nginx name: cg resources: limits: cpu: 500m # Specify the number of GPUs allocated to a container. nvidia.com/gpu: '1' command: ["bash","-c","sleep 100000"] dnsPolicy: ClusterFirst restartPolicy: Always
View GPU metrics.
Find the cluster to which the created GPU-accelerated elastic container instance belongs and click the cluster name.
On the Cluster Information page, click Prometheus Monitoring in the upper-right corner.
On the GPU APP or GPU Node tab, view monitoring data.
After an ASK cluster is connected to ARMS Prometheus Monitoring, you can monitor the GPU-accelerated elastic container instances in the cluster without the need to install additional plug-ins. By default, ARMS Prometheus Monitoring provides ready-to-use predefined monitoring dashboards.
GPU APP
In the GPU APP dashboard, you can view monitoring data about GPUs on a single pod, as shown in the following figure.
GPU Node
In the GPU Node dashboard, you can view monitoring data about all GPUs on the node, as shown in the following figure.