After a serverless Kubernetes (ASK) cluster is connected to Application Real-Time Monitoring Service (ARMS) Prometheus Monitoring, you can use the dashboards predefined in ARMS to monitor performance metrics of the GPU-accelerated elastic container instances in the cluster. This topic describes how to use ARMS Prometheus Monitoring to monitor a GPU-accelerated elastic container instance.
An ASK cluster is created and connected to ARMS Prometheus Monitoring. For more information, see Connect ASK clusters to ARMS Prometheus Monitoring.
Log on to the Container Service console.
Create a GPU-accelerated elastic container instance.
apiVersion: v1 kind: Pod metadata: name: cg-gpu-0 annotations: # Specify a GPU-accelerated instance type. k8s.aliyun.com/eci-use-specs : "ecs.gn6i-c4g1.xlarge" spec: containers: - image: nginx name: cg resources: limits: cpu: 500m # Specify the number of GPUs allocated to a container. nvidia.com/gpu: '1' command: ["bash","-c","sleep 100000"] dnsPolicy: ClusterFirst restartPolicy: Always
View GPU metrics.
Find the cluster to which the created GPU-accelerated elastic container instance belongs and click the cluster name.
On the Cluster Information page, click Prometheus Monitoring in the upper-right corner.
On the GPU APP or GPU Node tab, view monitoring data.
After an ASK cluster is connected to ARMS Prometheus Monitoring, you can monitor the GPU-accelerated elastic container instances in the cluster without the need to install additional plug-ins. By default, ARMS Prometheus Monitoring provides ready-to-use predefined monitoring dashboards.
In the GPU APP dashboard, you can view monitoring data about GPUs on a single pod, as shown in the following figure.
In the GPU Node dashboard, you can view monitoring data about all GPUs on the node, as shown in the following figure.