After you enable Managed Service for Prometheus for a Kubernetes cluster, you can use the predefined dashboards to monitor the performance metrics of GPU-accelerated elastic container instances in the Kubernetes cluster. This topic describes how to use Managed Service for Prometheus to monitor a GPU-accelerated elastic container instance.
Prerequisites
A Container Service for Kubernetes (ACK) cluster is created and Managed Service for Prometheus is enabled for the cluster. For more information, see Enable Managed Service for Prometheus for an ACK Serverless cluster.
Procedure
Log on to the ACK console.
Create a GPU-accelerated elastic container instance.
In the following sample YAML file, a Deployment is created.
apiVersion: apps/v1 kind: Deployment metadata: name: gpu-monitor spec: replicas: 1 selector: matchLabels: app: test template: metadata: labels: app: test alibabacloud.com/eci: "true" annotations: k8s.aliyun.com/eci-use-specs : "ecs.gn6i-c4g1.xlarge" # Specify a GPU-accelerated instance type. spec: containers: - name: bert-container image: registry.cn-beijing.aliyuncs.com/eci_open/nginx:1.14.2 ports: - containerPort: 80 resources: limits: nvidia.com/gpu: 1 # Specify the number of GPUs allocated to a container.
View GPU metrics.
On the Overview tab of the Cluster Information page, click Prometheus Monitoring in the upper-right corner.
Click the GPU Monitoring tab to view monitoring data.
After Managed Service for Prometheus is enabled for the ACK serverless cluster, you can monitor GPU-accelerated elastic container instances in the cluster without the need to deploy additional plug-ins. By default, Managed Service for Prometheus provides ready-to-use monitoring dashboards.
For more information, see Panels and Introduction to metrics.