This topic describes how to use Application Real-Time Monitoring Service (ARMS) Prometheus to monitor the GPU resources of a Kubernetes cluster.
- Create an ACK managed cluster with GPU-accelerated nodes or Create an ACK dedicated cluster with GPU-accelerated nodes.
- ARMS Prometheus is enabled.
- ARMS Prometheus is installed. For more information, see Enable ARMS Prometheus.
Use ARMS Prometheus to monitor GPU resources
- Log on to the ACK console.
- On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
- In the left-side navigation pane of the details page, choose .
- On the Prometheus Monitoring page, you can click the GPU APP tab to view the GPU APP dashboard and click the GPU Node tab to view the GPU Node dashboard.
- The GPU APP dashboard displays monitoring information about the GPU resources used by each pod.
- The GPU Node dashboard displays monitoring information about the GPU resource usage of each node.
- Use the following YAML template to deploy an application on a GPU-accelerated node
and test the monitoring of GPU resources.
apiVersion: apps/v1 kind: Deployment metadata: name: bert-intent-detection spec: replicas: 1 selector: matchLabels: app: bert-intent-detection template: metadata: labels: app: bert-intent-detection spec: containers: - name: bert-container image: registry.cn-beijing.aliyuncs.com/ai-samples/bert-intent-detection:1.0.1 ports: - containerPort: 80 resources: limits: nvidia.com/gpu: 1 --- apiVersion: v1 kind: Service metadata: labels: run: bert-intent-detection name: bert-intent-detection-svc spec: ports: - port: 8500 targetPort: 80 selector: app: bert-intent-detection type: LoadBalancer
- On the Prometheus Monitoring page, click the GPU APP tab. On the GPU APP tab, you can view various metrics of the GPU resources used by each pod, including the used GPU memory, GPU memory usage, power consumption, and stability. You can also view the applications deployed on each GPU-accelerated node.
- Perform stress tests on the application deployed on the GPU-accelerated node and check
the changes of metrics.
- Run the following command to query the Service of an inference task and the IP address
of the Service:
kubectl get svc bert-intent-detection-svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE bert-intent-detection-svc LoadBalancer 172.23.5.253 123.56.XX.XX 8500:32451/TCP 14m
- Run the following command to perform stress tests:
hey -z 10m -c 100 "http://123.56.XX.XX:8500/predict?query=music"The metrics indicate an obvious increase in the GPU memory usage, as shown in the following figure.
- Run the following command to query the Service of an inference task and the IP address of the Service: