This topic describes how to use Application Real-Time Monitoring Service (ARMS) Prometheus
to monitor the GPU resources of a Kubernetes cluster.
Prerequisites
The following operations are completed:
Use ARMS Prometheus to monitor GPU resources
- Log on to the ACK console.
- On the Clusters page, find the cluster that you want to manage and click the name of the cluster
or click Details in the Actions column. The details page of the cluster appears.
- In the left-side navigation pane of the details page, choose .
- On the Prometheus Monitoring page, you can click the GPU APP tab to view the GPU APP dashboard and click the GPU Node tab to view the GPU Node dashboard.
- The GPU APP dashboard displays monitoring information about the GPU resources used by each pod.
- The GPU Node dashboard displays monitoring information about the GPU resource usage of each node.
- Use the following YAML template to deploy an application on a GPU-accelerated node
and test the monitoring of GPU resources.
apiVersion: apps/v1
kind: Deployment
metadata:
name: bert-intent-detection
spec:
replicas: 1
selector:
matchLabels:
app: bert-intent-detection
template:
metadata:
labels:
app: bert-intent-detection
spec:
containers:
- name: bert-container
image: registry.cn-beijing.aliyuncs.com/ai-samples/bert-intent-detection:1.0.1
ports:
- containerPort: 80
resources:
limits:
nvidia.com/gpu: 1
---
apiVersion: v1
kind: Service
metadata:
labels:
run: bert-intent-detection
name: bert-intent-detection-svc
spec:
ports:
- port: 8500
targetPort: 80
selector:
app: bert-intent-detection
type: LoadBalancer
- On the Prometheus Monitoring page, click the GPU APP tab.
On the
GPU APP tab, you can view various metrics of the GPU resources used by each pod, including
the used GPU memory, GPU memory usage, power consumption, and stability. You can also
view the applications deployed on each GPU-accelerated node.

- Perform stress tests on the application deployed on the GPU-accelerated node and check
the changes of metrics.
- Run the following command to query the Service of an inference task and the IP address
of the Service:
kubectl get svc bert-intent-detection-svc
Expected output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
bert-intent-detection-svc LoadBalancer 172.23.5.253 123.56.XX.XX 8500:32451/TCP 14m
- Run the following command to perform stress tests:
hey -z 10m -c 100 "http://123.56.XX.XX:8500/predict?query=music"
The metrics indicate an obvious increase in the GPU memory usage, as shown in the
following figure.
