All Products
Search
Document Center

CloudMonitor:GPU monitoring

Last Updated:Mar 28, 2024

After you install the CloudMonitor on a GPU-accelerated compute optimized Elastic Compute Service (ECS) instance, CloudMonitor collects GPU metrics. You can also create an alert rule for the metrics. If the value of a metric meets the specified alert condition, an alert is triggered and CloudMonitor sends an alert notification. This helps you monitor the metric status in real time.

Prerequisites

GPU metrics

You can view GPU metrics based on GPUs, instances, and application groups. The following table lists the GPU metrics.

Metric

Unit

MetricName

Dimensions

(Agent)gpu_decoder_utilization

%

gpu_decoder_utilization

userId, instanceId, and gpuId

(Agent)gpu_encoder_utilization

%

gpu_encoder_utilization

userId, instanceId, and gpuId

(Agent)gpu_gpu_temperature

°C

gpu_gpu_temperature

userId, instanceId, and gpuId

(Agent)gpu_gpu_usedutilization

%

gpu_gpu_usedutilization

userId, instanceId, and gpuId

(Agent)gpu_memory_freespace

Byte

gpu_memory_freespace

userId, instanceId, and gpuId

(Agent)gpu_memory_freeutilization

%

gpu_memory_freeutilization

userId, instanceId, and gpuId

(Agent)gpu_memory_userdspace

Byte

gpu_memory_usedspace

userId, instanceId, and gpuId

(Agent)gpu_memory_usedutilization

%

gpu_memory_usedutilization

userId, instanceId, and gpuId

(Agent)gpu_power_readings_power_draw

W

gpu_power_readings_power_draw

userId, instanceId, and gpuId

View GPU metric data in the CloudMonitor console

  1. Log on to the CloudMonitor console.

  2. In the left-side navigation pane, click Cloud Service Monitoring > Host Monitoring.

  3. On the Host Monitoring page, click the host name or click Monitoring Charts in the Actions column of the host.

  4. Click the GPU Monitoring tab.

    On the GPUMonitor tab, view the monitoring charts for GPU metrics.

    You can view the GPU metrics of the host. You can also configure alert rules for specific GPU metrics and view alerts. For more information, see Step 2: Create an alert rule for the host and Step 3: View host alerts.

References