You can query GPU monitoring data either by using the CloudMonitor console or by calling APIs.

Metrics

The metrics for GPU monitoring are based on three dimensions: GPU, instance, and application group.

  • GPU-dimension metrics

    GPU-dimension metrics measure monitoring data on a per GPU basis. The following table lists GPU-dimension metrics.

    Metric Unit Description Dimensions
    gpu_memory_freespace Byte The free memory of a GPU instanceId, gpuId
    gpu_memory_totalspace Byte The total memory of a GPU instanceId, gpuId
    gpu_memory_usedspace Byte The memory in use of a CPU instanceId, gpuId
    gpu_gpu_usedutilization % The usage of a GPU instanceId, gpuId
    gpu_encoder_utilization % The usage of an encoder with GPU support instanceId, gpuId
    gpu_decoder_utilization % The usage of an decoder with GPU support instanceId, gpuId
    gpu_gpu_temperature The temperature of a GPU instanceId, gpuId
    gpu_power_readings_power_draw W The power of a GPU instanceId, gpuId
    gpu_memory_freeutilization % The percentage of the free memory of a GPU instanceId, gpuId
    gpu_memory_useutilization % The percentage of the memory in use of a GPU instanceId, gpuId
  • Instance-dimension metrics

    Instance-dimension metrics measure the maximum, minimum, or average value of multiple GPUs on a per instance basis, so that you can query the overall resource usage at the instance level.

    Metric Unit Description Dimension
    instance_gpu_decoder_utilization % GPU decoder usage at the instance level instanceId
    instance_gpu_encoder_utilization % GPU encoder usage at the instance level instanceId
    instance_gpu_gpu_temperature GPU temperature at the instance level instanceId
    instance_gpu_gpu_usedutilization % GPU usage at the instance level instanceId
    instance_gpu_memory_freespace Byte Free GPU memory at the instance level instanceId
    instance_gpu_memory_freeutilization % The percentage of free GPU memory at the instance level instanceId
    instance_gpu_memory_totalspace Byte GPU memory at the instance level instanceId
    instance_gpu_memory_usedspace Byte GPU memory in use at the instance level instanceId
    instance_gpu_memory_usedutilization % GPU memory usage at the instance level instanceId
    instance_gpu_power_readings_power_draw W GPU power at the instance level instanceId
  • Group-dimension metrics

    Group-dimension metrics measure the maximum, minimum, or average value of multiple instances on a per group basis, so that you can query the overall resource usage at the group level.

    Metric Unit Description Dimension
    group_gpu_decoder_utilization % GPU decoder usage at the application group level groupId
    group_gpu_encoder_utilization % GPU encoder usage at the application group level groupId
    group_gpu_gpu_temperature GPU temperature at the application group level groupId
    group_gpu_gpu_usedutilization % GPU usage at the application group level groupId
    group_gpu_memory_freespace Byte Free GPU memory at the application group level groupId
    group_gpu_memory_freeutilization % The percentage of free GPU memory at the application group level groupId
    group_gpu_memory_totalspace Byte GPU memory at the application group level groupId
    group_gpu_memory_usedspace Byte GPU memory in use at the application group level groupId
    group_gpu_memory_usedutilization % GPU memory usage at the application group level groupId
    group_gpu_power_readings_power_draw W GPU power at the application group level groupId

Query GPU monitoring data in the console

After you have purchased an ECS instance of the GPU Compute type, you need to install the GPU driver and a CloudMonitor agent to be able to view and configure GPU monitoring charts and set alarm rules.

View monitoring charts

  1. Log on to the CloudMonitor console.
  2. In the left-side navigation pane, click Host Monitoring.
  3. On the Instances tab page, find the target instance and click the instance name.
  4. Click the GPUMonitor tab to view the GPU monitoring charts.

Configure monitoring charts

  1. Log on to the CloudMonitor console.
  2. In the left-side navigation pane, choose Dashboard > Custom Dashboard.
  3. In the upper-right corner, click Create Dashboard.
  4. In the displayed dialog box, enter a name for the dashboard and click Create.
  5. On the displayed page of the created dashboard, click Add View.
  6. On the Add View page, select the chart type, and then select the metrics.


  7. Click Save.

Set alarm rules

We recommended that you use alarm templates to set alarm rules for new GPU metrics in batches. You can create alarm templates for the GPU metrics and then apply the templates to related application groups. For more information, see Create an alarm template.

Query GPU monitoring data through APIs

  • For more information about how to call APIs to query GUP monitoring data, see QueryMetricList.
  • Parameter description: The Project parameter should be set to acs_ecs_dashboard. For the values of Metric and Dimensions, see the GPU metrics in the preceding tables.