This topic describes how to view the monitoring metrics of GPU-accelerated instances and how to configure auto scaling policies for provisioned GPU-accelerated instances based on different resource metrics.

Background

The resource utilization of GPU hardware varies in different scenarios, such as model training, AI inference, and audio and video transcoding. Function Compute provides the auto scaling feature for provisioned GPU-accelerated instances based on the resource utilization of GPU instances, including streaming multiprocessors (SM), GPU memory, hardware decoder, and hardware encoder.

You can configure an auto scaling policy for provisioned instances in Function Compute by using scheduled setting modification and metric-based setting modification. You can use only metric-based setting modification to configure an auto scaling policy for GPU-accelerated instances based on metrics. For more information, see the "Metric-based Setting Modification" section in Configure provisioned instances and auto scaling rules.

View the metrics of GPU-accelerated instances

After the GPU function is executed, you can view the resource usage of the GPU-accelerated instance in the Function Compute console.

  1. Log on to the Function Compute console. In the left-side navigation pane, choose Advanced Features > Monitoring Dashboard.
  2. In the top navigation bar, select a region. In the service list, click the name of the desired service.
  3. In the function list of the monitoring dashboard, click the name of the desired function.
  4. On the monitoring dashboards page, click the Instance Metrics tab.
    You can check the resource usage of GPU-accelerated instances by viewing the following metrics: GPU-accelerated Memory Usage (Percentage), GPU-accelerated SM Usage (Percentage), GPU-accelerated Hardware Encoder Usage (Percentage), and GPU-accelerated Hardware Decoder Usage (Percentage).

Configure an auto scaling policy

Metrics for GPU resource usage

Metric Description Value range
GPUSmUtilization GPU SM utilization [0, 1]. The utilization rate ranges from 0% to 100%.
GPUMemoryUtilization GPU memory usage [0, 1]. The utilization rate ranges from 0% to 100%.
GPUDecoderUtilization GPU hardware decoder utilization [0, 1]. The utilization rate ranges from 0% to 100%.
GPUEncoderUtilization GPU hardware encoder utilization [0, 1]. The utilization rate ranges from 0% to 100%.

Configure an auto scaling policy in the Function Compute console

  1. Log on to the Function Compute console. In the left-side navigation pane, click Services & Functions.
  2. In the top navigation bar, select a region. On the Services page, click the desired service.
  3. On the Functions page, click the name of the desired function. On the Function Details page that appears, click the Auto Scaling tab.
  4. On the Function Details page, click the Auto Scaling tab and click Create Rule.
  5. On the page that appears, configure the following parameters and click Create.
    1. Configure the version and minimum number of instances. Retain the default value for other parameters.
      gpu-Scaling-Rule-1
    2. In the Metric-based Setting Modification section, click + Add Configuration and configure the policy.
      Example: gpu-Scaling-Rule-2
    After the configuration is complete, you can choose Metrics > Function Metrics to view the change of Function Provisioned Instances.

Configure an auto scaling policy by using Serverless Devs

Before you start:
  1. Create a project directory.
    mkdir fc-gpu-async-job&&cd fc-gpu-async-job
  2. In the project directory, create a file that contains the auto scaling policy, such as gpu-sm-hpa-policy.json.
    Example:
    {
      "target": 1,
      "targetTrackingPolicies": [
        {"name":"hpa_gpu_decoder_util","startTime":"2022-09-05T16:00:00.000Z","endTime":"2023-07-06T16:00:00.000Z","metricType":"GPUSmUtilization","metricTarget":0.01,"minCapacity":1,"maxCapacity":20}
      ]
    }
  3. In the project directory, run the following command to add the auto scaling policy to the desired function:
    s cli fc provision put --region ${region} --service-name ${service-name} --function-name ${function-name} --qualifier LATEST --config gpu-sm-hpa-policy.json
  4. Run the following command to view the auto scaling policy:
    s cli fc provision get --region ${region} --service-name ${service-name} --function-name ${function-name} --qualifier LATEST
    If the command is successfully run, the following result is returned:
    [2022-10-08 16:00:12] [INFO] [FC] - Getting provision: zh****.LATEST/zh****
    serviceName:            zh****
    functionName:           zh****
    qualifier:              LATEST
    resource:               164901546557****#zh****#LATEST#zh****
    target:                 1
    current:                1
    scheduledActions:       null
    targetTrackingPolicies:
      -
        name:         hpa_gpu_decoder_util
        startTime:    2022-09-05T16:00:00.000Z
        endTime:      2023-07-06T16:00:00.000Z
        metricType:   GPUSmUtilization
        metricTarget: 0.01
        minCapacity:  1
        maxCapacity:  20
    currentError:
    alwaysAllocateCPU:      true

References

If you want to modify the sensitivity of auto scaling, join the DingTalk group 11721331 to contact Function Compute technical support.