When you use MaxCompute, you can monitor subscription resources, pay-as-you-go job consumption to track the operational status of your resources. This lets you upgrade resources or plan jobs promptly. You can also set alert rules. If the status of a resource meets the conditions of an alert rule, Cloud Monitor automatically sends an alert notification to help you stay informed about the operational status of your resources.
Monitoring and alert solutions
MaxCompute supports the following methods for monitoring and alerts:
Configure metrics in Alibaba Cloud Monitor to monitor subscription resources, real-time job consumption.
NoteLog on to the MaxCompute console. On the Overview page, in the Alerts And Risk Warnings section, you can view the number of alerts for each metric.
Use dashboards to observe monitoring charts in real time and track the changes for each metric. For more information, see Dashboard configuration.
You can customize alert rules and add alert contacts. If a metric reaches or exceeds the set threshold, Cloud Monitor automatically sends an alert notification to the specified contacts. Alert notifications can be sent by phone, text message, email, or through a DingTalk Robot. For more information, see Alert rule configuration.
Use the MaxCompute client to monitor single SQL consumption. For more information about monitoring SQL consumption, see Single SQL consumption limit.
Metrics
The following table describes the metric types and specific metrics supported by MaxCompute.
Metric type | Metric categorization | Metric | Description |
MaxCompute-Subscription Compute Quota | level1 | Level 1 Quota CPU Utilization | The CPU utilization of a level 1 quota as a percentage of the total amount (reserved CUs + flexible reserved CUs). Unit: %. Data is collected every minute. |
Level 1 Quota CPU Usage | The total CPU usage of a level 1 quota. Unit: core. Data is collected every minute. | ||
Level 1 Quota Memory Utilization | The memory utilization of a level 1 quota as a percentage of the total memory (reserved + flexible reserved). Unit: %. Data is collected every minute. | ||
Level 1 Quota Memory Usage | The memory usage of a level 1 quota. Unit: MB. Data is collected every minute. | ||
level2 | Level 2 Quota CPU Utilization | The CPU utilization of a level 2 quota as a percentage of the total amount (reserved Min CUs + flexible reserved CUs). Unit: %. Data is collected every minute. | |
Level 2 Quota CPU Usage | The total CPU usage of a level 2 quota. Unit: core. Data is collected every minute. | ||
Level 2 Quota Memory Utilization | The memory utilization of a level 2 quota as a percentage of the total memory (reserved Min + flexible reserved). Unit: %. Data is collected every minute. | ||
Level 2 Quota Memory Usage | The memory usage of a level 2 quota. Unit: MB. Data is collected every minute. | ||
Level 2 Quota Waiting Jobs | The number of waiting jobs in a level 2 quota. Unit: count. Data is collected every minute. | ||
MaxCompute-General | Tunnel | Tunnel Download Traffic_Project Level | A real-time metric for download traffic at the project level. You can set a maximum download traffic in bytes/min. An alert is triggered if the traffic reaches or exceeds this threshold. |
Tunnel Upload Traffic_Project Level | A real-time metric for upload traffic at the project level. You can set a maximum upload traffic in bytes/min. An alert is triggered if the traffic reaches or exceeds this threshold. | ||
Cumulative Daily Tunnel Download Volume_Project Level | A metric for the cumulative data download volume of a project in a single day. You can set a maximum data volume in MB. An alert is triggered if the volume reaches or exceeds this threshold. | ||
Cumulative Daily Tunnel Upload Volume_Project Level | A metric for the cumulative data upload volume of a project in a single day. You can set a maximum data volume in MB. An alert is triggered if the volume reaches or exceeds this threshold. | ||
Current Tunnel Concurrency (Slots)_Project Level | The number of concurrent slots currently used by the selected project. An alert is triggered if the number reaches or exceeds the threshold. | ||
Current Tunnel Concurrency (Slots)_Tenant Level | The number of concurrent slots currently used by the selected tenant. An alert is triggered if the number reaches or exceeds the threshold. | ||
Job | Job Runtime | Monitors all jobs in a MaxCompute project. If the runtime of a job, including its waiting time, exceeds the set threshold, the system sends an alert notification to the alert contacts based on the configured alert rule. Important Jobs with a runtime of less than 1 minute cannot be monitored. | |
Job Runtime_SQL Type | Monitors all SQL jobs in a MaxCompute project. If the runtime of an SQL job, including its waiting time, exceeds the set threshold, the system sends an alert notification to the alert contacts based on the configured alert rule. Important Jobs with a runtime of less than 1 minute cannot be monitored. | ||
Job Runtime_SQL Type_Submitter | Monitors the runtime of all SQL jobs in a MaxCompute project, including waiting time. If an SQL job's runtime exceeds the set threshold, the system sends an alert notification to the alert contacts based on the configured alert rule. The alert includes the job submitter's information to help the recipient identify the job owner. Important Jobs with a runtime of less than 1 minute cannot be monitored. | ||
Cost | Daily Storage API Read Consumption | A metric for the cumulative daily data consumption from Storage API reads at the project level. Unit: GiB. An alert is triggered if the consumption reaches or exceeds the threshold. Note Each tenant receives a free monthly quota of 1 TB for data reads and writes. Monitoring starts after data consumption exceeds 1 TB. | |
Monthly Storage API Read Consumption | A metric for the cumulative monthly data consumption from Storage API reads at the project level. Unit: GiB. An alert is triggered if the consumption reaches or exceeds the threshold. Note Each tenant receives a free monthly quota of 1 TB for data reads and writes. Monitoring starts after data consumption exceeds 1 TB. | ||
Daily Storage API Write Consumption | A metric for the cumulative daily data consumption from Storage API writes at the project level. Unit: GiB. An alert is triggered if the consumption reaches or exceeds the threshold. Note Each tenant receives a free monthly quota of 1 TB for data reads and writes. Monitoring starts after data consumption exceeds 1 TB. | ||
Monthly Storage API Write Consumption | A metric for the cumulative monthly data consumption from Storage API writes at the project level. Unit: GiB. An alert is triggered if the consumption reaches or exceeds the threshold. Note Each tenant receives a free monthly quota of 1 TB for data reads and writes. Monitoring starts after data consumption exceeds 1 TB. | ||
Daily Consumption Of Pay-As-You-Go Jobs (CNY) | This metric monitors the daily fees of SQL and MapReduce jobs in a project. You can set a maximum daily cost in CNY. An alert is triggered if this threshold is reached or exceeded. | ||
Daily Consumption Of Pay-As-You-Go Jobs (USD) | This metric monitors the total daily fees of SQL and MapReduce jobs in a project. You can set a maximum daily cost in USD. An alert is triggered if this threshold is reached or exceeded. | ||
Monthly Consumption Of Pay-As-You-Go Jobs (CNY) | This metric monitors the monthly fees of SQL and MapReduce jobs in a project. You can set a maximum monthly fee in CNY. An alert is triggered if this threshold is reached or exceeded. | ||
Monthly Consumption Of Pay-As-You-Go Jobs (USD) | This metric monitors the monthly fees of SQL and MapReduce jobs in a project. You can set a maximum monthly fee in USD. An alert is triggered if this threshold is reached or exceeded. | ||
Storage | Standard Storage Size_Project Level | The Standard storage size of the project. Unit: GB. Data is collected every hour. | |
IA Storage Size_Project Level | The IA storage size of the project. Unit: GB. Data is collected every hour. | ||
IA Storage Access Percentage In The Last 30 Days_Project Level | Value: | ||
Archive Storage Size_Project Level | The Archive storage size of the project. Unit: GB. Data is collected every hour. | ||
Archive Storage Access Percentage In The Last 180 Days_Project Level | Value: |
You can configure a dashboard or an alert rule for a metric. For more information, see Dashboard configuration or Alert rule configuration.
Dashboard configuration
Log on to the Cloud Monitor console.
In the navigation pane on the left, choose .
On the Custom Dashboard page, click Create Dashboard. In the Create Dashboard dialog box, enter a Dashboard Name, select a Folder, and click OK.
Click the name of the dashboard you just created. On the product page, click Add Visualization Widget.

In the upper-right corner of the page, select a chart type. Options include line chart, column chart, statistics, gauge, meter, pie chart, table, facet chart, flow chart, and histogram.
In the Query Analysis section, set Data Source Plugin to Cloud Service Monitoring. You can then configure parameters such as Metric.
For more information about how to manage monitoring charts, see Manage monitoring charts in a custom dashboard.
Alert rule configuration
You can set alert rules for the metrics that are described in the Metrics section.
The following example shows how to set an alert for a resource group. This alert is triggered when the CU or memory utilization of a MaxCompute subscription quota group exceeds a specific value. For example, assume a resource group is configured with 150 CUs. One core at full utilization is 100%. Therefore, the maximum usage for 150 CUs is 15000%. You can set the alert threshold to trigger when usage exceeds 12000%. An alert indicates that the resource group is approaching full capacity and that submitting more jobs may cause them to be queued. You can then upgrade the resource group or plan your jobs based on your business needs. The following steps describe how to configure the alert rule for this scenario:
Log on to the Cloud Monitor console.
In the navigation pane on the left, click .
On the Alert Rules page, click Create Alert Rule.
On the Create Alert Rule page, configure the parameters for the alert rule based on your scenario. For more information about the parameters, see Create an alert rule. For more information about how to configure alert contacts, see Create an alert contact or an alert contact group.
The key parameters for the example scenario are described below:
Parameter
Description
Product
From the drop-down list, select MaxCompute-Subscription Compute Quota.
Resource Range
From the drop-down list, select Instance.
Linked Instance
Click Add Instance. On the Add Instance page, select the subscription quota group in the region where your MaxCompute project resides, and then click OK. For more information about quota groups, see Computing resources - Quota management.
Rule Description
Click . In the panel that appears, configure the following parameters:
Rule Name: Set a name for the alert rule.
Metric Type: Select Simple Metric.
Metric: From the drop-down list, select the corresponding CPU usage.
NoteIf the added instance is a level 1 quota group, you can select . If the added instance is a level 2 quota group, you can select
You can also monitor the number of waiting jobs. If CPU usage is high and many jobs are waiting for N consecutive epochs, manual resource intervention may be required.
Click OK to complete the alert rule configuration.
References
To set limits and alerts for the consumption of pay-as-you-go computing tasks, see Consumption control.