The resource observation feature allows you to view the monitoring data of various resources, such as data transmission service resources, computing resources, and storage resources, in a specific period of time. You can view metric data in line charts or tables to optimize and adjust execution plans and resource configurations of jobs. This helps improve the execution efficiency and performance of jobs. This topic describes how to view the resource usage of MaxCompute.
Supported regions
The following table describes the regions in which the resource observation feature can be used to observe various resources.
Resource type | Supported region |
Computing resources | China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Ulanqab), China (Chengdu), China (Hong Kong), US (Silicon Valley), US (Virginia), Malaysia (Kuala Lumpur), Japan (Tokyo), Germany (Frankfurt), Indonesia (Jakarta), UK (London), and Singapore |
Storage resources | China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Zhangjiakou), and China (Ulanqab), China (Hong Kong), Malaysia (Kuala Lumpur), Japan (Tokyo), Germany (Frankfurt), Indonesia (Jakarta), and Singapore |
Data transmission service | China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), and China (Chengdu), China East 1 Finance, China (Hong Kong), Singapore, Japan (Tokyo), Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia), SAU (Riyadh - Partner Region) |
Job performance | China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Japan (Tokyo), US (Silicon Valley), US (Virginia), Germany (Frankfurt), UK (London), and SAU (Riyadh - Partner Region) |
Permissions
Alibaba Cloud accounts: have full read and operation permissions for resource observation.
RAM users: require RAM permissions. For more information, see RAM permissions
Computing resources
You can view CU resource usage for subscription and pay-as-you-go quotas.
Procedure
Log on to the MaxCompute console. In the upper-left corner, select a region.
In the navigation pane on the left, click Resource Observation.
On the Resource Observation page, click the Computing Resource tab.
Select a level-1 quota name, a time range, and a time interval.
The time interval is the number of minutes between data points. You can select Automatic, 1, 5, or 15 minutes. To ensure performance, if the time range exceeds 72 hours, you can only select the Automatic interval.
Click the
icon to the left of a level-2 quota to view its resource consumption trend graph. You can expand multiple level-2 quota charts at the same time.View the list of projects associated with each level-2 quota.
Metrics
Metric name | Description |
CPU Resources | The trend of the CPU utilization of the current quota. Click a time point to view the job snapshot list that corresponds to the time point. |
Memory Resources | The trend of the memory usage of the current quota group. Important The pay-as-you-go resources come from a shared resource pool. These resources are consumed to run computing jobs. Computing jobs compete for resources, and the resources that can be used by each job cannot be specified. If a user continuously requests a large number of resources, MaxCompute limits the resource usage of the user to ensure that other users can use pay-as-you-go computing resources. |
Quotas and associated projects: allow you to identify the projects that define a level-2 quota as the default quota.
Storage resources
You can view the total storage usage and the storage usage percentages of different storage types in the current region. You can also view the storage trends of different storage types and the detailed table or partition storage information based on the project and the time range that you select.
Procedure
Log on to the MaxCompute console. In the top navigation bar, select a region.
In the left-side navigation pane, choose Workspace > Resource Observation.
On the Resource Observation page, click the Storage Resource tab to view the total storage usage and storage usage distribution of different storage types on the current day.
Optional. Select a time range and one or more projects to view the Storage Trend. By default, 7d, indicating 7 days, is selected as the time range, and all projects are selected. You can manually select up to 8 projects.
Optional. On the Project Details tab in the Storage Details section, select a date to view the storage usage of each project on the date. The default date is the current day.
Optional. On the Table/Partition Details tab in the Storage Details section, select a date and a project to view the storage usage of tables and partitions in the project on the date. The default date is the current day.
Metrics
Metric name | Description |
Storage Usage on the Current Day | Displays the total storage usage and the storage usage percentage of each storage type in the current region. The data is updated approximately every hour. |
Storage Distribution | Displays the number of projects, tables, and partitions in the current region. The data is updated every day. |
Storage Trend |
|
Project Details | Displays the storage usage details of different storage types of projects whose total storage usage values are greater than 0 on a specified date in the current region. You can select a date within the last year. The Project Details tab also compares the total storage usage of the projects on the current day with that on the previous day, 7 days ago, or 30 days ago. |
Table/Partition Details | Displays the storage types, storage size, comparison between the storage usage on the current day and storage usage on the previous day, 7 days ago, or 30 days ago. |
Data transmission service
You can view the resource usage of a specific data transmission resource group or a specific project. You can also use filter conditions to conduct more in-depth observation and analysis of the resource usage of different tables or request types.
Procedure
Log on to the MaxCompute console, and select a region in the top navigation bar.
In the left-side navigation pane, choose .
On the Resource Observation page, click the Data Transmission Service tab.
Select a quota, a project, a time range, and an aggregation algorithm to query the data of each metric.
Usage notes
Data aggregation mechanism
A metric interval adaptive mechanism is used for the monitoring of the data transmission service. This mechanism automatically optimizes the metric density for displaying monitoring data based on the selected time range:
Short cycles (within 3 hour) use the native data granularity (1 minute/point).
Long cycles automatically extend the metric step size to 5 minutes/point (12 hours), 30 minutes/point (72 hours) and 60 minutes/point (7 days).
Multiple aggregation strategies are provided for you:
Average Value: reflects the overall trend of the data.
Maximum Value: captures characteristics of abnormal fluctuations.
When the step size exceeds the base granularity (1 minute), the system will first process the data according to the aggregation strategy. Therefore, the trends shown in the monitoring chart vary based on the aggregation algorithm. The longer the time period, the greater the data differences. This is normal behavior. You can select an appropriate aggregation algorithm based on the analysis scenario:
Performance analysis scenarios: It is recommended to use the Average Value aggregation algorithm.
Troubleshooting scenarios: It is recommended to use the Maximum Value aggregation algorithm.
Filter conditions and limits
When you filter data by time range, a single query supports a maximum selection of 7 days. Due to the metric interval adaptive mechanism, the shorter the selected time range, the more precise the monitoring data will be.
You must select at least a quota or a project. Combined filtering for a quota and a project is also available.
When viewing resource usage data only from the quota dimension: You can specify either an exclusive resource group or a shared resource group when selecting a quota. Since shared resource groups are project-level quotas, when viewing the usage of the shared resource group for a specific project, you also need to specify the project.
When viewing resource usage data only from the project dimension: You need to leave the Select Quota field empty and select the desired project for Choose Project. This way, the total usage for that project is displayed.
After changing the filter conditions, to avoid accidental actions, you need to click Query to refresh the monitoring data.
For some table-level monitoring dashboards, you must select a project before viewing the data or applying a filter by table name.
Metrics
Metric | Description |
Request Parallelism | A line chart displays the corresponding slot usage based on filter conditions, including the current usage and the quota usage limit. Unit: slots. |
Throughput | A line chart displays the corresponding throughput based on filter conditions, with the unit shown on the vertical axis, such as B/min or MB/min. |
Table-level request concurrency | Select a value for Usage Type (such as Tunnel Batch Upload) and a value for Table Name (such as testtable) to display a line chart that shows the concurrency of using the Tunnel Batch method to upload data to the table testtable under filter conditions. Unit: slots. |
Table-level IP throughput | Select a value for Usage Type (such as Tunnel Batch Upload) and a value for Table Name (such as testtable) to display a line chart that shows the throughput for each source IP address under filter conditions when the Tunnel Batch method is used to upload data to the table testtable. |
Total Requests and Error Requests | Displays the total number of requests and the count of various error requests based on the filter conditions.
|
Total Throughput | Displays a summary of the data volume for different usage methods within the corresponding time range based on the filter conditions, and shows the proportion of different usage methods through a pie chart. |
Slot Average Transmission Rate | Select a value for Usage Type (such as Tunnel Batch Upload) to display the average transmission rate per slot for requests of using the Tunnel Batch method to upload data under the filter conditions. |
Job performance
You can view the quantity, CU usage, and running durations of computing jobs and determine whether the job performance meets your expectations.
Procedure
Log on to the MaxCompute console, and select a region in the top navigation bar.
In the left-side navigation pane, click Resource Observation.
On the Resource Observation page, click the Job Performance Observation tab.
You can filter and group the jobs that you want to view based on the following parameters and group metric data in charts based on different dimensions.
Parameter
Description
Time Period
Required. The time range (start time and end time) that is used to filter completed jobs.
You can select a preset time range or configure a custom time range.
1d: previous day.
3d: previous 3 days.
7d: previous 7 days.
Custom time range: Click the drop-down list, select a date, and then click Select Time to select a time range.
NoteThe default time range is the previous day. The maximum time range is 7 days and the minimum time range is 1 hour. You can search for only jobs in the previous 45 days.
Project Name
The names of the MaxCompute projects that are used to filter completed jobs.
NoteBy default, all projects are selected. You can select up to eight projects.
Quota
The computing quotas that are used to filter completed jobs.
NoteBy default, all computing quotas are selected. You can select up to eight level-2 quotas. For more information about computing quotas, see Manage quotas for computing resources in the new MaxCompute console.
Group By
Required. The group of data in charts. You can define groups based on dimensions and chart types.
Valid values:
No Group: displays the trends of metrics over time for all jobs within the selected filter range. This is the default value.
Project: displays the metrics of all jobs within the selected filter range by project.
NoteIf you select Project, you must specify Project Name in the filter parameters and select up to eight projects.
Quota: displays the metrics of all jobs within the selected filter range by level-2 quota.
NoteIf you select Quota, you must specify Quota in the filter parameters and select up to eight level-2 quotas.
Job Type: displays the metrics of all jobs within the selected filter range by job type.
SQL: SQL job.
SQLRT: MaxCompute Query Acceleration (MCQA) SQL job.
LOT: MapReduce job.
CUPID: Spark or Mars job.
Algo_Task: machine learning job.
GRAPH: graph computing job.
Job End Status: displays the metrics of all jobs within the selected filter range based on the status when the job ends.
Success: The job succeeds.
Failed: The job fails.
Canceled: The job is canceled.
Click Search to view the statistics of each metric.
Optional. Select Data Summary to view the statistics of each metric based on the selected time range.
Parameter
Description
By Hour
One hour is a time granularity. If you select By Hour, data statistics of jobs that are completed in the current hour are displayed. By default, data at hourly granularity is displayed.
For example, if the current point in time is 14:00 on May 6, 2024, the statistics of each metric of jobs completed in the time range from 14:00 to 15:00 on May 6, 2024 are displayed.
By Day
One day is a time granularity. If you select By Day, data statistics of jobs that are completed in the current day are displayed.
For example, if the current date is May 6, 2024, the statistics of each metric of the jobs completed in the time range from 00:00 on May 6, 2024 to 00:00 on May 7, 2024, are displayed.
Select an option for Comparison Period to view the historical data statistics of the day or hour that are obtained by subtracting the number of days or hours specified in Comparison Period from the current date or hour.
Default value: No Comparison. You can also select Previous 30 Days, Previous 7 Days, or Previous 1 Day. For example, if the current point in time is 14:00 on May 6, 2024 and Previous 30 Days is selected, the data at 14:00 on April 6, 2024 is used to compare with the current data.
Metrics
CU Usage Trend (Unit: Core × Hour)
Metric name
Description
CPU-hour (Unit: Core × Hour)
The number of CPU-hours consumed for completed jobs within the selected filter range.
1 CPU-hour refers to that 1 CPU core is consumed for 1 hour. Number of CPU-hours = Number of CPU cores × Duration
Memory-hour (Unit: GB × Hour)
The number of memory-hours consumed for completed jobs within the selected filter range.
1 memory-hour refers to that 1-GB memory space is consumed for 1 hour. Number of memory-hours = Memory space × Duration.
Top 10 CPU-Hour Consumption Analysis or Top 10 Memory-Hour Consumption Analysis
Displays the top 10 jobs that consume the most CPU-hours or memory-hours and the top 10 signatures and ExtNodeIds of jobs that are ranked by the highest total CPU-hour, the highest average CPU-hour, the highest total memory-hour, and the highest average memory-hour within the selected filter range.
Job Runtime Period (Unit: seconds)
Metric name
Description
Average Runtime Duration
The average job duration of completed jobs within the selected filter range.
Maximum Runtime Duration
The maximum job duration of completed jobs within the selected filter range.
Minimum Runtime Duration
The minimum job duration of completed jobs within the selected filter range.
Select a quantile runtime duration
The duration that is taken to complete a specified quantile of jobs within the selected filter range. The quantile can be the 1st quantile, 5th quantile, 10th quantile, 50th quantile, 90th quantile, 95th quantile, or 99th quantile.
For example, for the 99th quantile, this metric indicates the duration that is taken to complete 99% jobs.
Top 10 Job Runtime Analysis
Displays the top 10 jobs that have the longest running durations and the top 10 signatures and ExtNodeIds of jobs that are ranked by the longest total running durations and longest average running durations.
Job Count Trend (Unit: counts): displays the number of jobs that are completed within the selected filter range.
Job Scan Amount Trend (Unit: GB): displays the amount of data that is scanned by completed jobs within the selected filter range. The unit is adaptively changed and the actually used unit is displayed in charts.
Trend of Job Scan Amount per CU-Hour (Unit: GB): displays the average amount of data that is scanned by jobs per CU-hour within the selected filter range. The unit is adaptively changed and the actually used unit is displayed in charts. 1 CU contains 1 CPU core and 4 GB of memory. The value is calculated by using
MAX(CPU-hours, Roundup(Memory-hours/4)).
You can also use the tenant-level Information Schema to collect statistics on the preceding metrics. You need to take note that the Information Schema task_history table contains task instances that are generated by all operations. However, the data of metrics displayed on the Job Performance Observation tab in the console is obtained only from the jobs that consume computing resources. Therefore, the statistical results obtained by using the tenant-level Information Schema may be different from the statistical results displayed on the Job Performance Observation tab.
The following SQL statements show an example.
SET odps.namespace.schema=TRUE;
SELECT to_char (end_time, 'yyyy-mm-dd hh'), -- The hour in which the job completes.
-- to_char (end_time, 'yyyy-mm-dd'), -- The date on which the job completes. If daily granularity is used, use this line to replace the preceding line.
sum(cast(cost_cpu/100/3600 as DECIMAL(18,5) )) cost_cpuh, -- The CPU-hours.
sum(cast(cost_mem/1024/3600 as DECIMAL(18,5) )) cost_memh, -- The memory-hours.
avg(datediff(end_time, start_time, 'ss')), -- The average runtime duration of jobs.
min(datediff(end_time, start_time, 'ss')), -- The minimum runtime duration of jobs.
max(datediff(end_time, start_time, 'ss')) -- The maximum runtime duration of jobs.
-- status, -- Group basis: status: job status; project: task_catalog; job type: task_type.
FROM SYSTEM_CATALOG.INFORMATION_SCHEMA.tasks_history
WHERE ds>=to_char(date_add(getdate(),-7),'yyyymmdd') -- You can add other filter conditions based on your business requirements.
and task_type in ('SQL','SQLRT','LOT','CUPID','ALgoTask')
GROUP BY to_char (end_time, 'yyyy-mm-dd hh')
-- to_char (end_time, 'yyyy-mm-dd'), -- The date on which the job completes. If daily granularity is used, use this line to replace the preceding line.
-- status, -- Group basis: status: job status; project: task_catalog; job type: task_type.
order BY to_char (end_time, 'yyyy-mm-dd hh') ASC;
-- to_char (end_time, 'yyyy-mm-dd'); -- The date on which the job completes. If daily granularity is used, use this line to replace the preceding line.FAQ
Question 1:
Problem description: After jobs are grouped by project or quota, some projects or quotas are not displayed in charts.
Possible causes: No jobs are available in the projects or quotas.
Question 2:
Problem description: After a comparison period is selected, the data that corresponds to the comparison period is not available.
Possible causes: The project or quota may not be created or may be deleted within the comparison period. No jobs are available in the project or quota within the comparison period.
References
After you view the data of each metric on the Resource Observation page, you can optimize and adjust the execution plans and resource allocation of jobs based on your business requirements.
You can reconfigure resources. For more information about how to configure quota plans and time plans in a quota resource group, see Configure quotas.
You can configure job priorities. For more information, see Job priority.