Resource observation provides monitoring data for computing resources, storage resources, data transmission resources, and job performance over a specified time period. Use these metrics to evaluate resource consumption patterns, identify optimization opportunities, and adjust execution plans and resource configurations to improve job efficiency.
Scope: Resource observation covers four resource types: computing resources (CU utilization by quota), storage resources (usage by project, table, and partition), data transmission service (Tunnel and Storage API throughput and concurrency), and job performance (job count, CU consumption, and runtime duration).
MaxCompute uses Tunnel as its built-in data transfer mechanism for uploading and downloading data. In this document, metrics under the Data Transmission Service tab primarily track Tunnel-based operations (such as Tunnel Batch Upload and Tunnel Batch Download) and Storage API operations.
Supported regions
| Resource type | Supported regions |
|---|---|
| Computing resources | China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Ulanqab), China (Chengdu), China (Hong Kong), US (Silicon Valley), US (Virginia), Malaysia (Kuala Lumpur), Japan (Tokyo), Germany (Frankfurt), Indonesia (Jakarta), UK (London), and Singapore |
| Storage resources | China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Zhangjiakou), China (Ulanqab), China (Hong Kong), Malaysia (Kuala Lumpur), Japan (Tokyo), Germany (Frankfurt), Indonesia (Jakarta), and Singapore |
| Data transmission service | China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China East 1 Finance, China (Hong Kong), Singapore, Japan (Tokyo), Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia), SAU (Riyadh - Partner Region) |
| Job performance | China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Japan (Tokyo), US (Silicon Valley), US (Virginia), Germany (Frankfurt), UK (London), and SAU (Riyadh - Partner Region) |
Permissions
Alibaba Cloud accounts: Have full read and operation permissions for Resource Observation.
Resource Access Management (RAM) users: Require RAM permissions. For more information, see RAM permissions.
Computing resources
Monitor Compute Unit (CU) resource usage for subscription and pay-as-you-go quotas.
MaxCompute organizes computing resources into a two-level hierarchy: a level-1 quota represents a top-level resource pool, and each level-1 quota contains one or more level-2 quotas that subdivide resources for finer-grained allocation. For more information, see Manage quotas for computing resources in the new MaxCompute console.
View computing resource usage
Log on to the MaxCompute console. In the upper-left corner, select a region.
In the navigation pane on the left, click Resource Observation.
On the Resource Observation page, click the Computing Resources tab.
Select a level-1 quota name, a time range, and a time interval. The time interval is the number of minutes between data points. Options: Adaptive, 1, 5, or 15 minutes. If the time range exceeds 72 hours, only Adaptive is available.
Click the expand icon to the left of a level-2 quota to view its resource consumption trend graph. You can expand multiple level-2 quota charts at the same time.
View the list of projects associated with each level-2 quota.
Computing resource metrics
| Metric | Description | What to look for |
|---|---|---|
| CPU Resources | Trend of CPU utilization for the current quota. Click a time point to view the job snapshot list for that point. | Sustained high CPU usage indicates the quota may need more capacity. Sudden spikes suggest large or unoptimized jobs. |
| Memory Resources | Trend of memory usage for the current quota. | Consistently high memory usage may require quota resizing or job memory optimization. |
| Quotas and associated projects | Identifies the projects that define a level-2 quota as the default quota. | Verify that projects are assigned to the correct quotas. |
Pay-as-you-go resources come from a shared resource pool. Computing jobs compete for resources, and the resources available to each job cannot be specified. If a user continuously requests a large number of resources, MaxCompute limits that user's resource usage to ensure fair access for other pay-as-you-go users.
Storage resources
Monitor the total storage usage and storage usage percentages of different storage types in the current region. View storage trends over time and drill down into table-level or partition-level storage details by project.
View storage resource usage
Log on to the MaxCompute console. In the top navigation bar, select a region.
In the left-side navigation pane, click Resource Observation.
On the Resource Observation page, click the Storage Resource tab to view the total storage usage and storage usage distribution of different storage types on the current day.
(Optional) Select a time range and one or more projects to view the Storage Trend. By default, 7d (7 days) is selected as the time range, and all projects are selected. You can manually select up to 8 projects.
(Optional) On the Project Details tab in the Storage Details section, select a date to view the storage usage of each project on that date. The default date is the current day.
(Optional) On the Table/Partition Details tab in the Storage Details section, select a date and a project to view the storage usage of tables and partitions in the project on that date. The default date is the current day.
Storage resource metrics
| Metric | Description | Unit | Update frequency |
|---|---|---|---|
| Storage Usage on the Current Day | Total storage usage and the storage usage percentage of each storage type in the current region. | Bytes (adaptive) | Approximately every hour |
| Storage Distribution | The number of projects, tables, and partitions in the current region. | Counts | Every day |
| Storage Trend | Group by storage type: storage usage of all projects or selected projects and the trend of each storage type over time. Group by project: storage trends for the top N projects (8 by default) with the highest total storage usage, or for selected projects. | Bytes (adaptive) | -- |
| Project Details | Storage usage details by storage type for projects whose total storage usage is greater than 0 on a specified date. Select a date within the last year. Includes comparison with the previous day, 7 days ago, or 30 days ago. | Bytes (adaptive) | -- |
| Table/Partition Details | Storage types, storage size, and comparison with the previous day, 7 days ago, or 30 days ago. | Bytes (adaptive) | -- |
Data transmission service
Monitor the resource usage of a specific data transmission resource group or project. Use filters to analyze usage by table or request type.
View data transmission metrics
Log on to the MaxCompute console, and select a region in the top navigation bar.
In the left-side navigation pane, click Resource Observation.
On the Resource Observation page, click the Data Transmission Service tab.
Select a quota, a project, a time range, and an aggregation algorithm to query metric data.
Data aggregation
An adaptive interval mechanism automatically adjusts the data point density based on the selected time range:
| Time range | Data point interval |
|---|---|
| Within 3 hours | 1 minute/point (native granularity) |
| Up to 12 hours | 5 minutes/point |
| Up to 72 hours | 30 minutes/point |
| Up to 7 days | 60 minutes/point |
Two aggregation algorithms are available:
| Algorithm | Behavior | Recommended use |
|---|---|---|
| Average Value | Reflects the overall trend. | Performance analysis |
| Maximum Value | Captures peak fluctuations. | Troubleshooting |
When the interval exceeds the native granularity (1 minute), the system pre-processes data using the selected aggregation algorithm. Longer time ranges produce greater differences between Average Value and Maximum Value results. This is expected behavior.
Filter conditions and limits
A single query supports a maximum time range of 7 days. Shorter time ranges yield more precise monitoring data.
Select at least a quota or a project. Combined filtering for both a quota and a project is also supported.
Quota dimension only: Specify either an exclusive resource group or a shared resource group. Since shared resource groups are project-level quotas, viewing the shared resource group for a specific project also requires specifying the project.
Project dimension only: Leave the Select Quota field blank and select the desired project for Choose Project. This displays the total usage for that project.
After changing filter conditions, click Query to refresh the monitoring data.
For table-level monitoring dashboards, select a project before viewing data or filtering by table name.
Data transmission metrics
| Metric | Description | Unit |
|---|---|---|
| Request Parallelism | A line chart showing slot usage based on filter conditions, including current usage and quota usage limit. | Slots (concurrent connections) |
| Throughput | A line chart showing throughput based on filter conditions. | Displayed on the vertical axis (for example, B/min or MB/min) |
| Table-level Request Parallelism | Select a Usage Type (such as Tunnel Batch Upload) and a Table Name (such as testtable) to display a line chart of concurrency for the specified upload or download operation. | Slots |
| Table-level IP Throughput | Select a Usage Type and a Table Name to display a line chart of throughput per source IP address for the specified operation. | Displayed on the vertical axis |
| Total Requests and Error Requests | Total number of requests and the count of various error requests based on filter conditions. Total Requests = all successful requests + error requests. Error Requests = requests with 4XX or 5XX status codes. For more information about status codes, see Overview of the data transmission service. | Counts |
| Total Throughput | Summary of data volume by usage method within the selected time range, with a pie chart showing the proportion of each method. | Bytes (adaptive) |
| Slot Average Transfer Rate | Select a Usage Type (such as Tunnel Batch Upload) to display the average transmission rate per slot for the specified operation. | Displayed on the vertical axis |
Job performance
Monitor the count, CU usage, and running durations of computing jobs to evaluate whether job performance meets expectations.
View job performance metrics
Log on to the MaxCompute console, and select a region in the top navigation bar.
In the left-side navigation pane, click Resource Observation.
On the Resource Observation page, click the Job Performance Observation tab.
Configure the following filter and grouping parameters, then click Search to view statistics.
Filter parameters
| Parameter | Description |
|---|---|
| Time Range | Required. The time range used to filter completed jobs. Preset options: 1d (previous day), 3d (previous 3 days), 7d (previous 7 days). Configure a custom time range by clicking the drop-down list, selecting a date, and clicking Select Time. Default: previous day. Maximum: 7 days. Minimum: 1 hour. Search for jobs from only the previous 45 days. |
| Project Name | The MaxCompute projects used to filter completed jobs. By default, all projects are selected. Select up to eight projects. |
| Quota | The computing quotas used to filter completed jobs. By default, all computing quotas are selected. Select up to eight level-2 quotas. For more information, see Manage quotas for computing resources in the new MaxCompute console. |
| Group By | Required. Determines how data is grouped in charts. Options: No Group (default, shows trends over time for all jobs), Project (groups by project; requires specifying up to eight projects in Project Name), Quota (groups by level-2 quota; requires specifying up to eight quotas in Quota), Job Type (groups by job type), Job End Status (groups by completion status). |
Job Type values:
| Value | Description |
|---|---|
| SQL | SQL job |
| SQLRT | MaxCompute Query Acceleration (MCQA) SQL job |
| LOT | MapReduce job |
| CUPID | Spark or Mars job |
| Algo_Task | Machine learning job |
| GRAPH | Graph computing job |
Job End Status values: Success, Failed, Canceled.
Data summary and comparison
(Optional) Select Data Summary to view statistics at a different time granularity.
| Option | Description |
|---|---|
| By Hour | Displays statistics for jobs completed within each hour. This is the default granularity. |
| By Day | Displays statistics for jobs completed within each day. |
(Optional) Select a Comparison Period to compare current data against a historical baseline.
Default: No Comparison. Other options: Previous 30 Days, Previous 7 Days, Previous 1 Day. For example, if the current time is 14:00 on May 6, 2024, and you select Previous 30 Days, data from 14:00 on April 6, 2024 is shown alongside the current data for comparison.
Job performance metrics
CU usage trend
| Metric | Description | Unit |
|---|---|---|
| CPU-hour | The number of CPU-hours consumed by completed jobs. 1 CPU-hour means 1 CPU core consumed for 1 hour. Formula: Number of CPU cores x Duration. | Core x Hour |
| Memory-hour | The number of memory-hours consumed by completed jobs. 1 memory-hour means 1 GB of memory consumed for 1 hour. Formula: Memory space x Duration. | GB x Hour |
| Top 10 CPU-Hour Consumption Analysis / Top 10 Memory-Hour Consumption Analysis | The top 10 jobs by CPU-hour or memory-hour consumption, and the top 10 signatures and ExtNodeId (execution plan node identifiers) ranked by highest total, highest average CPU-hour, highest total, and highest average memory-hour. | -- |
Job runtime period
| Metric | Description | Unit |
|---|---|---|
| Average Runtime Duration | The average duration of completed jobs within the selected filter range. | Seconds |
| Maximum Runtime Duration | The maximum duration of completed jobs within the selected filter range. | Seconds |
| Minimum Runtime Duration | The minimum duration of completed jobs within the selected filter range. | Seconds |
| Quantile runtime duration | The duration at a specified quantile of completed jobs. Available quantiles: 1st, 10th, 50th, 90th, and 99th. For example, the 99th quantile indicates the duration within which 99% of jobs completed. | Seconds |
| Top 10 Job Runtime Analysis | The top 10 jobs with the longest running durations, and the top 10 signatures and ExtNodeId (execution plan node identifiers) ranked by longest total and longest average running duration. | -- |
Job count trend
Unit: counts. Displays the number of jobs completed within the selected filter range.
Job input size trend
Unit: GB (adaptive). Displays the amount of data scanned by completed jobs within the selected filter range. The unit adapts automatically and the actual unit is shown in charts.
Trend of job input size per CU-hour
Unit: GB (adaptive). Displays the average amount of data scanned per CU-hour within the selected filter range. The unit adapts automatically and the actual unit is shown in charts.
1 CU contains 1 CPU core and 4 GB of memory. The value is calculated as: MAX(CPU-hours, ROUNDUP(Memory-hours/4)).
Query metrics with Information Schema
Use the tenant-level Information Schema to collect the same statistics programmatically.
The Information Schema task_history table contains task instances generated by all operations. The Job Performance Observation tab displays data only from jobs that consume computing resources. As a result, statistics from Information Schema may differ from those shown in the console.The following SQL example queries job metrics at hourly granularity for the previous 7 days:
SET odps.namespace.schema=TRUE;
SELECT to_char (end_time, 'yyyy-mm-dd hh'), -- The hour in which the job completes.
-- to_char (end_time, 'yyyy-mm-dd'), -- The date on which the job completes. If daily granularity is used, use this line to replace the preceding line.
sum(cast(cost_cpu/100/3600 as DECIMAL(18,5) )) cost_cpuh, -- The CPU-hours.
sum(cast(cost_mem/1024/3600 as DECIMAL(18,5) )) cost_memh, -- The memory-hours.
avg(datediff(end_time, start_time, 'ss')), -- The average runtime duration of jobs.
min(datediff(end_time, start_time, 'ss')), -- The minimum runtime duration of jobs.
max(datediff(end_time, start_time, 'ss')) -- The maximum runtime duration of jobs.
-- status, -- Group basis: status: job status; project: task_catalog; job type: task_type.
FROM SYSTEM_CATALOG.INFORMATION_SCHEMA.tasks_history
WHERE ds>=to_char(date_add(getdate(),-7),'yyyymmdd') -- You can add other filter conditions based on your business requirements.
and task_type in ('SQL','SQLRT','LOT','CUPID','ALgoTask')
GROUP BY to_char (end_time, 'yyyy-mm-dd hh')
-- to_char (end_time, 'yyyy-mm-dd'), -- The date on which the job completes. If daily granularity is used, use this line to replace the preceding line.
-- status, -- Group basis: status: job status; project: task_catalog; job type: task_type.
order BY to_char (end_time, 'yyyy-mm-dd hh') ASC;
-- to_char (end_time, 'yyyy-mm-dd'); -- The date on which the job completes. If daily granularity is used, use this line to replace the preceding line.Troubleshooting
Projects or quotas are not displayed in grouped charts
No completed jobs exist in those projects or quotas during the selected time range.
Comparison period data is not available
The project or quota did not exist during the comparison period, or no jobs ran in the project or quota during that time.
Related information
After reviewing resource observation metrics, consider the following optimizations:
Reconfigure resources: Configure quota plans and time plans in a quota resource group. See Configure quotas.
Adjust job priorities: See Job priority.