View job details and resource usage - MaxCompute - Alibaba Cloud Documentation Center

The MaxCompute Job O&M feature lets you view historical and running jobs. You can use this feature to understand job execution details, analyze resource loads during runtime, and perform job O&M.

Features

The MaxCompute Job O&M feature lets you view and manage historical and running jobs.

For data developers, the Job O&M feature helps you view job details, promptly detect and handle exceptions and issues, and terminate jobs.
For administrators, the Job O&M feature helps you view the resource load of a quota group at a specific time. This lets you efficiently allocate and manage system resources to improve job execution efficiency and performance.

On the Jobs page of the MaxCompute console, you can configure filter conditions to find specific jobs. This lets you view job details and perform analysis. The feature provides the following operations:

Operations

Filter jobs
You can filter jobs based on the parameters described in the following table. For more information, see Parameter description.
Sort jobs
By default, jobs are sorted by end time in descending order, with running jobs listed first. You can perform basic single-column sorting or advanced multi-column sorting.
- Basic single-column sorting: Click the sort button in a column header to sort the data in ascending or descending order.
- Advanced multi-column sorting: To perform a multi-column sort, click the Advanced Sorting button in the upper-right corner of the list. Then, click Add Sort to add multiple columns, specify the sort order for each column as Ascending or Descending, and click OK.
  When advanced sorting is active, you cannot perform basic single-column sorting. To return to basic single-column sorting, click the Advanced Sorting button in the upper-right corner of the list, click Reset, and then click OK.
View job details
In the job list, click LogView in the Actions column of a job to open the LogView page. On this page, you can view the job's running status, execution details, and results.
Terminate jobs
You can perform the Terminate or Batch Terminate operation on jobs that have a Latest Status of Running.
Get job insights
In the job list, click Insights in the Actions column of a job. This lets you view the job overview, its resource consumption, and the resource allocation of the compute quota at a specific time. You can also trigger a job smart diagnosis.
- Currently, smart diagnosis is supported only for SQL jobs.
- For jobs that run for less than 2 minutes or are not SQL, MapReduce, Spark, or Mars jobs, job-level resource consumption data is not available.

Job statistics chart

A stacked column chart that shows job counts is generated based on the filter results and grouped by time and status. This chart helps you visualize the overall running status of jobs.

If you select a different Time Range, the duration represented by each column in the chart varies:
- If the Time Range is within 24 hours: The minimum time interval for each column is 2 minutes. The number of columns is automatically adjusted, with a maximum of 24.
- If the Time Range is more than 24 hours but no more than 48 hours: The time interval for each column is fixed at 2 hours. The number of columns is automatically adjusted, with a maximum of 24.
- If the Time Range is more than 48 hours but no more than 7 days: The time interval for each column is 6 hours. The number of columns is automatically adjusted, with a maximum of 29.
The job statuses included in the statistics are:
- Running: The snapshot status is Running.
- Finished: The job has succeeded, failed, or been terminated.
Job snapshot data is collected every three minutes. Because of this, snapshot data may not be available for some jobs. The snapshot status for a running job might also be empty.
You can drag the mouse over the chart to select and shorten the time range.

Job list

The job list shows jobs that match your specified filter conditions and provides basic job information. MaxCompute provides a Regular Job List and a Snapshot List to suit different scenarios.

Regular Job List: View information about all jobs within a specific time range.
Snapshot List: View snapshot information for jobs that are running at a specific time. This includes the snapshot status, CPU and memory usage, request volume, and usage percentage at the snapshot time.

For a description of the parameters in the job list, see Parameter description.

Scope

The following job information cannot be collected currently:

Job snapshot data is collected every three minutes. Because of this, snapshot data cannot be collected for jobs that start within the three-minute interval before a collection.
Some MaxCompute jobs initiated through PAI, especially those initiated by RAM users, cannot be collected.
Jobs in Developer Edition projects (to be discontinued) cannot be collected.

Note

Data is processed at a specific frequency. When you view the job list for the current time, some jobs with a Latest Status of Running may have already been completed in LogView. This discrepancy is more likely to occur for jobs with very short runtimes. The status in LogView is the most accurate.

Parameter description

Global job filter parameters

The following parameters affect the display of both the job statistics chart and the job list.

Parameter	Description
Time Range	Filters jobs by a time range (start and end time). This is a required parameter. You can use one of the following methods: Select a preset time range. Custom time range Expand the time selection panel, select the year, month, and day to query, and then click Select Time. The specified time has two meanings: Filter for jobs that finished within this time range. Filter for jobs that were running at the end time or within the 3 minutes before it. Job snapshot information is available for these jobs. This time is also known as the job snapshot observation time. The default range is the last 1 hour. The maximum time span for a search is 7 days, and the minimum is 2 minutes. You can search for jobs from the last 45 days.
Choose Project	Filters by MaxCompute project name. You can select multiple MaxCompute projects. By default, this is empty.
Select Quota	Filters by quota group. Only subscription quota groups are supported. By default, this is empty. You do not need to configure this parameter when querying pay-as-you-go jobs. For more information about quota groups, see Compute resources - Quota management.
Job Type	Filters by job type. The job types are: SQL: SQL job. SQLRT: Accelerated query SQL job. SQLCost: SQL cost estimation job. LOT: MapReduce job. CUPID: Spark or Mars job. AlgoTask: Machine learning job. Graph: Graph computing job. MaxQA (MCQA2): MaxQA job. MaxFrame: MaxFrame job.

Job list search parameters

The following parameters are used to perform a secondary search on the results in the job list and affect only the job list. The following figure shows the location of the search controls:

Parameter	Description
Instance ID	Searches by the InstanceID generated for a MaxCompute job. You can enter a specific InstanceID to find a job. By default, this is empty. For more information about InstanceIDs, see View instance information.
Job Owner	Filters by the account that submitted the MaxCompute job. By default, this is empty. Fuzzy query is not supported. The format must be a complete account name, such as `ALIYUN$xxx` or `RAM$xxx`.
ExtNodeId	Searches by the source job ID that ran the MaxCompute job. For example, a node ID from DataWorks. For more information about DataWorks node IDs, see Configure basic properties.
ExtNodName	Searches by the source job name that ran the MaxCompute job.
Signature	Searches by the signature of an SQL job. This is available only for SQL jobs. Use this signature to find all instances of the same SQL execution. By default, this is empty.

Job list item filter parameters

The following parameters are used to perform a secondary filter on the results in the job list and affect only the job list. To filter, click the filter icon next to the column name.

Parameter	Description
Latest Status	Filters by job running status. By default, no status is selected, meaning all statuses are shown. This performs a secondary filter on the job list results and only affects the job list. The job running statuses are: Running: A job that has not finished is in this state. Success: The job ran successfully. Failed. Canceled. Submitted: The job has been submitted and is waiting for compute resources. This is the overall status of the job. A job may have multiple concurrent tasks, each with a different sub-status. For details, go to LogView. For more information, see Use LogView 2.0 to view job run information.
Priority	Filters by MaxCompute job priority. The priority of each job is a value from 0 to 9. A smaller value indicates a higher priority. High-priority jobs get compute resources before low-priority jobs. This performs a secondary filter on the job list results and only affects the job list.
Intelligent Diagnostics	Filters by the tags from the job smart diagnosis results. By default, no tags are selected. For a detailed description of smart diagnosis result tags, see Smart diagnosis result tag description.

Normal list parameter description

Column Name	Description
Instance ID	An instance is generated for each MaxCompute job, and each instance has a corresponding Instance ID. The project, computing quota, and type of the job are also displayed. Click LogView in the Actions column of the Instance ID to go to the LogView page and view the specific progress of the job. For information about how to view LogView, see Use Logview 2.0 to view job run information. Click Insights in the Actions column of the Instance ID to go to the Job Insights page. On this page, you can view the diagnosis results, resource consumption, and similar job information. For more information, see Job Insights.
Latest Status	The latest status of the job.
Job Owner	The Alibaba Cloud account that runs the MaxCompute job. You can find the job owner based on the account information. If a job consumes too many resources and affects other jobs, you can contact the owner to terminate the job. For information about how to terminate a job, see Instance operations.
Priority	Each job has a priority from 0 to 9. A smaller value indicates a higher priority. High-priority jobs get compute resources before low-priority jobs. For more information, see Job priority.
Submission Time	The time when the instance was submitted.
Start Running Time	The time when the job received the first computing resource. If the running time of a job is too short or a job does not consume computing resources, such as DDL statements, use the submission time of the job instead.. By default, this is not displayed. You can customize the list options to display it.
Waiting Duration	The duration from the time a job is submitted to the time it starts to run. If the job has no start time, the duration from the submission time to the snapshot time is displayed. By default, this is not displayed. You can customize the list options to display it.
Execution Duration	The duration from the start time to the end time of a job.. By default, this is not displayed. You can customize the list options to display it.
End Time	The end time of a job..
Total Duration	The total duration from the time a job is submitted to the time the job is terminated..
Total Used CPU Resources	The total CPU consumption throughout the job execution. The unit is `100Cores`.
Total Amount of Used Memory Resources	The total memory consumption throughout the job execution. The unit is `MB*s`.
CPU Utilization Percentage Snapshot	The latest CPU usage percentage collected for the job 1 minute before the deadline of the time range. It is calculated by using the formula: CPU Cores in Use/(CPU Lower Limit for Reserved CUs + Elastically Reserved Minimum CPU Utilization). This information is not available for pay-as-you-go jobs or jobs that cannot collect snapshot information. Sorting is not supported. If you need to sort data, go to the snapshot list.
Memory Usage Percentage Snapshot	The latest memory usage percentage collected 1 minute before the job deadline of the time range. It is calculated by using the formula: Memory Size Occupied/(Minimum Reserved CU Memory + Minimum Elastically Reserved CU Memory). This information is not available for pay-as-you-go jobs or jobs that cannot collect snapshot information. Sorting is not supported. If you need to sort data, go to the snapshot list.
Input Size	The amount of input data computed by the job.
Intelligent Diagnostics	The tags generated from the job smart diagnosis results.
ExtPlatformId	The client that initiated the job, such as DataWorks. The requester needs to pass this information when initiating the job. For more information about the conventions for passing information, see General job identification conventions.
ExtNodeId	The task ID that corresponds to the job initiator, such as the ID of a DataWorks scheduling node. The requester needs to pass this information when initiating the job. For more information about the conventions for passing information, see General job identification conventions.
ExtNodeName	The task Name that corresponds to the job initiator, such as the Name of a DataWorks scheduling node.. The requester needs to pass this information when initiating the job. For more information about the conventions for passing information, see General job identification conventions.
ExtNodeOnDuty	The task owner that corresponds to the job initiator, such as the owner of a DataWorks scheduling node.. The requester needs to pass this information when initiating the job. For more information about the conventions for passing information, see General job identification conventions.
Signature	The signature of an SQL job. You can use this signature to search for the instance on which the SQL job runs..

Snapshot list parameter description

Column Name	Parameter Description
Instance ID	An instance is generated for each MaxCompute job, and each instance has a corresponding Instance ID. The project, computing quota, and type of the job are also displayed. Click LogView in the Actions column of the Instance ID to go to the LogView page and view the specific progress of the job. For information about how to view LogView, see Use Logview 2.0 to view job run information. Click Insights in the Actions column of the Instance ID to go to the Job Insights page. On this page, you can view the diagnosis results, resource consumption, and similar job information. For more information, see Job Insights.
Snapshot Time	The time when snapshot information is collected for the job..
Snapshot Status	The status of a job at the snapshot time..
Job Owner	The Alibaba Cloud account that runs the MaxCompute job. You can find the job owner based on the account information. If a job consumes too many resources and affects other jobs, you can contact the owner to terminate the job. For information about how to terminate a job, see Instance operations.
Priority	Each job has a priority from 0 to 9. A smaller value indicates a higher priority. High-priority jobs get compute resources before low-priority jobs. For more information, see Job priority.
CPU Cores in Use	Number of CPU cores in use for a job at the snapshot time..
Requested CPU Cores	The number of requested CPU cores of a job at the snapshot time..
CPU Satisfaction Rate	Number of CPU cores in use/Number of requested CPU cores of a job at the snapshot time..
CPU Utilization Percentage Snapshot	The latest CPU usage percentage collected for the job 1 minute before the deadline of the time range. It is calculated by using the formula: CPU Cores in Use/(CPU Lower Limit for Reserved CUs + Elastically Reserved Minimum CPU Utilization). This information is not available for pay-as-you-go jobs or jobs that cannot collect snapshot information. Sorting is not supported. If you need to sort data, go to the snapshot list.
Memory Size Occupied	The size of requested memory space of a job at the snapshot time.. The unit is automatically adjusted for display.
Requested Memory Size	The size of requested memory space of a job at the snapshot time.. The unit is automatically adjusted for display.
Memory Satisfaction Rate	Memory size occupied/Requested memory size of a job at the snapshot time..
Memory Usage Percentage Snapshot	The memory usage percentage of a job at the observation time, which is calculated by using the following formula: Amount of used memory resources/(Amount of reserved memory resources + Amount of elastically reserved memory resources).. This information is not available for pay-as-you-go jobs or jobs without snapshot data.
Submission Time	The time when a job was submitted..
Total Duration	The total duration from the time a job is submitted to the time the job is terminated.
ExtPlatformId	The client that initiated the job, such as DataWorks. The requester needs to pass this information when initiating the job. For more information about the conventions for passing information, see General job identification conventions.
ExtNodeId	The task ID that corresponds to the job initiator, such as the ID of a DataWorks scheduling node. The requester needs to pass this information when initiating the job. For more information about the conventions for passing information, see General job identification conventions.
ExtNodeOnDuty	The task owner that corresponds to the job initiator, such as the owner of a DataWorks scheduling node.. The requester needs to pass this information when initiating the job. For more information about the conventions for passing information, see General job identification conventions.
Signature	The signature of an SQL job. You can use this signature to search for the instance on which the SQL job runs..

Common O&M scenarios

View the run details of a specific job

Scenario

You need to view the running status of a job initiated by a DataWorks hourly scheduling node or audit a specific MaxCompute job.

Procedure

Log on to the MaxCompute console and select a region in the top-left corner.
In the navigation pane on the left, choose Observation O&M > Jobs.
On the Jobs page, select a Time Range as required and click Search.
Above the job list, select ExtNodeId or Instance ID and enter the corresponding value.
Click the icon to apply the filter.
In the results list, click LogView in the Actions column of the instance to open the LogView page and view detailed job execution information. For more information about LogView, see Use LogView 2.0 to view job run information.

View job run details for a specific period

O&M scenario

You need to view the jobs that ran in the last day for two projects that you are responsible for, Project_1 and Project_2, and analyze which jobs failed to resolve them.

Procedure

Log on to the MaxCompute console and select a region in the top-left corner.
In the navigation pane on the left, choose Observation O&M > Jobs.
Set the Time Range to 1d or set the Time Range from 00:00:00 on the current day to the current time.
In the Choose Project drop-down list, select Project_1 and Project_2.
In the results list, click LogView in the Actions column of the target instance to go to the LogView page and view detailed job run information. For more information about LogView, see Use LogView 2.0 to view job run information.

View the resource usage of jobs for a subscription quota at a specific time

O&M scenarios

The resource usage of the Default Subscription Quota is high, and many jobs are waiting. You need to identify the specific jobs that are consuming the quota resources.

Procedure

Log on to the MaxCompute console and select a region in the top-left corner.
In the navigation pane on the left, choose Observation O&M > Jobs.
Set the Time Range to 1h or customize the start and end times.
Set the Select Quota parameter to Default Subscription Quota and click Search.
In the results list, check the CPU Utilization Percentage Snapshot and Memory Usage Percentage Snapshot for jobs whose Latest Status is Running. You can check whether the job with the highest usage percentage is consistent with your business requirements and use other job information to determine whether the job is running as expected or needs to be terminated.
In the results list, click LogView in the Actions column of the target instance to go to the LogView page and view detailed job run information. For more information about LogView, see Use LogView 2.0 to view job run information.

View the run details of an accelerated query job

O&M scenarios

You need to view the running status and details of accelerated query jobs from the last day.

Procedure

Log on to the MaxCompute console and select a region in the top-left corner.
In the navigation pane on the left, choose Observation O&M > Jobs.
Set the Time Range to 1d and the Job Type to SQLRT (Query Acceleration), and then click Search.
In the results list, click LogView in the Actions column of the target instance to go to the LogView page and view detailed job run information. For more information about LogView, see Use LogView 2.0 to view job run information.

For jobs that use the accelerated query feature, multiple SQL commands may run in the same session. One session corresponds to one Instance ID. You can view the running status of all SQL commands in the session using the LogView for that Instance ID. Because of this, when you view accelerated query jobs on the Jobs page, note the following:

If the session has not exited (for example, if some SQL commands have finished but others are still running), the job's Latest Status is Running.
If the session expires or exits because the interface was closed, the job's Latest Status is Cancelled.

View job resource consumption and the resource allocation of a computing quota at a specific time

O&M Scenarios

A job has been running for an extended period without completing, and the cause is difficult to identify in LogView. Alternatively, a completed job may have a longer-than-expected runtime. You need to analyze whether a resource supply issue is the cause.

Procedure

Log on to the MaxCompute console and select a region in the top-left corner.
In the navigation pane on the left, choose Observation O&M > Jobs.
Select a Time Range and filter by the Select Quota parameter, then click Search.
Click Insights in the Actions column of the required Instance ID to open the Job Insights page.
On the Resource Consumption tab, view the resource consumption of the job throughout its lifecycle.
- The resource consumption chart shows the change trends of used CUs and waiting CUs at the job level over time, along with the change trends of used CUs and waiting CUs at the quota level. If you find that the job's CU usage is low while the quota-level CU usage is high or continuously at its limit, this indicates that the quota resources are insufficient and other jobs are competing for compute resources with the current job.
- Click a time point on the horizontal axis of the resource consumption chart to view the resource allocation of the compute quota at that time. This includes the number of jobs and their priority distribution. Click the color block for a target priority to open the job list and view the details of the corresponding jobs. This helps you identify which specific jobs are competing for compute resources with the current job. Based on your business needs, you can then optimize task execution, adjust job priorities, or manage compute resources. For more information, see Job priority or Compute resources - Quota management.

Next steps

If a job's resource usage percentage is consistently high and many other jobs are waiting, take the following actions:

If the job does not meet business requirements, stop the job.
If the job meets business requirements, the quota resource configuration is not suitable. In this case, you need to optimize the resource configuration. For more information, see Compute resource configuration optimization.

References

To view job information, check job status, and stop jobs using commands, see Instance operations.