The Operation and Maintenance feature of MaxCompute allows you to view historical jobs and jobs that are running, view job details, and analyze the resource load of a job when the job is running. This helps you manage jobs.
Feature description
The Operation and Maintenance feature of MaxCompute allows you to view and manage historical jobs and jobs that are running in your project.
Data developers can use this feature to view job details, identify job exceptions, and troubleshoot job issues at the earliest opportunity. For example, data developers can terminate one or more jobs in which exceptions occur to handle job issues.
Administrators can use this feature to view the resource load at a specific point in time and allocate and manage system resources in an efficient manner based on the quota group to which the resource belongs. This helps improve job execution efficiency and performance.
You can configure filter conditions to filter jobs on the Jobs page of the MaxCompute console. This helps you query the details of a job and analyze a job. You can perform the following operations on the Jobs page.
Operations
Filter jobs
You can configure filter parameters to query the details of jobs. The following table describes the filter parameters.
Sort jobs
The job filtering results are sorted by job completion time in descending order, with unfinished jobs appearing at the top. Basic single-column sorting and advanced multi-column sorting are supported.
Basic single-column sorting: Sort the column with a sort button in the list in ascending or descending order.
Advanced multi-column sorting: Click the Advanced Sorting button in the upper-right corner of the list, add columns by clicking Add Sort, and specify the sort order such as ascending and descending for each column. Click OK to apply the multi-column sorting.
NoteWhen advanced sorting conditions are applied, basic single-column sorting cannot be performed. You need to click the Advanced Sorting button in the upper-right corner of the list, then click Reset and OK before you can perform basic single-column sorting again.
View job details
To view the details of a job, perform the following steps: In the job list, find the desired job and click LogView in the Actions column to go to the LogView page. On the page that appears, view the status, details, and results of the job.
Terminate jobs
You can terminate one or more jobs that are in the
Runningstate at a time.Jobs Insight
You can perform insight operations on individual jobs to view job summaries, resource consumption, and resource allocation for Quota at a specific point in time, as well as to trigger the Intelligent diagnostics for jobs.
NoteIntelligent diagnostics are available exclusively for SQL jobs.
Jobs with a runtime less than 2 minutes or jobs of types other than SQL, MapReduce, Spark, and Mars do not have job-level resource consumption data.
View the chart that displays job statistics
The chart displays the number of jobs on a stacked column chart by time and job state based on the query results. This helps you view the overall status of a job.
Jobs
Query results are obtained based on filter conditions and provide job information for you to manage jobs. MaxCompute provides Regular Job List and Snapshot List on the Jobs page for you to obtain job information in different scenarios.
Regular Job List: used to view all job information over a period of time.
Snapshot List: used to view snapshot information of jobs that are running at a specific point in time, including the snapshot status, and the following information at the snapshot time: the number of CPU cores in use, the number of requested CPU cores, CPU utilization percentage, memory size occupied, requested memory size, and memory usage percentage.
The following job information cannot be collected:
Snapshot information of some jobs. The system collects the snapshot information at an interval of 3 minutes. In this case, the system does not collect the snapshots of jobs that are started within 3 minutes before collection.
Information about specific MaxCompute jobs that are created based on the PAI service, especially the jobs that are created by using RAM users.
Information about jobs in the projects of the MaxCompute developer edition. The MaxCompute developer edition will be phased out.
Data is processed at specific intervals. Therefore, some jobs are in the Running state in the query results but the jobs on the LogView page are complete. In most cases, this issue occurs when a job is run for an excessively short period of time. If this issue occurs, use the job state on the LogView page.
Regular job list
The following table describes the parameters.
Column name | Description |
Instance ID | The instance ID of the job. Each MaxCompute job runs as an instance and has an instance ID. This column also shows the project, computing quota, type information of the job. Note
|
Latest Status | The latest status of the job. |
Job Owner | The Alibaba Cloud account that is used to run the MaxCompute job. You can find the job owner based on the account information. If a job occupies an excessive number of resources and affects the execution of other jobs, you can contact the job owner to terminate the job. For more information about how to terminate a job, see Instance operations. |
Priority | The priority of the job. Each job has a priority, ranging from 0 to 9. A smaller value indicates a higher priority. High-priority jobs acquire computing resources before low-priority jobs. For more information, see Job priority. |
Submission Time | The time at which the job was submitted. |
Start Running Time | The time when the job received the first batch of computing resources. For the jobs that run for a short period of time or do not consume computing resources such as DDL statements, the job submission time is used instead. By default, the column is not displayed. You can select the relevant option in the Choose Display Fields dialog box to display the column. |
Waiting Duration | The duration from the time the job is submitted to the time the job starts to run. By default, the column is not displayed. You can select the relevant option in the Choose Display Fields dialog box to display the column. |
Execution Duration | The duration from the start running time to the end time of the job. By default, the column is not displayed. You can select the relevant option in the Choose Display Fields dialog box to display the column. |
End Time | The time at which the running of the job was complete. |
Total Duration | The interval from the time when the job was submitted to the time when the job was complete. |
Total Used CPU Resources | The total CPU resources that are used when you run a job. Unit: |
Total Amount of Used Memory Resources | The total memory consumption when you run a job. Unit: |
Scan Size | The amount of input data for computing in the job. |
Intelligent Diagnostics | Tags generated based on the results of intelligent job diagnostics. |
ExtPlantFrom | The client that initiates the job, such as DataWorks. This parameter is passed in by the client when it initiates the job. |
ExtNodeId | The ID of the task on which the job runs, such as the node ID of DataWorks. This parameter is passed in by the client when it initiates the job. |
ExtNodeOnDuty | The Alibaba Cloud account ID of the task owner, such as the account ID of the DataWorks node owner. This parameter is passed in by the client when it initiates the job. |
Signature | The signature of the SQL job. You can use the signature to find the instances on which each time an SQL statement is executed. |
Snapshot list
The following table describes the parameters.
Column name | Description |
Instance ID | The instance ID of the job. Each MaxCompute job runs as an instance and has an instance ID. This column also shows the project, computing quota, type information of the job. Note You can find the desired instance and click LogView in the Actions column to go to the LogView page and view the progress of the job. For more information about how to view the job progress on the LogView page, see Use LogView to view job information. You can also click Insights in the Actions column to navigate to the Job Insights page and view the diagnostic results and resource consumption information of the job, as well as information about similar jobs. For more information, see Job insights. |
Snapshot Time | The time when the job snapshot information is collected. |
Snapshot Status | The status of the job at the snapshot time. |
Job Owner | The Alibaba Cloud account that is used to run the MaxCompute job. You can find the job owner based on the account information. If a job occupies an excessive number of resources and affects the execution of other jobs, you can contact the job owner to terminate the job. For more information about how to terminate a job, see Instance operations. |
Priority | The priority of the job. Each job has a priority, ranging from 0 to 9. A smaller value indicates a higher priority. High-priority jobs acquire computing resources before low-priority jobs. For more information, see Job priority. |
CPU Cores in Use | The number of CPU cores in use for the job at the snapshot time. |
Requested CPU Cores | The number of requested CPU cores of the job at the snapshot time. |
CPU Satisfaction Rate | The CPU satisfaction rate of the job at the snapshot time, which is calculated by using the following formula: CPU Cores in Use/Requested CPU Cores. |
CPU Utilization Percentage Snapshot | The CPU utilization percentage of the job at the observation time, which is calculated by using the following formula: |
Memory Size Occupied | The size of occupied memory space of the job at the snapshot time. The unit is dynamically adjusted based on the memory size. |
Requested Memory Size | The size of requested memory space of the job at the snapshot time. The unit is dynamically adjusted based on the memory size. |
Memory Satisfaction Rate | The memory satisfaction rate of the job at the snapshot time, which is calculated by using the following formula: Memory Size Occupied/Requested Memory Size. |
Memory Usage Percentage Snapshot | The memory usage percentage of the job at the observation time, which is calculated by using the following formula: |
Submission Time | The time at which the job was submitted. |
Total Duration | The interval from the job submission time to the snapshot time. |
ExtPlantFrom | The client that initiates the job, such as DataWorks. This parameter is passed in by the client when it initiates the job. |
ExtNodeId | The ID of the task on which the job runs, such as the node ID of DataWorks. This parameter is passed in by the client when it initiates the job. |
ExtNodeOnDuty | The Alibaba Cloud account ID of the task owner, such as the account ID of the DataWorks node owner. This parameter is passed in by the client when it initiates the job. |
Signature | The signature of the SQL job. You can use the signature to find the instances on which each time an SQL statement is executed. |
Examples of O&M scenarios
View the details about a specific job
Scenario
You want to view the details of a specific MaxCompute job or a job that is scheduled by a DataWorks node on an hourly basis.
Procedure
Log on to the MaxCompute console. In the left-side navigation pane, click Jobs.
Specify the Time Range parameter based on your business requirements.
Click Search.
Select ExtNodeId or Instance ID from the drop-down list below the query results and enter the value of ExtNodeId or Instance ID for your job.
Click the
icon to filter the jobs. In the query results, you can find the desired instance and click LogView in the Actions column to view the details of the job on the LogView page. For more information about LogView, see Use LogView V2.0 to view job information.
View the details about a job in a specific time range
Scenario
You want to view the jobs that are managed on the last day for the Project_1 and Project_2 projects, identify failed jobs, and troubleshoot errors.
Procedure
Log on to the MaxCompute console. In the left-side navigation pane, click Jobs.
On the Jobs page, set the Time Range parameter to 1d or set the time range from
the current timeof the last day to the current time of the current day.Select Project_1 and Project_2 from the Choose Project drop-down list.
In the query results, you can find the desired instance and click LogView in the Actions column to view the details of the job on the LogView page. For more information about LogView, see Use LogView V2.0 to view job information.
View the resources occupied by a job with a subscription quota at a specific point in time
Scenario
A large number of resources in the quota group named Subscription Default Quota are occupied. As a result, multiple jobs are waiting for the resources of the quota group. You can use the following method to view the jobs that use the quota.
Procedure
Log on to the MaxCompute console. In the left-side navigation pane, click Jobs.
Set the Time Range parameter to 1h. Alternatively, specify a custom
start timeand set theend timeto the current time. The end time is the time when you observe the job.Set the Select Quota parameter to
Subscription Default Quota.Click Search.
You can view the CPU Utilization Percentage Snapshot and Memory Usage Percentage Snapshot parameters of the jobs whose Snapshot Status is
Runningin the query results. You can check whether the job that has large values of the CPU Utilization Percentage Snapshot and Memory Usage Percentage Snapshot parameters meets your business requirements. You can determine whether the job runs as expected or whether the job needs to be terminated based on other job information.NoteIn the query results, you can find the desired instance and click LogView in the Actions column to view the details of the job on the LogView page. For more information about LogView, see Use LogView V2.0 to view job information.
View details of an MCQA job
Scenario
You want to view the status and details of the MCQA job in the last day.
Procedure
Log on to the MaxCompute console. In the left-side navigation pane, click Jobs.
Set the Time Range parameter to 1d and select SQLRT (Query Acceleration) from the Job Type drop-down list.
Click Search.
In the query results, you can find the desired instance and click LogView in the Actions column to view the details of the job on the LogView page. For more information about LogView, see Use LogView V2.0 to view job information.
NoteFor MCQA jobs, multiple SQL statements may be executed in the same session. One session corresponds to one instance ID. You can click an instance ID to view the status of all SQL statements in a session on the LogView page. Take note of the following issues when you query a job of this type on the Operation and Maintenance page:
An active session indicates that some SQL statements are still being executed. If a session remains active, the job is in the
Runningstate.If a session expires or is closed, the job is in the
Canceledstate.
View the resource consumption of a job and the resource allocation of computing quotas at a specific point in time
Scenario
If a job is not complete for a long period of time and you cannot locate the cause on the LogView page, you can analyze the job to check whether the issue occurs due to insufficient resources. After the job is complete, if the job runs at a low speed, you can analyze the job to check whether the issue occurs due to insufficient resources.
Procedure
Log on to the MaxCompute console. In the left-side navigation pane, click Jobs.
On the Jobs page, specify the Time Range and Select Quota parameters and click Search to filter MaxCompute jobs.
In the obtained results, find the desired job and click Insight in the Actions column to go to the Job Insights page.
On the CU Usage tab, you can view the resource consumption in the lifecycle of the job.
You can view the trend of the number of used compute units (CUs) and the number of CUs that wait to be used by a job within a specific period of time, and the trend of the CU metrics at the quota group level within a specific period of time based on the resource consumption chart. If the number of CUs used by a job is small, but the number of CUs used by a job in a quota group is large or even continuously reaches the upper limit, the resources in the quota group are insufficient. In this case, other jobs preempt computing resources from the current job.
You can click a time point in the horizontal axis of the resource consumption trend chart to view the resource allocation in the quota group at the point in time. You can view the number of jobs that are using CUs and the number of jobs that wait to use CUs and view the statistics on the priorities of existing jobs. You can click the legend that corresponds to the desired priority to go to the job list and view the details of the jobs. This way, you can identify the jobs that preempt computing resources from the current job. You can adjust job priorities or manage computing resources to optimize job execution based on your business requirements. For more information, see Job priority or Manage quotas in the new MaxCompute console.
What to do next
If a job occupies an excessive number of resources and affects the execution of other jobs, perform the following operations:
If the job does not meet your business requirements, you can terminate the job.
If the job meets your business requirements, invalid settings of the resources in the quota group exist. In this case, you must optimize the resource configuration plan. For more information, see Optimization of computing resource configuration.
References
You can run commands to view the details and status of a job and terminate a job. For more information, see Instance operations.