All Products
Search
Document Center

MaxCompute:Monitoring and alerting on job timeout

Last Updated:Mar 26, 2026

Jobs that run too long can delay business operations and signal underlying issues such as resource contention or excessive computational load. MaxCompute integrates with CloudMonitor to let you set threshold-based alert rules on job runtime. When a job exceeds the threshold, CloudMonitor notifies your designated alert contact so you can investigate and act quickly.

Monitoring metrics

MaxCompute provides two metrics for monitoring job runtimes. Both metrics measure total runtime, including wait time.

Metric Scope Best for
Job runtime All jobs in a MaxCompute project Analyst projects where jobs typically run fast. Use this metric to detect resource contention or excessive computational load early.
Job runtime_SQL type All SQL jobs in a MaxCompute project Production projects. Use this metric to catch SQL job timeouts before they cause business delays.

Prerequisites

Before you begin, make sure you have:

Permissions

If a Resource Access Management (RAM) user needs to configure monitoring and alerting, grant the following policies to the RAM user in the RAM console. These are required in addition to standard CloudMonitor permissions.

  • AliyunCloudMonitorFullAccess

  • AliyunDataWorksFullAccess

For details, see Grant permissions to a RAM user.

Supported regions

China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Ulanqab), China (Chengdu), China (Hong Kong), US (Silicon Valley), US (Virginia), Malaysia (Kuala Lumpur), Japan (Tokyo), Germany (Frankfurt), Indonesia (Jakarta), UK (London), and Singapore.

Set up an alert rule

Step 1: Create an alert contact

  1. Log on to the CloudMonitor console.Cloud Monitor console

  2. In the left navigation pane, choose Alert > Alerts Contacts.

  3. On the Alert Contacts page, click the Alert Contacts tab.

  4. Click Create Alert Contact and fill in the required information in the Set Alert Contact window.

For more information, see Create an alert contact or alert contact group.

Step 2: Create an alert rule

  1. In the left navigation pane, choose Alert > Alerts Rules.

  2. On the Alert Rules page, click Create Alert Rule.

  3. In the Create Alert Rule dialog box, set Product to MaxCompute_Common.

  4. Configure the remaining parameters. For details, see Metric description.

Handle an alert

When a job exceeds the configured threshold, the alert contact receives a notification. Follow these steps to investigate and resolve the timeout.

  1. Log on to the MaxCompute console and select your region.MaxCompute console

  2. In the left navigation pane, choose Observation O\&M > Jobs.

  3. Find the timed-out job using the InstanceID from the alert notification.

  4. (Optional) If the job is still running, decide whether to stop it. For details, see Job O\&M.

  5. Investigate based on how the job was submitted:

    Submission type Action
    Submitted through a DataWorks node (the ExtPlantFrom value for the instance is DataWorks) Go to the DataWorks Operation Center, view the job details, and handle the timeout. For details, see Manage auto triggered tasks.
    Submitted directly (not through a DataWorks node) On the Job O\&M page, in the Instance list area, click LogView in the Actions column to view detailed job information and troubleshoot the timeout. For details, see Using Logview 2.0 to view job runtime information.