When a task instance fails to run or completes later than expected, Intelligent diagnosis walks through the four conditions an instance must meet before it can run — upstream dependencies, scheduling time, resource availability, and execution — and identifies exactly where the failure occurred. For failed instances, the built-in AI analysis parses error logs and suggests corrective actions.
Limitations
-
Intelligent diagnosis requires DataWorks Professional Edition or higher. If you use a different edition, you can try the feature for free. To access the full feature set, upgrade to Professional Edition. For more information, see Differences among DataWorks editions.
-
Intelligent diagnosis is supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), US (Virginia), and UAE (Dubai).
How it works
Intelligent diagnosis examines a task instance from four angles:
-
Running details: Checks in sequence whether ancestor instances completed successfully, whether the scheduled time has arrived, whether scheduling resources are available, and whether the instance itself ran without errors.
-
General: Shows key timestamps and basic scheduling properties for the instance.
-
Impact baseline: Shows which monitored baselines include the task and their current status.
-
Historical instance: Shows 15-day trends for run duration, start time, resource wait time, and completion time.
Open intelligent diagnosis
Prerequisites
Before you begin, ensure that you have:
-
Access to a DataWorks workspace at Professional Edition or higher
To go to Operation Center:
-
Log on to the DataWorks console. In the top navigation bar, select the target region.
-
In the left-side navigation pane, choose Data Development and O&M > Operation Center.
-
Select the target workspace from the drop-down list and click Go to Operation Center.
Navigate to the diagnosis page
From Operation Center, open the Intelligent Diagnosis page using one of the following methods:
Option 1: From the instance list
-
In the left-side navigation pane, choose Auto Triggered Node O&M > Auto Triggered Instances.
-
On the Instance Perspective tab, find the target instance.
-
Click Perform Diagnostics in the Actions column.
Option 2: From the DAG page
-
In the left-side navigation pane, choose Auto Triggered Node O&M > Auto Triggered Instances.
-
On the Instance Perspective tab, find the target instance and click DAG in the Actions column.
-
On the DAG page, right-click the instance and select Instance Diagnose.
Option 3: Search by instance ID
In the left-side navigation pane, choose O&M Assistant > Intelligent Diagnosis, then search for instances by instance ID only.
Diagnose an instance
Running details tab
The Running Details tab walks through the four conditions an instance must meet before it can run. DataWorks checks them in sequence.
Upstream nodes
Displays the status of ancestor instances. If an ancestor instance fails to be run, the current instance is blocked.
To diagnose a failed ancestor instance, click Instance Diagnose in the Operation column of that instance.
Tip: If the upstream dependency chain is complex and multiple ancestor instances are incomplete, use the upstream analysis feature on the Upstream Analysis tab of the DAG page to identify the specific ancestor instances blocking execution. Then use intelligent diagnosis on those instances.
Timing check
Checks whether the scheduled time for the instance has arrived. This check runs only after the Upstream Nodes check passes.
When you configure scheduling properties for a task on the DataStudio page, you must specify the time at which the task is scheduled to run in the scheduling system. However, the actual time at which the task starts to be run may be later than the scheduling time due to issues such as the failure of an ancestor task.
Resources
Shows resource usage for the scheduling resource group assigned to this instance. If the check fails, the scheduling resources are fully occupied and the instance waits until resources are released.
| Section | Description |
|---|---|
| Scheduling resource information | Resource group name, number of running instances, and number of waiting instances on that resource group |
| Diagnosis Results | Execution status of the current instance |
| Resource Usage Trends | Resource usage per time period for the resource group; for shared resource groups, also shows how long the instance has been waiting |
To reduce resource contention, use serverless resource groups. If you use shared resource groups, note that peak demand runs from 00:00 to 09:00 every day — schedule tasks outside this window to reduce wait times.
Execution
Shows run logs, data quality (DQ) monitoring rule details, and node code for the instance. For failed instances, the Intelligent Diagnostics tab analyzes the error logs using large language models (LLMs) and provides suggested fixes.
| Tab | Description |
|---|---|
| Log | Full run log for the instance. For EMR nodes, click the EMR web UI URL to view EMR resource details. Click Intelligent Diagnostics in the lower-right corner to go directly to the AI analysis tab. |
| Intelligent Diagnostics | Analyzes error logs using Tongyi Qianwen, DeepSeek, or DW Knowledge Base. Tongyi Qianwen and DeepSeek parse error logs and generate analysis results with suggested fixes. DW Knowledge Base surfaces relevant knowledge base articles. |
| DQC | Data quality monitoring rule details. If a DQ rule is associated with the task, it is triggered after the task runs. |
| Code details | Code of the node that generated this instance. |
After reviewing the AI analysis, you can take direct action from the Intelligent Diagnostics tab: edit the instance code, rerun the instance, set the instance status to success, change the resource group for scheduling or Data Integration, submit a ticket, or apply for table permissions.
General tab
The General tab shows key timestamps and basic scheduling properties for the current instance. For details on the scheduling properties, see Configure basic properties.
Impact baseline tab
The Impact baseline tab shows which baselines include the task within their monitoring scope and the current status of each baseline. For more information about baselines, see Overview.
Historical instance tab
The Historical instance tab shows 15-day trends and a historical run list for the current node.
Trend charts
The trend charts show the following metrics for the current node within the recent 15 days:
| Chart | Description |
|---|---|
| Running time | Run duration trend for the current node |
| Start run time | Start time trend for the current node |
| Time consumption of waiting for scheduling resources | Resource wait time trend for the current node |
| Completed At | Completion time trend for the current node |
Historical instance list
The list shows each instance's start time, completion time, run duration, and resource wait time for the past 15 days. Click Instance Diagnose in the Operation column to open the diagnosis page for any historical instance.