Intelligent diagnosis - DataWorks - Alibaba Cloud Documentation Center

The intelligent diagnosis feature allows you to perform end-to-end diagnosis on node instances. If node instances are not run as expected, you can use this feature to identify problems.

Overview

You can use the intelligent diagnosis feature to diagnose and analyze node instances from the following dimensions:

End-to-end diagnosis:
- Check the status of ancestor instances of the current instance: If an ancestor instance of the current instance fails to be run, the current instance is blocked. The intelligent diagnosis feature can help you identify the reason for the failure of the ancestor instance.
- Check whether the scheduling time configured for the current instance has arrived.
  Note
  When you configure scheduling properties for a node for which the current instance is generated on the DataStudio page, you must specify the time at which the node is scheduled to run in the scheduling system. However, the actual time at which the node starts to be run may be later than the scheduling time of the node due to issues such as the failure of an ancestor node of the current node.
- Check the usage of scheduling resources: You can view the resource usage and the list of instances that occupy resources when the current instance is waiting for the resources.
- View running details of the current instance: You can view the run logs of the current instance, details of associated data quality monitoring rules, code details of the node for which the current instance is generated, and suggestions on the current instance based on diagnosis results.
Note
- An instance can be scheduled to run only when the following conditions are met: Ancestor instances of the instance are successfully run, the scheduling time of the instance has arrived, scheduling resources are sufficient, and the instance has not been run. For more information, see What are the conditions that are required for a node to successfully run?.
- If some ancestor instances of the current instance are not run and dependencies between the current instance and its ancestor instances are complex, we recommend that you use the ancestor node analysis feature on the Upstream Analysis tab of the DAG page to identify the key ancestor instances that block the running of the current instance. Then, you can use the intelligent diagnosis feature to identify the reason why the ancestor instances are not run. This improves O&M efficiency.
Basic information: You can view the key points in time for the current instance.
Affected baseline: You can view the baseline that contains the node for which the current instance is generated within the monitoring scope and the status of the baseline. For more information about baselines, see Overview.
Status of the historical instances of the current node: You can view the status of the historical instances of the current node within recent 15 days in chart mode or list mode.

Limits

Only users of DataWorks Professional Edition or a more advanced edition can use the intelligent diagnosis feature. If you use another edition, you can have a trial use of the feature for free. However, we recommend that you upgrade the DataWorks service to DataWorks Professional Edition to use more features. For more information about edition upgrade, see Differences among DataWorks editions.

Go to the Intelligent Diagnosis page

Go to the Operation Center page.
Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > Operation Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.
On the Operation Center page, use one of the following methods to go to the Intelligent Diagnosis page:
- Method 1: In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Instance, Cycle Task Maintenance > Patch Data, Cycle Task Maintenance > Test Instance, or Manual Task > Manual Instance. Then, use one of the following methods to go to the Intelligent Diagnosis page of a desired instance:
  - Click the status icon of the desired instance to go to the Intelligent Diagnosis page of the instance, as shown in the following figure.
  - In the list of instances, find the desired instance and click To diagnose in the Actions column to go to the Intelligent Diagnosis page of the instance. If the current page is not displayed in list mode, you can click the icon in the middle of the page. The current page is displayed in list mode.
  - On the DAG page of the desired instance, right-click the instance and select Instance Diagnose. If the current page is not displayed in DAG mode, you can click DAG in the Actions column of the desired instance to open the DAG of the instance.
  - On the DAG page of the desired instance, click the instance. In the pane that appears in the lower-right corner, click To diagnose next to Node Status.
- Method 2: In the left-side navigation pane, click Intelligent Diagnosis.
  Note
  The intelligent diagnosis feature allows you to search for instances only by instance ID. You can obtain the instance ID on the instance details page.

End-to-end diagnosis

On the End-to-end Diagnostics tab, DataWorks checks the status of ancestor instances of the current instance, the scheduling time configured for the current instance, the usage of scheduling resources, and the status of the current instance in sequence based on the conditions required for running an instance.

Upstream Nodes
In the Upstream Nodes step on the End-to-end Diagnostics tab of the Intelligent Diagnosis page, you can view the status of ancestor instances of the current instance. If an ancestor instance fails to be run, the current instance is blocked. You can click Instance Diagnose in the Operation column of the ancestor instance to identify the reason for the failure.
Note
If some ancestor instances of the current instance are not run and dependencies between the current instance and its ancestor instances are complex, we recommend that you use the ancestor node analysis feature on the Upstream Analysis tab of the DAG page to identify the key ancestor instances that block the running of the current instance. Then, you can use the intelligent diagnosis feature to identify the reason why the ancestor instances are not run. This improves O&M efficiency.
Timing Check
In the Timing Check step, you can check whether the scheduling time configured for the current instance has arrived. The check is triggered only when the upstream dependency check is successful.

Resources

In the Resources step, you can view the resource usage. If the current instance fails to pass the resource usage check, the scheduling resources used for running the current instance are insufficient. In this case, the current instance enters the state of waiting for resources. The current instance can start to be run only when instances that occupy the scheduling resources are complete and the scheduling resources are released. You can arrange the scheduling time of the current instance to avoid peak hours based on the information in the Resources step.

Section	Description
Scheduling resource information	Allows you to view the name of the resource group for scheduling that is used by the current instance, the number of instances that are running on the resource group for scheduling, and the number of instances that are waiting to be run on the resource group for scheduling. Note Shared resource group for scheduling: The peak hours for DataWorks nodes are from 00:00 to 09:00 every day. During this period of time, resources in the shared resource group for scheduling may be insufficient, and nodes may wait for resources. In this case, you can change the scheduling time of nodes or purchase an exclusive resource group for scheduling or a custom resource group for scheduling in the DataWorks console.
Resource Usage Trends	Allows you to view the resource usage of the current resource group for scheduling within each time period and the time consumed by the current instance to wait for resources.
Resource-consuming tasks	Allows you to view the instances that occupy resources in the current resource group for scheduling during the time period in which the current instance is waiting for resources.

Execution

In the Execution step, you can view the run logs of the current instance, details of associated data quality monitoring rules, and code details of the node for which the current instance is generated. For an instance that fails to be run, the intelligent diagnosis feature provides diagnosis results and suggestions based on log information. This helps you identify the cause of the error that occurs on the instance. Execution

Important

The Diagnostic Information and MaxCompute tabs are displayed in the Execution step only for instances that are generated for MaxCompute nodes.

Tab	Description
Diagnostic Information	Log Diagnostics: This section displays key error information and error cause, and provides suggestions based on diagnosis results. Diagnostics of Computing Resources: A prompt appears if the current instance waits for resources for a long period of time after the node for which the current instance is generated is committed to a desired compute engine.
Log	On this tab, you can view the running details of the current instance.
MaxCompute	On this tab, you can view MaxCompute jobs and computing resources. Note An instance that is generated for a MaxCompute node in DataWorks is divided into several MaxCompute task instances. The MaxCompute task instances are run in sequence. When the node for which the current instance is generated meets the required conditions, DataWorks commits the node to a desired compute engine based on the type of the node. If computing resources are insufficient, the node may wait for computing resources. As a result, the node may be run slowly.
DQC	If you associate a data quality monitoring rule with the node for which the current instance is generated, the data quality monitoring rule is triggered after the node is run. You can view details of the data quality monitoring rule on this tab.
Code details	On this tab, you can view the code details of the node for which the current instance is generated.

Basic information

On the General tab, you can view key points in time for the current instance and basic information about the current instance. For more information about scheduling properties that are configured for the node for which the current instance is generated, see Configure basic properties.

Affected baseline

On the Impact baseline tab, you can view the baseline that contains the node for which the current instance is generated within the monitoring scope and the status of the baseline. For more information about baselines, see Overview.

Historical instances

On the Historical instance tab, you can view the following information:

Trends of the following metrics measured for the current node within recent 15 days in charts: Running time, Start run time, and Time consumption of waiting for scheduling resources.
Running details of the instances that are generated for the current node over a historical period of time in the Historical instance list, including the time when an instance started to run, the time when the instance was complete, the running duration, and the time spent for waiting for resources. You can click Instance Diagnose in the Operation column of an instance to go to the diagnosis details page of the instance.