After you develop the code for a node, you can debug the script or a code snippet using features like Run, Run with Parameters, and Quick run. This topic describes how to debug nodes and workflows in DataWorks and view the results.
Prerequisite
A node has been developed. For more information, see DataStudio.
Background
You can choose to debug a single node or debug a workflow. After a run is complete, you can view the results in Operating history, use the ad hoc query feature, and process the query results.
When you debug a node or a workflow, you are not charged for DataWorks scheduling resources, but you are charged for the compute engines used. For more information about compute engine fees, see the billing documentation for the respective engine.
Debug a single node
-
Go to the DataStudio page. In the left-side navigation pane, under Data Development or Manually Triggered Workflow, find the target workflow and double-click the node that you want to edit.
-
Run the node to debug it.
On the node's editor tab, use Run or Run with Parameters from the toolbar to debug the code logic. You can also use Quick run to debug a code snippet. The following table describes each option.
NoteIf you do not have permissions for the data you want to query when you run a node, see Overview of permission management for compute engines and data to learn about permission control for different compute engines in DataWorks.
Run option
Description
Use case
Run (
icon)This option lets you assign values to variables and specify a resource group for testing. These settings are saved for future runs.
NoteWhen you run a newly created node for the first time, a configuration dialog appears. You must manually assign constant values to the variables in the code. DataWorks saves the values you assign. The next time you run the node, the saved configuration is used by default.
Use this option when you need to debug code frequently.
Run with Parameters (
icon)Each time you run the node, you must assign constant values to variables for the test scenario and specify a resource group.
Use this option when you need to modify variable values in the code or change the resource group used by the task.
Quick run (
icon)This option lets you debug a code snippet in the code area of the node editor.
Use this option when you only need to debug a code snippet in a node.
NoteThis feature is used only to debug the correctness of a code snippet. To debug the complete code logic, use Run or Run with Parameters.
Debug a workflow
-
Go to the DataStudio page. In the left-side navigation pane, under Data Development or Manually Triggered Workflow, double-click the target workflow to open its panel.
-
Debug the workflow.
Click the Run (
icon) button on the workflow panel toolbar to run all tasks in the workflow sequentially based on dependencies.You can also right-click a specific node in the workflow panel and select Run Node and Downstream Nodes to run that node and all its downstream nodes based on dependencies.
The toolbar run and the right-click run differ in their mechanisms:
-
Toolbar run: Runs all nodes sequentially from upstream to downstream based on the complete DAG of the workflow. Each node is submitted independently, which is equivalent to running each node individually on its editor tab.
-
Right-click Run Node and Downstream Nodes: Starts from the selected node and runs it along with its downstream nodes using the dependency scheduling logic. This method validates the run status and output of upstream nodes, which differs from running a node directly on its editor tab (standalone debug mode).
If an error occurs when you right-click to run a node and its downstream nodes, but the same SQL runs without errors on the node editor tab, the cause is that the right-click run incorporates the dependency scheduling logic and validates the run status and output data of upstream nodes, while the direct run only executes the SQL code of the current node. Troubleshoot as follows:
-
Check the run status of upstream nodes. The right-click run validates upstream dependencies. If an upstream node has not run or has failed, the current node reports an error. Check the run status icons of upstream nodes in the workflow panel to confirm that all upstream nodes have run successfully.
-
Check the dependency configuration between nodes. Verify that the input and output parameter passing between nodes is correct and that the scheduling parameter assignments match expectations. On the schedule settings page of the node, check whether the upstream output table matches the input table of the current node.
-
Right-click the failed node, select View Run Logs, and locate the specific error information. The error in the logs usually contains the specific cause (such as table not found, insufficient permissions, or syntax error).
-
Check whether there are comment syntax issues in the SQL. Some engines parse comments differently in scheduling mode than in direct run mode, which may cause syntax errors. To troubleshoot, remove block comments (
/* */) from the SQL first and keep only line comments (--) to see whether the error persists. -
Check whether the output data of upstream nodes is ready. The right-click run depends on data tables produced by upstream nodes as input. If an upstream node has run successfully but its output data is not yet ready or has been cleaned up, the current node reports an error because it cannot read the input table.
Note-
When nodes in a manually triggered workflow have variables with the same name and the variables can be assigned uniformly, you can define workflow parameters on the workflow panel and assign values to the parameters. Then, run the workflow to view the assignment results and the execution status of the manual tasks.
-
Only some node types support workflow parameters. Refer to the actual interface for details.
-
After the run is complete, you can right-click a specific node in the workflow panel to view its runtime log.
-
View operating history
You can go to the Operating history page in Data Studio to view all task run records of the current account from the past three days.
After a task is run in Data Studio, it is submitted to the corresponding engine for execution. Even if you accidentally close the task during execution, the task continues to run. You can go to the Operating history page to view runtime logs or stop tasks that have been submitted for execution.
Create an ad hoc query file
If you only need to query data and related SQL code in Data Studio (the development environment) to verify whether the actual results match the expected values or to check code correctness, without deploying the data or SQL code to the production environment, you can use ad hoc query files.
If you do not have permissions for the data you want to query when you run a node, see Overview of permission management for compute engines and data to learn about permission control for different compute engines in DataWorks.
Process query results
After SQL code is executed successfully, you can perform the following operations on the query results.
|
Operation |
Description |
Reference |
|
Analyze data |
Synchronize query results to a spreadsheet for richer analysis operations. |
|
|
Share data |
Synchronize query results to a spreadsheet, and then use the data sharing feature of the spreadsheet to share the data with specified users. |
|
|
Download data |
Download query results to your local machine as a spreadsheet. By default, up to 10,000 rows of data are displayed. |
-
Tenant Administrator, Tenant Security Administrator, and RAM users who are assigned the Workspace Administrator role can go to Data Upload & Download settings to control the number of rows displayed in query results, the number of rows that can be downloaded, and whether downloads are allowed. For details about authorization, see Grant access to members.
-
The download feature is available only for DataWorks Standard Edition, Professional Edition, and Enterprise Edition. To use this feature, upgrade DataWorks to the corresponding edition. For more information, see DataWorks editions.