DataWorks batch synchronization nodes do not automatically add scheduling dependencies by parsing code. In a workflow that contains a batch synchronization node, if a downstream node depends on a table that the batch synchronization node creates, you must manually add this output table to the node's outputs. This allows the automatic parsing feature to find the correct upstream batch synchronization node when a downstream node queries the table.
Common mistakes
If you do not manually add the output table of a batch synchronization node to its outputs, the automatic parsing feature cannot find the node. When you submit an SQL node that references this batch synchronization node, the following error message appears.
This error occurs because the upstream dependency, which is automatically parsed from the downstream node, cannot be matched to the corresponding upstream batch synchronization node. For a detailed explanation, see Detailed explanation of the cause. To avoid this error, configure scheduling dependencies for workflows that contain batch synchronization nodes using one of the following two methods:
Method 1: Manually add the output table as a node output
To avoid this error, you must ensure that the upstream dependency parsed from a downstream node is added to the outputs of the upstream node. Therefore, after you configure the workflow, go to the Scheduling Configuration page for the batch synchronization node and manually add the output table as a node output. The following figure shows an example.
Method 2: Keep the node name and output table name the same
This method works based on the following logic:
When you create a batch synchronization node, an output is automatically generated for the node. The output name follows the
projectname.nodenamenaming convention for the Outputs of this node section.When an SQL node references the output table of an offline node, an Ancestor Node Dependency is created with the naming convention
projectname.tablenameTo prevent errors, the name of the upstream dependency in the SQL node must match the name of the output of this node for the batch synchronization node.
Therefore, you can keep the node name (`nodename`) and the output table name (`tablename`) the same. This ensures that no error occurs when you submit the node.
The projectname.nodename output of this node is generated when you create the node. If you change the node name after the node is created, the name of this automatically generated projectname.nodename output of this node does not change. This method works only when you create the batch synchronization node. Changing the node name to match the output table name after the node is created does not resolve the issue described in this topic.
Detailed explanation of the cause
The following figure shows the typical steps to create nodes and configure dependencies for a workflow that contains a batch synchronization node.
Step | Details | Scheduling dependency configuration |
1 | Create each node according to the workflow plan. For this example, create a virtual node, a batch synchronization node, and an ODPS node. | After you create a node in DataWorks, DataWorks automatically generates two output of this node configurations for it. One output name has the _out suffix. The other follows the For example, after you create the `user_1` batch synchronization node in the figure, the node has two outputs:
|
2 | Connect the nodes to define the upstream and downstream dependencies based on the logical execution order of the workflow. | After you connect the nodes on the workflow page, DataWorks automatically configures the dependencies. For example, the `user_1` batch synchronization node is upstream of the `sql_1` ODPS node. DataWorks automatically adds the |
3 | Develop the task code for each node. | When you develop the code, DataWorks automatically parses it. Based on the input and output commands, DataWorks adds an output of this node or an upstream dependency. For example, the `sql_1` ODPS node needs to use data from the |
After you complete these steps, an error occurs when you submit the node if you are unaware that batch synchronization nodes cannot be automatically parsed and their output tables are not automatically added as an output of this node. The error message indicates that the output name of the dependent parent node does not exist.
This error occurs for the following reasons:
Batch synchronization nodes do not support automatic parsing. Therefore, the output table `table_1` is not automatically added as an output of this node for the batch synchronization node. As a result, the `user_1` node does not have an output named
doctest.table_1.Automatic parsing adds an upstream dependency to the downstream node `sql_1`. This dependency is named based on the
projectname.tablenameDependent ancestor nodesdoctest.table_1naming convention for Dependent ancestor nodes. However, becausedoctest.table_1is not an output of `user_1`, this dependency in `sql_1` cannot be matched to the node ID of `user_1`.When you submit the `sql_1` node, the system detects the
doctest.table_1upstream dependency. Because this dependency is not associated with a node ID, the system cannot find the corresponding upstream node and reports an error that the parent node output name does not exist.