Scheduling dependencies cannot be automatically added to batch synchronization nodes in a workflow based on the automatic parsing feature. If a node depends on the table generated by its ancestor batch synchronization node, you must manually add the table to the output of the batch synchronization node. This way, when the node queries the table data, the automatic parsing feature can help quickly find the batch synchronization node.
Method 1: Manually add the table generated by a batch synchronization node to its output
Method 2: Keep the name of a batch synchronization node the same as that of its generated table
- When you create a batch synchronization node, the system automatically generates an
output named in the
projectname.nodenameformat for the node.
- When the SQL node uses the generated table of the batch synchronization node, the
system automatically generates a dependent ancestor node named in the
projectname.tablenameformat for the SQL node.
- To prevent errors, you must make sure that the name of the dependent ancestor node is the same as that of the output of the batch synchronization node.
projectname.nodenameformat for the node. If you change the name of the node after the node is created, the name of the output does not change. Therefore, this method can be used only when you create a batch synchronization node. If you change the name of a node or a table generated by a batch synchronization node in subsequent operations to ensure consistent names, this error persists.
|Step No.||Detailed step||Configured scheduling dependency|
|1||Create nodes in the workflow based on the planning of the workflow.
In the preceding figure, virtual nodes, batch synchronization nodes, and MaxCompute nodes are created.
|After the nodes are created in DataWorks, the system automatically generates two outputs for each node. One is named in the
For example, the preceding figure shows that after the batch synchronization node user_1 is created, the system automatically generates the following outputs for the node:
|2||Connect the nodes by drawing lines based on the planning of the workflow to determine the dependency relationships of the nodes.||After the nodes are connected, the system automatically adds dependency configurations
for each ancestor node based on the connections.
For example, after you connect the nodes, the MaxCompute node sql_1 in the preceding
figure becomes a descendant node of the batch synchronization node user_1. In this
case, the system automatically configures an output named
|3||Develop task code for each node.||When you develop task code for each node, the system automatically parses some input
and output commands in the code and adds the output or descendant ancestor node for each node.
For example, the MaxCompute node sql_1 needs to use the data in the
- The batch synchronization node user_1 does not support automatic parsing. Therefore,
the table_1 table generated by user_1 is not automatically added to the output of user_1. This indicates that user_1 does not have an output named
- The system automatically adds a dependent ancestor node named in the
projectname.tablenameformat for the descendant node sql_1. In this example, the name of the dependent ancestor node is
doctest.table_1. However, the system does not add
doctest.table_1to the output of user_1. Therefore, the system cannot find the ID of user_1.
- When you commit sql_1, the system detects that sql_1 has an upstream dependency
doctest.table_1. However, the system cannot associate the upstream dependency with the ID of an ancestor node and reports an error indicating that the output name of the dependent ancestor node of sql_1 does not exist.