All Products
Search
Document Center

DataWorks:Configure a merge node

Last Updated:Oct 09, 2023

This topic describes the definition of merge nodes and how to create a merge node and define the merging logic. This topic also provides a sample merge node to show how to configure and run a merge node.

Background information

Merge nodes are a type of logical control node in DataStudio. A merge node can merge the results of its ancestor nodes. A merge node aims to facilitate the scheduling of nodes that depend on the output of the child nodes of a branch node.

You cannot change the status of a merge node. A merge node merges the results of multiple child nodes of a branch node and sets the status to Successful. To ensure the proper scheduling of a node that depends on the output of the child nodes of a branch node, you can configure the node to depend on a merge node.

For example, Branch Node C has two logically exclusive branches C1 and C2. These two branches use different logic to write data to the same MaxCompute table. Assume that Node B depends on the output of this MaxCompute table. To ensure that Node B can be run as expected, you must use Merge Node J to merge the results of branches C1 and C2, and then configure Merge Node J as the parent node of Node B. If Node B directly depends on branches C1 and C2, one of the branches will fail to be run because only one branch meets the branch condition each time Branch Node C is run. In this case, Node B and its descendant nodes cannot be triggered as scheduled.

Limits

You can use merge nodes only in DataWorks Standard Edition or a more advanced edition. For information about how to purchase the DataWorks service and upgrade the DataWorks edition, see Differences among DataWorks editions.

Create a merge node

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

  2. In the Scheduled Workflow pane, move the pointer over the 新建 icon and choose Create Node > General > Merge Node.

  3. In the Create Node dialog box, configure the Name and Path parameters.

  4. Click Confirm.

Define the merging logic

After a merge node is created, the node configuration tab appears. Define the merging logic for the node.归并节点运行逻辑

  1. Add the branch node whose branches need to be merged. The branch node becomes an ancestor node of the merge node.

    In the Merged Branch drop-down list, search for the branch node by node name, node ID, or node output and click the 添加 icon.

    Note

    If you need to merge branches of multiple branch nodes, you must repeat this step to add the branch nodes one at a time.

  2. In the MERGE Condition section, configure merge conditions for the branch nodes.

    You need to configure merge logic conditions and states for the branch nodes.

    • The following merge logic conditions are supported:

      • AND: The node status specified in the Result section takes effect only if all the ancestor branch nodes are run and in the specified state.

      • OR: The node status specified in the Result section takes effect if all the ancestor branch nodes are run and an ancestor branch node is in the specified state.

    • You can specify the following states for the branch nodes:

      • Successful

      • Failed

      • Branch Not Running

  3. In the Result section, specify the status of the merge node.

    Note

    You can set the status of the merge node only to Successful.

The preceding figure shows a merge node with the following configurations:

  • Branch Nodes A and B are added as the ancestor nodes of the merge node.

  • The Successful, Branch Not Running, and Failed states are specified for Node A. In this case, Node A needs only to be run, regardless of the result.

  • The Successful and Branch Not Running states are specified for Node B. In this case, Node B needs to be run and the result must not be Failed.

  • The merge logic condition is set to AND.

Therefore, the Successful status of the merge node takes effect if Nodes A and B are run and the result of Node B is not Failed.

On the node configuration tab, click the Properties tab in the right-side navigation pane. Configure the scheduling properties of the merge node. For more information, see Configure basic properties.

Sample merge node

You can associate child nodes with different outputs of a branch node to define the branches under different conditions. For example, in the workflow that is shown in the following figure, the branches Branch_1 and Branch_2 are defined as the child nodes of the branch node.示例

Branch_1 depends on the output that is named autotest.fenzhi121902_1.依赖

Branch_2 depends on the output that is named autotest.fenzhi121902_2.输出

Run nodes

The condition of Branch_1 is met. The child node of this branch is run. You can select the branch and view the running details of the child node on the Runtime Log tab.

The condition of Branch_2 is not met. The child node of this branch is skipped. You can select the branch and view relevant information on the Runtime Log tab.

The child node of the merge node is run as expected.