DataWorks provides for-each nodes. You can use a for-each node to traverse the output of an assignment node in loops. You can also customize the workflow in a for-each node. This topic provides an example on how to configure and use a for-each node. In this example, the for-each node is used to traverse the output of an assignment node in two loops, and the system displays the current number of loops in the Operation Center for each loop.

Prerequisites

DataWorks of the Standard Edition or a more advanced edition is activated.

Background information

In DataWorks, a for-each node is used to traverse the output of an assignment node in loops. Before you use a for-each node, you must configure the for-each node as a descendant node of an assignment node. After the assignment node passes its output to the for-each node, the for-each node traverses the output in loops. After you create a for-each node, the following three inner nodes are created: start, sql, and end. Inner nodes of a for-each nodeYou can customize the workflow in a for-each node and configure built-in variables to obtain the output of an assignment node. For more information, see Composition and application logic.

Procedure

Before you use a for-each node, you must configure the for-each node as a descendant node of an assignment node. The following figure shows the configuration procedure.

for-each node
  1. Configure node dependencies.

    Configure an assignment node as an ancestor node of a for-each node. For more information, see Create and configure a workflow.

  2. Configure inputs for the for-each node.

    In the Parameters section of the Properties tab for the for-each node, add the outputs parameter of the assignment node to Input Parameters. For more information, see Configure an assignment node.

  3. Configure the inner nodes of the for-each node.

    Customize the workflow in the for-each node based on your business requirements. Then, configure built-in variables for the inner nodes to enable the inner nodes to obtain and traverse the output of the assignment node in loops. For more information about the built-in variables, see Built-in variables. For more information about how to configure a for-each node, see Configure a for-each node.

Create and configure a workflow

To create a workflow that contains an assignment node as the ancestor node and a for-each node as the descendant node, perform the following steps:

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Create a workflow.
    1. Move the pointer over the Create icon and select Workflow.
    2. In the Create Workflow dialog box, specify Workflow Name and Description.
      Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
    3. Click Create.
  3. Create a for-each node.
    1. Move the pointer over the Create icon and choose General > for-each.
      Alternatively, find the required workflow in the Scheduled Workflow pane, click the workflow name, right-click General, and then choose Create > for-each.
    2. In the Create Node dialog box, specify Node Name and Location.
      Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
    3. Click Commit.
  4. Create an assignment node. For more information, see Configure an assignment node.
    1. On the configuration tab of the created workflow, drag Assignment Node in the General section to the canvas on the right side.
      Create an assignment node
    2. In the Create Node dialog box, specify Node Name and Location. The default value of the Location parameter is the path of the current workflow.
      Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
    3. Click Commit.
  5. Drag a directed line to configure the assignment node as an ancestor node of the for-each node.
    Configure dependencies

Configure an assignment node

  1. On the configuration tab of the created workflow, double-click the name of the assignment node that you created. The configuration tab of the assignment node appears.
  2. Select SHELL from the Language drop-down list.
  3. Enter the following statement in the code editor:
    echo 'this is name,ok';
  4. In the right-side navigation pane, click the Properties tab. In the Parameters section, view the information about the outputs parameter below Output Parameters. The outputs parameter is the default output parameter of the assignment node.
    outputs
  5. Click the Save icon in the top toolbar to save the assignment node.
  6. Commit the assignment node.
    Notice You must specify Rerun and Parent Nodes on the Properties tab before you commit the assignment node.
    1. Click the Commit icon in the top toolbar.
    2. In the Commit Node dialog box, specify Change description.
    3. Click OK.
    If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the node after you commit the node. For more information, see Deploy nodes.

Configure a for-each node

  1. Double-click the for-each node that you created. By default, the start, sql, and end nodes are displayed on the configuration tab of the for-each node.
    sql
  2. Delete the sql node.
    You can use a node other than an SQL node in the workflow of the for-each node.
    • If you want to use an ODPS SQL node, skip this step.
    • If you want to use a node other than an SQL node, delete the sql node first. In this example, a Shell node is used.
    1. Right-click the sql node and select Delete Node.
      Delete
    2. In the Delete message, click OK.
  3. Create and configure a Shell node.
    You can create other types of nodes by using the same method. If you want to use the default sql node, skip this step.
    1. Drag Shell in the General section to the configuration tab of the for-each node.
      Shell
    2. In the Create Node dialog box, specify Node Name.
      Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
    3. Click Create.
    4. Drag directed lines to configure the start node as an ancestor node of the Shell node and the end node as a descendant node of the Shell node.
    5. Double-click the Shell node. The configuration tab of this node appears.
    6. Enter the following code:
      echo ${dag.loopTimes} ----Display the current number of loops. 
      Note
      • The start and end nodes in the workflow of the for-each node have fixed logic and cannot be modified.
      • After you modify the code of the Shell node, save the modification. No message that reminds you to save the modification will appear when you commit the node. If you do not save the modification, the code cannot be immediately updated to the latest version.
      A for-each node supports the following environment variables:
      • ${dag.foreach.current}: the current data entry.
      • ${dag.loopDataArray}: the output of an assignment node.
      • ${dag.offset}: the offset of the current number of loops to 1.
      • ${dag.loopTimes}: the current number of loops. The value is equivalent to the value of ${dag.offset} plus 1.
      For more information about the variables, see Built-in variables and Examples of variable values.
  4. Configure the scheduling properties of the for-each node.
    1. On the configuration tab of the for-each node, click the Properties tab in the right-side navigation pane.
    2. Find the loopDataArray parameter below Input Parameters in the Parameters section and click Change in the Actions column. The loopDataArray parameter is the default input parameter of the for-each node.
    3. Select the outputs parameter of the assignment node from the drop-down list in the Value Source column.
      outputs
      Note After you configure the assignment node as an ancestor node of the for-each node, you must specify the input parameter for the for-each node on the Properties tab. If you do not specify the input parameter, an error occurs when you commit the for-each node.
    4. Click Save.
  5. Click the Save icon in the top toolbar to save the for-each node.
  6. Commit the for-each node.
    Notice You must specify Rerun and Parent Nodes on the Properties tab before you commit the node.
    1. Click the Commit icon in the top toolbar.
    2. In the Commit dialog box, select the inner nodes that you want to commit and enter your comments in the Description field.
      Commit
    3. Click Commit.
    If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the for-each node after you commit the for-each node. For more information, see Deploy nodes.
  7. Test the for-each node and view the result.
    1. On the configuration tab of the for-each node, click Operation Center in the upper-right corner.
    2. In the left-side navigation pane of the Operation Center page, choose Cycle Task Maintenance > Cycle Task.
    3. On the page that appears, click the name of the for-each node in the node list. In the directed acyclic graph (DAG) on the right side, right-click the assignment node and choose Run > Current and Descendant Nodes Retroactively. In the Patch Data dialog box, select the assignment node and for-each node and click OK.
      Run
    4. On the Patch Data page, wait until the data backfill node instance is run. Then, click DAG in the Actions column of the instance.
    5. In the DAG that appears, right-click the assignment node and select View Runtime Log to view the operational logs of the node.
      View the operational logs
    6. On the Patch Data page, right-click the for-each node in the DAG and select View Internal Nodes.
      View inner nodes
    7. On the page that appears, click Loop 1 in the middle pane, right-click the Shell node in the DAG, and then select View Runtime Log.
      View operational logs
      On the page that appears, view the operational logs of the Shell node in the first loop. 1
    8. Use the same method to view the operational logs of the Shell node in the second loop.
      2