This topic describes how to use a for-each node to repeat a loop twice and display the loop count.

Background information

The number of times that a loop needs to be repeated by using a for-each node is the same as the number of elements in the one-dimensional array in the output of the parent node.
  • You can use for-each nodes only in DataWorks Standard Edition or higher.
  • You can use a for-each node to repeat a loop for a maximum of 128 times. If the loop count exceeds this limit, an error occurs.
  • If the for-each node needs to perform logic judgment and result traversal, you can use a branch node. However, the branch node must be used with the merge node to achieve result traversal.
  • You cannot run a for-each node in DataStudio. You must commit and deploy the node in DataStudio, and then run the node with an assignment node in Operation Center.

Create and configure a workflow

To create a workflow that contains an assignment node as the parent node and a for-each node as the child node, perform the following steps:

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Create a workflow.
    1. On the Data Development tab, move the pointer over the Create icon icon and click Business process.
    2. In the New business process dialog box, set the Business Name and Description parameters.
      Notice The workflow name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
    3. Click New.
  3. Create a for-each node.
    1. On the Data Development tab, move the pointer over the Create icon icon and choose Universal > for-each.
      Alternatively, you can click the workflow you just created in the Business process section, right-click General, and then choose New > for-each.
    2. In the New node dialog box, set the Node name and Destination folder parameters.
      Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
    3. Click Submit.
  4. Create an assignment node.
    1. On the configuration tab of the created workflow, drag Assignment node under Universal to the canvas on the right.
    2. In the New node dialog box, set the Node name and Destination folder parameters. By default, the assignment node is placed in the current workflow.
      Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
    3. Click Submit.
  5. Drag a directed line to configure the assignment node as the parent node of the for-each node.
    Configure dependencies

Configure the assignment node

  1. On the canvas of the created workflow, double-click the assignment node. The configuration tab of the assignment node appears.
  2. Select SHELL from the Please select an assignment language drop-down list.
  3. Enter the following statement in the code editor:
    echo 'this is name,ok';
  4. Click Scheduling configuration in the right-side navigation pane. On the Scheduling configuration tab, view the information about the outputs parameter under Output parameters of this node in the Node context section. The outputs parameter is the default output parameter of the assignment node.
    outputs
  5. Click the Save icon icon in the toolbar to save the assignment node.
  6. Commit the assignment node.
    Notice You must set the Rerun attribute and Dependent upstream node parameters on the Scheduling configuration tab before you can commit the node.
    1. Click the Submit icon icon in the toolbar.
    2. In the Submit New Version dialog box, enter your comments in the Change description field.
    3. Click OK.
    In a workspace in standard mode, you must click Publish in the upper-right corner after you commit the assignment node. For more information, see Deploy a node.

Configure the for-each node

  1. On the canvas of the created workflow, double-click the for-each node. The configuration tab of the for-each node appears. By default, the for-each node consists of the start, sql, and end nodes.
    SQL
  2. Delete the sql node.
    You can configure a specific type of node as the second node of the for-each node.
    • If you need to configure an ODPS SQL node as the second node, skip this step.
    • If you need to configure another type of node as the second node, delete the sql node first. In this example, configure a Shell node as the second node.
    1. Right-click the sql node and select Delete node.
      Delete node
    2. In the Delete message, click Confirm.
  3. Create and configure a Shell node.
    This step guides you through Shell node creation. You can use the same method to create other types of nodes. If you need to use the default sql node, skip this step.
    1. On the configuration tab of the for-each node, drag Shell under Universal to the canvas on the right.
      Shell
    2. In the New node dialog box, set the Node name parameter.
      Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). It is not case-sensitive.
    3. Click Submit.
    4. On the configuration tab of the for-each node, drag directed lines to configure the Shell node as the child node of the start node and as the parent node of the end node.
    5. Double-click the Shell node. The configuration tab of the Shell node appears.
    6. Enter the following code in the code editor:
      echo ${dag.loopTimes} ----Display the loop count.
      Note
      • The start and end nodes of the for-each node have fixed logic and cannot be edited.
      • After you modify the code of the Shell node, save the modification. No message will appear reminding you to save the modification when you commit the node. If you do not save the modification, the code cannot be updated to the latest version in time.
      A for-each node supports the following environment variables:
      • ${dag.foreach.current}: the current data entry.
      • ${dag.loopDataArray}: the input dataset.
      • ${dag.offset}: the offset of the loop count to 1.
      • ${dag.loopTimes}: the loop count, whose value equals the value of ${dag.offset} plus 1.
      // Compare the code of the Shell node with that of a common for loop.
      data=[]  // It is equivalent to ${dag.loopDataArray}.
      // i is equivalent to ${dag.offset}.
      for(int i=0;i<data.length;i++) {
        print(data[i]);  // data[i] is equivalent to ${dag.foreach.current}.
      }
  4. Configure the for-each node.
    1. On the configuration tab of the for-each node, click Scheduling configuration in the right-side navigation pane.
    2. Find the loopDataArray parameter under Input parameters of this node in the Node context section and click Edit in the Operation column. The loopDataArray parameter is the default input parameter of the for-each node.
    3. Select the outputs parameter of the parent node from the drop-down list in the Value source column.
      outputs
      Note After you configure a parent node for the for-each node, you must manually specify the input parameter for the for-each node on the Scheduling configuration tab. If you do not specify the input parameter, an error occurs when you commit the for-each node.
    4. Click Save.
  5. Click the Save icon icon in the toolbar to save the for-each node.
  6. Commit the for-each node.
    Notice You must set the Rerun attribute and Dependent upstream node parameters on the Scheduling configuration tab before you can commit the node.
    1. Click the Submit icon icon in the toolbar.
    2. In the Submit dialog box, select the nodes to commit and enter your comments in the Remarks field.
      Submit dialog box
    3. Click Submit.
    In a workspace in standard mode, you must click Publish in the upper-right corner after you commit the for-each node. For more information, see Deploy a node.
  7. Test the for-each node and view the running result.
    1. On the configuration tab of the for-each node, click Operation & Maintenance (O & M) in the upper-right corner to go to Operation Center.
    2. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Task.
    3. On the page that appears, click the name of the for-each node or assignment node that you created in the node list. In the directed acyclic graph (DAG) on the right, right-click the assignment node and choose Run > Current and Descendant Nodes Retroactively. In the Patch Data dialog box, select the assignment node and for-each node and click OK.
      Generate retroactive data
    4. On the Patch Data page, wait until the retroactive data generation instance is run and click DAG in the ACTIONS column of the instance.
    5. In the DAG that appears, right-click the assignment node and select View Runtime Log to view its operational logs.
      View the operational logs
    6. On the Patch Data page, right-click the for-each node in the DAG and select View Internal Nodes.
    7. On the page that appears, click Loop 1 in the middle pane, right-click the Shell node in the right pane, and then select View Runtime Log.
      View the operational logs
      On the page that appears, view the operational logs of the Shell node when the loop runs for the first time.1
    8. Use the same method to view the operational logs of the Shell node when the loop runs for the second time.
      2