DataWorks provides do-while nodes. You can rearrange the workflow inside a do-while node, write the logic that you want to execute in a loop in the node, and then configure an end node to determine whether to exit from looping. You can also use a do-while node together with an assignment node to traverse the output of the assignment node in loops. This topic provides examples on how to configure a do-while node in simple and complex scenarios.

Prerequisites

DataWorks Standard Edition or a more advanced edition is activated.

Background information

In DataWorks, do-while nodes are a special type of node that contains inner nodes. After you create a do-while node, the following three inner nodes are created: start, sql, and end. The start node marks the start of a loop. The sql node runs a loop. The end node marks the end of a loop and controls the number of loops to run. The three inner nodes are organized as a workflow to traverse data in loops. do-while nodeYou can customize the sql node and use the built-in variables provided by the do-while node to write the code of the end node. For more information about logic principles, see Logic principles. You can plan the workflow inside your do-while node based on your business requirements. For more information about how to configure a do-while node, see the procedures described in the following sections.

Limits and usage nodes

  • Support for do-while nodes
    • You can use do-while nodes only in DataWorks Standard Edition or a more advanced edition.
    • A do-while node supports a maximum of 128 times the loop is run. If the number of times the loop is run determined by the end node exceeds 128, an error is returned.
  • Internal nodes
    • When you customize a do-while node, you can delete the dependencies between the internal nodes and rearrange the internal workflow of the do-while node. However, you must use the start node and the end node as the start and end nodes of the internal workflow of the do-while node.
    • When the internal nodes of a do-while node use a branch node to perform logical judgments or traverse results, a merge node also needs to be used.
    • You cannot add comments when you develop the code of the end node of a do-while node.
  • Test and running
    • If the workspace is in standard mode, you cannot directly test and run a do-while node in DataStudio.

      To test the do-while node and view the result, you must commit the do-while node to Operation Center and run the do-while node in Operation Center. If you use the value passed by an assignment node in the do-while node, run both the assignment node and do-while node during the test in Operation Center.

    • When you view the operational logs of a do-while node in Operation Center, right-click the do-while node and select View Internal Nodes to view the operational logs of the internal nodes.

Procedure

do-while node
  1. Configure node dependencies.

    Configure an assignment node as an ancestor node of a do-while node.

  2. Configure inputs for the do-while node.

    In the Input and Output Parameters section of the Properties tab for the do-while node, add the outputs parameter of the assignment node to Input Parameters.

  3. Configure the inner nodes of the do-while node.

    Customize the workflow inside the do-while node based on your business requirements. Then, configure built-in variables for the inner nodes of the do-while node to obtain and traverse the output of the assignment node in loops.

Create a do-while node

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where the required workspace resides, find the workspace, and then click Data Analytics.
  2. In the Scheduled Workflow pane of the DataStudio page, move the pointer over the Create icon and choose General > do-while.
    Alternatively, you can click the name of the workflow in which you want to create a do-while node, right-click General, and then choose Create > do-while.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.

Simple example of using a do-while node

This section describes how to use a do-while node to traverse the output of an assignment node in five loops and display the current number of loops each time a loop is run.

  1. Double-click the name of the do-while node. The configuration tab of the node appears.
    By default, the do-while node consists of the start, sql, and end nodes.
    • The start node marks the start of a loop and does not run business code.
    • The sql node is a sample business processing node provided by DataWorks. You can replace the sql node based on your business requirements. For example, you can replace this node with a Shell node named Display loop count.
    • The end node marks the end of a loop and determines whether to start the next loop. The end node defines the condition for exiting from looping for the do-while node.
  2. Delete the sql node.
    1. Right-click the sql node and select Delete Node.
      Delete Node
    2. In the Delete message, click OK.
  3. Create and configure a loop task node. In this example, a Shell node is used.
    1. Choose General > Shell and drag Shell to the canvas on the right.
      Shell
    2. In the Create Node dialog box, enter a name in the Node Name field.
      Notice The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
    3. Click Commit.
    4. On the canvas of the do-while node, drag lines to configure the Shell node as the descendant node of the start node and the ancestor node of the end node.
    5. Double-click the Shell node. The configuration tab of the Shell node appears.
    6. Enter the following code in the code editor:
      echo ${dag.loopTimes} ----Display the current number of loops. 
      • The ${dag.loopTimes} variable is a reserved variable of the system. This variable specifies the current number of loops, and the value of this variable starts from 1. All inner nodes of the do-while node can reference this variable. For more information about built-in variables, see Built-in variables and Examples of variable values.
      • After you modify the code of the Shell node, save the modification. No message that reminds you to save the modification will appear when you commit the node. If you do not save the modification, the code cannot be updated to the latest version in time.
  4. Configure the end node to control the number of loops that can be run.
    1. Double-click the end node. The configuration tab of the node appears.
    2. Select Python from the Language drop-down list.
    3. Enter the following code to define the condition for exiting from looping for the do-while node:
      if ${dag.loopTimes}<5: 
       print True; 
      else: 
       print False;
      • The ${dag.loopTimes} variable is a reserved variable of the system. This variable specifies the current number of loops, and the value of this variable starts from 1. All inner nodes of the do-while node can reference this variable. For more information about built-in variables, see Built-in variables and Examples of variable values.
      • In the code, the value of the dag.loopTimes variable is compared with 5 to limit the number of loops that can be run. The value of the ${dag.loopTimes} variable is 1 for the first loop and increases by 1 each time. In this case, the value of the ${dag.loopTimes} variable is 2 for the second loop and 5 for the fifth loop. The do-while node exits from looping when the result of ${dag.loopTimes}<5 is False.
  5. On the configuration tab of the do-while node, click the Properties tab on the right-side navigation pane to configure scheduling properties for the node. For more information, see Configure basic properties.
  6. Click the Save icon in the top toolbar.
  7. Commit the do-while node.
    Notice You can commit the do-while node only after you configure the Rerun and Parent Nodes parameters.
    1. Click the Commit icon in the top toolbar.
    2. In the Commit dialog box, select the nodes that you want to commit and enter your comments in the Description field.
    3. Click Commit.
    If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the do-while node after you commit it. For more information, see Deploy nodes.
  8. Test the node and view the result.
    Note If the workspace that you use is in standard mode, you cannot directly perform a test to run a do-while node in DataStudio.

    To perform a test to run the do-while node and view the result, you must commit the do-while node to Operation Center and run the do-while node in Operation Center. If you use the value passed by an assignment node in the do-while node, run both the assignment node and do-while node during the test in Operation Center.

    1. On the node configuration tab, click Operation Center in the upper-right corner to go to Operation Center.
    2. In the left-side navigation pane of the Operation Center page, choose Cycle Task Maintenance > Cycle Task.
    3. On the Cycle Task page, find the do-while node and click DAG in the Actions column to open the directed acyclic graph (DAG) of the do-while node. In the DAG of the do-while node, right-click the assignment node and choose Run > Current and Descendent Nodes Retroactively. In the Patch Data dialog box, configure the parameters and click OK.
    4. Refresh the Patch Data page. After the data backfill instance is run, click DAG in the Actions column of the instance.
    5. Right-click the do-while node and select View Internal Nodes.
      The internal workflow of the do-while node is divided into three parts:
      • The left pane of the view displays the rerun history of the do-while node. A record is generated each time a do-while node instance is run.
      • The middle pane of the view displays a loop record list that shows all existing loops of the do-while node and the status of each loop.
      • The right pane of the view displays the details about each loop. You can click a record in the loop record list to view the details of each instance in the loop.
    6. On the inner node page, click Loop 3 on the left, right-click the Shell node, and then select View Runtime Log.
    The preceding example shows that a do-while node works based on the following application logic:
    1. The system starts a loop from the start node.
    2. Other nodes inside the do-while node run in sequence based on the dependencies configured for them.
    3. The system executes the conditional statement defined in the code of the end node for exiting from looping.
    4. The system records the number of loops that are run, and the next loop starts if the conditional statement returns True in the logs of the end node.
    5. The entire looping process ends if the conditional statement returns False in the logs of the end node.

Complex example of using a do-while node

In addition to the preceding simple scenario, you may encounter complex scenarios in which each data entry is processed in sequence by using a loop. You can use a do-while node to process data in these scenarios. Before you use a do-while node to process data in these scenarios, make sure that the following conditions are met:
  • Another node is deployed and configured as the ancestor node of the do-while node. The node can pass its output to the do-while node. You can use an assignment node as the ancestor node.
  • The output of the assignment node is configured as the input of the do-while node. This way, the do-while node can obtain the output of the assignment node.
  • The inner nodes of the do-while node can reference each data entry. The built-in variable ${dag.offset} is used to reference the input parameters configured for the do-while node.
The following example shows how to configure a do-while node in a complex scenario. The preceding figure shows the following information:
  • The output of the assignment node is a two-dimensional array. The two-dimensional array is passed to the do-while node.
    Sample values of the two-dimensional array:
    +----------------------------------------------+
    | uid            | region | age_range | zodiac |
    +----------------------------------------------+
    | 0016359810821  | Hubei Province | 30 to 40 years old   | Cancer |
    | 0016359814159  | Unknown   | 30 to 40 years old   | Cancer |
    +----------------------------------------------+
  • The inner nodes of the do-while node use variables to obtain and print the loop parameters, offsets, and parameter values of the input from the ancestor assignment node.
  1. Create and configure an assignment node.
    Key points:
    • Value assignment code and input and output parameters: Select the language of the assignment node and write the code of the assignment parameter. The system generates output parameters for the output of the assignment node based on specific rules.
      Note The output of the assignment node is used as the input of the do-while node.
      Assignment node
    • Node dependencies: You can create an assignment node in the workflow and drag a line to configure the assignment node as the ancestor node of the do-while node.
    For more information, see Configure an assignment node.
  2. Configure the output of the assignment node as the input of the do-while node.
    On the configuration tab of the do-while node, click the Properties tab on the right-side navigation pane. In the Input and Output Parameters section, click Create. Set the Parameter Name parameter to input and the Value Source parameter to the output parameter of the ancestor assignment node.
    Note The input and output parameters are configured for the assignment node and the do-while node, not for the inner nodes of the do-while node.
    Properties
  3. Configure the inner loop task node of the do-while node.

    Double-click the name of the do-while node. The node configuration tab appears. Then, define the workflow inside the do-while node.

    By default, the do-while node consists of three nodes: start, sql, and end. In this example, you must delete the sql node, create a Shell node, and then write code for the Shell node to print the loop parameters. Take note of the following key points:
    • Node dependencies: After you delete the sql node and create a Shell node, you must drag lines to establish dependencies between the inner nodes. Complex case - inner nodes
    • Loop task code: When you write code for the Shell node, you can use built-in variables to print various loop parameters. For more information about the built-in variables available for a do-while node, see Built-in variables. You can refer to the following sample code to write code for the Shell node:
      echo '${dag.input}';
      echo 'Obtain the row data of the current loop:'${dag.input[${dag.offset}]};
      
      echo 'Obtain the offset:'${dag.offset};
      
      echo 'Obtain the number of loops that are run:'${dag.loopTimes};
      
      echo 'Obtain the length of the dataset passed by the ancestor assignment node _odpssql:'${dag.input.length};
      
      echo 'If you want to select data in a specific row and a specific column in the output of the assignment node, select the value based on a two-dimensional array:'${dag.input[0][1]};
  4. Define the loop exit condition for the end node.
    You can use the built-in variables supported by the do-while node for loop control. In this example, the values of the dag.loopTimes and dag.input.length variables are compared. The dag.loopTimes variable specifies the number of loops that are run, and the dag.input.length variable specifies the length of the dataset passed by the ancestor assignment node. If the value of the dag.loopTimes variable is less than the value of the dag.input.length variable, the end node returns True and the next loop starts. Otherwise, the end node returns False, and the entire looping process ends. In this example, the following code is used:
    if ${dag.loopTimes}<${dag.input.length}:
        print True;
    else:
        print False;
  5. Run the do-while node and view the result.
    Go to Operation Center, find the do-while node, and then open the DAG of the node. In the DAG, right-click the node name and choose Run > Current and Descendent Nodes Retroactively. In the Nodes section of the Patch Data dialog box, select the assignment node and the do-while node. After the data backfill instances are run, you can view the result in the logs.
    Note
    • If you use the value passed by an assignment node in the do-while node, run both the assignment node and do-while node during the test in Operation Center.
    • To view the operational logs of a do-while node in Operation Center, perform the following steps: find the do-while node and open the DAG of the node. In the DAG, right-click the node name and select View Internal Nodes to view the operational logs of the inner nodes.
    • View the output of the assignment node. Output of the assignment node
    • View the result that is returned after the end node is run for the first time
    • View the result that is returned after the end node is run for the second time

Summary

  • Comparison between a do-while node and the while, For Each, and do-while loop statements:
    • A do-while node runs based on a workflow that starts a loop before evaluation. This node functions the same way as the do-while statement. A do-while node can use the built-in variable ${dag.offset} and input and output parameters to achieve the feature of the For Each statement.
    • A do-while node cannot achieve the feature of the while statement because a do-while node runs a loop before evaluation.
  • Work procedure of a do-while node:
    1. The system runs a loop from the start node and runs other nodes based on the dependencies configured for them.
    2. After the system runs the code that is defined for the end node in a loop, one of the following situations occur:
      • The next loop starts if the end node returns True.
      • The entire looping process ends if the end node returns False.
  • Input and output parameters: The inner nodes of the do-while node use a variable ${dag.Input and output parameter names} to reference the input and output parameters configured for the do-while node.
  • Built-in variables: DataWorks provides the following built-in variables for the inner nodes of the do-while node:
    • dag.loopTimes: the number of loops that are run. The value of this variable starts from 1.
    • dag.offset: the offset of the number of loops that are run to 1. The value of this variable starts from 0.