DataWorks provides for-each nodes. You can use a for-each node to traverse the output of an assignment node in loops. You can also customize the workflow in a for-each node. This topic describes the composition and application logic of a for-each node.
Usage notes
Item | References |
---|---|
Learn the use scenarios of for-each nodes. | Scenario
Note A for-each node is used to only traverse the result set of an assignment node in loops.
|
Learn the limits and precautions of for-each nodes, such as the upper limit for the number of loops, the method to test a for-each node, and the method to view logs of a for-each node. | Limits and Precautions |
Learn that you can configure an inner workflow for a for-each node based on your business requirements. When you configure an inner workflow for a for-each node, make sure that the inner workflow starts with the start node and ends with the end node. | Composition and workflow orchestration of a for-each node |
Learn that the number of loops for a for-each node is determined by the output of an assignment node. | Number of loops |
Learn that the built-in variables provided by a for-each node can be used to obtain the related values from the result set of an assignment node in each loop. | Built-in variables |
Learn samples of variable values and the number of loops for a for-each node. | Sample variable values and sample number of loops for a for-each node if a Shell or ODPS SQL node is used as an assignment node for the for-each node |
Scenario
A for-each node in DataWorks is used in loop traversal scenarios and must be used with an assignment node. An assignment node must be configured as the ancestor node of a for-each node. After the assignment node passes its output to the for-each node, the for-each node traverses the output in loops. For information about assignment nodes, see Configure an assignment node.
Limits
- Only DataWorks Standard Edition and more advanced editions support for-each nodes. For more information, see Differences among DataWorks editions.
- The maximum number of loops for a for-each node is 128. The actual number of loops for a for-each node is determined by the result set passed by an assignment node.
- Parallel execution is not supported. A loop can start only if the previous loop ends.
Precautions
Dimension | Item | Description |
---|---|---|
Dependencies | Dependency settings | A for-each node needs to traverse the value passed by an assignment node in loops. Therefore, an assignment node must be configured as the ancestor node of a for-each node. The for-each node must depend on the assignment node. For information about assignment nodes, see Configure an assignment node. |
Traversal support | Upper limit for the number of loops | The maximum number of loops for a for-each node is 128. If the number of loops for a for-each node exceeds 128, an error is reported. The actual number of loops for a for-each node is determined by the output of an assignment node. |
Number of loops | The number of loops for a for-each node is determined by the result set passed by an assignment node. | |
Inner nodes | Workflow orchestration |
|
Value acquisition | The built-in variables provided by a for-each node can be used to obtain a specific value passed by the assignment node that is configured as the ancestor node of the for-each node. | |
Debugging | Node debugging |
|
Log viewing | To view the operational logs of a for-each node in Operation Center, perform the following steps: find the for-each node on the Cycle Task page and open the directed acyclic graph (DAG) of the node. In the DAG, right-click the node name and select View Internal Nodes to view the operational logs of the inner nodes. |
Composition and workflow orchestration of a for-each node

- sql node
DataWorks automatically creates an inner SQL task node named sql. You can delete the default sql node and configure an inner loop task node based on your business requirements.
- If you use an SQL node as an inner loop task node, double-click the sql node and edit the code of the node on the node configuration tab.
- If your loop task is complicated, you can create more inner nodes in the inner workflow for the for-each node to process the loop task and connect the inner nodes based on your business requirements.
Note When you customize inner loop task nodes of a for-each node, you can delete the dependencies between the existing inner nodes of the for-each node, and configure an inner workflow for the for-each node based on your business requirements. When you configure an inner workflow for a for-each node, make sure that the inner workflow starts with the start node and ends with the end node.
- start and end nodes
The start node marks the startup of a loop, and the end node marks the end of a loop. The two nodes are not used to process a loop task.Note The number of loops for a for-each node is determined by the output of an assignment node rather than the end node of the for-each node. The assignment node is configured as the ancestor node of the for-each node.
Number of loops
- If the assignment node that you configure as the ancestor node of the for-each node uses Shell or Python, the number of loops for the for-each node is determined by the generated one-dimensional array. The number of loops is equal to the number of elements in the one-dimensional array. The elements are separated by commas (,).
For example, if the assignment node uses Shell or Python (Python 2), the output of the assignment node is a one-dimensional array such as
2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01
. The for-each node traverses the output of the assignment node in five loops. - If you use an SQL node as an inner loop task node of the for-each node, the number of loops for the for-each node is determined by the generated two-dimensional array. The number of loops is equal to the number of rows in the two-dimensional array.
For example, if ODPS SQL is used by the assignment node, the output of the assignment node is a two-dimensional array:
The output indicates that the for-each node traverses the output of the assignment node in two loops.+----------------------------------------------+ | uid | region | age_range | zodiac | +----------------------------------------------+ | 0016359810821 | Hubei Province | 30 to 40 years old | Cancer | | 0016359814159 | Unknown | 30 to 40 years old | Cancer | +----------------------------------------------+
Built-in variables
You can use the built-in variables provided by a for-each node to obtain the result set passed by the assignment node that is configured as the ancestor node of the for-each node. If the inner workflow for the for-each node contains an assignment node, you can obtain the output of the assignment node by using the default method. For more information about the default method, see Configure an assignment node.Built-in variable | Description | Compare with the for loop |
---|---|---|
${dag.loopDataArray} |
Obtain the dataset of an assignment node. | Equivalent to the code result in the for loop.
|
${dag.foreach.current} |
Obtain the current data entry. | Sample for loop code:
|
${dag.offset} |
Obtain the offset between the current loop and the first loop. | |
${dag.loopTimes} |
Obtain the number of loops that are finished. | - |
Other variable | Description |
---|---|
${dag.foreach.current[n]} |
If the output of the assignment node that is configured as the ancestor node of a for-each node is a two-dimensional array, the variable is used to obtain the data of a specific column for the current data entry when the for-each node traverses the output of the assignment node in loops. |
${dag.loopDataArray[i][j]} |
If the output of the assignment node that is configured as the ancestor node of a for-each node is a two-dimensional array, the variable is used to obtain the data of Row i and Column j in the dataset of the assignment node. |
${dag.foreach.current[n]} |
If the output of the assignment node that is configured as the ancestor node of a for-each node is a one-dimensional array, the variable is used to obtain the data of a specific column. |
Examples of variable values
Sample 1: A Shell node is used as an assignment node
- Output of the assignment node
A Shell node is used as an assignment node, and the last output of the assignment node is
2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01
. - Values of the variables for a for-each node
Note The output of the assignment node is a one-dimensional array, and the five elements in the array are separated by commas (,). Therefore, the number of loops for the for-each node is 5.
Built-in variable Value for the first loop Value for the second loop ${dag.loopDataArray}
2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01
${dag.foreach.current}
2021-03-28
2021-03-29
${dag.offset}
0 1 ${dag.loopTimes}
1 2 ${dag.foreach.current[3]}
2021-03-30
Sample 2: An ODPS SQL node is used as an assignment node
- Output of the assignment node
An ODPS SQL node is used as an assignment node, and the last SELECT statement returns the following two pieces of data:
+----------------------------------------------+ | uid | region | age_range | zodiac | +----------------------------------------------+ | 0016359810821 | Hubei Province | 30 to 40 years old | Cancer | | 0016359814159 | Unknown | 30 to 40 years old | Cancer | +----------------------------------------------+
- Values of the variables for a for-each node
Note The output of the assignment node is a two-dimensional array and two rows of data are contained in the array. Therefore, the number of loops for the for-each node is 2.
Built-in variable Value for the first loop Value for the second loop ${dag.loopDataArray}
+----------------------------------------------+ | uid | region | age_range | zodiac | +----------------------------------------------+ | 0016359810821 | Hubei Province | 30 to 40 years old | Cancer | | 0016359814159 | Unknown | 30 to 40 years old | Cancer | +----------------------------------------------+
${dag.foreach.current}
0016359810821,Hubei Province,30–40 years old,Cancer
0016359814159,unknown,30–40 years old,Cancer
${dag.offset}
0 1 ${dag.loopTimes}
1 2 ${dag.foreach.current[0]}
0016359810821
0016359814159
${dag.loopDataArray[1][0]}
0016359814159