DataWorks allows you to configure cross-cycle scheduling dependencies for nodes. You can configure the instance generated for a node in the current cycle to depend on the instances generated for one or more specific nodes in the previous cycle. The instance generated for the node in the current cycle can start to run only after the instances generated for one or more specific nodes on which the node depends are successfully run. If the instance generated for a node in the current cycle needs to depend on the data of an instance generated for another node on the previous day or if the instance generated for a node scheduled by hour or minute in the current cycle needs to depend on the instance generated for the same node in the previous cycle, you can configure cross-cycle scheduling dependencies. This topic describes how to configure cross-cycle scheduling dependencies for a node and the types of cross-cycle scheduling dependencies.
Precautions
When you configure cross-cycle scheduling dependencies, take note of the items described in the following table.
Item | Description | References |
Display of cross-cycle scheduling dependencies | Cross-cycle scheduling dependencies are presented as dashed lines in the directed acyclic graph (DAG) of a node. | |
Confirmation of the requirement of configuring same-cycle scheduling dependencies after cross-cycle scheduling dependencies are configured | After you configure scheduling dependencies for a node, the node can start to run only after all the ancestor nodes are successfully run. By default, the automatic parsing feature for same-cycle scheduling dependencies is enabled. If cross-cycle scheduling dependencies are configured for a node, you must check whether the node requires same-cycle scheduling dependencies. If the node does not require same-cycle scheduling dependencies, you must delete the automatically generated same-cycle scheduling dependencies to prevent the running of the node from being affected. | Select a scheduling dependency type (same-cycle scheduling dependency) |
Complex scenarios in which cross-cycle scheduling dependencies are required | In some complex scenarios, same-cycle scheduling dependencies may not be able to meet your business requirements. In this case, you can configure cross-cycle scheduling dependencies. For example, if a node scheduled by day depends on a node scheduled by hour, the instance generated for the node scheduled by day depends on all instances generated for the node scheduled by hour on the current day by default. | Principles and samples of scheduling configurations in complex dependency scenarios |
Preview of scheduling dependencies of a node | To prevent an auto triggered node in the production environment from being delayed due to the scheduling dependencies that do not meet expectations, we recommend that you preview the scheduling dependencies of the node before you deploy the node to the production environment. This ensures that the instances generated for the auto triggered node can run as expected. | |
Node deployment | After you configure cross-cycle scheduling dependencies for a node, you must deploy the node and its ancestor nodes to the production environment. After the deployment is complete, you can view the cross-cycle scheduling dependencies in Operation Center in the production environment. |
Entry point
Go to the configuration tab of the desired node in Data Studio. Click the Properties tab in the right-side navigation pane. In the Scheduling Dependencies section of the tab that appears, click Add Dependency, set the Dependency Type parameter to Cross-cycle Dependency, search for a node by node name, output name, or scheduling task ID, and then add the node as an ancestor node of the current node based on your business requirements.
Cross-cycle scheduling dependency types
Dependency type | Description | Scenario |
Dependency on the instance generated for the current node in the previous cycle | The instance generated for a node in the current cycle can start to run only after the instance generated for the same node in the previous cycle is successfully run. | The instance generated for a node in the current cycle depends on the latest business data of the instance generated for the same node in the previous cycle. |
Dependency on the instances generated for the level-1 descendant nodes of the current node in the previous cycle | The instance generated for the current node in the current cycle can start to run only after the instances generated for the descendant nodes of the current node in the previous cycle are successfully run. | The instance generated for the current node in the current cycle depends on whether the output table data of the current node in the previous cycle is cleansed by the instances generated for the descendant nodes of the current node in the previous cycle. |
Dependency on the instances generated for one or more specified nodes in the previous cycle | The instance generated for the current node in the current cycle can start to run only after the instances generated for one or more specified nodes in the previous cycle are successfully run. | The instance generated for the current node in the current cycle depends on the output table data of the instances generated for one or more other nodes in the previous cycle based on the business logic but does not use the data in the code. |
Dependency on the instance generated for the current node in the previous cycle
The instance generated for a node in the current cycle depends on the latest business data of the instance generated for the same node in the previous cycle. Check the following dependencies:
The instance generated for a node scheduled by hour in the current cycle depends on the instance generated for the same node in the previous cycle.
The instance generated for a node scheduled by day in the current cycle depends on the instance generated for the same node in the previous cycle.
Configure scheduling dependencies:
Go to the configuration tab of Node A in Data Studio. Click the Properties tab in the right-side navigation pane. Click Scheduling Dependencies on the tab that appears.
In the Node Dependencies section, click Add Dependency. In the form that appears, configure the following settings:
Set the Dependency Type parameter to Cross-cycle Self Dependency. Click Add to add the instance generated for Node A in the previous cycle as the ancestor dependency of the current node.
Click Save to save the scheduling dependency configurations.
The running results of instances generated for the node scheduled by hour in different scheduling cycles and those generated for the node scheduled by day in different scheduling cycles affect each other.
If the node scheduled by day depends on the node scheduled by hour or minute, the time when the instance generated for the node scheduled by day starts to run is affected by whether the node scheduled by hour or minute is configured with the self-dependency.
Node scheduled by hour or minute not configured with the self-dependency
If the node scheduled by hour or minute is not configured with the self-dependency, the instance generated for the node scheduled by day depends on all instances generated for the node scheduled by hour or minute on the current day. In this case, the node scheduled by day aggregates and processes all table data of all instances generated for the node scheduled by hour or minute on the current day.
Node scheduled by hour or minute configured with the self-dependency
If the node scheduled by hour or minute is configured with the self-dependency, the instance generated for the node scheduled by day depends only on a specific instance generated for the node scheduled by hour or minute based on the principle of scheduling time proximity. The scheduling time of the two instances are the closest.
For more information, see Appendix 1: Complex dependency scenarios.
Dependency on the instances generated for the level-1 descendant nodes of the current node in the previous cycle
If you configure this type of scheduling dependency for a node, the instance generated for the node in the current cycle can start to run only after the instances generated for the level-1 descendant nodes of the node in the previous cycle are successfully run.
Configure scheduling dependencies:
Go to the configuration tab of Node C in Data Studio. Click the Properties tab in the right-side navigation pane. Click Scheduling Dependencies on the tab that appears.
In the Node Dependencies section, click Add Dependency. In the form that appears, configure the following settings:
Set the Dependency Type parameter to Cross-cycle Dependency on Level-1 Descendant Node. The system automatically searches for the nodes on which the current node needs to depend based on the latest code in the configuration tab of the current node. You need to select Nodes A and B as the ancestor nodes of the current node.
Click Save to save the scheduling dependency configurations.
Flowchart description:
Description of Instance C1:
Data processing: Instance C1 processes data in the output tables of Instances A and B in the T-2 partition.
Data output: Instance C1 generates data in Table C1 in the T-1 partition.
Descriptions of Instances A1 and B1:
Data processing: Instances A1 and B1 process data in the output table of Instance C1 on the T-1 day.
Data output: Instances A1 and B1 generate data in Tables A1 and B1 on the T-1 day.
Description of Instance C2:
Data processing: Instance C2 processes data in the output tables of Instances A1 and B1 in the T-1 partition.
Data output: Instance C2 generates data in Table C2 in the T partition.
Descriptions of Instances A2 and B2:
Data processing: Instances A2 and B2 process data in the output table of Instance C2 on the T day.
Data output: Instances A2 and B2 generate data in Tables A2 and B2 on the T day.
Dependency on the instances generated for one or more specified nodes in the previous cycle
If you configure this type of scheduling dependency for a node, the instance generated for the node in the current cycle can start to run only after the instances generated for one or more specified nodes in the previous cycle are successfully run.
Configure scheduling dependencies:
Go to the configuration tab of Node B in Data Studio. Click the Properties tab in the right-side navigation pane. Click Scheduling Dependencies on the tab that appears.
In the Node Dependencies section, click Add Dependency. In the form that appears, configure the following settings:
Set the Dependency Type parameter to Cross-cycle Dependency.
Ancestor Object: Set the parameter to Name and select Node D, which does not belong to the workflow formed by Nodes A, B, and C.
Select Node D as the ancestor node of Node B.
Flowchart description: Node C has two descendant nodes: Node A and Node B, and the instance generated for Node B in the current cycle depends on the instance generated for Node D in the previous cycle. In this example, the current cycle is T, and the previous cycle is T-1. The instance generated for Node B in the current cycle can start to run only after the instance generated for Node D in the previous cycle is successfully run.
Preview scheduling dependencies
After you configure scheduling dependencies for a node, you can preview the scheduling dependencies. For more information, see Subsequent steps: Check whether the scheduling dependencies meet your expectations.