Scheduling dependencies in DataWorks are the dependency relationship between ancestor and descendant auto triggered nodes. The nodes are orderly scheduled to run based on the scheduling dependencies. The descendant nodes start to run after the ancestor nodes finish running. This ensures that valid business data is generated at the earliest opportunity. This topic describes how to configure scheduling dependencies for a node to prevent data exceptions due to incorrect configurations of scheduling dependencies. We recommend that you read this topic before you configure scheduling dependencies for a node.

Note
  • Scheduling time of a node: the time at which you want to schedule a node in scheduling scenarios.
  • Scheduling dependencies of a node: another node on which a node depends. In most cases, data dependencies exist for a node and another node on which the node depends.
Therefore, the time at which a node is actually run is determined not only by its scheduling time but also the time at which its ancestor nodes finish running. This indicates that the actual running time of the node also depends on the scheduling time of its ancestor nodes. If the ancestor nodes do not finish running, the node cannot start to run at its scheduling time even through the scheduling time of the node is earlier than that of its ancestor nodes. For more information, see Impacts of dependencies between nodes on the running of the nodes.

Scheduling dependency configuration guide

Scheduling dependencies between nodes in DataWorks are configured to ensure that descendant nodes obtain valid data from ancestor nodes, which indicates that a strong lineage exists between tables generated by the ancestor and descendant nodes. You can decide whether to configure scheduling dependencies for a node based on the lineage between tables generated by the node and its ancestor nodes based on your business requirements.

Not configure scheduling dependencies for a node based on the lineage between tables generated by the node and its ancestor nodes, but based on business requirements

You can configure scheduling dependencies for a node based on your business requirements in the following scenarios: Scenario 1: No strong lineage exists between the node and its ancestor nodes. For example, the node does not strongly depend on the data in a specific partition of its ancestor nodes, but depends on only the data in the partition that has the largest partition key value. Scenario 2: The node depends on the data in the table that is not generated by an auto triggered node. For example, the node depends on the data in the table that is uploaded from your on-premises machine. You can use the following methods to configure scheduling dependencies for a node based on business requirements:
  • Configure the root node of a workspace as the ancestor node:

    You can configure the root node of a workspace as the ancestor node of a node in the following scenarios: Scenario 1: A data synchronization node depends on data in other business databases. Scenario 2: An SQL node processes the data in the table generated by a real-time synchronization node.

  • Configure a zero load node as the ancestor node:

    If a workspace contains a large number of workflows or a workspace contains complex workflows, you can use a zero load node to manage the workflows. You can configure the zero load node as the ancestor node of the nodes that you want to manage in a centralized manner. This way, data forwarding paths in the workspace are clearer. For example, you can configure the scheduling time for nodes in a workflow, and schedule or freeze nodes in a workflow in a centralized manner.

Configure scheduling dependencies for a node based on the lineage between tables generated by the node and its ancestor nodes

For information about precautions for configuring scheduling dependencies, see Precautions.
No.GoalDescriptionReferences
1 and 2Check whether a strong lineage exists between tables generated by the ancestor and descendant nodes. After you configure scheduling dependencies for a node, the node can start to run only when all ancestor nodes of the node are successfully run. Otherwise, data quality errors may occur when the node obtains data from its ancestor nodes. Successful running of all ancestor nodes is one of the conditions that are required to run the node.

To ensure that the node is run at its scheduling time, you can check whether a strong lineage exists between tables generated by the node and its ancestor nodes, and whether to configure the scheduling dependencies for the node based on the lineage between the tables.

Judgement of strong lineages, and impacts of configuration of scheduling dependencies for a node based on the lineage between tables generated by the node and its ancestor nodes
3Check whether the table on which a node depends is generated by an auto triggered node. DataWorks determines whether data of an auto triggered node is generated based on the status of the node. If the node is in the Successful state, data is generated. For tables that are not generated by auto triggered nodes, DataWorks cannot determine the status of the nodes that generate these tables. Therefore, in the scenarios in which tables that are not generated by auto triggered nodes are used, scheduling dependency configuration is not supported. Scenarios in which configuration of scheduling dependencies is not supported (tables whose data is not generated based on periodical scheduling)
4, 5, and 6Determine to configure scheduling dependencies for a node based on the lineage between tables generated by the node and its ancestor nodes.

You can determine to configure the same-cycle or previous-cycle scheduling dependencies between a node and its ancestor nodes based on the following conditions: Check whether you want to configure the node to depend on the data of the ancestor nodes that is generated on the previous or current day. If the node is scheduled by hour or minute, check whether you want to configure the instance generated for the node in the current cycle to depend on the instance generated for the node in previous cycle.

7Preview scheduling dependencies. You can use the scheduling dependency preview feature to check whether the scheduling dependencies you configured meet your business requirements.

The number of scheduling cycles of an ancestor node may be different from that of a descendant node, and the scheduling dependencies between the ancestor and descendant instances in a scheduling cycle may be different from those between the ancestor and descendant instances in another scheduling cycle. To ensure that the configured scheduling dependencies meet your business requirements, we recommend that you use the preview feature to preview the scheduling dependencies between ancestor and descendant instances before you deploy the nodes that generate the instances. You can use the feature in scenarios in which the number of scheduling cycles and the scheduling time between the ancestor and descendant instances are inconsistent.

8Confirm the impacts of modifications to the scheduling dependencies. When you commit the node, you can use the code parsing result comparison feature to check whether the modifications you made to the scheduling dependencies meet your business requirements.
9Check whether the scheduling dependencies of the node in the production environment meet your business requirements. After you deploy the node, we recommend that you go to the Cycle Task page to check whether the scheduling dependencies of the node in the production environment meet your business requirements.

In the directed acyclic graph (DAG) of the node, solid lines indicate the same-cycle scheduling dependencies for the node and dashed lines indicate the previous-cycle scheduling dependencies for the node.

Judgement of strong lineages, and impacts of configuration of scheduling dependencies for a node based on the lineage between tables generated by the node and its ancestor nodes

If you configure scheduling dependencies for a node based on the lineage between tables generated by the node and its ancestor nodes, the system determines that a strong lineage exists between the tables. This indicates that data in the table generated by the node depends on data in the tables generated by its ancestor nodes. Therefore, before you configure the scheduling dependencies for the node based on the table lineage, you must check whether a strong lineage exists between the tables generated by the node and its ancestor nodes. To check whether a strong lineage exists, you can check whether errors may occur due to a failure of the ancestor nodes of the node to generate data when the node obtains the data from the ancestor nodes.

After you configure the scheduling dependencies for the node, the node can start to run if the following conditions are met:
  • The scheduling time of the node arrives.
  • All ancestor nodes of the node are successfully run. This ensures that the data of the ancestor nodes is generated.

    Therefore, the scheduling time of the ancestor nodes determines the earliest time at which the node can be actually run. Even through the scheduling time of the node is earlier than that of its ancestor nodes, the node can start to run when the ancestor nodes are successfully run.

Scenarios in which configuration of scheduling dependencies is not supported (tables whose data is not generated based on periodical scheduling)

Scheduling dependencies between auto triggered nodes in DataWorks are configured to ensure that tables generated by the nodes are regularly updated at specific points in time, and descendant auto triggered nodes obtain valid data from ancestor auto triggered nodes. Therefore, DataWorks cannot monitor tables that are not generated by auto triggered nodes in DataWorks. Tables whose data is not generated based on periodical scheduling in DataWorks include but are not limited to the following tables:
  • Tables generated by real-time synchronization nodes
  • Tables uploaded from on-premises machines to DataWorks
  • Dimension tables
  • Tables generated by manually triggered nodes
  • Tables whose data is periodically updated but are not generated by auto triggered nodes in DataWorks
For nodes whose table data is not generated based on periodical scheduling in DataWorks, you can configure scheduling dependencies for the nodes based on your business requirements. For more information, see Scheduling dependency configuration guide.

Determine to configure scheduling dependencies for a node based on the lineage between tables generated by the node and its ancestor nodes

In DataWorks, the lineage between tables is indicated by the scheduling dependencies of the nodes that generate the tables. After you confirm that a strong lineage exists between tables generated by a node and its ancestor nodes, we recommend that you determine the table lineage based on the code of the node and select a scheduling dependency type based on the table lineage. When the node is scheduled, the scheduling parameters of the node in the node code are used to determine the specific ancestor instance on which the current node instance depends.

To confirm the lineage between tables generated by a node and its ancestor nodes and select a scheduling dependency type, configuration of scheduling parameters for the node is also required. For information about how to configure scheduling parameters for a node, see Supported formats of scheduling parameters. The scheduling parameters of the node are automatically replaced with specific values based on the data timestamp and scheduling time of the node and the value formats of the scheduling parameters. This way, the values of the scheduling parameters are dynamically replaced at the scheduling time of the node. This also implements changing of queried data and generated partition data.

To configure scheduling dependencies for a node based on the lineage between tables generated by the node and its ancestor nodes, perform the following operations:
  1. Confirm the lineage between tables generated by the node and its ancestor nodes.
    To ensure that the data in the table generated by the node meets your business requirements, you must be clear about the business data in the tables generated by the node and its ancestor nodes on the current day. This indicates that you must make sure that the data that the node obtains on the current day is the data that is generated by the tables of the ancestor nodes on the current day.
    • If the node is a node that is scheduled by hour or minute, you must be clear about the partition data in the table generated by each instance of the node.
    • For information about how to confirm the lineage between tables generated by the node and its ancestor nodes in the scenarios in which you cannot view the configurations of the scheduling parameters of the ancestor nodes, such as dependencies on the ancestor nodes that are in another workspace, see Confirm the lineage of a table.
  2. Select a scheduling dependency type based on the table lineage.
    The following table describes the scheduling dependency types that you can select based on the lineage between tables.
    Scheduling dependency typeLineage
    Same-cycle scheduling dependenciesA descendant node depends on the data in the table generated by an ancestor node on the current day.
    Previous-cycle scheduling dependencies
    • A descendant node depends on the data in the table generated by an ancestor node on the previous day.
    • Special dependency scenarios for nodes scheduled by hour and minute:
      • The instance generated for the node scheduled by hour or minute in the current cycle depends on the instance generated for the same node in the previous cycle. For more information, see Dependency on the instance generated for the current node in the previous cycle.
      • Node A scheduled by hour depends on Node B scheduled by hour, and the scheduling time of the nodes is the same. You can configure the cross-cycle scheduling dependencies for Node A to allow the instance generated for Node A at 02:00 to depend on the instance generated for Node B at 01:00. The same logic applies to a node scheduled by minute that depends on another node scheduled by minute.

Configure scheduling dependencies for a node

This section describes how to configure the same-cycle and previous-cycle scheduling dependencies for a node in DataWorks after you confirm the lineage between tables generated by the node and its ancestor nodes and select a scheduling dependency type based on the table lineage. For more information, see Configure same-cycle scheduling dependencies and Configure cross-cycle scheduling dependencies.

Configure same-cycle scheduling dependencies

The instance generated for a node in a scheduling cycle depends on the data in the table generated by the instance of another node in the same scheduling cycle. This indicates that the output of the ancestor node is the input of the descendant node, which forms the same-cycle scheduling dependencies. The following table describes how to configure the same-cycle scheduling dependencies.
Configuration methodDescription
Draw lines on the configuration tab of a workflow to connect nodes to establish scheduling dependencies between nodesDataWorks automatically suffixes the name of an ancestor node with _out for a descendant node to form the scheduling dependencies.
Use the automatic parsing feature to configure scheduling dependencies between nodes based on the table lineageYou can configure scheduling dependencies between nodes based on the automatic parsing feature. This feature can automatically parse the table lineage based on the node code and allows you to quickly configure the scheduling dependencies between nodes.
Manually add ancestor nodes for a node in the Dependencies sectionIn most cases, you can use this method to modify scheduling dependencies of a node if the scheduling dependencies that are obtained by using the automatic parsing feature do not meet your business requirements.
You can configure same-cycle scheduling dependencies for nodes that belong to different workflows and for nodes that belong to different workspaces in the same region based on the preceding principles. For more information, see Scenario 3: Configure scheduling dependencies for nodes across workflows or workspaces.

Previous-cycle scheduling dependency mode

The instance generated for a node in the current cycle depends on the data in the table generated by the instance of another node in the previous cycle. You can configure the cross-cycle scheduling dependencies for a node if the instance generated for the node in the current cycle needs to depend on the data in the table generated by the instance of another node in the previous cycle. The following table describes how to configure the cross-cycle scheduling dependencies.
Configuration methodDescription
Dependency on the instance generated for the same node in the previous cycleThe instance generated for a node in the current cycle depends on the latest business data from the instance generated for the same node in the previous cycle.
Dependency on the instances generated for the descendant nodes of a node in the previous cycleThe instance generated for a node in the current cycle depends on whether the output table data of the current node in the previous cycle is cleansed by the instances generated for the descendant nodes of the current node in the previous cycle.
Dependency on the instance generated for another node in the previous cycleThe instance generated for a node in the current cycle depends on the output table data from the instances generated for one or more other nodes in the previous cycle in the business logic but does not use the data in the code.

Check whether the scheduling dependencies meet your business requirements

After you configure the scheduling dependencies for the node, you can use the methods that are described in the following table to check whether the scheduling dependencies meet your business requirements.
MethodDescription
Preview scheduling dependencies of a node when you configure the scheduling dependenciesYou can use the preview feature to check whether the current scheduling dependencies of a node meet your business requirements.

DataWorks allows you to configure scheduling dependencies between nodes that are scheduled by minute, hour, day, week, month, or year. The number of scheduling cycles of a node varies based on the scheduling frequency of the node.

An instance is generated for a node in each scheduling cycle. The dependencies between ancestor and descendant instances vary based on the scheduling frequencies of the ancestor and descendant nodes that generate the instances. You can use this method in the following scenarios: A node scheduled by day depends on a node scheduled by hour, a node scheduled by hour depends on a node scheduled by minute, or you want to configure cross-cycle scheduling dependencies. This method ensures that nodes can be scheduled as expected, and prevents unexpected scheduling dependencies from delaying the running of nodes. For information about how to configure scheduling dependencies in complex dependency scenarios, see Principles and samples of scheduling configurations in complex dependency scenarios.

Compare code parsing results when you commit a nodeYou can use the code parsing result comparison feature to confirm whether the modifications you made to the current scheduling dependencies of a node meet your business requirements, and confirm the impacts of the modifications to data in the production environment.

If you enable the automatic parsing feature and you modify the scheduling dependencies of a node that are obtained based on the automatic parsing feature, you must confirm the modifications you made when you commit the node. This ensures that data is generated as expected in the production environment. This method ensures that modifications to the scheduling dependencies do not affect generation of data of the node in the production environment.

View the details of a node on the Cycle Task page after you deploy the nodeYou can use this method to check whether the scheduling dependencies of a node in the production environment meet your business requirements in Operation Center after you deploy the node.
  • Confirm scheduling dependencies of a node in the production environment

    In a workspace in standard mode, the scheduling dependencies of a node in the development and production environments can be different. You must configure the scheduling dependencies for a node in the production environment on the DataStudio page, and deploy the node for the configurations to take effect.

    After you deploy the node, you can go to the Cycle Task page in Operation Center, and show the ancestor and descendant nodes of the current node to view the scheduling dependencies of the node.
    Important You can view the latest status of nodes in the production environment on the Cycle Task page. However, whether instances are added or removed is related to the mode in which instances take effect. For more information, see Precautions.
  • Confirm the data of a node in the production environment
    After you confirm that the scheduling dependencies of a node are correct, you must check whether the partitions in the tables generated by the ancestor nodes are the partitions in the table on which the current node depends. This prevents that the data in the tables generated by the ancestor nodes is not the data in the table on which the current node depends.
    Note If process control for the node deployment procedure exists, we recommend that you go to the Cycle Task page in Operation Center in the production environment after you deploy a node. On this page, you can view the scheduling dependencies and related properties of the node. If you find that the configurations do not meet your requirements, you need to check whether the deployment procedure is blocked. For more information, see Deploy nodes.

Precautions

You must take note of the following items when you configure scheduling dependencies for a node:
  • Items related to node uniqueness
    • A node can have different scheduling dependency configurations in the development and production environments but the node must be unique.

      A node can have different scheduling dependency configurations in the development and production environments but the node must be unique.

    • Before you undeploy a node, you must remove all descendant nodes of the node from both the development and production environments.

      Due to node uniqueness, before you undeploy a node in DataWorks, you must remove all descendant nodes of the node, reconfigure a node as the ancestor node of the descendant nodes, and then commit and deploy the operations. This ensures that the descendant nodes can obtain valid data and are run as expected. Make sure that the scheduling dependencies of the node in both the development and production environments are removed before you undeploy the node.

  • Items related to the mode in which instances take effect
For information about frequently asked questions about scheduling dependencies, see Scheduling dependencies.