Dataphin effectively and timely outputs business data by executing each node in the business flow in an orderly manner, based on the scheduling dependency configuration of each node. This topic explains how to implement scheduling dependencies and describes the main configuration principles.
Background information
Scheduling dependency is the relationship between upstream and downstream nodes. In Dataphin, a downstream task node starts running only after the upstream task node has completed successfully. By configuring scheduling dependencies, you ensure that the task retrieves the correct data at runtime. This prevents issues where the downstream node attempts to retrieve data before the upstream table data is ready.
Procedure
On the Dataphin home page, in the top menu bar, select Development > Data Development.
On the Development page's top menu bar, select Project (Dev-Prod mode requires selecting the environment).
In the left-side navigation pane, select Data Processing > Script Task.
In the computing task list, click the target computing task to open its tab.
Click Attribute in the right sidebar to open the Attribute panel, and configure the parameters in the Schedule Dependency area.
Upstream Dependency
Automatic Parsing
For SQL task types, click Automatic Parsing. Dataphin will automatically parse and identify upstream tasks and output tables from the task code. After parsing, all identified dependency tables are added to the upstream dependency list, where you can view details or edit and delete entries.
NoteBy default, all output tasks for the automatically parsed input table are used as upstream dependencies.
The default dependency cycle for all parsed dependency tables is this cycle.
If the code references project variables or does not specify a project, the system defaults to parsing as the production project name for schedule stability. For example, if the development project name is
onedata_dev
:Code specifying
select * from s_order
results in a scheduling parsing dependency ofonedata.s_order
.Code specifying
select * from ${onedata}.s_order
results in a scheduling parsing dependency ofonedata.s_order
.Code specifying
select * from onedata.s_order
results in a scheduling parsing dependency ofonedata.s_order
.Code specifying
select * from onedata_dev.s_order
results in a scheduling parsing dependency ofonedata_dev.s_order
.
Add Root Node
If there is no corresponding upstream dependency for the task, you can click Add Root Node as the upstream dependency of the current task.
NoteEach tenant or enterprise has a virtual root node, named virtual_root_node, created during initialization.
Add This Node's Previous Cycle
This option indicates that the task scheduling depends on the successful completion of this node's previous cycle (the previous day or previous n hours).
Add Dependency
If Automatic Parsing fails to parse the scheduling dependency relationship, or if the upstream dependency configuration generated by Automatic Parsing does not reflect the actual application, you can manually click +add Dependency to include the node's upstream dependency.
ImportantWhen adding dependencies, the dependency cycle and dependency policy for physical nodes and logical table nodes use system-recommended settings by default. To modify them, click the dependency list
to edit a single dependency's dependency cycle and dependency policy.
Dependency cycle: The time range for the scheduled start time of the upstream task instance, typically the same day, from 00:00 to 24:00.
Dependency policy: Specifies the policy when multiple instances exist within a dependency cycle. If only one instance is present, any policy can be set. To accommodate potential changes in upstream task scheduling, only relative path policies are supported.
For the default policy on cross-cycle dependency, see Appendix 2: Default Policy for Cross-Cycle Dependency.
Add Physical Node Dependency
Click Add Dependency, and select Script Task.
In the Add Dependency - Physical Node dialog box, select one or more nodes. Filter targets by project, node type, node name, or output table name.
Click OK.
Add Logical Table Node
Click Add Dependency, and select Logical Table Node.
In the Add Dependency - Logical Table Node dialog box, select one or more nodes. Filter targets by logical table type, associated section, or logical table name.
(Optional) In the node list, click the Dependency Field column of the target node's
icon to view the table fields owned by the logical table.
Click OK.
This Node's Output
The system automatically generates output names for created nodes. To add multiple output names, click Automatically Generate Output Names.
ImportantThe system uses output names to build the scheduling dependency graph. Output names are generated automatically, and manual intervention is not recommended.
Click OK to finalize the scheduling dependency configuration.
Dependency cycle and dependency policy preview
Click Attribute of the target offline computing task, and locate the Schedule Dependency area in the Attribute panel.
In the Schedule Dependency area, click the Upstream Dependency list's Actions column's
icon for the target dependency.
In the Edit Dependency dialog box, view details such as the node name, dependency cycle, dependency policy, and node dependency cycle preview.
Dependency cycle: This refers to the scheduled run time window for the upstream task instance, typically within the same day, meaning the time range spans from [00:00 to 24:00).
Dependency policy: Specifies the policy when multiple instances exist within a dependency cycle. If only one instance is present, any policy can be set.To accommodate potential changes in upstream task scheduling, only relative path policies are supported.
Node Dependency Cycle Preview: View the list of current node instances and the list of instances for the selected upstream node for the specified data timestamp.
Block
Description
① Instance List of the Selected Upstream Node
Data Timestamp: Determined by the dependency cycle and the selected current node data timestamp.
If the dependency cycle is this cycle (same day), the data timestamp matches the selected current node data timestamp.
If the dependency cycle is previous cycle (previous day), the data timestamp is the selected current node data timestamp minus 1 day.
If the dependency cycle is previous N days, the data timestamp is the selected current node data timestamp minus N days.
If the dependency cycle is last 24 hours and the instances span two data timestamps, the data timestamp is displayed as
{yyyy-MM-dd ~ yyyy-MM-dd}
.
Instance List: Displays the total number of instances of the selected upstream node for the data timestamp.
If the total number of instances for the data timestamp is five or fewer, the instance list displays all instances.
If the total number of instances for the data timestamp is more than five, you can click Expand All to view all instances.
If the current left-side (selected upstream node instance list) instance is relied upon by the right-side list (current node's instance list) currently selected instance, and it is the first instance or last instance in the left-side list, the list displays the first instance and last instance.
If the instance selected from the left-side list (upstream node instance list) is depended on by the right-side list (current node's instance list) for the currently selected instance, and it is neither the first instance nor the last instance, the display will include the first instance, the instance relied upon by the right-side selected instance, and the last instance.
The instance list displays instances in the format
Instance n ({instance timed scheduling time})
, with n incrementing from 1.
② Instance List of the Current Node
Displays the total number of instances of the current node for the selected data timestamp.
If the total number of instances for the selected data timestamp is five or fewer, the instance list displays all instances; if the total is more than five, the list shows only the first instance and last instance, and you can click Expand All to view all instances. The list defaults to selecting the first instance (Instance 1), and you can click Instance to switch the selected instance.
The instance list displays instances in the format
Instance n ({instance timed scheduling time})
, with n incrementing from 1.③ Connection Line of the Right-Side Selected Instance Relying on the Left-Side Instance
When the dependency policy is first instance, last instance, nearest instance forward, or nearest instance backward, the connection line is a single line between a single instance on the left side (selected upstream node instance list) and the selected instance on the right side (current node's instance list).
When the dependency policy is all instances, the left side (selected upstream node instance list) instances will be all selected, and at this time, the connection line represents all instances on the left side (selected upstream node instance list) as all dependencies of the selected instance on the right side (current node's instance list).
Appendix 1: initial default dependency cycle and dependency policy
This node's schedule cycle | Upstream node's schedule cycle | Is the upstream node self-dependent | Default dependency cycle | Default dependency policy |
Day/Week/Month | Day | Yes/No | This cycle (same day) | Last instance |
Day/Week/Month | Hour/Minute | No | This cycle (same day) | All instances |
Day/Week/Month | Hour/Minute | Yes | This cycle (same day) | Last instance |
Month/Week/Day/Hour/Minute | Month/Week | Yes | This cycle (same day) | Last instance |
Month/Week/Day/Hour/Minute | Month/Week | No | This cycle (same day) | Last instance |
Hour/Minute | Day | Yes/No | This cycle (same day) | Last instance |
Hour/Minute | Hour/Minute | Yes/No | This cycle (same day) | Last instance |
Appendix 2: default policy for cross-cycle dependency
In the table below, - indicates not involved.
This node's schedule cycle | Upstream node | Upstream node's schedule cycle | Is the upstream node self-dependent | Default dependency cycle |
Month | This node (self-dependent) | - | - | Previous cycle (previous day) |
Week | This node (self-dependent) | - | - | Previous cycle (previous day) |
Day | This node (self-dependent) | - | - | Previous cycle (previous day) |
Hour | This node (self-dependent) | - | - | Last 24 hours |
Minute | This node (self-dependent) | - | - | Last 24 hours |
Day/Week/Month | Not this node | Day | - | This cycle (same day) |
Day/Week/Month | Not this node | Hour/Minute | No | This cycle (same day) |
Day/Week/Month | Not this node | Hour/Minute | Yes | This cycle (same day) |
Month/Week/Day/Hour/Minute | Not this node | Month/Week | Yes | This cycle (same day) |
Month/Week/Day/Hour/Minute | Not this node | Month | No | This cycle (same day) |
Month/Week/Day/Hour/Minute | Not this node | Week | No | This cycle (same day) |
Hour/Minute | Not this node | Day | - | This cycle (same day) |
Hour/Minute | Not this node | Hour/Minute | - | This cycle (same day) |