All Products
Search
Document Center

Dataphin:Configure offline task schedule dependency

Last Updated:Mar 24, 2025

Dataphin effectively and timely outputs business data by executing each node in the business flow in an orderly manner, based on the scheduling dependency configuration of each node. This topic explains how to implement scheduling dependencies and describes the main configuration principles.

Background information

Scheduling dependency is the relationship between upstream and downstream nodes. In Dataphin, a downstream task node starts running only after the upstream task node has completed successfully. By configuring scheduling dependencies, you ensure that the task retrieves the correct data at runtime. This prevents issues where the downstream node attempts to retrieve data before the upstream table data is ready.

Procedure

  1. On the Dataphin home page, in the top menu bar, select Development > Data Development.

  2. On the Development page's top menu bar, select Project (Dev-Prod mode requires selecting the environment).

  3. In the left-side navigation pane, select Data Processing > Script Task.

  4. In the computing task list, click the target computing task to open its tab.

  5. Click Attribute in the right sidebar to open the Attribute panel, and configure the parameters in the Schedule Dependency area.

    1. Upstream Dependency

      • Automatic Parsing

        • For SQL task types, click Automatic Parsing. Dataphin will automatically parse and identify upstream tasks and output tables from the task code. After parsing, all identified dependency tables are added to the upstream dependency list, where you can view details or edit and delete entries.

        Note
        • By default, all output tasks for the automatically parsed input table are used as upstream dependencies.

        • The default dependency cycle for all parsed dependency tables is this cycle.

        • If the code references project variables or does not specify a project, the system defaults to parsing as the production project name for schedule stability. For example, if the development project name is onedata_dev:

          • Code specifying select * from s_order results in a scheduling parsing dependency of onedata.s_order.

          • Code specifying select * from ${onedata}.s_order results in a scheduling parsing dependency of onedata.s_order.

          • Code specifying select * from onedata.s_order results in a scheduling parsing dependency of onedata.s_order.

          • Code specifying select * from onedata_dev.s_order results in a scheduling parsing dependency of onedata_dev.s_order.

      • Add Root Node

        If there is no corresponding upstream dependency for the task, you can click Add Root Node as the upstream dependency of the current task.

        Note

        Each tenant or enterprise has a virtual root node, named virtual_root_node, created during initialization.

      • Add This Node's Previous Cycle

        This option indicates that the task scheduling depends on the successful completion of this node's previous cycle (the previous day or previous n hours).

      • Add Dependency

        If Automatic Parsing fails to parse the scheduling dependency relationship, or if the upstream dependency configuration generated by Automatic Parsing does not reflect the actual application, you can manually click +add Dependency to include the node's upstream dependency.

        Important
        • When adding dependencies, the dependency cycle and dependency policy for physical nodes and logical table nodes use system-recommended settings by default. To modify them, click the dependency list image.png to edit a single dependency's dependency cycle and dependency policy.

          • Dependency cycle: The time range for the scheduled start time of the upstream task instance, typically the same day, from 00:00 to 24:00.

          • Dependency policy: Specifies the policy when multiple instances exist within a dependency cycle. If only one instance is present, any policy can be set. To accommodate potential changes in upstream task scheduling, only relative path policies are supported.

        • For the default policy on cross-cycle dependency, see Appendix 2: Default Policy for Cross-Cycle Dependency.

        • Add Physical Node Dependency

          1. Click Add Dependency, and select Script Task.image

          2. In the Add Dependency - Physical Node dialog box, select one or more nodes. Filter targets by project, node type, node name, or output table name.

          3. Click OK.

        • Add Logical Table Node

          1. Click Add Dependency, and select Logical Table Node.

          2. In the Add Dependency - Logical Table Node dialog box, select one or more nodes. Filter targets by logical table type, associated section, or logical table name.

          3. (Optional) In the node list, click the Dependency Field column of the target node's image..png icon to view the table fields owned by the logical table.

          4. Click OK.

    2. This Node's Output

      The system automatically generates output names for created nodes. To add multiple output names, click Automatically Generate Output Names.

      Important

      The system uses output names to build the scheduling dependency graph. Output names are generated automatically, and manual intervention is not recommended.

  6. Click OK to finalize the scheduling dependency configuration.

Dependency cycle and dependency policy preview

  1. Click Attribute of the target offline computing task, and locate the Schedule Dependency area in the Attribute panel.

  2. In the Schedule Dependency area, click the Upstream Dependency list's Actions column's image icon for the target dependency.

  3. In the Edit Dependency dialog box, view details such as the node name, dependency cycle, dependency policy, and node dependency cycle preview.

    • Dependency cycle: This refers to the scheduled run time window for the upstream task instance, typically within the same day, meaning the time range spans from [00:00 to 24:00).

    • Dependency policy: Specifies the policy when multiple instances exist within a dependency cycle. If only one instance is present, any policy can be set.To accommodate potential changes in upstream task scheduling, only relative path policies are supported.

    • Node Dependency Cycle Preview: View the list of current node instances and the list of instances for the selected upstream node for the specified data timestamp.image

      Block

      Description

      ① Instance List of the Selected Upstream Node

      • Data Timestamp: Determined by the dependency cycle and the selected current node data timestamp.

        • If the dependency cycle is this cycle (same day), the data timestamp matches the selected current node data timestamp.

        • If the dependency cycle is previous cycle (previous day), the data timestamp is the selected current node data timestamp minus 1 day.

        • If the dependency cycle is previous N days, the data timestamp is the selected current node data timestamp minus N days.

        • If the dependency cycle is last 24 hours and the instances span two data timestamps, the data timestamp is displayed as {yyyy-MM-dd ~ yyyy-MM-dd}.

      • Instance List: Displays the total number of instances of the selected upstream node for the data timestamp.

        • If the total number of instances for the data timestamp is five or fewer, the instance list displays all instances.

        • If the total number of instances for the data timestamp is more than five, you can click Expand All to view all instances.

          • If the current left-side (selected upstream node instance list) instance is relied upon by the right-side list (current node's instance list) currently selected instance, and it is the first instance or last instance in the left-side list, the list displays the first instance and last instance.

          • If the instance selected from the left-side list (upstream node instance list) is depended on by the right-side list (current node's instance list) for the currently selected instance, and it is neither the first instance nor the last instance, the display will include the first instance, the instance relied upon by the right-side selected instance, and the last instance.

        • The instance list displays instances in the format Instance n ({instance timed scheduling time}), with n incrementing from 1.

      ② Instance List of the Current Node

      Displays the total number of instances of the current node for the selected data timestamp.

      If the total number of instances for the selected data timestamp is five or fewer, the instance list displays all instances; if the total is more than five, the list shows only the first instance and last instance, and you can click Expand All to view all instances. The list defaults to selecting the first instance (Instance 1), and you can click Instance to switch the selected instance.

      The instance list displays instances in the format Instance n ({instance timed scheduling time}), with n incrementing from 1.

      ③ Connection Line of the Right-Side Selected Instance Relying on the Left-Side Instance

      • When the dependency policy is first instance, last instance, nearest instance forward, or nearest instance backward, the connection line is a single line between a single instance on the left side (selected upstream node instance list) and the selected instance on the right side (current node's instance list).

      • When the dependency policy is all instances, the left side (selected upstream node instance list) instances will be all selected, and at this time, the connection line represents all instances on the left side (selected upstream node instance list) as all dependencies of the selected instance on the right side (current node's instance list).

Appendix 1: initial default dependency cycle and dependency policy

This node's schedule cycle

Upstream node's schedule cycle

Is the upstream node self-dependent

Default dependency cycle

Default dependency policy

Day/Week/Month

Day

Yes/No

This cycle (same day)

Last instance

Day/Week/Month

Hour/Minute

No

This cycle (same day)

All instances

Day/Week/Month

Hour/Minute

Yes

This cycle (same day)

Last instance

Month/Week/Day/Hour/Minute

Month/Week

Yes

This cycle (same day)

Last instance

Month/Week/Day/Hour/Minute

Month/Week

No

This cycle (same day)

Last instance

Hour/Minute

Day

Yes/No

This cycle (same day)

Last instance

Hour/Minute

Hour/Minute

Yes/No

This cycle (same day)

Last instance

Appendix 2: default policy for cross-cycle dependency

In the table below, - indicates not involved.

This node's schedule cycle

Upstream node

Upstream node's schedule cycle

Is the upstream node self-dependent

Default dependency cycle

Month

This node (self-dependent)

-

-

Previous cycle (previous day)

Week

This node (self-dependent)

-

-

Previous cycle (previous day)

Day

This node (self-dependent)

-

-

Previous cycle (previous day)

Hour

This node (self-dependent)

-

-

Last 24 hours

Minute

This node (self-dependent)

-

-

Last 24 hours

Day/Week/Month

Not this node

Day

-

This cycle (same day)

Day/Week/Month

Not this node

Hour/Minute

No

This cycle (same day)

Day/Week/Month

Not this node

Hour/Minute

Yes

This cycle (same day)

Month/Week/Day/Hour/Minute

Not this node

Month/Week

Yes

This cycle (same day)

Month/Week/Day/Hour/Minute

Not this node

Month

No

This cycle (same day)

Month/Week/Day/Hour/Minute

Not this node

Week

No

This cycle (same day)

Hour/Minute

Not this node

Day

-

This cycle (same day)

Hour/Minute

Not this node

Hour/Minute

-

This cycle (same day)