All Products
Search
Document Center

Dataphin:Configure offline pipeline scheduling dependencies

Last Updated:Jan 21, 2025

Dataphin facilitates the effective and timely delivery of business data by executing each node in the business process sequentially, based on the scheduling dependency configuration of each node. This topic provides guidance on configuring offline pipeline scheduling dependencies.

Procedure

  1. On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.

  2. On the Integration page, select Project from the top menu bar.

  3. In the left-side navigation pane, go to Integration > Batch Pipeline. Click the desired task name in the Batch Pipeline list.

  4. In the task tab, open the Attribute panel by clicking Attribute on the right.

  5. In the Schedule Dependency area, set the integration task Schedule Dependency parameters.

    image.png

    1. Upstream Dependency

      • Automatic Parsing

        To configure the upstream dependency of the integration task, select Automatic Parsing. Dataphin will automatically identify and retrieve upstream tasks and output tables related to the integration task. Once parsed, all identified dependency tables will be included in the upstream dependency list, where you can review details or edit and remove entries.

        Note
        • By default, if the automatically parsed input table has multiple output tasks, all will be set as upstream dependencies.

        • The dependency cycle for all identified dependency tables is set to This Cycle by default.

      • Add Root Node

        If the task lacks a corresponding upstream dependency, click Add Root Node to establish it as the current task's upstream dependency.

        Note

        During initialization, each tenant or enterprise is assigned a virtual root node named virtual_root_node.

      • Add Previous Cycle of This Node

        This option indicates that the scheduling of the node task depends on the successful completion of the node's previous cycle, such as the previous day or the previous n hours.

      • Add Dependency

        If Automatic Parsing cannot parse the scheduling dependency relationship or the upstream dependency configuration generated by Automatic Parsing does not match the actual application, you can manually click +add Dependency to add the node's Upstream Dependency.

        Important

        When adding dependencies, the Dependency Cycle and Dependency Policy for physical and logical table nodes default to system-recommended settings. To modify these, click the dependency list image.png to edit the Dependency Cycle and Dependency Policy for individual dependencies.

        • Dependency Cycle: Defines the scheduled runtime (start time) range for the upstream task instance, typically the current day, from 00:00 to 24:00.

        • Dependency Policy: Specifies the policy when multiple instances occur within a dependency cycle. If only one instance exists, any policy setting is acceptable. To accommodate potential scheduling changes in upstream tasks, only the relative path policy is supported.

        For the default policy on cross-cycle dependencies, see Appendix: Default Policy for Cross-Cycle Dependencies.

        • Add Physical Node Dependency

          Area

          Description

          Search and Filter Area

          Use the search and filter area to narrow down the Physical Table Node based on criteria such as This Project, Project, Node Type, and by entering a Node Name or Output Table Name.

          Node List

          The node list displays available Physical Nodes for dependency, allowing you to select as needed.

        • Add Dependency - Logical Table Node

          Area

          Description

          Search and Filter Area

          The search and filter area allows you to filter the Logical Table Node based on Logical Table Type, Belonging Section, or by entering the Logical Table Name.

          Node List

          The node list presents Logical Table Nodes available for dependency, allowing you to select as necessary.

          To depend on specific fields within a logical table, click the Dependent Fields column in the node list. image..png This action enables you to view and select from the fields available in the logical table.

    2. This Node Output

      The system automatically assigns output names for created nodes. To add multiple output names, select Auto Generate Output Name.

      Important

      The system constructs the scheduling dependency graph using the output name. The output name is generated automatically, and manual adjustments are generally discouraged.

  6. To finalize the scheduling dependency configuration, click OK.

Appendix: Default Policy for Cross-Cycle Dependencies

This Node Scheduling Cycle

Upstream Node

Upstream Node Scheduling Cycle

Is Upstream Node Self-Dependency

Default Dependency Cycle

Month

This Node (Self-Dependency)

-



Previous Cycle (1 Day Before)

Week

This Node (Self-Dependency)

-



Previous Cycle (1 Day Before)

Day

This Node (Self-Dependency)

-



Previous Cycle (1 Day Before)

Hour

This Node (Self-Dependency)

-



Last 24 Hours

Minute

This Node (Self-Dependency)

-



Last 24 Hours

Day/Week/Month

Non-This Node

Day



This Cycle (Today)

Day/Week/Month

Non-This Node

Hour/Minute

No

This Cycle (Today)

Day/Week/Month

Non-This Node

Hour/Minute

Yes

This Cycle (Today)

Month/Week/Day/Hour/Minute

Non-This Node

Month/Week

Yes

This Cycle (Today)

Month/Week/Day/Hour/Minute

Non-This Node

Month

No

This Cycle (Today)

Month/Week/Day/Hour/Minute

Non-This Node

Week

No

This Cycle (Today)

Hour/Minute

Non-This Node

Day



This Cycle (Today)

Hour/Minute

Non-This Node

Hour/Minute



This Cycle (Today)