All Products
Search
Document Center

Dataphin:Offline mode dependency configuration

Last Updated:Jan 21, 2025

Dataphin ensures orderly execution of business processes by configuring scheduling dependencies for each node, which guarantees the effective and timely delivery of business data. This topic introduces the configuration of offline mode dependencies for stream-batch integrated tasks.

Background information

Scheduling dependencies are the upstream and downstream relationships between nodes. In Dataphin, a downstream task node commences execution only after the upstream task node has completed successfully. Configuring these dependencies ensures that tasks receive the correct data at runtime. Dataphin determines the availability of the latest upstream table data based on the running status of the upstream node, allowing the downstream node to retrieve the data. This mechanism prevents the downstream node from attempting to retrieve data before the upstream table data is ready.

Procedure

  1. Access the Offline Mode configuration panel by referring to Offline Mode Configuration Entry.

  2. In the Dependency section of the offline mode configuration panel, set the Dependency parameters.

    Parameter

    Description

    Start Parsing

    If the node's task type is SQL, you can click Start parsing. This action prompts the system to parse the code's tables and identify any table name that corresponds with an output name. The node associated with this output name then becomes the upstream dependency for the current node.

    If the code references project variables or lacks a specific project, the system defaults to the production project name to ensure scheduling stability. For instance, if the development project name is onedata_dev:

    • Code specifying select * from s_order results in a dependency of onedata.s_order.

    • Code with select * from ${onedata}.s_order also results in a dependency of onedata.s_order.

    • Code specifying select * from onedata.s_order results in a dependency of onedata.s_order.

    • Code specifying select * from onedata_dev.s_order results in a dependency of onedata_dev.s_order.

    Upstream Dependency

    To add an upstream node that the node task scheduling depends on, perform the following:

    1. Click Manually Add Upstream.

    2. In the New Upstream Dependency dialog box, search for dependency nodes by:

      • Entering the output name keyword of the dependent node.

      • Entering virtual to find virtual nodes (each tenant or enterprise has a root node upon initialization).

      Note

      Note: The output name of the node is globally unique and case-insensitive.

    3. Click Confirm Addition.

    You can also click the Actions column's fagaga icon to delete the added dependency node.

    Current Node

    To set the output name of the current node, which allows other nodes to establish dependencies, follow these steps:

    1. Click Manually Add Output.

    2. In the Add Current Node Output dialog box, enter the output name. Adhere to a consistent naming convention, typically project name.table name, which is case-insensitive. This convention helps identify the table produced by this node and facilitates the selection of scheduling dependencies by other nodes.

      For example, for a development project named onedata_dev, the recommended output name is onedata.s_order. Setting the output name to onedata_dev.s_order means only code specifying select * from onedata_dev.s_order can parse the upstream dependency node.

    3. Click Confirm Addition.

    For existing output names on the current node, you can:

    • To delete the added output name, click the Actions column's fagaga icon.

    • If the node has been submitted or published and has downstream dependencies (with submitted tasks), click the Actions column's icon to view the dependent downstream nodes.

  3. Complete the offline mode dependency configuration by clicking Confirm.