Offline mode dependency configuration - Dataphin - Alibaba Cloud Documentation Center

Dataphin ensures orderly execution of business processes by configuring scheduling dependencies for each node, which guarantees the effective and timely delivery of business data. This topic introduces the configuration of offline mode dependencies for stream-batch integrated tasks.

Background information

Scheduling dependencies are the upstream and downstream relationships between nodes. In Dataphin, a downstream task node commences execution only after the upstream task node has completed successfully. Configuring these dependencies ensures that tasks receive the correct data at runtime. Dataphin determines the availability of the latest upstream table data based on the running status of the upstream node, allowing the downstream node to retrieve the data. This mechanism prevents the downstream node from attempting to retrieve data before the upstream table data is ready.

Procedure

Access the Offline Mode configuration panel by referring to Offline Mode Configuration Entry.

In the Dependency section of the offline mode configuration panel, set the Dependency parameters.

Parameter	Description
Start Parsing	If the node's task type is SQL, you can click Start parsing. This action prompts the system to parse the code's tables and identify any table name that corresponds with an output name. The node associated with this output name then becomes the upstream dependency for the current node. If the code references project variables or lacks a specific project, the system defaults to the production project name to ensure scheduling stability. For instance, if the development project name is `onedata_dev`: Code specifying `select * from s_order` results in a dependency of `onedata.s_order`. Code with `select * from ${onedata}.s_order` also results in a dependency of `onedata.s_order`. Code specifying `select * from onedata.s_order` results in a dependency of `onedata.s_order`. Code specifying `select * from onedata_dev.s_order` results in a dependency of `onedata_dev.s_order`.
Upstream Dependency	To add an upstream node that the node task scheduling depends on, perform the following: Click Manually Add Upstream. In the New Upstream Dependency dialog box, search for dependency nodes by: Entering the output name keyword of the dependent node. Entering virtual to find virtual nodes (each tenant or enterprise has a root node upon initialization). Note Note: The output name of the node is globally unique and case-insensitive. Click Confirm Addition. You can also click the Actions column's icon to delete the added dependency node.
Current Node	To set the output name of the current node, which allows other nodes to establish dependencies, follow these steps: Click Manually Add Output. In the Add Current Node Output dialog box, enter the output name. Adhere to a consistent naming convention, typically `project name.table name`, which is case-insensitive. This convention helps identify the table produced by this node and facilitates the selection of scheduling dependencies by other nodes. For example, for a development project named `onedata_dev`, the recommended output name is `onedata.s_order`. Setting the output name to `onedata_dev.s_order` means only code specifying `select * from onedata_dev.s_order` can parse the upstream dependency node. Click Confirm Addition. For existing output names on the current node, you can: To delete the added output name, click the Actions column's icon. If the node has been submitted or published and has downstream dependencies (with submitted tasks), click the Actions column's icon to view the dependent downstream nodes.

Complete the offline mode dependency configuration by clicking Confirm.