Dataphin ensures orderly execution of business processes by configuring scheduling dependencies for each node, which guarantees the effective and timely delivery of business data. This topic introduces the configuration of offline mode dependencies for stream-batch integrated tasks.
Background information
Scheduling dependencies are the upstream and downstream relationships between nodes. In Dataphin, a downstream task node commences execution only after the upstream task node has completed successfully. Configuring these dependencies ensures that tasks receive the correct data at runtime. Dataphin determines the availability of the latest upstream table data based on the running status of the upstream node, allowing the downstream node to retrieve the data. This mechanism prevents the downstream node from attempting to retrieve data before the upstream table data is ready.
Procedure
Access the Offline Mode configuration panel by referring to Offline Mode Configuration Entry.
In the Dependency section of the offline mode configuration panel, set the Dependency parameters.
Parameter
Description
Start Parsing
If the node's task type is SQL, you can click Start parsing. This action prompts the system to parse the code's tables and identify any table name that corresponds with an output name. The node associated with this output name then becomes the upstream dependency for the current node.
If the code references project variables or lacks a specific project, the system defaults to the production project name to ensure scheduling stability. For instance, if the development project name is
onedata_dev:Code specifying
select * from s_orderresults in a dependency ofonedata.s_order.Code with
select * from ${onedata}.s_orderalso results in a dependency ofonedata.s_order.Code specifying
select * from onedata.s_orderresults in a dependency ofonedata.s_order.Code specifying
select * from onedata_dev.s_orderresults in a dependency ofonedata_dev.s_order.
Upstream Dependency
To add an upstream node that the node task scheduling depends on, perform the following:
Click Manually Add Upstream.
In the New Upstream Dependency dialog box, search for dependency nodes by:
Entering the output name keyword of the dependent node.
Entering virtual to find virtual nodes (each tenant or enterprise has a root node upon initialization).
NoteNote: The output name of the node is globally unique and case-insensitive.
Click Confirm Addition.
You can also click the Actions column's
icon to delete the added dependency node.Current Node
To set the output name of the current node, which allows other nodes to establish dependencies, follow these steps:
Click Manually Add Output.
In the Add Current Node Output dialog box, enter the output name. Adhere to a consistent naming convention, typically
project name.table name, which is case-insensitive. This convention helps identify the table produced by this node and facilitates the selection of scheduling dependencies by other nodes.For example, for a development project named
onedata_dev, the recommended output name isonedata.s_order. Setting the output name toonedata_dev.s_ordermeans only code specifyingselect * from onedata_dev.s_ordercan parse the upstream dependency node.Click Confirm Addition.
For existing output names on the current node, you can:
To delete the added output name, click the Actions column's
icon.If the node has been submitted or published and has downstream dependencies (with submitted tasks), click the Actions column's
icon to view the dependent downstream nodes.
Complete the offline mode dependency configuration by clicking Confirm.