All Products
Search
Document Center

Dataphin:Configure offline pipeline task properties

Last Updated:Jan 21, 2025

To schedule offline integration tasks to run periodically, it is necessary to define their scheduling properties, including the scheduling cycle, dependencies, and parameters. This topic outlines how to configure these properties and schedule offline tasks.

Notes

  • The system only supports scheduling configuration for offline integration tasks with a scheduling type of recurring task node.

  • Dependencies are semantic links between nodes, where the status of an upstream node influences the running status of downstream nodes.

  • The scheduling rule for dependent nodes is twofold: firstly, a downstream node can be scheduled only after the upstream node has finished running; secondly, the decision to execute the schedule is based on the node's predefined scheduling time.

  • Scheduling configurations submitted before the preset scheduling time will take effect at that time. Dependencies set after the preset scheduling time will only generate instances after a one-day delay.

  • A task's scheduling configuration is used solely to define its properties during scheduling. The task must be deployed to the production environment before it can be scheduled according to this configuration.

  • The scheduling time specifies only the intended execution time of the task. However, the actual execution time depends on the execution status of upstream dependencies. For detailed information on task execution conditions, see instance running diagnostics.

Offline integration task properties entry

  1. On the Dataphin home page, select Development > Data Integration from the top menu bar.

  2. At the top menu bar of the Integration page, choose Project.

  3. In the left-side navigation pane, select Integration > Batch Pipeline. Then, in the Batch Pipeline list, click the desired task name.

  4. In the task tab, click Attribute on the right to open the Attribute panel.

Configure offline task properties

On the offline task properties page, configure the basic information and scheduling-related properties as outlined in the table below.

Configuration Item

Description

Basic Information .

Includes task name, ID, node type, development owner, operations owner, and description.

  • Task Name: The name given to the task upon creation.

  • Node ID: The unique identifier assigned to the node upon submission.

  • Development Owner: By default, the current user. You can select any project member.

    Note

    The development owner cannot be set in the production environment; the configuration from the development environment will be used.

  • Operations Owner: Defaults to the node's creator. Alternatively, select a project member as the operations owner.

Configure offline pipeline scheduling.

Determines how the task is scheduled to recur in the production environment.

  • Scheduling Type: Specifies the running status of the task's instances in the production environment.

  • Priority: The task's priority level. The default priority for new tasks is derived from Management Center > Development Platform > Node Task Related Settings > Default Priority.

    Note

    Priority cannot be changed when publishing to the production environment or submitting in the Basic environment. It must be adjusted in the production operations, where it reflects the most recent value.

  • Scheduling Cycle: Sets the task's execution frequency.

  • Conditional Scheduling: Establishes the criteria for scheduling the task. Multiple conditions can be set, with the system evaluating each in sequence. Once a condition is met, the corresponding schedule is executed, and further evaluations cease. If no conditions are met, the default schedule is executed.

  • Parameter Configuration: Defines the parameters for node scheduling. Dataphin supports both built-in and custom parameters, enabling dynamic parameter assignment during task scheduling.

    Note

    If a variable is defined in the node code, it must be assigned a value here. Otherwise, no definition is necessary.

Configure offline pipeline scheduling dependencies.

Defines the task's upstream and downstream dependencies, ensuring nodes are executed in sequence. Descendant nodes run after ancestor nodes complete, facilitating the timely generation of valid business data. Dependencies can be set automatically or manually.

Offline pipeline task running configuration.

Specifies the task's timeout settings and rerun policy in case of failure, preventing resource waste due to prolonged task execution and enhancing task reliability.