All Products
Search
Document Center

Dataphin:Configure offline pipeline scheduling configuration

Last Updated:Jan 21, 2025

Scheduling configuration defines the rules for scheduling offline pipeline tasks of the recurring type. This topic explains how to set up the recurring offline pipeline scheduling configuration.

Background information

  • Dependencies are semantic connections between nodes, where the status of upstream nodes influences the running status of downstream nodes.

  • The scheduling rules for dependent nodes are twofold: downstream nodes are scheduled only after upstream nodes have completed, and scheduling execution depends on the nodes' set scheduling times.

  • Scheduling configurations submitted before the set time take effect afterwards. Dependencies set after the scheduling time can only generate instances after a one-day delay.

Procedure

  1. On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.

  2. On the Integration page, select Project from the top menu bar.

  3. In the left-side navigation pane, go to Integration > Batch Pipeline. Click the desired task name in the Batch Pipeline list.

  4. In the task tab, open the Attribute panel by clicking Attribute on the right.

  5. Select the Scan Configuration tab and set the relevant parameters in the Scan Configuration section.

    Parameter

    Description

    Schedule Type

    Supports three scheduling types: normal scheduling, dry-run scheduling, and skip execution.

    • Normal scheduling: Configures scheduling based on the scheduling period and executes normally (i.e., real data is run). This option is usually selected by default for tasks.

    • Dry-run scheduling: Configures scheduling based on the scheduling period but executes as a dry-run (i.e., no real data is run). When this task is scheduled, it directly returns success without actually executing the task. This type of scheduling is typically used when a node does not need to be executed for a certain period but should not block the execution of its downstream nodes.

    • Skip execution: When skip execution is selected, scheduling is initiated based on the time configured in the scheduling period, but the node status is set to paused (i.e., no real data is run). When this task is scheduled, the system directly returns a failure response, and the descendant nodes cannot be run. This type of scheduling is typically used when a task does not need to be executed temporarily but will be used later.

    Priority

    The priority of the current offline pipeline task can be set to Medium, Low, or Lowest.

    When a new offline pipeline task is created, the default value of this priority is derived from the default priority in Management Center > Development Platform Settings > Node Task Settings.

    Note

    When the task is published to the production environment or submitted in the Basic environment, the priority cannot be modified during task editing. It must be modified in the production environment's operations and maintenance. At this time, the priority value is the latest value in the production environment.

    Recurrence

    The scheduling period refers to how often the task's code is executed in the production environment scheduling system. After a task is successfully submitted, the scheduling of the task starts from the next day based on the time attribute configuration of the task. Automatic scheduling recurring instances are generated, and the task runs based on the results of upstream dependency instances and time points.

    Scheduling period options include Day, Week, Month, Hour, and Minute:

    • Day scheduling: The scheduling task runs automatically once a day. When a recurring task is created, the system's default period is to run once daily at 00:00. You can click the image icon to specify the time point as needed.

    • Week scheduling: The scheduling task runs automatically once on specific days of the week at specific time points. You can click the image icon to specify the time point as needed.

      Note

      To ensure the normal operation of downstream instances during non-scheduled times, the system generates instances and directly sets them to success without executing any logic or occupying resources.

    • Month scheduling: The scheduling task runs automatically once on specific days of the month at specific time points. You can click the image icon to specify the time point as needed.

      Note

      To ensure the normal operation of downstream instances during non-scheduled times, the system generates instances daily and directly sets them to success without executing any logic or occupying resources.

    • Hour scheduling: Supports Time Range, Whole Hour, Custom Time Range, and Custom Time Points scheduling.

      • Time Range scheduling: The scheduling task starts automatically within a specified time range daily at the set time interval. Click the image icon to set the Start Time and End Time of the time range. Click the image icon to select Interval n Hours. For example, if the time range is set to 00:00–23:00 and the interval is set to 1 hour, the task is automatically scheduled every hour within the 00:00–23:00 time range daily.

      • Whole Hour scheduling: Select Whole Hour time from the drop-down list box. The scheduling system generates and runs instances for the task at the selected whole hour time.

      • Custom Time Points scheduling: The scheduling system generates and runs instances for the task at the selected custom time points. Select custom time points from the drop-down list box. You can click +add Custom Time Points to add multiple time points.

      • Custom Time Range scheduling: The scheduling task starts automatically within certain time ranges daily at the set time interval. Click the image icon to set the Start Time and End Time of the time range. Enter Interval n Minutes (n is an integer between 5 and 360) in the interval field. You can click +add Custom Time Range to add up to 10 time ranges. The time ranges cannot overlap. For example, in Time Range 1, if the time range is set to 00:00–03:00 and the interval is set to 30 minutes, the task is automatically scheduled every 30 minutes within the 00:00–03:00 time range daily.

    • Minute scheduling: Supports Daily or Hourly.

      • Daily: The scheduling task runs once at the specified time interval within the specified time range daily.

      • Hourly: The scheduling task runs once at the specified time interval within the time range from the start time to the end of the hour hourly.

    Scheduling Plan

    Click Preview. The scheduling plan displays all scheduling instances and their scheduling types for each day of a specific month based on the configured scheduling period and conditions. The preview date type can be selected as Data Timestamp or Run Date (Scheduling Date).

    If multiple scheduling types exist for all instances on a single day, the scheduling types are displayed by color. The name of each scheduling type and the number of corresponding instances are displayed. For example, in the figure below, on the 4th of a specific month, there are 44 normal scheduling instances, 2 paused instances, and 12 dry-run instances for the current scheduling task. image

    Hover the mouse over the scheduling type module for a specific day to view the detailed scheduling instance list for the current scheduling task on that day. The list includes the scheduling type, scheduling conditions, and condition names.

    Conditional Scheduling

    After enabling conditional scheduling, you can set multiple scheduling conditions, with up to 10 scheduling conditions. The system evaluates conditions in top-to-bottom order. Once a condition is hit, the corresponding scheduling action is executed, and all subsequent conditions are skipped. If no conditions are hit, the default scheduling configuration is executed.

    Important

    Conditional scheduling is only effective when the scheduling type is Normal Scheduling.

    1. Click +add Scheduling Condition.

    2. In the Edit Conditional Scheduling dialog box, configure the relevant conditional scheduling information.

      • Condition Name: Supports any characters, with a maximum length of 32 characters.

      • Effective Status: Enabled by default. When disabled, the conditional scheduling is ignored during scheduling.

      • Meet the Following Conditions: The rule for condition evaluation. When the condition evaluates to true, scheduling is performed based on the Execute Scheduling configuration. For configuration details, see Description of Conditional Scheduling Rules.

      • Execute Scheduling: Supports custom and follow scheduling attributes:

        • Custom: When the condition evaluates to true, scheduling is performed based on the configured Scheduling Type.

        • Follow Scheduling Attributes: Consistent with the scheduling strategy in scheduling attributes, equivalent to the scheduling settings when conditional scheduling is disabled.

      • Scheduling Type: For configuration details, see Scheduling Type.

    3. Click OK.

      After completing the conditional scheduling settings, click Preview Scheduling Plan to view the dates hit by conditional scheduling in the calendar.

      Important
      • After modifying the conditional scheduling settings and submitting them to the production environment, the changes take effect immediately for instances in the Not Running status at the time of publication. However, they do not affect instances in the Waiting for Execution Time status.

      • When cross-node parameter evaluation types are used in conditional scheduling, possible parameter values must be provided for previewing.

  6. To finalize the offline pipeline scheduling configuration, click OK.