All Products
Search
Document Center

DataWorks:Scheduling time

Last Updated:Mar 19, 2025

The scheduling frequency of a node determines the number of cycles that the node is automatically run in the scheduling scenario. A scheduling frequency is used to define the interval at which the code logic of a node is actually executed in the scheduling system in the production environment. DataWorks generates instances for the node based on the scheduling frequency and the number of scheduling cycles of the node. The node is run as an instance.

Precautions

  • The scheduling frequency of a node is unrelated to the scheduling frequency of the ancestor node of the node.

    The interval at which the node is scheduled is related to the scheduling frequency of the node and is unrelated to the scheduling frequency of the ancestor node of the node.

  • DataWorks allows you to configure scheduling dependencies between nodes whose scheduling frequencies are different.

    DataWorks generates instances for an auto triggered node based on the scheduling frequency and the number of scheduling cycles of the node. For example, the number of instances generated for a node scheduled by hour every day is the same as the number of scheduling cycles of the node every day. The node is run as an instance. In essence, dependencies between auto triggered nodes are dependencies between instances that are generated for the nodes. The number of instances generated for ancestor and descendant auto triggered nodes and dependencies between the instances vary based on the scheduling frequencies of the ancestor and descendant nodes. For information about how to configure scheduling dependencies between nodes whose scheduling cycles are different, see Principles and samples of scheduling configurations in complex dependency scenarios.

  • Dry-run instances are generated for a node on the days when the node is not scheduled to run.

    For a node that is not scheduled to run every day, such as a node scheduled by week or month, DataWorks generates dry-run instances for the node on the days when it is not scheduled to run. The dry-run instances return success results when the scheduling time of the node arrives on these days. This way, if a node scheduled by day depends on the node scheduled by week or month, the node scheduled by day can be run as expected. In this case, the node scheduled by week or month is dry run, but the node scheduled by day is run as scheduled.

  • The actual time when a node is run is affected by multiple factors.

    You can specify the time when you want to schedule a node. The actual time when the node is run is affected by multiple factors, such as the scheduling time of the ancestor node of the node, resources required to run the node, and conditions for running the node.

Configuration of scheduling time for auto triggered nodes

  • If you want the start node in a workflow to start to run at a specific point in time and the other nodes in the workflow to run after their ancestor nodes finish running, you can configure the scheduling time only for the start node in the workflow. When the scheduling time arrives, nodes in the workflow start to run in sequence based on scheduling dependencies. For information about the impacts of scheduling dependencies on node running, see Impacts of dependencies between nodes on the running of the nodes.

  • If you want specific nodes in a workflow to start to run at different points in time, you must separately configure the scheduling time for the nodes.

Sample scenarios

In the workflows shown in the following figures, Node A is the start node, Node B depends on Node A, and Node C depends on Node B.

Diagram

Description

image

If you want the workflow to start to run at 03:00, you need to only set the scheduling time of Node A to 03:00. Although the default scheduling time of the descendant node of Node A is 00:00, the descendant node can start to run only after Node A finishes running.

image

If you want Node A to run at 03:00, Node B to run at 05:00, and Node C to run at 06:00, you must separately configure the scheduling time for the nodes.

image

If you want Node A to run at 03:00 and Node B to run at 05:00, and use the default scheduling time for Node C, Node C can start to run only after Node B finishes running. In this case, the actual time when Node C starts to run is later than 05:00.

Scheduling frequencies

DataWorks supports the following scheduling frequencies: minute, hour, day, week, month, and year. The following information describes the scheduling configuration and node running details based on each scheduling frequency.

Minute

Limits

The minimum interval for running a node that is scheduled by minute is 5 minutes.

Configuration example

  • Configuration method

    Go to the configuration tab of a node on the Data Studio page. Click the Properties tab in the right-side navigation pane. In the Scheduling Time section of the Properties tab, configure the scheduling cycle for the node.

  • Scenario

    The following figure shows how to configure a node that is scheduled to run every 30 minutes in the time period from 00:00 to 23:59 every day.

    Note

    The cron expression is automatically generated based on the scheduling time that you select and cannot be changed.

    image

Scheduling details

The following figure shows the scheduling time of instances generated for a node scheduled to run every 30 minutes and the replacement results of the scheduling parameters configured for the node.

image
Note

For more information about dependency scenarios for nodes scheduled by minute, see Dependencies for nodes scheduled by minute.

Hour

Precautions

  • The period within which a node is scheduled to run is a left-closed, right-closed interval. For example, if a node is scheduled to run once every hour from 00:00 to 03:00, the scheduling system generates four instances for the node every day, and the instances are scheduled to run in sequence at 00:00, 01:00, 02:00, and 03:00.

  • You can schedule a node to run at a specified interval within a specific period every day. You can also schedule the node to run at specified points in time on the hour every day.

  • The actual time at which a node is run may be different from the scheduling time of the node due to reasons such as insufficient resources.

Configuration example

  • Configuration method

    Go to the configuration tab of a node on the Data Studio page. Click the Properties tab in the right-side navigation pane. In the Scheduling Time section of the Properties tab, configure the scheduling cycle for the node.

  • Configuration details

    The following figure shows how to configure a node that is scheduled to run every 6 hours in the time period from 00:00 to 23:59 every day.

    Note

    The cron expression is automatically generated based on the scheduling time that you select and cannot be changed.

    image

Scheduling details

The scheduling system generates four instances for the node every day and schedules the instances to run in sequence at 00:00, 06:00, 12:00, and 18:00.

image
Note

For more information about dependency scenarios for nodes scheduled by hour, see Dependencies for nodes scheduled by hour.

Day

If a node is scheduled by day, the node is automatically run once at a specified point in time every day. After you create an auto triggered node that is scheduled by day, the default scheduling time of the node is randomly generated in the time period from 00:00 to 00:30. You can change the scheduling time of the node based on your business requirements. For example, you can set the scheduling time of the node to 13:00.

Configuration example

  • Configuration method

    Go to the configuration tab of a node on the Data Studio page. Click the Properties tab in the right-side navigation pane. In the Scheduling Time section of the Properties tab, configure the scheduling cycle for the node.

  • Configuration details

    • An import node, an analytics node, and an export node are scheduled by day.

    • The nodes are scheduled to run at 13:00.

    • The analytics node depends on the import node, and the export node depends on the analytics node.

    The following figure shows how to configure the nodes to be scheduled to run at 13:00 every day.

    Note

    The cron expression is automatically generated based on the scheduling time that you select and cannot be changed.

    image

Scheduling details

The scheduling system automatically generates and runs instances for the nodes. The following figure shows the processing time of business data.

image
Note
  • The following prerequisites must be met before a node is run:

    • The ancestor node of the node is successfully run.

    • The scheduling time of the node arrives.

    Both prerequisites must be met and they have no specific chronological order.

    The default scheduling time of a node that is scheduled by day is randomly generated in the time period from 00:00 to 00:30.

  • For more information about dependency scenarios for nodes scheduled by day, see Dependencies for tasks scheduled by day.

Week

Precautions

  • To ensure that the descendant nodes of an auto triggered node that is scheduled by week can run as expected at a point in time that is not the scheduling time of the node, the system generates a dry-run instance for the node.

    Note

    If a node is dry run, the system does not actually run the node but directly prompts that the node is successfully run.

    • The running duration of the node is 0 second, and no run logs are generated for the node.

    • The node does not occupy scheduling resources.

    • The node does not block the running of its descendant nodes that are scheduled by minute, hour, or day.

  • You can configure a scheduling frequency only for a single node. Whether a node is scheduled to run every day depends on the scheduling frequency configured for the node and is unrelated to the scheduling frequencies of the ancestor nodes of the node. The scheduling time of the ancestor nodes of the node affects the actual time when the node is run.

Configuration example

  • Configuration method

    Go to the configuration tab of a node on the Data Studio page. Click the Properties tab in the right-side navigation pane. In the Scheduling Time section of the Properties tab, configure the scheduling cycle for the node.

  • Configuration details

    If the node is scheduled to run at a specified point in time every Monday and Friday, the scheduling system runs the instances that are generated for the node every Monday and Friday. The scheduling system does not run the instances that are generated for the node every Tuesday, Wednesday, Thursday, Saturday, or Sunday and directly returns a success response for the instances. The following figure shows the details about the configurations.

    Note

    The cron expression is automatically generated based on the scheduling time that you select and cannot be changed.

    image

Scheduling details

The scheduling system automatically generates and runs instances for the node.

image
Note
  • If you use the data backfill feature to backfill data for a node that is scheduled by week, you must make sure that the data timestamp you specify is one day before the date on which the node is scheduled to run in the current scheduling cycle. The data timestamp of a node is calculated based on the following formula: Date of the scheduling time of the node - 1.

    Description:

    • If you want to backfill data for a node that is scheduled to run every Monday, you must set the data timestamp to the date of last Sunday for the node.

    • If you set the data timestamp to a date that is not the date of last Sunday, the data backfill instance generated for the node will be dry run.

  • For more information about dependency scenarios, see Principles and samples of scheduling configurations in complex dependency scenarios.

Month

Precautions

  • To ensure that the descendant nodes of an auto triggered node that is scheduled by month can run as expected at a point in time that is not the scheduling time of the node, the system generates a dry-run instance for the node.

    Note

    If a node is dry run, the system does not actually run the node but directly prompts that the node is successfully run.

    • The running duration of the node is 0 second, and no run logs are generated for the node.

    • The node does not occupy scheduling resources.

    • The node does not block the running of its descendant nodes that are scheduled by minute, hour, or day.

  • You can configure a scheduling frequency only for a single node. Whether a node is scheduled to run every day depends on the scheduling frequency configured for the node and is unrelated to the scheduling frequencies of the ancestor nodes of the node. The scheduling time of the ancestor nodes of the node affects the actual time when the node is run.

  • You can set the Custom Time parameter to Last Day of Each Month. This way, the node is run on the last day of every month.

Configuration example

  • Configuration method

    Go to the configuration tab of a node on the Data Studio page. Click the Properties tab in the right-side navigation pane. In the Scheduling Time section of the Properties tab, configure the scheduling cycle for the node.

  • Configuration details

    If the node is scheduled to run on the last day of every month, the scheduling system runs the instances that are generated for the node on the last day of every month. The scheduling system generates dry-run instances for the node on each of the rest days of every month and directly returns a success response for the instances. The code of the instances is not run. The following figure shows the details about the configurations.

    Note

    The cron expression is automatically generated based on the scheduling time that you select and cannot be changed.

    image

Scheduling details

The scheduling system automatically generates and runs instances for the node.

image
Note
  • If you use the data backfill feature to backfill data for a node that is scheduled by month, you must make sure that the data timestamp you specify is one day before the date on which the node is scheduled to run in the current scheduling cycle. The data timestamp of a node is calculated based on the following formula: Date of the scheduling time of the node - 1.

    Description:

    • If you want to backfill data for a node that is scheduled to run on the first day of every month, you must set the data timestamp to the last day of the previous month for the node.

    • If you want to backfill data for a node that is scheduled to run on the last day of every month, you must set the data timestamp to the previous day of the last day of the current month for the node.

    • If you set the data timestamp to a date that is not one day before the date on which a node is scheduled to run, the data backfill instance generated for the node will be dry run.

  • For more information about dependency scenarios, see Principles and samples of scheduling configurations in complex dependency scenarios.

Year

Precautions

To ensure that the descendant nodes of a node that is scheduled by year can run as expected at a point in time that is not the scheduling time of the node, the system generates a dry-run auto triggered instance for the node.

Note

If a node is dry run, the system does not actually run the node but directly prompts that the node is successfully run.

  • The running duration of the node is 0 second, and no run logs are generated for the node.

  • The node does not occupy scheduling resources.

  • The node does not block the running of its descendant nodes that are scheduled by minute, hour, or day.

Configuration example

  • Configuration method

    Go to the configuration tab of a node on the Data Studio page. Click the Properties tab in the right-side navigation pane. In the Scheduling Time section of the Properties tab, configure the scheduling cycle for the node.

  • Configuration details

    If the node is scheduled to run at a specified point in time on the first and last days of January, April, July, and October every year, the scheduling system runs the instances that are generated on these days every year. The scheduling system generates dry-run instances for the node on each of the other days in the year and directly returns a success response for the instances. The code of the instances is not run. The following figure shows the details about the configurations.

    image

Scheduling details

The scheduling system automatically generates and runs instances for the node.

image
Note

For more information about dependency scenarios, see Principles and samples of scheduling configurations in complex dependency scenarios.