This section describes how to set the scheduling time of nodes, including the scheduling cycle and dependencies. You can also specify whether a node depends on the instance of the last cycle.
Instance creation modes
- Next Day: If you select this option, instances are generated in full mode. (Nodes published before 22:00 create instances the next day, while nodes published after 22:00 create instances the third day.)
- Immediately After Publishing: If you select this option, instances are immediately generated after nodes are published.
- Normal: If you select this option, a node is scheduled and run normally based on the following scheduling cycle configuration. It is the default option for nodes.
- Zero-load: If you select this option, a node is scheduled based on the following scheduling cycle configuration. However, once this node is scheduled, the system does not actually run the node but directly returns a success response.
- Error Retry: If you select this check box, a node is rerun when it encounters an error. By default, a node can be automatically rerun for a maximum of three times with an interval of 2 minutes.
- Pause Scheduling: If you select this check box, a node is scheduled based on the following scheduling cycle configuration. However, once this node is scheduled, the system does not actually run the node but directly returns a failure response. It is used when a node is suspended but will be run later.
In DataWorks, after a node is submitted, the underlying scheduling system generates an instance every day from the next day based on the scheduling time of the node, and runs the instances based on the running results and time points of its ancestor instances. Nodes that are submitted after 23:30 create instances from the third day.
If you schedule a node to run on every Monday, the node is run only on Mondays. On the other days, once this node is scheduled, the system does not actually run the node but directly returns a success response. Therefore, you need to set the business date to one day earlier than the runtime for weekly scheduled nodes during testing or data patching.
For a node that runs cyclically, the priority of its dependency is higher than that of its scheduling time. That is, when the scheduling time is reached, the node instance is not run immediately but first checks whether all the ancestor instances have been run.
- If not all the ancestor instances have been run but the scheduling time is reached, the instance remains in the Not Running status.
- If all the ancestor instances have been run but the scheduling time is not reached, the instance enters the Waiting for Scheduled Time status.
- If all the ancestor instances have been run and the scheduling time is reached, the instance enters the Waiting for Resource status.
Dependency on the last-cycle instance
- Last-cycle instance: indicates the instance generated on the last calendar day. Assume that the current day is August 8, 2018. The instance generated on August 7, 2018 is called the last-cycle instance.
- Dependency on the last-cycle instance: indicates that a node depends on the last-cycle instance of its parent node. Assume that you have configured daily scheduling nodes A and B. If you want the instance of node B to be run only after that of node A generated on the last day is run, you can configure a cross-cycle dependency. That is, you can configure node B to make it depend on the last-cycle instance of node A.
The following figure shows how to configure the dependency on the last-cycle instance.
- Level 1 Child Node: indicates that the current node depends on the last-cycle instances of its descendant nodes. For example, node A has descendant nodes B, C, and D. If you select this option, node A depends on the last-cycle instances of nodes B, C, and D.
- Current Node: indicates that the current node depends on its own last-cycle instance.
- Customize: indicates that the current node depends on the last-cycle instance of a specified node. You need to enter the ID of the specified node. If multiple nodes exist, separate their IDs with commas (,), for example, 12345,23456.
Scheduling by day
Nodes scheduled by day are run automatically once every day. When you create a periodic node, the node is set to run at 00:00 every day by default. You can specify another runtime as needed. For example, you can specify the runtime to 13:00 every day, as shown in the following figure.
- If Specify Time is cleared, the scheduled date of the daily node is the date of the current day in the YYYY-MM-DD format and the scheduling time of the node is randomly generated between 00:00 and 00:30.
- If Specify Time is selected, the scheduling time of the daily node is a specified time of the current day in the YYYY-MM-DD HH:MM format. A scheduled node can be run only after its ancestor node is run and the scheduling time is reached. The node cannot be run if either one of the conditions is not met. The conditions are in no particular order.
Import, statistical processing, and export nodes are all daily nodes with the runtime of 13:00, as shown in the preceding figure. Statistical processing nodes depend on import nodes, and export nodes depend on statistical processing nodes. The following figure shows the configuration of their dependencies. (When configuring the dependencies for a statistical processing node, set its ancestor node to an import node).
Scheduling by week
As shown in the preceding figure, the system schedules instances on every Monday and Friday, but returns success responses without scheduling instances on every Tuesday, Wednesday, Thursday, Saturday, and Sunday.
Scheduling by month
Nodes scheduled by month are automatically run at specific time points of specific days each month. On the other days, the system still generates instances to ensure the proper running of descendant instances. However, once a node is scheduled, the system does not actually run any logic or consume any resources but directly returns a success response.
As shown in the preceding figure, the system schedules instances on the first day of each month, but returns success responses without scheduling instances for the rest days of the month.
Scheduling by hour
Nodes scheduled by hour are run once every N hours in a specific period every day, for example, once every hour from 01:00 to 04:00 every day.
Scheduling by minute
Nodes scheduled by minute are run once every N minutes in a specific period every day, as shown in the following figure.
Currently, the minimum interval is 5 minutes for a scheduled node by minute. The CRON expression is automatically generated based on the preceding configuration and cannot be manually modified.
Q: Node A is scheduled by hour, and its descendant node B is scheduled by day. Is it feasible that node B is automatically run every day after all instances of node A are executed?
A: A node can depend on any other node, and there are no limits on the scheduling type of the node. Therefore, a node scheduled by day can depend on a node scheduled by hour. To enable node B to be automatically run every day after all 24 instances of node A are run, do not specify the daily runtime for node B. Then, configure node A as an ancestor of node B. For more information, see the Dependencies section. [DO NOT TRANSLATE]
Q: Node A is run once every hour, and node B is run once every day. How do I configure the scheduling time for the two nodes so that the instance of node B is run after the first instance of node A is run every day?
A: For node A, select Depend on Last Interval and set Dependent Node to Current Node. For node B, set Recurrence to Day, select Specify Time, and set Run At to 00:00.
Q: Node A is run every Monday and node B depends on node A. How do I enable node B to be run every Monday?
A: Configure the scheduling time of node B to be the same as those of node A. That is, you need to set Recurrence to Week and Run Every to Monday.
Q: How are the instances of a node affected when the node is deleted?
A: When a node is deleted after running for a period, its instances are remained because the scheduling system still generates one or more instances for the node based on the scheduling time. Therefore, when the instances are initiated after the node is deleted, an error message appears because the required code is unavailable, as shown in the following figure.
Q: Can I enable a node to process monthly data on the last day of each month?
A: No. Currently, the system does not support setting the execution date to the last day of each month. If you enable a node to run on the thirty-first day of each month, the scheduling system runs a node instance in each month that has 31 days and returns a success response without running the node instance in any other month.
We recommend that you configure a node to process the data of the past month on the first day of each month.