DataWorks provides powerful scheduling capabilities including time-based or dependency-based task trigger functions to perform tens of millions of tasks accurately and timely each day, based on DAG relationships. It supports scheduling by minute, hour, day, week, and month. For more information, see Scheduling configuration.
This section uses write_result created in Create a data sync job as an example and configures the scheduling period to weekly, to explain the scheduling configurations and task O&M functions of DataWorks.
Select Data Development > Task Development. The task development list is displayed on the left-side of the page.
Double-click any synchronization task that you want to configure, for example, the write_result task.
Click Scheduling Configuration to configure the Scheduling attribute of the task. See the following figure.
The configuration parameters are described as follows.
Scheduling status: When this parameter is selected, the task is paused.
Error retry: When this parameter is selected, error retry is enabled.
Start date: The date on which the task takes effect, which can be set based on actual needs.
Scheduling period: The operating period of the task, which can be set by month, week, day, hour, and minute. For example, a task can be scheduled weekly.
Specific time: The specific operating time of the task. For example, you can set up the task to run at 02:00 every Tuesday.
After configuring the scheduling attribute of a task, you can configure its dependency attribute. See the following figure.
You can configure an upstream dependency for a task. In this way, even if the scheduled time of an instance of the current task is reached, the task can run only after the instance of its upstream task is completed.
The configuration in the preceding figure indicates that instances of the current task are triggered only after the instance of the upstream task write_result is finished. You can enter work in the upstream task to configure an upstream task for write_result.
If no upstream task is configured then, by default the current task is triggered by the project . Therefore, by default, the upstream task of the current task is project_start in the scheduling system. By default, a project_start task is created as a root task for each project.
Save the synchronization task write_result, and click Submit to submit it to the scheduling system. See the following figure.
The system automatically generates an instance for the task at each time point according to the scheduling attribute configuration and periodically runs the task from the second day only after a task is submitted to a scheduling system.
Note: If a task is submitted after 23:30, the scheduling system automatically generates instances for the task and periodically runs the task from the third day.
Now you know how to set the scheduling attribute and dependency of a synchronization task. Continue to the next tutorial for further study. This tutorial shows you how to perform periodic O&M for the submitted tasks and view the log troubleshooting results. For more information, see Perform periodic O&M and view log troubleshooting results.