This topic describes how to configure recurrence and dependencies for a node in DataWorks. The synchronization node write_result that is scheduled by week is used as an example.

Prerequisites

The synchronization node write_result is created. For more information, see Create a synchronization node.

Background information

DataWorks has a powerful scheduling engine to trigger nodes based on the recurrence and dependencies of the nodes. DataWorks ensures that tens of millions of nodes run accurately and punctually per day based on directed acyclic graphs (DAGs). In the DataWorks console, you can set the recurrence to minutely, hourly, daily, weekly, or monthly. For more information, see Configure time properties.

Configure recurrence for the synchronization node

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where the required workspace resides, find the workspace, and then click Data Analytics.
  2. Find the workflow to which the synchronization node write_result belongs and double-click the synchronization node.
  3. On the node configuration tab, click Properties in the right-side navigation pane.
    Note In a manually triggered workflow, all nodes must be manually triggered, and cannot be automatically scheduled by DataWorks.
  4. In the Schedule section of the Properties tab, set the parameters as required.
    Configure scheduling properties
    Parameter Description
    Instance Generation Mode The time to generate the first instance. Valid values: Next Day and Immediately After Deployment. For more information, see Configure immediate instance generation for a node.
    Recurrence The mode in which the node is run. Valid values:
    • Normal: If you set the Recurrence parameter to Normal, the node is run and generates data based on the setting of the Scheduling Cycle parameter.
    • Skip Execution: If you set the Recurrence parameter to Skip Execution, the node is scheduled based on the recurrence and the scheduled time that you specify. However, the status of the node becomes Freeze and no data is generated for this node.
    • Dry Run: If you set the Recurrence parameter to Dry Run, the node is run based on the setting of the Scheduling Cycle parameter. However, the node performs a dry run and no data is generated.
    Scheduling Cycle The recurrence of the node. Valid values: Minute, Hour, Day, Week, Month, and Year. In this example, set this parameter to Week, the Run Every parameter to Monday and Tuesday, and the Run At parameter to 00:00. In this case, the synchronization node is scheduled to run at 00:00 on every Monday and Tuesday.
    Cron Expression The CRON expression of the scheduling time you specified, which cannot be changed.
    Timeout Definition The timeout period. If the running duration of the node exceeds the value of the Timeout Definition parameter, the node automatically stops and does not rerun.
    • The timeout period applies to auto triggered node instances, data backfill instances, and test node instances.
    • The default timeout period ranges from 72 hours to 168 hours. The system automatically adjusts the default timeout period for a node based on system loads.
    Note You can specify a custom timeout period after you select Instances of Custom Nodes. For nodes scheduled by exclusive resource groups for scheduling, valid timeout period is 1 to 72 hours. For nodes scheduled by shared resource groups for scheduling, valid timeout period is 1 to 168 hours.
    Rerun Specifies whether to allow the node to be rerun. Valid values: Allow Regardless of Running Status, Allow upon Failure Only, and Disallow Regardless of Running Status.
    Auto Rerun upon Error Specifies whether to automatically rerun the node when an error occurs. This parameter appears only if you set the Rerun parameter to Allow Regardless of Running Status or Allow upon Failure Only. After you select this check box, the node is automatically rerun when an error occurs. This parameter does not appear if you set the Rerun parameter to Disallow Regardless of Running Status. In this case, the node is not rerun when an error occurs.
    Number of Reruns The maximum number of reruns allowed. This parameter appears only if you select Auto Rerun upon Error.
    Rerun Interval The intervals at which the node is rerun after an error occurs. This parameter appears only if you select Auto Rerun upon Error. You can set this parameter based on your requirements. Valid values: 1 to 30. Default value: 30. Unit: minutes.
    Validity Period The validity period of the node. Specify the start and end dates of the validity period as required.
    For more information about the time properties, see Configure time properties.

Configure dependencies for the synchronization node

After you configure recurrence for the synchronization node write_result, you can continue to configure dependencies for the synchronization node.

You can configure the parent node on which the synchronization node depends. After that, the scheduling system triggers the synchronization node only after the instance of the parent node is run.

For example, the instance of the synchronization node is not triggered until the instance of its parent node insert_data is run.

Note From the perspective of node running, to configure node dependencies is to ensure the smooth running of a synchronization node. After node dependencies are configured, the current synchronization data can extract and process data only after its parent node is run and generates valid data. The synchronization node can cleanse the data generated by the parent node or deliver the data cleansed by the parent node to data sources. For more information about the logic of scheduling dependencies, see Logic of same-cycle scheduling dependencies.

By default, the scheduling system creates a node named in the format of Workspace name_root for each workspace as the root node. If no parent node is configured for the synchronization node, the synchronization node depends on the root node.

Commit and deploy the node

  1. On the configuration tab of the write_result node, click the Save icon icon in the toolbar.
  2. Commit the node.
    Notice You can commit the node only after you set the Rerun and Parent Nodes parameters.
    1. Click the Submit icon icon in the top toolbar.
    2. In the Commit Node dialog box, enter your comments in the Change description field.
    3. Click OK.
    If you use a workspace in standard mode, the node is committed to the development environment after you click OK. If you want to deploy the node to the production environment, click Deploy in the upper-right corner of the toolbar. For more information, see Deploy nodes.
    A node must be committed to the scheduling system so that the scheduling system can automatically generate and run instances for the node. The scheduling system runs these instances at the specified time from the next day based on the recurrence settings.
    Note If you commit a node after 23:30, the scheduling system automatically generates and runs instances for the node from the third day.

What to do next

Now you have learned how to configure recurrence and dependencies for a synchronization node. You can proceed with the next tutorial. In the next tutorial, you will learn how to perform O&M operations on the committed node and troubleshoot errors based on the operational logs. For more information, see Run a node and troubleshoot errors.