All Products
Search
Document Center

DataWorks:Configure time properties

Last Updated:Jan 19, 2024

You can configure time properties for a node to determine how the node is scheduled to run in the production environment after you commit and deploy the node to the production environment. In the Schedule section of the Properties tab, you can configure various parameters that are related to the time properties of the node, such as Instance Generation Mode, Scheduling Cycle, Run At, Rerun, and Timeout definition. This topic describes how to configure time properties for a node.

Background information

To configure time properties for a node, you need to perform the following operations: Go to the DataStudio page, double-click the name of a node in the Business Flow section in the Scheduled Workflow pane to go to the configuration tab of the node, and then click Properties in the right-side navigation pane of the configuration tab. In the Schedule section of the Properties tab, configure time properties for the node.时间属性

Note

You can configure scheduling properties for a single node by configuring the parameters that are shown in the preceding figure. You can also use the batch operation feature to modify scheduling properties of multiple nodes at the same time. For example, you can use the feature to modify the scheduling time or the resource groups for scheduling of multiple nodes at the same time.

The following table describes the parameters that are related to the time properties of a node.

Parameter

Description

Instance Generation Mode

The mode in which instances generated for the node take effect in the production environment.

Recurrence

The mode in which the node is run in the production environment.

Scheduling Calendar

The scheduling dates and the scheduling type of the node.

Scheduling Cycle

The scheduling frequency of the node. This parameter determines the number of instances generated for the node and the time at which the instances are run in the production environment.

Timeout definition

The timeout period for the node. If the period of time for which the node is run exceeds the specified timeout period, the node fails.

Rerun

Specifies whether to rerun the node and the conditions in which the node can be rerun. When you configure this parameter, make sure that the data idempotence of the node is not affected.

Validity Period

The period of time during which the node is automatically scheduled to run. The node is not automatically run in the period of time that falls out of the specified time range.

Usage notes

The time properties of a node define only the time at which you want to schedule the node. Whether the node is run and the actual time at which the node is run are determined by multiple factors. The factors include but are not limited to the following items:

  • Control of the scheduling switch

    The node can be automatically scheduled based on its scheduling properties only if Periodic scheduling is turned on for the workspace to which the node belongs. You can turn on the switch on the Scheduling Settings tab of the Settings page in DataStudio for a workspace. For more information, see Configure scheduling settings.

  • Impacts of scheduling dependencies of the node on the node execution time

    The scheduling time that you specify for a node takes effect only on the node. The actual time at which the node is run is related to the scheduling time of the ancestor nodes of the node. The node can start to run only if the scheduling time of the ancestor nodes arrives and the ancestor nodes are successfully run. The same logic applies to a node whose scheduling time is earlier than the scheduling time of the ancestor nodes of the node. For more information, see Impacts of dependencies between nodes on the running of the nodes.

  • Impacts of resource groups required to run the node on the node execution time

    The running of the node is determined by not only the scheduling time of the ancestor nodes of the node and whether the ancestor nodes are successfully run, but also the resource groups that are required to run the node. Whether the resources that are required to run the node are sufficient at the scheduling time of the node also affects the running of the node. For more information, see Node execution mechanisms.

  • Impacts of environments

    Only nodes that are deployed to the production environment can be automatically scheduled to run. If you want nodes to be periodically scheduled, you must deploy the nodes to the production environment. Nodes in the development environment cannot be automatically scheduled.

  • Ways in which the node is run

    In DataWorks, an auto triggered node generates instances based on the scheduling frequency and the number of scheduling cycles of the node. For example, the number of instances generated for a node scheduled by hour every day is the same as the number of scheduling cycles of the node every day. The node is run as an instance.

  • Scheduling time

    During peak hours such as early mornings, all nodes (including the root node of the current workspace) for which the scheduling time is set to 00:00 will be scheduled to run within the time range from 00:00 to 00:05.

Modes in which instances take effect

After you commit and deploy an auto triggered node to the scheduling system in the production environment, instances that can be automatically scheduled are generated for the node based on the value of the Instance Generation Mode parameter. Regardless of the value of the Instance Generation Mode parameter, you can view the latest scheduling dependencies of the node on the Cycle Task page in Operation Center.

The time at which the generated instances take effect and the time at which the scheduling dependencies of the node are updated are determined by the value of the Instance Generation Mode parameter. The following table describes the valid values of the Instance Generation Mode parameter.

Value of the Instance Generation Mode parameter

Description

Next Day

Instances generated for a node are automatically scheduled on the next day after you deploy the node to the production environment. You can view the status of the instances on the Cycle Instance page in Operation Center.

If you want to run the node on the day when you deploy the node, you can use the data backfill feature for the node. If you select the previous day as the data timestamp when you configure settings related to data backfill for the node, data backfill instances generated for the node are run in the same manner as the instances that are scheduled to run on the current day.

Immediately After Deployment

Instances generated for a node are automatically scheduled on the day when you deploy the node to the production environment. You can view the status of the instances on the Cycle Instance page in Operation Center.

Note

Instances generated for a node can be normally run only if the scheduling time of the node is later than the time when the node is deployed. If you set it to a point in time in the past, the instances are dry run and generate no data. To ensure that the instances are normally run, you must make sure that the scheduling time of the node that generates the instances is at least 10 minutes later than the time when you deploy the node.

  • If you select this instance generation mode when you create a node, whether the node generates data or is dry run on the current day is related to the scheduling time and deployment time of the node.

  • If you select this instance generation mode when you change the scheduling frequency of a node that is deployed to the production environment, DataWorks replaces the instances that are generated for the node and wait for the scheduling time to come based on the latest scheduling configurations. However, expired instances are not deleted. In this case, the scheduling dependencies of the node on the current day may be complex.

Scheduling types

The following table describes the valid values of the Recurrence parameter.

Value

Description

Scenario

Normal

If you set the Recurrence parameter to Normal, the node is run and generates data based on the settings of the Scheduling Cycle and Run At parameters.

After the node is run as expected, the descendant nodes of the node are also triggered and run. By default, the Recurrence parameter is set to Normal.

You want a node and the instances that are generated for this node to be run as expected.

Skip Execution

If you set the Recurrence parameter to Skip Execution, the node is scheduled based on the settings of the Scheduling Cycle and Run At parameters. However, the status of the node is set to Freeze and the node generates no data.

When the node is scheduled, the system directly returns a failure response and the descendant nodes cannot be run.

Note

The following icon is displayed next to the name of a node that is frozen: 暂停.

You want to freeze a node and the instances generated for the node. In this case, the current node and its descendant nodes cannot be run.

If you do not need to run a workflow within a specified period of time, you can freeze the root node of the workflow in that period of time based on your business requirements. You can also unfreeze the root node to resume the workflow based on your business requirements. For information about how to unfreeze a node, see Node freezing and unfreezing.

Dry Run

If you set the Recurrence parameter to Dry Run, the node is scheduled based on the settings of the Scheduling Cycle and Run At parameters. However, the node performs a dry run and generates no data.

When the node is scheduled, the scheduling system returns a success response. However, the running duration is 0 second, and no run logs are generated. The dry-run node does not affect the execution of its descendant nodes and occupies no resources.

You want to suspend a node in a period of time and want the descendant nodes of the node to be run as expected.

Scheduling calendar

The scheduling calendar feature is used to define the scheduling dates and the scheduling type of a node. Valid values for the Scheduling Calendar parameter in the Schedule section of the Properties tab in DataStudio:

  • Default Calendar: the calendar that is provided by DataWorks and is suitable for common scenarios.

  • Customize Calendar: the calendar that is configured by users and is suitable for industries and scenarios that require flexible scheduling dates, such as the financial industry. To configure a scheduling calendar for a node, you need to specify the items such as the workspaces to which the scheduling calendar can be applied, the validity period of the scheduling calendar, and the scheduling type of the node on a specific date. For more information, see Configure a scheduling calendar.

You can schedule the node at the specified point in time based on the selected scheduling calendar and other scheduling settings such as the scheduling type and scheduling frequency.

Scheduling frequency

The scheduling frequency of a node determines the number of cycles that the node is automatically run in the scheduling scenario. A scheduling frequency is used to define the interval at which the code logic of a node is actually executed in the scheduling system in the production environment. The node generates instances based on the scheduling frequency and the number of scheduling cycles of the node. For example, the number of instances generated for a node scheduled by hour every day is the same as the number of scheduling cycles of the node every day. The node is run as an instance. The following table describes the valid values of the Scheduling Cycle parameter.

Important
  • The scheduling frequency of a node is unrelated to the scheduling frequencies of the ancestor nodes of the node.

    The interval at which the node is scheduled is related to the scheduling frequency of the node and is unrelated to the scheduling frequencies of the ancestor nodes of the node.

  • DataWorks allows you to configure scheduling dependencies between nodes whose scheduling frequencies are different.

    In DataWorks, an auto triggered node generates instances based on the scheduling frequency and the number of scheduling cycles of the node. For example, the number of instances generated for a node scheduled by hour every day is the same as the number of scheduling cycles of the node every day. The node is run as an instance. In essence, dependencies between auto triggered nodes are dependencies between instances that are generated for the nodes. The number of instances generated for ancestor and descendant auto triggered nodes and dependencies between the instances vary based on the scheduling frequencies of the ancestor and descendant nodes. For information about scheduling dependencies between nodes whose scheduling frequencies are different, see Principles and samples of scheduling configurations in complex dependency scenarios.

  • Dry-run instances are generated for a node on the days when the node is not scheduled to run.

    For a node that is not scheduled to run every day, such as a node scheduled by week or month, DataWorks generates dry-run instances for the node on the days when it is not scheduled to run. The dry-run instances return success results when the scheduling time of the node arrives on these days. This way, if a node scheduled by day depends on the node scheduled by week or month, the node scheduled by day can be run as expected. In this case, the node scheduled by week or month is dry run, but the node scheduled by day is run as scheduled.

  • Node execution time

    You can specify the time at which you want to schedule a node. The actual time at which the node is run is affected by multiple factors. The running of a node is affected by various factors such as the scheduling time of the ancestor nodes of the node, resources required to run the node, and conditions for running the node. For more information, see Node execution conditions.

Value of the Scheduling Cycle parameter

Description

Sample configuration in a typical scenario

Minute

If a node is scheduled by minute, the node is automatically run once every N minutes within a specific period of time every day. The minimum interval for running a node that is scheduled by minute is 5 minutes.

The node is run once every 30 minutes.分钟调度

Hour

If a node is scheduled by hour, the node is automatically run once every N hours within a specific period of time every day.

The node is run once every hour.小时调度

Day

If a node is scheduled by day, the node is automatically run at a specified point in time every day. If you create an auto triggered node that is scheduled by day, the node is scheduled to run at a point in time in the period from 00:00 to 00:30 every day by default. You can change the scheduling time of the node based on your business requirements.

The node is run at 00:00 every day.日示例

Week

If a node is scheduled by week, the node is automatically run at a specified point in time on specific days every week.

Important

For a node that is not scheduled every day, DataWorks generates dry-run instances for the node on the days when it is not scheduled to run. The dry-run instances return success results but do not generate data.

The node is run at 12:00 every Friday.周示例

Month

If a node is scheduled by month, the node is automatically run at a specified point in time on specific days every month.

Important

For a node that is not scheduled every day, DataWorks generates dry-run instances for the node on the days when it is not scheduled to run. The dry-run instances return success results but do not generate data.

The node is run at a specified point in time on the last day of every month.月示例

Year

If a node is scheduled by year, the node is automatically run at a specified point in time on specific days every year.

Important

For a node that is not scheduled every day, DataWorks generates dry-run instances for the node on the days when it is not scheduled to run. The dry-run instances return success results but do not generate data.

The node is run at a specified point in time on the last day of the first month in every quarter.季度示例

Timeout period

You can use the Timeout definition parameter to specify a timeout period for a node. If the period of time for which the node is run exceeds the specified timeout period, the node fails. Take note of the following items when you use this parameter:

  • The timeout period applies to auto triggered node instances, data backfill instances, and test instances.

  • The default timeout period ranges from 72 hours to 168 hours. The system automatically adjusts the default timeout period for a node based on system loads.

  • You can customize a timeout period, but it cannot exceed 168 hours.

Note

For exclusive resource groups for scheduling that you purchased before January 7, 2021, submit a ticket to contact the technical support staff to update the resource groups. The Timeout definition parameter is available only after you update the resource groups.

Rerun properties

In the Schedule section, you can configure the conditions, interval, and number of times for rerunning a node.

Note
  • When you configure rerun properties for a node, make sure that the data idempotence of the node is not affected based on your business requirements. This helps prevent data quality issues after a failed node is rerun. For example, when you create and develop an ODPS SQL node, you can replace the INSERT INTO statement with the INSERT OVERWRITE statement.

  • You can go to the Scheduling Settings tab of the Settings page in DataStudio to configure default scheduling settings for nodes to be created. For more information, see Configure scheduling settings.

  • Rerun

    The following table describes the valid values of the Rerun parameter.

    Note

    You can click Specify Default Value next to the Rerun parameter to go to the Scheduling Settings tab.

    Value

    Scenario

    Allow Regardless of Running Status

    If the data idempotence of a node is not affected after the node is rerun multiple times, you can set the Rerun parameter to this value.

    Allow upon Failure Only

    If the rerun of a failed node does not affect the data idempotence but the rerun of a successful node does, you can set the Rerun parameter to this value.

    Disallow Regardless of Running Status

    If the data idempotence of a node cannot be ensured after the node is rerun, you can set the Rerun parameter to this value.

    Note
    • If you set the Rerun parameter to Disallow Regardless of Running Status, the system does not automatically rerun the node after the system recovers from an exception.

    • The Auto Rerun upon Error parameter is not displayed if you set the Rerun parameter to Disallow Regardless of Running Status.

  • Auto Rerun upon Error

    The following table describes the parameters you must configure if you allow automatic reruns after an error occurs.

    Parameter

    Description

    Number of Reruns

    The default number of times that an auto triggered node is rerun after it fails to run as scheduled.

    Valid values: 1 to 10. The value 1 indicates that the node is rerun once after it fails to run as expected. The value 10 indicates that the node is rerun ten times after it fails to run as expected. You can configure this parameter based on your business requirements.

    Rerun Interval

    The interval at which a node is rerun after it fails to run as scheduled. You can configure this parameter based on your requirements. Valid values: 1 to 30. Default value: 30. Unit: minutes.

    Note
    • The Auto Rerun upon Error parameter is not displayed if you set the Rerun parameter to Disallow Regardless of Running Status. In this case, the node is not allowed to rerun after it fails to run as scheduled.

    • You can set the default number of reruns and default rerun interval for the nodes in a workspace on the Scheduling Settings tab. For more information, see Configure scheduling settings.

    • The automatic rerun feature does not take effect if a node fails because the timeout period is exceeded.

Validity period

You can specify a validity period during which a node is automatically run as scheduled. The node is not automatically run in the period of time that falls out of the specified time range. Nodes whose validity period expires are expired nodes. You can view the number of expired nodes on the Overview page of Operation Center and undeploy the nodes based on your requirements.

Appendix: Description of the dry-run property

For a node that is scheduled by week, month, or year, the scheduling system runs the node at the scheduling time every day. On the days that are not specified to run the node, the node performs a dry run and the node generates no data. The following descriptions provide the effects of a dry run:

  • The scheduling system directly returns a success response, and the running duration is 0 second.

  • No run logs are generated.

  • The running of descendant nodes is not affected.

  • No resources are occupied.