Nodes and workflows in your project often need to run on a recurring schedule. To run them on a schedule, you must configure scheduling properties such as the scheduling period, scheduling dependencies, and scheduling parameters in the scheduling configuration panel for each node or workflow. This topic describes how to configure these scheduling properties.
Prerequisites
You have created a node. In DataWorks, you create nodes to define tasks. Different engine tasks are represented by different node types. You can choose the appropriate node type for your business needs. For more information, see Develop nodes.
The scheduling switch for the workspace is enabled. Tasks in a DataWorks workspace can run automatically based on their configurations only after you turn on the Enable Periodic Scheduling switch for the workspace. To do so, go to the Scheduling Settings page of the workspace. For more information, see System settings.
Important
These configurations take effect only after the task is published to the production environment.
The scheduling time only defines the expected execution time of a task. The actual execution time also depends on the status of its ancestor nodes. For more information about task execution conditions, see Diagnose task runs.
DataWorks supports dependencies between various types of tasks. Before you configure dependencies, we recommend that you read Principles and examples for configuring scheduling in complex dependency scenarios to understand the default dependency behaviors in DataWorks for complex scenarios.
In DataWorks, a scheduled task generates corresponding recurring instances based on its scheduling type and period. For example, an hourly task generates a specific number of hourly instances each day. These instances then run the task automatically.
When you use scheduling parameters, the scheduled time of each run and your parameter expressions determine the parameter values passed to the code. For more information about how scheduling parameters are configured and replaced, see Sources and expressions of scheduling parameters.
A workflow includes the workflow node itself and its internal nodes, creating complex dependencies. This topic describes only the scheduling and dependency configuration for individual nodes. For detailed information about workflow scheduling dependencies, see Orchestrate recurring workflows.
Go to the scheduling configuration page
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
Go to the scheduling configuration page.
In the DataStudio interface, find the target node and open its editor page.
Click Scheduling Configuration in the right-side navigation pane of the node editor page.
Configure node scheduling properties
On the scheduling configuration page of the node, you need to configure the node's Scheduling Parameters, Scheduling Policy, Scheduling Time, Scheduling Dependencies, and Node Output Parameters.
Scheduling parameters (optional)
If you have defined variables in the node's code, you must assign values to them here.
Scheduling parameters are automatically replaced with specific values based on the business date of the scheduled task and the format of the parameter expressions. This enables dynamic parameter replacement at runtime.
Configure scheduling parameters
You can define scheduling parameters in the following two ways.
Method | Description | Example |
Add Parameter | You can configure multiple scheduling parameters for a single task. To add more parameters, click Add Parameter.
|
|
Load parameters from code | This feature automatically identifies variable names defined in the task's code and adds them as scheduling parameters for use in scheduled runs. Note Typically, variables are defined in the code using the The variable definition format for PyODPS nodes and general Shell nodes is different from other node types. For details on the scheduling parameter formats for different node types, see Examples of scheduling parameter configuration for different node types. |
|
Supported formats for scheduling parameters
For more information, see Sources and expressions of scheduling parameters.
Verify scheduling parameters in production
To prevent task failures from incorrect scheduling parameters, we recommend going to the Auto Triggered Task page in Operation Center to check the scheduling parameter configuration of the task in the production environment after publishing it. For more information about how to view Auto Triggered Tasks, see Manage Auto Triggered Tasks.
Scheduling policy
The scheduling policy defines the instance generation mode, scheduling type, computing resources, and resource group for an Auto Triggered Task.
Parameter | Description |
Instance generation mode | After a node is published to the production scheduling system, the platform generates automated recurring instances based on the configured Instance generation mode.
|
Scheduling type |
|
Timeout period | If you set a timeout period, the task automatically terminates if its runtime exceeds this duration. The following rules apply:
|
Rerun property | Specifies whether and when the node can be rerun. You must specify a rerun property. The supported types and their use cases are as follows:
|
Automatic Rerun Upon Failure | When enabled, if a task fails (excluding manual termination), the scheduling system automatically triggers a rerun based on the number of retries and retry interval.
Note
|
Computing resource | Configure the engine resources required to run the task. To create new resources, you can go to manage computing resources. |
Computing quota | You can configure the computing quota required to run the task in MaxCompute SQL nodes and MaxCompute Script nodes. Quotas provide computing resources (CPU and memory) for computing jobs. |
Schedule resource group | Configure the schedule resource group used to run the task. Select a resource group as needed.
|
Maximum parallel instances | Limits the maximum number of parallel instances for a single task to provide concurrency control and resource protection. By default, the number of parallel instances is not limited. When this limit is enabled, you can set the number of parallel instances. The default value is
|
Dataset | Click
|
Scheduling time
Use scheduling time to configure the period, time, and other information for the automated execution of a scheduled task.
For nodes in a workflow, the Scheduling Time and related parameters are set in the Scheduling Configuration of the workflow page. For nodes that are not in a workflow, the Scheduling Time is set in the Scheduling Configuration of each individual node.
Important
Task scheduling frequency is independent of ancestor task periods
The frequency at which a task is scheduled depends on its own defined scheduling period, not that of its ancestor tasks.
DataWorks supports dependencies between tasks with different scheduling periods
In DataWorks, a scheduled task generates corresponding recurring instances based on its scheduling type and period (for example, an hourly task generates a number of hourly instances each day) and runs through these instances. The dependencies set between recurring tasks are essentially dependencies between the instances they generate. The number of recurring instances and their dependency relationships vary for ancestor and descendant tasks with different scheduling types. For more information on dependencies between tasks with different scheduling periods, see Choose a scheduling dependency method (cross-cycle dependencies).
Tasks perform dry runs outside their scheduled time
Non-daily tasks (such as weekly or monthly) perform a dry-run and return a success status on non-scheduled days. This allows any daily descendant tasks to run normally on their own schedule.
Task execution time
This section only configures the expected scheduling time for a task. The actual execution time depends on multiple factors, such as the completion time of ancestor tasks, resource availability, and the actual run conditions of the task. For more information, see Task run conditions.
Configure scheduling time
Parameter | Description |
Scheduling period | The scheduling period defines how often a task runs automatically. It determines how frequently the code logic within a node is executed in the production environment. A scheduled task generates corresponding recurring instances based on its scheduling type and period (for example, an hourly task generates a number of hourly instances each day) and runs automatically through these recurring instances.
Important Weekly, monthly, and yearly tasks still generate instances daily outside of their scheduled run times. These instances show a success status but actually perform a dry-run and do not execute the task. |
Effective date | A scheduled node is effective and runs automatically within its effective date range. Tasks that are past their effective date are no longer scheduled automatically. These are considered expired tasks. You can view the number of expired tasks on the O&M dashboard and take actions such as decommissioning them. |
Cron expression | This expression is automatically generated based on the time property settings and does not need to be configured. |
Scheduling dependencies
In DataWorks, scheduling dependencies define the ancestor-descendant relationships between nodes. A descendant node runs only after all its ancestor nodes have run successfully. This structure prevents a descendant node from accessing data before its ancestor nodes have finished generating it, thus avoiding data consistency issues.
Important
After node dependencies are configured, by default, one of the conditions for a descendant node to run is that all of its ancestor nodes have run successfully. Otherwise, data quality issues may occur when the current task retrieves data.
The actual run time of a task depends not only on its own scheduled time (the expected execution time in a scheduling scenario) but also on the completion time of its ancestor tasks. A descendant task does not run even if its scheduled time is earlier than that of an ancestor task if the ancestor task has not completed its run. For more information about task run conditions, see Diagnose task runs.
Configure scheduling dependencies
Task dependencies in DataWorks are ultimately designed to ensure that descendant nodes retrieve data correctly, which in practice means they rely on the data lineage between ancestor and descendant tables. You can choose whether to configure scheduling dependencies based on table lineage according to your business needs. The process for configuring node scheduling dependencies is as follows.
A dependency implies a strong data lineage relationship, meaning the descendant node's output relies on the ancestor node's output. Before configuring a dependency, confirm that this relationship is required. Ask: "Will the task fail or produce incorrect results if its ancestor's data is not ready?" If yes, a strong dependency exists.
Step | Description |
① | To avoid unexpected execution times for the current task, first assess whether a strong dependency exists between tables to determine if you need to configure scheduling dependencies based on data lineage. |
② | Confirm whether the data is from a table produced by an Auto Triggered Task. DataWorks cannot monitor data production through task run status for tables not produced by its recurring scheduler. Therefore, scheduling dependencies cannot be configured for some tables. Tables not produced by DataWorks recurring schedules include, but are not limited to, the following types:
|
③④ | Depending on whether you need to depend on yesterday's or today's data from the ancestor, or whether an hourly or minute task needs to depend on its own previous instance, choose to depend on the same cycle or the previous cycle of the ancestor.
Note For details on configuring dependency scenarios based on data lineage, see Choose a scheduling dependency method (same-cycle dependencies). |
⑤⑥⑦ | After configuring the dependencies and publishing to the production environment, you can check the task's dependency relationships in Auto Triggered Task in Operation Center to verify that they are correct. |
Configure custom node dependencies
If there is no strong data lineage dependency between tasks on DataWorks (for example, the task does not strongly depend on a specific partition of an ancestor but only retrieves the latest partition at the current time), or if the dependent data is not from a table produced by an Auto Triggered Task (for example, locally uploaded table data), you can customize the node's dependencies. The custom dependency configurations are as follows:
Depend on the root node of the workspace
For scenarios such as a synchronization task where the source data comes from another business database, or an SQL task processing table data produced by a real-time synchronization task, you can directly choose to mount the dependency on the workspace root node.
Depend on a zero load node
When a workspace contains many or complex business processes, you can use a zero load node to manage them. By mounting dependencies of nodes that require unified control onto a specific zero load node, you can make the data flow path within the workspace clearer. For example, you can control the overall scheduling time or enable/disable scheduling (freeze) for an entire business process.
Node output parameters
You can pass a value from an ancestor node to a descendant node. To do this, define an output parameter in the ancestor node, then create an input parameter in the descendant node that references it.
Important
A node's output parameter can only be used as an input parameter for a descendant node (you add a parameter in the descendant node's scheduling parameters section and bind it to the ancestor's parameter by clicking the
icon). Some nodes cannot directly pass query results to descendant nodes. If you need to pass query results from an ancestor node to a descendant node, you can use an assignment node. For more information, see Assignment nodes.Nodes that support output parameters are:
EMR Hive,EMR Spark SQL,ODPS Script,Hologres SQL,AnalyticDB for PostgreSQL, andMySQLnodes.
Configure node output parameters
The value of a Node Output Parameter can be a Constant or a Variable.
After defining the output parameters and submitting the current node, you can Bind The Output Parameter Of The Ancestor Node as an input parameter for the descendant node when configuring its scheduling parameters.

Parameter name: The name of the defined output parameter.
Parameter value: The value of the output parameter. Value types include constants and variables:
A constant is a fixed string.
Variables include system-supported global variables, built-in scheduling parameters, and custom parameters.
Configure a linked role for a node
DataWorks linked roles let you assign a preset RAM role to a specific task node. When the task runs, it dynamically obtains temporary access credentials for the role through Alibaba Cloud Security Token Service (STS). This allows your code to access other cloud resources without needing to include a permanent AccessKey (AK) in plain text.
Resource group restrictions: Only nodes that run on a Serverless resource group are supported.
Node type restrictions: Only Python, Shell, Notebook, PyODPS 2, and PyODPS 3 nodes are supported.
1. Configure a linked role in a DataWorks node
On the right side of the node editing page, find and click Run Configuration.
In the scheduling settings panel, switch to the Linked Role tab.
From the RAM Role drop-down list, select the RAM role that you prepared.
ImportantIf the drop-down list is empty or you cannot find the required role, see Configure a linked role to access other cloud services by using STS to complete the RAM role configuration.
After the configuration is complete, submit the node. This configuration takes effect only for debug runs.
2. Obtain and use temporary credentials in your code
After you configure the linked role, DataWorks injects the obtained temporary credentials into the runtime environment when the task runs. You can obtain them in your code in the following two ways.
Method 1: Read environment variables (recommended for Shell and Python)
The system automatically sets the following three environment variables. You can read them directly in your code.
LINKED_ROLE_ACCESS_KEY_ID: The temporary AccessKey ID.LINKED_ROLE_ACCESS_KEY_SECRET: The temporary AccessKey secret.LINKED_ROLE_SECURITY_TOKEN: The temporary security token.
Code sample (Python):
For this case, you must select a custom Python image with oss2 installed for the runtime environment. For more information, see Custom images.
import os
import oss2
# 1. Obtain temporary credentials from environment variables.
access_key_id = os.environ.get('LINKED_ROLE_ACCESS_KEY_ID')
access_key_secret = os.environ.get('LINKED_ROLE_ACCESS_KEY_SECRET')
security_token = os.environ.get('LINKED_ROLE_SECURITY_TOKEN')
# Check if the credentials were obtained.
if not all([access_key_id, access_key_secret, security_token]):
raise Exception("Failed to get linked role credentials from environment variables.")
# 2. Use the temporary credentials to initialize the OSS client.
# Assume that you have granted the role permissions to access 'your-bucket-name'.
auth = oss2.StsAuth(access_key_id, access_key_secret, security_token)
bucket = oss2.Bucket(auth, 'http://oss-<regionID>-internal.aliyuncs.com', 'your-bucket-name')
# 3. Use the client to access OSS resources.
try:
# List objects in the bucket.
for obj in oss2.ObjectIterator(bucket):
print('object name: ' + obj.key)
print("Successfully accessed OSS with linked role.")
except oss2.exceptions.OssError as e:
print(f"Error accessing OSS: {e}")Code sample (Shell):
#!/bin/bash
access_key_id=${LINKED_ROLE_ACCESS_KEY_ID}
access_key_secret=${LINKED_ROLE_ACCESS_KEY_SECRET}
security_token=${LINKED_ROLE_SECURITY_TOKEN}
# To access OSS, replace regionID, bucket_name, and file_name with your actual information.
echo "ID: "$access_key_id
echo "token: "$security_token
ls -al /home/admin/usertools/tools/
# This example shows how to use ossutil to download a file from a specified OSS path to the local test_dw.py file and then print the file content.
/home/admin/usertools/tools/ossutil64 cp --access-key-id $access_key_id --access-key-secret $access_key_secret --sts-token $security_token --endpoint http://oss-<regionID>-internal.aliyuncs.com oss://<bucket_name>/<file_name> test_dw.py
echo "************************ Success ************************, printing result"
cat test_dw.pyMethod 2: Use Credentials Client (recommended for Python)
Code sample (Python):
For this case, you must select a custom Python image with oss2 and alibabacloud_credentials installed for the runtime environment. For more information, see Custom images.
from alibabacloud_credentials.client import Client as CredentialClient
import oss2
# 1. Use the SDK to automatically obtain credentials.
# It automatically searches for credential information such as LINKED_ROLE_* in environment variables.
cred_client = CredentialClient()
credential = cred_client.get_credential()
access_key_id = credential.get_access_key_id()
access_key_secret = credential.get_access_key_secret()
security_token = credential.get_security_token()
if not all([access_key_id, access_key_secret, security_token]):
raise Exception("Failed to get linked role credentials via SDK.")
# 2. Use the credentials to initialize the OSS client.
auth = oss2.StsAuth(access_key_id, access_key_secret, security_token)
bucket = oss2.Bucket(auth, 'http://oss-cn-hangzhou.aliyuncs.com', 'your-bucket-name')
# 3. Access OSS.
print("Listing objects in bucket...")
for obj in oss2.ObjectIterator(bucket):
print(' - ' + obj.key)
print("Successfully accessed OSS with linked role via SDK.")3. Run and verify the task
Shell and Python: When the task runs, it uses the specified RAM role to access other cloud services.
PyODPS: When accessing other cloud services such as OSS, the task uses the identity of the RAM role that you set. However, when accessing MaxCompute data, it still automatically uses the access identity configured for the computing resources at the project level.
Configure scheduling properties
After you finish debugging the node, you must synchronize the Run Configuration in the Run Configurations to the setting in the Scheduling Configurations. After publishing, the task will run as this role.
If you configure a custom image in the Run Configuration, you must also synchronize this setting to the scheduling settings.
View the execution role in Operation Center
After the task runs, view the details of the task instance in Operation Center to confirm that the specified role was used.
Go to .
Find the instance of the node that you ran and click it to go to the details page.
In the Properties section of the instance details page, view the Execution Identity field. This field displays the Alibaba Cloud Resource Name (ARN) of the linked role that was actually used for this run.
An ARN is a unique resource identifier. For more information, see Basic elements of a policy.
References
Scheduling parameters: Scheduling parameter format reference.
Scheduling policies:
Scheduling time: Scheduling time reference.
Scheduling dependencies:
Node output parameters: Configure and use node context parameters.
Other references: Impact of daylight saving time changes on scheduled task execution.


to add an