To run offline tasks on a recurring schedule, you must configure their scheduling properties. These properties include the scheduling epoch, dependencies, and parameters. This topic describes offline task properties and the scheduling process.
Precautions
Scheduling configuration is supported only for offline computing tasks whose scheduling type is set to auto triggered task.
A dependency defines the execution order between two nodes. The status of an upstream node affects the running status of its downstream nodes.
For nodes with dependencies, the scheduling rule is as follows: A downstream node can be scheduled only after its upstream node has finished running. The system then determines whether to run the downstream node based on its configured time.
Scheduling configurations submitted before the scheduled time take effect at the next scheduled run. If you configure dependencies after the scheduled time, the corresponding instances are not generated until the next day.
The scheduling configuration defines the runtime properties of a task. You must publish the task to the production environment before it can be scheduled based on this configuration.
The scheduled time defines the expected running time for the task. The actual running time depends on the execution status of upstream tasks. For more information about task execution conditions, see Instance running diagnostics.
Accessing offline task properties
On the Dataphin home page, choose Develop > Data Development from the top menu bar.
On the Develop page, choose Project from the top menu bar.
In the navigation pane on the left, choose Data Processing > Script Task. In the Script Task list, click the name of the target task.
On the task tab, click Property on the right to open the Property panel.
Configure offline task properties
On the offline task properties page, configure the basic information and scheduling properties for the task as described in the following table.
Configuration item | Description |
Includes the task name, ID, node type, developer owner, O&M owner, and description.
| |
The CPU and memory resources allocated to run the current task. Note This configuration is supported only for computing tasks of the following types: Python, Shell, Spark on MaxCompute, Spark on Yarn, MapReduce on MaxCompute, and MapReduce on Yarn. | |
Python third-party packages | Select the Python third-party packages to import. Note
|
Defines the parameters used for node scheduling. Dataphin provides built-in parameters and supports custom parameters. This lets you dynamically assign parameter values during scheduling. Note If you define variables in the node code, you must assign values to them here. If no variables are defined, you do not need to configure this item. | |
Defines how the task is scheduled on a recurring basis in the production environment.
| |
Defines the upstream and downstream dependencies for the task. Dependencies ensure that nodes are scheduled in an orderly manner. A downstream node starts only after its upstream node runs successfully. This guarantees the timely output of valid business data. You can use automatic parsing to quickly set node dependencies, or you can add them manually. | |
Defines the task timeout and the rerun policy for failed tasks. This prevents resource waste caused by long-running tasks and improves the reliability of task execution. | |
Select the resource group for the computing task. The task will use resources from this group for scheduling. |
What to do next
After you configure the task properties, submit and publish the task to the production environment. You can then perform O&M operations on the task in the production environment. For more information, see Operation Center.