Overview of offline task property configuration - Dataphin

To run offline tasks on a recurring schedule, you must configure their scheduling properties. These properties include the scheduling epoch, dependencies, and parameters. This topic describes offline task properties and the scheduling process.

Precautions

Scheduling configuration is supported only for offline computing tasks whose scheduling type is set to auto triggered task.
A dependency defines the execution order between two nodes. The status of an upstream node affects the running status of its downstream nodes.
For nodes with dependencies, the scheduling rule is as follows: A downstream node can be scheduled only after its upstream node has finished running. The system then determines whether to run the downstream node based on its configured time.
Scheduling configurations submitted before the scheduled time take effect at the next scheduled run. If you configure dependencies after the scheduled time, the corresponding instances are not generated until the next day.
The scheduling configuration defines the runtime properties of a task. You must publish the task to the production environment before it can be scheduled based on this configuration.
The scheduled time defines the expected running time for the task. The actual running time depends on the execution status of upstream tasks. For more information about task execution conditions, see Instance running diagnostics.

Accessing offline task properties

On the Dataphin home page, choose Develop > Data Development from the top menu bar.
On the Develop page, choose Project from the top menu bar.
In the navigation pane on the left, choose Data Processing > Script Task. In the Script Task list, click the name of the target task.
On the task tab, click Property on the right to open the Property panel.

Configure offline task properties

On the offline task properties page, configure the basic information and scheduling properties for the task as described in the following table.

Configuration item	Description
Basic information	Includes the task name, ID, node type, developer owner, O&M owner, and description. Task name: The name entered when the task was created. Node ID: A unique identifier for the node. It is generated after the node is submitted. Developer owner: The current user by default. You can select any member of the current project. Note In the production environment, you cannot configure the developer owner. The configuration from the development environment is used. O&M owner: The node creator by default. You can also select a member of the current project as the O&M owner.
Running resources	The CPU and memory resources allocated to run the current task. Note This configuration is supported only for computing tasks of the following types: Python, Shell, Spark on MaxCompute, Spark on Yarn, MapReduce on MaxCompute, and MapReduce on Yarn.
Python third-party packages	Select the Python third-party packages to import. Note This configuration is supported only for Python and Shell computing tasks. After you add a third-party module from a Python package, you must declare a reference to it in the task before you can import the module in your code. You can edit the referenced module in the Python third-party packages configuration item in the task properties.
Running parameters (optional)	Defines the parameters used for node scheduling. Dataphin provides built-in parameters and supports custom parameters. This lets you dynamically assign parameter values during scheduling. Note If you define variables in the node code, you must assign values to them here. If no variables are defined, you do not need to configure this item.
Scheduling properties	Defines how the task is scheduled on a recurring basis in the production environment. Scheduling type: Defines the running status of instances generated by the task in the production environment. Priority: The priority of the task. When you create a task, its default priority is retrieved from Management Center > Development Platform > Node Task Related Settings > Default Priority. Note After the task is published to the production environment or submitted in the Basic environment, you cannot change the priority when editing the task. You must modify it in the O&M settings of the production environment. The priority value then reflects the latest value from the production environment. Effective date: Defines the time period during which the task is scheduled. The task no longer generates instances after the effective date. Scheduling epoch: Defines the scheduling frequency of the task, which is how often the task is executed. Conditional scheduling: Defines the conditions for task scheduling. You can set multiple groups of scheduling conditions. The system evaluates the conditions from top to bottom. When a condition is hit, the corresponding schedule is executed, and the evaluation of all subsequent conditions is stopped. If no conditions are hit, the default scheduling configuration is executed.
Scheduling dependencies	Defines the upstream and downstream dependencies for the task. Dependencies ensure that nodes are scheduled in an orderly manner. A downstream node starts only after its upstream node runs successfully. This guarantees the timely output of valid business data. You can use automatic parsing to quickly set node dependencies, or you can add them manually.
Running configuration	Defines the task timeout and the rerun policy for failed tasks. This prevents resource waste caused by long-running tasks and improves the reliability of task execution.
Resource configuration	Select the resource group for the computing task. The task will use resources from this group for scheduling.

What to do next

After you configure the task properties, submit and publish the task to the production environment. You can then perform O&M operations on the task in the production environment. For more information, see Operation Center.