Manage integration and computing tasks - Dataphin - Alibaba Cloud Documentation Center

The Integration and Computing Tasks page lists all computing, sync, and integration tasks. Each task corresponds to a scheduling node. This topic describes how to view and manage these tasks from the perspective of their nodes.

Entry point to the Integration and Compute Tasks page

In the top menu bar of the Dataphin homepage, choose Development > Operation Center.
In the left navigation pane, select Task O&M > Auto-triggered Tasks.
In the top menu bar, select the production or development environment.
On the Recurring Task page, click the Integration and Computing Tasks tab.

Operations in the Integration and Computing Tasks list

After you submit integration and computing tasks to the Operation Center, they appear in the Recurring Task > Integration and Computing Tasks list. The list displays details for each task, including the task object, recurrence, priority, owner, related baselines, project, HTTP path, schedule resource group, last update time, tags, and available operations.

Task Object: An auto-triggered script task that is submitted to the Operation Center. The script name, script ID, and scheduling method of the task are displayed. For more information, see Scheduling methods.
Recurrence: The time when the task is scheduled to run in the scheduling time zone.
Priority: The priority of the task. If the baseline feature is enabled, the task uses the highest priority among its related baselines, which overwrites the original priority of the task.
Related Baselines: The baselines related to the task. This includes baselines for which the task is a descendant node and baselines that include the task as a protected ancestor node.
Note
This field is not displayed if the baseline feature is disabled.
Project: The project to which the task belongs. The project is displayed in the Project English name (Project Chinese name) format.
HTTP path: The HTTP path of a Databricks SQL task in the production or development environment.
Note
This parameter is displayed only for Databricks SQL tasks. For other types of tasks, - is displayed.
Resource Group: The name of the schedule resource group that is used by the instance at runtime.
If the custom resource group specified for the task is not active, the project default resource group is used. If the project default resource group is also not active, the tenant default resource group is used. The priority order is: Custom resource group > Project default resource group > Tenant default resource group.
Note
When you change the project default resource group, the update may be delayed in the UI. However, the modified resource group is used for the actual task execution.
Tenant default resource group: This resource group does not belong to any project. Each tenant has only one default resource group. This resource group is used to schedule a task if the task is not assigned to a specific custom resource group and the project to which the task belongs does not have a project default resource group. This applies only to exclusive resource tasks, and not to other tasks such as SQL or virtual tasks.

The following table describes the operations that you can perform on integration and computing tasks.

Operation	Description
DAG	Click to view the DAG of the integration or computing task. For more information, see Operations on DAG nodes of integration and computing tasks.
View Recurring Instances	View the recurring instances that are generated for the task. You can also manage the recurring instances.
Edit Development Node	Go to the edit page of the task in the Dev project to edit the task. Note This operation is available only in the Dev-Prod development mode.
Edit Node	Go to the edit page of the task to edit the task. Note This operation is available only in Basic mode.
View Production Node	Go to the Prod project to view the configuration of the task. Note This feature is not supported for tasks that are in Basic mode or in the Dev-Prod development mode but are not published to the production environment.
View Node Code	View the code of the integration or computing task node.
View Data Backfill Instances	View and manage the data backfill instances that are generated by data backfill operations.
Data Backfill	Backfill data for an auto triggered task within a specified historical data timestamp range. After an auto triggered task is developed and published, the task runs at a scheduled time based on the scheduling configuration. If you want to run an auto triggered task in a specific period of time or backfill data for a historical period of time, use the data backfill feature. For more information about how to backfill data for an integration or computing task node, see Appendix: Backfill data for an auto triggered task.
Download Upstream and Downstream Nodes	Download the list of upstream and downstream nodes for the current node. The list includes all columns, including those that are not displayed. After you click Download Upstream and Downstream Nodes, in the Download Upstream and Downstream Nodes dialog box, select the levels for the upstream and downstream nodes. You can select from Level 1~Level 10 or All levels. The default for both is Level 1. After you select the levels, click OK to download the Excel file. The file is named `{task_name}_upstream_and_downstream_nodes_{timestamp}.xlsx`.
Modify Owner	Modify the owner of the task. Note This applies only to the Prod environment for the Basic and Dev-Prod patterns. Modifications cannot be made in the Dev environment.
Modify Priority	Modify the priority of the task. If multiple tasks meet the scheduling conditions at the same time, the tasks with higher priorities are run first. Note This operation is available only in Basic mode and in the Prod environment of the Dev-Prod mode. You cannot modify the priority in the Dev environment. If the baseline feature is enabled, you can set the priority of a task only to Lowest, Low, or Medium. To set a higher priority, configure baselines. You cannot modify the priority of a baseline task. The priority of a baseline task is determined by the priority of the baseline. Adjust the priority of the baseline as needed. If your compute engine is MaxCompute, the mapping between Dataphin task priorities and MaxCompute job priorities is Lowest (9), Low (7), Medium (5), High (3), and Highest (1). For more information about MaxCompute job priorities, see MaxCompute job priorities. The priority of a Spark SQL task takes effect only when different priority queues are configured for tasks in HDFS of the Hadoop compute engine.
Pause	Pause the scheduling of the current task node. You can pause a task and its downstream tasks that do not need to be run for the time being but will be used later. For example, you can temporarily adjust some computing logic to prevent the output data from being affected.
Resume	Resume the scheduling of a paused node.
Configure Monitoring and Alerting	Configure monitoring rules for task execution. For more information, see Overview of offline task monitoring. Note This operation is available only for Prod and Basic projects.
Modify HTTP path	Modify the HTTP path of the task in the production environment. You can select one of all HTTP paths that are configured for the cluster of the production project. Note This operation is available only in the production environment.
Modify Resource Group	Modify the schedule resource group that is used by instances generated from the task at runtime. Note If you select tasks from multiple projects, only the schedule resource groups that are granted to all these projects are listed. We recommend that you filter tasks by a single project before you perform batch settings. The modification does not affect existing instances but affects only new instances. To modify the resource group that is used by an existing instance, perform the modification in the instance list.

Operations on DAG nodes of integration and computing tasks

The Directed Acyclic Graph (DAG) clearly shows the upstream and downstream dependencies of task nodes. You can also manage these upstream and downstream nodes. By default, the DAG displays the main node (the selected node) and its immediate upstream and downstream nodes. You can select an integration or computing task node to perform related operations and maintenance (O&M) tasks.

Dataphin supports cross-project node management. To perform O&M tasks on a cross-project script node, you must have permission to view and manage the project to which the task belongs.

Operations on the DAG
Expand Parent Nodes and Expand Child Nodes: Expand to view dependency nodes at different levels of the main node in the DAG.
Operations on DAG nodes
Hover over a DAG node to view its name, type, recurring schedule, owner, and description. The operations that you can perform on a DAG node are the same as those in the Integration and Computing Tasks list. For more information, see Operations in the Integration and Computing Tasks list.

Batch operations for integration and computing tasks

You can perform the following batch operations on auto-triggered integration and computing tasks:

Operation	Description
Pause	Pause all selected tasks. After you pause the tasks, instances are still generated for the tasks, but the instances and their downstream dependent instances are not scheduled.
Resume	Resume the scheduling of the selected tasks.
Modify Owner	Modify the owners of multiple auto triggered integration and computing tasks at a time. Note This operation is available only in Basic mode and in the Prod environment of the Dev-Prod mode. You cannot modify the owner in the Dev environment.
Modify Priority	Modify the priorities of multiple auto triggered integration and computing tasks at a time. Note This operation is available only in Basic mode and in the Prod environment of the Dev-Prod mode. You cannot modify the priority in the Dev environment. If the baseline feature is enabled, you can set the priority of a task only to Lowest, Low, or Medium. To set a higher priority, configure baselines. You cannot modify the priority of a baseline task. The priority of a baseline task is determined by the priority of the baseline. Adjust the priority of the baseline as needed. If your compute engine is MaxCompute, the mapping between Dataphin task priorities and MaxCompute job priorities is Lowest (9), Low (7), Medium (5), High (3), and Highest (1). For more information about MaxCompute job priorities, see MaxCompute job priorities. The priority of a Spark SQL task takes effect only when different priority queues are configured for tasks in HDFS of the Hadoop compute engine.
Modify HTTP path	Modify the HTTP paths of multiple Databricks SQL tasks in the production environment. If the selected Databricks SQL tasks belong to different Databricks clusters, specify an HTTP path for each cluster. You can select one of all HTTP paths that are configured for the cluster. Note This operation is available only when you select Databricks SQL tasks in the production environment.
Modify Resource Group	Modify the schedule resource groups that are used by instances generated from multiple tasks at runtime. Note If you select tasks from multiple projects, only the schedule resource groups that are granted to all these projects are listed. We recommend that you filter tasks by a single project before you perform batch settings. The modification does not affect existing instances but affects only new instances. To modify the resource group that is used by an existing instance, perform the modification in the instance list.
Data Backfill	By default, this operation backfills data for the selected tasks and their upstream and downstream tasks in Massive Mode by specifying node IDs. In addition to the timestamp range, you can customize the basic information and other configurations. For more information about the parameters, see Backfill data for the current task and its upstream and downstream tasks.