Integration and computing task instances are generated when scheduled, auto-triggered integration and computing tasks run. You can perform operations management (O&M) on these instances. Supported operations include viewing operational logs, rerunning the current instance, forcing a rerun of the current instance, forcing a rerun of descendant nodes, and viewing node code. This topic describes how to view and manage integration and computing task instances.
Prerequisites
You can view Gantt charts only after you purchase the artificial intelligence for IT operations value-added service and enable the artificial intelligence for IT operations module for the current tenant.
Accessing the integration and compute task instances page
In the top menu bar of the Dataphin home page, choose Develop > O&M.
In the navigation pane on the left, choose Instance O&M > Recurring Instance.
In the top menu bar, select the production or development environment.
On the Recurring Instance page, click the Integration and Computing Task tab.
Operations supported in the integration and computing task instance list
After auto-triggered integration and computing tasks generate instances, the instances are displayed in a list on the Integration and Computing Task tab. This list displays the instance object, instance ID, status, schedule cycle, data timestamp, scheduled run time, start time, end time, duration, retries/auto-retries, priority, owner, project, related baseline instances, HTTP path, schedule resource group, tags, and supported operations.
Instance Object: A recurring instance object is generated when an auto-triggered task runs. This column displays the name and ID of the instance object and identifies the schedule type of the task. Click the
icon next to the column name to sort by object name in ascending or descending order. For more information, see Description of recurring instance markers and .Status: The current status of the instance. Possible values are Succeeded, Failed, Running, Waiting for Schedule Time, Throttled, Waiting for Schedule Resources, and Not Run. For more information about the status icons and their details, see Description of recurring instance statuses .
Start Running Time: The time when the instance starts running. Click the
icon next to the column name to sort by start time in ascending or descending order.NoteThe start time of a logical table node is the time when the earliest internal materialization node of the instance object starts running.
End Running Time: The time when the instance stops running. Click the
icon next to the column name to sort by end time in ascending or descending order.NoteThe end time of a logical table node is the time when the latest internal materialization node of the instance object stops running.
Retries/Auto-retries: The number of manual retries and automatic retries.
Retries = Runs - 1.Running Duration: The total time that the instance runs. Click the
icon next to the column name to sort by duration in ascending or descending order.NoteThe duration of a logical table node is the time difference between the start time of the earliest internal materialization node and the end time of the latest internal materialization node.
Priority: The priority level of the instance.
NoteIf the baseline feature is enabled, the priority of a baseline task is the highest priority among all its baselines. This overrides the original priority configured for the task.
Project: The project to which the task belongs. The project is displayed in the
Project English Name (Project Chinese Name)format.Related Baseline Instances: The baseline that the node guarantees, and any related baselines that have this node as an ancestor node.
NoteIf the baseline feature is disabled, this field is not displayed.
HTTP path: Based on the selected production or development environment, this column displays the production or development HTTP path of the Databricks SQL instance.
NoteThis column is displayed only for Databricks SQL instances. For other types of tasks, a hyphen (-) is displayed.
Resource Group: The name of the schedule resource group that the instance uses at runtime.
If the custom resource group specified for the task is not active, the project's default resource group is used. If the project's default resource group is also not active, the tenant's default resource group is used. The priority order is: Custom resource group > Project default resource group > Tenant default resource group.
NoteWhen you change the project's default resource group, the change may not be immediately reflected in the UI. However, the modified resource group is used for the next run.
Tenant default resource group: This resource group does not belong to any project. Each tenant has only one default resource group. It is used to schedule a task if the task does not have a specified custom resource group or if the project does not have a specified project default resource group. This applies only to exclusive resource tasks and excludes task types such as SQL and virtual tasks.
The following table describes the operations supported in the integration and computing task instance list.
Operation | Description |
DAG | Click the |
View Operational Log | Click the |
Rerun | Click the If your business scenario requires a rerun, you can perform a forced rerun. |
View Gantt Chart | Click the
For more information about the Gantt chart, see View the Gantt chart of a critical path. |
Download Ancestor and Descendant Nodes | Downloads a list of the upstream and downstream nodes for the current node. The list includes all columns (including list items that are not displayed). Click Download Ancestor And Descendant Nodes. In the Download Ancestor And Descendant Nodes dialog box, select the levels for the ancestor and descendant nodes. You can select from Layer 1 to Layer 10 or Unlimited Layers, with Layer 1 being the default for both. After you select the layers, click OK to download the Excel file, which is named |
View Node Code | Click the Logical Code: The task code that you write. Physical Code: The compiled code that can run on the Flink engine. |
Recurring Task | Click the |
Edit Development Node | Click the Note You can edit development nodes only for integration and computing task instances in Dev-Prod mode projects. |
View Production Node | Click the Note You can view development nodes only for integration and computing task instances in Dev-Prod mode projects. |
Edit Node | Click the Note You can edit nodes only for integration and computing task instances in Basic mode projects. |
Rerun Downstream | Click the To rerun the entire dependency chain, we recommend that you force a rerun of the downstream instances. For more information, see Force a rerun of downstream instances. The rerun downstream operation is often used in the following scenarios:
|
Set To Success & Resume | Click the |
Stop | Click the Note You cannot stop instances that are in the Succeeded, Failed, or Not Run state. You can stop instances in any other state. The stop operation is often used in the following scenarios:
|
Forced Rerun | Click the
Important A forced rerun does not check whether all upstream instances have run successfully or whether the scheduled run time of the current instance has been reached. This can lead to run failures or data quality issues. Before you proceed, make sure that the operation does not affect downstream data. |
Remove Upstream Dependencies | Click the Important You must keep at least one upstream instance. |
Pause | Click the Note
|
Resume | Click the |
Modify HTTP Path | Modify the production environment HTTP path of the task. You can select any HTTP path configured for the cluster that corresponds to the production project. Note This operation is supported only when you select a Databricks SQL task in the production environment. |
Modify Schedule Resource Group | Click the Note
|
Modify Priority | Click the |
Operations supported for DAG nodes of integration and computing task instances
The Directed Acyclic Graph (DAG) shows the upstream and downstream dependencies of instance nodes. You can also perform O&M on upstream and downstream instance nodes. By default, the DAG displays the main node (the selected node) and its immediate ancestor and descendant nodes. You can select an integration or computing task instance node to perform O&M operations on the instance.
Dataphin supports O&M for instance nodes across different projects. To perform O&M operations on a cross-project instance node, you must have the required view and operation permissions for the project where the instance resides.
Operations supported in the DAG
Operation
Description
Expand Parent Nodes
Expand the dependency nodes at different levels of the main node in the DAG.
Expand Child Nodes
View Task
Go to the DAG of the task node that generates the current instance node. You can view the task node details, information about its upstream and downstream nodes, and perform O&M on the task node. For more information, see auto triggered tasks.
View Operation Logs
View the logs of operations performed on the instance.
Operations supported for DAG nodes
Hover over a DAG node to view its name, type, schedule cycle, owner, and description. The operations supported for DAG nodes are the same as those supported in the instance list. For more information, see Operations supported in the integration and computing task instance list .
Batch operations for integration and computing task instances
The following table describes the batch operations supported for auto-triggered integration and computing tasks.
Operation | Description |
Rerun |
|
Stop |
|
Set To Success & Resume | Select multiple instances to manually set the status of failed or not run instances to Succeeded in a batch. This allows them to participate in scheduling. |
Pause |
|
Resume | Resume paused recurring instances in a batch. |
Modify HTTP Path | Modify the production environment HTTP path for multiple Databricks SQL instances. If the selected Databricks SQL instances belong to different Databricks clusters, specify an HTTP path for each cluster. You can select any HTTP path configured for the corresponding cluster. Note This operation is supported only when you select Databricks SQL instances in the production environment. |
Modify Schedule Resource Group | Modify the schedule resource group that instances use at runtime. Note
|
Modify Priority | Modify the priority of the selected instances in a batch. You can select Highest, High, Medium, Low, or Lowest. |
Download All | Download the data of all recurring instances, including integration, computing, and modeling task instances, to your computer. The downloaded file is in the .xlsx format. The file is named in the The table contains the following information: instance object, instance ID, status, schedule cycle, data timestamp, priority, owner, project (if a logical aggregate table belongs to multiple projects, the project names are separated by commas (`, `)), scheduled run time, start time, end time, duration, retries/auto-retries, related baseline instances (if an instance is associated with multiple baselines, the baseline names are separated by commas (`, `)), and schedule resource group (this parameter is empty for modeling task instances). |
Rerun downstream
In the Rerun Downstream dialog box, configure the parameters.
NoteYou cannot rerun descendant nodes that have a Waiting or Running status. To rerun the entire dependency chain, we recommend that you force a rerun of the downstream instances. For more information, see Force a rerun of downstream instances .
Parameter
Description
Start Node Run Mode
Define the run mode of the start node. You can select Dry-run or Normal run.
Dry-run: The status of a dry-run instance is Succeeded (Normal). The operational log is empty, no duration is recorded, and no data is processed.
Normal Run: The instance is scheduled as normal.
Downstream Rerun Scope
Select the scope of descendant nodes to rerun.
All Failed Instances: The list of descendant nodes is not displayed. The system automatically selects all descendant instances that have failed and reruns them.
Custom: If you want to specify the descendant instances to rerun, select this option. You can search for nodes by name or ID, or filter them by status, owner, or project.
Click OK.
After you rerun the downstream nodes, the data of the descendant instances is updated.
Force a rerun of downstream instances
In the Force Rerun Downstream dialog box, configure the rerun parameters.
Parameter
Description
Start Node Run Mode
Define the run mode of the start node. You can select Dry-run or Normal run.
Dry-run: The status of a dry-run instance is Succeeded (Normal). The operational log is empty, no duration is recorded, and no data is processed.
Normal Run: The instance is scheduled as normal.
Downstream Forced Rerun Scope
Select the scope of descendant nodes to force a rerun.
All Instances: Select all descendant instance nodes of the start node.
Custom: If you want to specify the descendant instances to rerun, select this option. You can search for nodes by name or ID, or filter them by status, owner, or project.
Click OK.