All Products
Search
Document Center

DataWorks:Backfill data for an auto triggered node and view data backfill instances generated for the node

Last Updated:Aug 15, 2023

You can backfill data of a historical period of time or a period of time in the future for an auto triggered node to write the data to time-based partitions. If scheduling parameters are used in the node code, the scheduling parameters are automatically replaced with specific values based on the data timestamp that you configure to backfill data for the node. The data that corresponds to the data timestamp is written to specific partitions based on the business code. The partitions to which the data is written are related to the logic and content of the node code. This topic describes how to backfill data for an auto triggered node and manage data backfill instances generated for the node.

Background information

After an auto triggered node is developed, committed, and deployed to the scheduling system, the scheduling system runs the node based on the scheduling configurations of the node. If you want to run the auto triggered node in a specified time range, you can backfill data for the node. For information about how to backfill data for an auto triggered node, see Backfill data for the node in this topic. The following table describes the supported data backfill modes.

Data backfill mode

Description

Backfill Data for Current Node

This mode allows you to backfill data for the current node.

Current and Descendent Nodes Retroactively

This mode allows you to backfill data for the current node and its descendant nodes at a time. If the current node has a small number of descendant nodes, we recommend that you use this mode. In this mode, you can specify the descendant nodes for which you want to backfill data.

Backfill Data for Massive Nodes

This mode allows you to backfill data for the current node and its descendant nodes at a time. If the current node has a large number of descendant nodes, we recommend that you use this mode. In this mode, you can filter the descendant nodes for which you want to backfill data by workspace. You can configure a whitelist to backfill data for some nodes that are not in the selected workspaces. You can also configure a blacklist to avoid backfilling data for some nodes that are in the selected workspaces.

Advanced Mode

This mode allows you to backfill data for multiple nodes at a time. In this mode, you can select nodes that do not have dependencies with each other. You can select nodes for which you want to backfill data in the directed acyclic graph (DAG) of an auto triggered node or in the node list on the Cycle Task page.

  • In the DAG, you can use the node aggregation feature to group nodes by workspace, owner, or priority. This way, you can backfill data for multiple nodes at a time by specifying a node group. For more information about a DAG, see Appendix: Use the features provided in a DAG.

  • You can also select nodes in the node list on the Cycle Task page. You can filter nodes based on specific conditions and select the nodes for which you want to backfill data.

Limits

  • You can use the advanced mode only in workspaces that reside in the China (Shenzhen) and UAE (Dubai) regions.

  • Data backfill instances cannot be manually deleted. The system deletes data backfill instances after their validity period elapses. The validity period of data backfill instances is approximately 30 days. If you do not need to use a data backfill instance, you can freeze it.

  • Instances that run on the shared resource group for scheduling are retained for one month (30 days), and logs for the instances are retained for one week (7 days).

  • Instances that run on exclusive resource groups for scheduling are retained for one month (30 days), and logs for the instances are also retained for one month (30 days).

  • The system regularly clears excess run logs every day when the size of run logs generated for the auto triggered node instances that finish running exceeds 3 MB.

Precautions

  • When DataWorks backfills data for a node for a specified time range, if an instance generated for the node fails on a day within the time range, the status of the data backfill instance of the node for that day is also set to failed. In this case, DataWorks does not run the instances generated for this node for the next day. DataWorks runs the instances generated for a node on a day only after all instances generated for the node on the previous day are successfully run.

  • If you backfill data of a specific day for a node scheduled by hour or minute, whether instances including those scheduled to run on that day for the node and the data backfill instances for the node are run in parallel depends on whether you configure the self-dependency for the node.

  • If both an auto triggered node instance and a data backfill instance are running for a node, you must stop the data backfill instance to ensure that the auto triggered node instance can be run as expected.

  • If you backfill data for multiple instances or run a large number of data backfill instances in parallel, scheduling resources may be insufficient. Make sure that your configurations are appropriate based on your business requirements.

  • To avoid data backfill instances from occupying large amounts of resources and affecting the running of auto triggered node instances, you must abide by the following rules that are formulated for data backfill instances:

    • If you backfill data for a node whose data timestamp is the previous day, the priority of a data backfill instance generated for the node is determined by the priority of the baseline to which the node belongs.

    • If you backfill data for a node whose data timestamp is the day before the previous day, you must abide by the following rules to downgrade the priority of the node:

      • If the priority of the node is 7 or 8, downgrade the priority of the node to 3.

      • If the priority of the node is 3 or 5, downgrade the priority of the node to 2.

      • If the priority of the node is 1, keep the priority unchanged.

Go to the Patch Data page

  1. Go to the Operation Center page.

    Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > Operation Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.

  2. In the left-side navigation pane of the Operation Center page, choose Cycle Task Maintenance > Cycle Task.

  3. Backfill data for the desired node.

    1. Open the DAG of the desired node.

      You can use one of the following methods to open the DAG of the desired node:

      • Method 1: Click the name of the desired node in the node list to open the DAG of the node.

      • Method 2: Click the Show icon to show the node list. Click DAG in the Actions column of the desired node to open the DAG of the node.

    2. In the DAG, right-click the desired node. In the shortcut menu that appears, move the pointer over Run and select a data backfill mode.

      Data backfill mode

Backfill data for the node

After you select a data backfill mode, configure the parameters in the Backfill Data dialog box and click OK.

Backfill data for the current node

Backfill data for the current nodeThe following table describes the parameters required for this mode.

Parameter

Description

Data Backfill Instance Name

DataWorks automatically generates a data backfill instance name. You can modify the name based on your business requirements.

Node

The name of the node for which you want to backfill data.

Data Timestamp

The data timestamp of the data backfill instance. A data timestamp specifies a specific date.

  • If you want to backfill data for the node for multiple non-consecutive time ranges, click Add Multiple Data Timestamp Ranges to specify multiple data timestamps.

  • If the data timestamp that you specify for a data backfill instance is later than the current date, you can select Run Retroactive Instances Scheduled to Run after the Current Time. The system immediately runs the data backfill instance after the data timestamp passes.

    For example, if the current date is August 24, 2021, the data timestamp of a data backfill instance is September 17, 2021, and you select Run Retroactive Instances Scheduled to Run after the Current Time, the system runs the data backfill instance on September 18, 2021.

Note

We recommend that you do not set this parameter to a long time range. Otherwise, data backfill instances may be delayed due to insufficient resources.

Concurrency

Specifies whether to run multiple data backfill instances in parallel.

  • If you set Concurrency to No, the data backfill instances are run in sequence based on the data timestamps.

  • If you set Concurrency to Yes, a specified number of data backfill instances are generated based on the data timestamps and run in parallel. Data backfill instances with different data timestamps are run at the same time.

    Note

    If you backfill data of a specific day for a node scheduled by hour or minute, whether instances including those scheduled to run on that day for the node and the data backfill instances for the node are run in parallel depends on whether you configure the self-dependency for the node.

Number of data backfill instances run in parallel

The number of data backfill instances that are generated and run in parallel during data backfill.

Note

You must configure the number of data backfill instances that are run in parallel if you set Concurrency to Yes.

You can set the number of data backfill instances that are run in parallel to an integer from 2 to 10. The following rules apply when multiple data backfill instances are run in parallel:

  • If the number of data timestamps is less than the number of data backfill instances that are run in parallel, the data backfill instances are run in parallel. For example, the data timestamps are from January 11 to January 13, and you set the number of data backfill instances that are run in parallel to 4. In this case, a data backfill instance is generated for each of the three data timestamps. The three data backfill instances are run in parallel.

  • If the number of data timestamps is greater than the number of data backfill instances that are run in parallel, the system runs some data backfill instances in sequence and the other data backfill instances in parallel based on the data timestamps. For example, the data timestamps are from January 11 to January 13, and you set the number of data backfill instances that are run in parallel to 2. In this case, two data backfill instances are generated and run in parallel for once. One of the data backfill instances has two data timestamps and is separately run for the second time.

Alert for Data Backfill

Specifies whether to enable the alerting feature for data backfill.

  • Is: An alert is generated for data backfill if the trigger condition is met.

  • No: The alerting feature is disabled for data backfill.

Trigger Condition

The trigger condition of an alert for data backfill. Valid values:

  • Alert on Failure or Success: An alert is generated when data backfill succeeds or fails.

  • Alert on Success: An alert is generated when data backfill succeeds.

  • Alert on Failure: An alert is generated when data backfill fails.

Note

This parameter is required only if you select Is for the Alert for Data Backfill parameter.

Alert Notification Method

The notification method for an alert. The alert recipient must be the initiator for data backfill. Valid values: Text Message and Email, Text Message, Email.

Note
  • This parameter is required only if you select Is for the Alert for Data Backfill parameter.

  • You can click Inspection contact information to check whether the mobile phone number or email address of the alert recipient is registered. If not, you can refer to Configure and view alert contacts to configure an alert recipient.

Order

The sequence based on which data backfill instances are run. Valid values: Ascending by Business Date and Descending by Business Date.

Resource Group for Scheduling

Specifies whether to select another resource group for scheduling to run a data backfill instance. If you use another resource group for scheduling to run a data backfill instance, the data backfill instance does not need to compete for resources with auto triggered node instances.

  • If you set this parameter to Yes, a drop-down list appears, and you can select a resource group for scheduling to run the data backfill instance.

  • If you set this parameter to No, the resource group for scheduling that is configured for the current node is used to run the data backfill instance.

Execution Period

The period of time during which a data backfill instance is run.

  • If you set this parameter to Yes, a time picker appears and you can select a cycle based on which a data backfill instance is run and a specific point in time to start to run the data backfill instance.

  • If you set this parameter to No, the data backfill instance is immediately run in most cases. If you set Data Timestamp for the data backfill instance to the current date or a date later than the current date and you do not select Run Retroactive Instances Scheduled to Run after the Current Time, the data backfill instance is run as scheduled.

Backfill data for the current node and its descendant nodes

Backfill data for the current node and its descendant nodesThe following table describes the parameters required for this mode.

Parameter

Description

Data Backfill Instance Name

DataWorks automatically generates a data backfill instance name. You can modify the name based on your business requirements.

Data Timestamp

The data timestamp of the data backfill instance. A data timestamp specifies a specific date.

  • If you want to backfill data for the node for multiple non-consecutive time ranges, click Add Multiple Data Timestamp Ranges to specify multiple data timestamps.

  • If the data timestamp that you specify for a data backfill instance is later than the current date, you can select Run Retroactive Instances Scheduled to Run after the Current Time. The system immediately runs the data backfill instance after the data timestamp passes.

    For example, if the current date is August 24, 2021, the data timestamp of a data backfill instance is September 17, 2021, and you select Run Retroactive Instances Scheduled to Run after the Current Time, the system runs the data backfill instance on September 18, 2021.

Note

We recommend that you do not set this parameter to a long time range. Otherwise, data backfill instances may be delayed due to insufficient resources.

Concurrency

Specifies whether to run multiple data backfill instances in parallel.

  • If you set Concurrency to No, the data backfill instances are run in sequence based on the data timestamps.

  • If you set Concurrency to Yes, a specified number of data backfill instances are generated based on the data timestamps and run in parallel. Data backfill instances with different data timestamps are run at the same time.

    Note

    If you backfill data of a specific day for a node scheduled by hour or minute, whether instances including those scheduled to run on that day for the node and the data backfill instances for the node are run in parallel depends on whether you configure the self-dependency for the node.

Number of data backfill instances run in parallel

The number of data backfill instances that are generated and run in parallel during data backfill.

Note

You must configure the number of data backfill instances that are run in parallel if you set Concurrency to Yes.

You can set the number of data backfill instances that are run in parallel to an integer from 2 to 10. The following rules apply when multiple data backfill instances are run in parallel:

  • If the number of data timestamps is less than the number of data backfill instances that are run in parallel, the data backfill instances are run in parallel. For example, the data timestamps are from January 11 to January 13, and you set the number of data backfill instances that are run in parallel to 4. In this case, a data backfill instance is generated for each of the three data timestamps. The three data backfill instances are run in parallel.

  • If the number of data timestamps is greater than the number of data backfill instances that are run in parallel, the system runs some data backfill instances in sequence and the other data backfill instances in parallel based on the data timestamps. For example, the data timestamps are from January 11 to January 13, and you set the number of data backfill instances that are run in parallel to 2. In this case, two data backfill instances are generated and run in parallel for once. One of the data backfill instances has two data timestamps and is separately run for the second time.

Alert for Data Backfill

Specifies whether to enable the alerting feature for data backfill.

  • Is: An alert is generated for data backfill if the trigger condition is met.

  • No: The alerting feature is disabled for data backfill.

Trigger Condition

The trigger condition of an alert for data backfill. Valid values:

  • Alert on Failure or Success: An alert is generated when data backfill succeeds or fails.

  • Alert on Success: An alert is generated when data backfill succeeds.

  • Alert on Failure: An alert is generated when data backfill fails.

Note

This parameter is required only if you select Is for the Alert for Data Backfill parameter.

Alert Notification Method

The notification method for an alert. The alert recipient must be the initiator for data backfill. Valid values: Text Message and Email, Text Message, Email.

Note
  • This parameter is required only if you select Is for the Alert for Data Backfill parameter.

  • You can click Inspection contact information to check whether the mobile phone number or email address of the alert recipient is registered. If not, you can refer to Configure and view alert contacts to configure an alert recipient.

Order

The sequence based on which data backfill instances are run. Valid values: Ascending by Business Date and Descending by Business Date.

Resource Group for Scheduling

Specifies whether to select another resource group for scheduling to run a data backfill instance. If you use another resource group for scheduling to run a data backfill instance, the data backfill instance does not need to compete for resources with auto triggered node instances.

  • If you set this parameter to Yes, a drop-down list appears, and you can select a resource group for scheduling to run the data backfill instance.

  • If you set this parameter to No, the resource group for scheduling that is configured for the current node is used to run the data backfill instance.

Execution Period

The period of time during which a data backfill instance is run.

  • If you set this parameter to Yes, a time picker appears and you can select a cycle based on which a data backfill instance is run and a specific point in time to start to run the data backfill instance.

  • If you set this parameter to No, the data backfill instance is immediately run in most cases. If you set Data Timestamp for the data backfill instance to the current date or a date later than the current date and you do not select Run Retroactive Instances Scheduled to Run after the Current Time, the data backfill instance is run as scheduled.

Nodes

You can filter nodes by name and level and select the nodes for which you want to backfill data.

Note
  • A fuzzy search is supported when you search for the desired node by node name. After you enter a keyword, all nodes whose names contain the keyword are displayed in the table below the search box.

  • The search scope includes the current node and its descendant nodes of all levels. You can select the current node and some or all of its descendant nodes.

Backfill data for a large number of nodes

Backfill data for a large number of nodesThe following table describes the parameters required for this mode.

Parameter

Description

Data Backfill Instance Name

DataWorks automatically generates a data backfill instance name. You can modify the name based on your business requirements.

Data Timestamp

The data timestamp of the data backfill instance. A data timestamp specifies a specific date.

  • If you want to backfill data for the node for multiple non-consecutive time ranges, click Add Multiple Data Timestamp Ranges to specify multiple data timestamps.

  • If the data timestamp that you specify for a data backfill instance is later than the current date, you can select Run Retroactive Instances Scheduled to Run after the Current Time. The system immediately runs the data backfill instance after the data timestamp passes.

    For example, if the current date is August 24, 2021, the data timestamp of a data backfill instance is September 17, 2021, and you select Run Retroactive Instances Scheduled to Run after the Current Time, the system runs the data backfill instance on September 18, 2021.

Note

We recommend that you do not set this parameter to a long time range. Otherwise, data backfill instances may be delayed due to insufficient resources.

Alert for Data Backfill

Specifies whether to enable the alerting feature for data backfill.

  • Is: An alert is generated for data backfill if the trigger condition is met.

  • No: The alerting feature is disabled for data backfill.

Trigger Condition

The trigger condition of an alert for data backfill. Valid values:

  • Alert on Failure or Success: An alert is generated when data backfill succeeds or fails.

  • Alert on Success: An alert is generated when data backfill succeeds.

  • Alert on Failure: An alert is generated when data backfill fails.

Note

This parameter is required only if you select Is for the Alert for Data Backfill parameter.

Alert Notification Method

The notification method for an alert. The alert recipient must be the initiator for data backfill. Valid values: Text Message and Email, Text Message, Email.

Note
  • This parameter is required only if you select Is for the Alert for Data Backfill parameter.

  • You can click Inspection contact information to check whether the mobile phone number or email address of the alert recipient is registered. If not, you can refer to Configure and view alert contacts to configure an alert recipient.

Order

The sequence based on which data backfill instances are run. Valid values: Ascending by Business Date and Descending by Business Date.

Resource Group for Scheduling

Specifies whether to select another resource group for scheduling to run a data backfill instance. If you use another resource group for scheduling to run a data backfill instance, the data backfill instance does not need to compete for resources with auto triggered node instances.

  • If you set this parameter to Yes, a drop-down list appears, and you can select a resource group for scheduling to run the data backfill instance.

  • If you set this parameter to No, the resource group for scheduling that is configured for the current node is used to run the data backfill instance.

Execution Period

The period of time during which a data backfill instance is run.

  • If you set this parameter to Yes, a time picker appears and you can select a cycle based on which a data backfill instance is run and a specific point in time to start to run the data backfill instance.

  • If you set this parameter to No, the data backfill instance is immediately run in most cases. If you set Data Timestamp for the data backfill instance to the current date or a date later than the current date and you do not select Run Retroactive Instances Scheduled to Run after the Current Time, the data backfill instance is run as scheduled.

Select Nodes Requiring Data Backfill by Workspace

You can select workspaces in the Available Workspaces section and add them to the Selected Workspaces section. This way, you can backfill data for the desired nodes in the selected workspaces.

Note
  • A fuzzy search is supported when you search for the desired workspace by keyword. After you enter a keyword, all workspaces whose names contain the keyword are displayed in both sections.

  • You can select only workspaces that reside in the current region.

  • You can configure a whitelist to backfill data for some nodes that are not in the selected workspaces. You can also configure a blacklist to avoid backfilling data for some nodes that are in the selected workspaces.

  • You can specify whether to backfill data for the current node.

    • If you select Current Node, the system backfills data for the current node and its descendant nodes.

    • If you clear Current Node, the current node performs a dry run, and the system backfills data for the descendant nodes of the current node.

    For information about dry-run instances, see Dry-run instances.

Node Whitelist

You can select the nodes that are not in the selected workspaces to backfill data for the nodes.

Note

You can search for nodes only by node ID.

Node Blacklist

You can select the nodes for which you do not want to backfill data in the selected workspaces.

Note

You can search for nodes only by node ID.

Backfill data in advanced mode

In advanced mode, you can use the node aggregation feature provided by the DAG of an auto triggered node to group nodes by condition such as node type or owner. You can backfill data for nodes that have no dependencies with each other. Backfill data in advanced modeTo backfill data in advanced mode, perform the following steps:

  1. Select the nodes for which you want to backfill data.

    • In the DAG of an auto triggered node, you can click the Not Aggregate, Aggregate By Workspace, Aggregate By Owner, or Aggregate By Priority icon in the area marked with 1 to use the node aggregation feature. This way, you can group nodes by workspace, owner, or priority. You can select the check box in the upper-right corner of a group to select all the nodes in the group in the area marked with 2. For more information about the node aggregation feature of a DAG, see Appendix: Use the features provided in a DAG.

    • You can also select nodes in the node list on the Cycle Task page. You can search for the desired nodes based on different conditions such as the node name, node type, owner, and resource group for scheduling in the area marked with 3. You can select the auto triggered nodes for which you want to backfill data in the area marked with 4 and click Add in the lower part of the page.

      Note

      This way, the system generates data backfill instances for all the selected auto triggered nodes. If you want to generate data backfill instances for a specific auto triggered node, click the name of the node in the node list to open the DAG of the node. In the DAG, right-click the node and select a data backfill mode to backfill data for the node based on your business requirements.

  2. View the selected nodes.

    After the nodes for which you want to backfill data are selected, you can view the selected nodes in the Run dialog box in the area marked with 5. You can also perform the following operations:

    • Click the Locate icon next to the name of a node to open the DAG of the node. You can re-select the nodes for which you want to backfill data.

    • Click the Delete icon next to the name of a node to remove the node.

  3. In the Run dialog box in the area marked with 5, click Configure to configure the parameters for data backfill. Advanced modeThe following table describes the parameters required for this mode.

    Parameter

    Description

    Data Backfill Instance Name

    DataWorks automatically generates a data backfill instance name. You can modify the name based on your business requirements.

    Selected Nodes

    The number of nodes for which you want to backfill data. You can click Change to change the nodes for which you want to backfill data.

    Data Timestamp

    The data timestamp of the data backfill instance. A data timestamp specifies a specific date.

    • If you want to backfill data for the node for multiple non-consecutive time ranges, click Add Multiple Data Timestamp Ranges to specify multiple data timestamps.

    • If the data timestamp that you specify for a data backfill instance is later than the current date, you can select Run Retroactive Instances Scheduled to Run after the Current Time. The system immediately runs the data backfill instance after the data timestamp passes.

      For example, if the current date is August 24, 2021, the data timestamp of a data backfill instance is September 17, 2021, and you select Run Retroactive Instances Scheduled to Run after the Current Time, the system runs the data backfill instance on September 18, 2021.

    Note

    We recommend that you do not set this parameter to a long time range. Otherwise, data backfill instances may be delayed due to insufficient resources.

    Concurrency

    Specifies whether to run multiple data backfill instances in parallel.

    • If you set Concurrency to No, the data backfill instances are run in sequence based on the data timestamps.

    • If you set Concurrency to Yes, a specified number of data backfill instances are generated based on the data timestamps and run in parallel. Data backfill instances with different data timestamps are run at the same time.

      Note

      If you backfill data of a specific day for a node scheduled by hour or minute, whether instances including those scheduled to run on that day for the node and the data backfill instances for the node are run in parallel depends on whether you configure the self-dependency for the node.

    Number of data backfill instances run in parallel

    The number of data backfill instances that are generated and run in parallel during data backfill.

    Note

    You must configure the number of data backfill instances that are run in parallel if you set Concurrency to Yes.

    You can set the number of data backfill instances that are run in parallel to an integer from 2 to 10. The following rules apply when multiple data backfill instances are run in parallel:

    • If the number of data timestamps is less than the number of data backfill instances that are run in parallel, the data backfill instances are run in parallel. For example, the data timestamps are from January 11 to January 13, and you set the number of data backfill instances that are run in parallel to 4. In this case, a data backfill instance is generated for each of the three data timestamps. The three data backfill instances are run in parallel.

    • If the number of data timestamps is greater than the number of data backfill instances that are run in parallel, the system runs some data backfill instances in sequence and the other data backfill instances in parallel based on the data timestamps. For example, the data timestamps are from January 11 to January 13, and you set the number of data backfill instances that are run in parallel to 2. In this case, two data backfill instances are generated and run in parallel for once. One of the data backfill instances has two data timestamps and is separately run for the second time.

    Alert for Data Backfill

    Specifies whether to enable the alerting feature for data backfill.

    • Is: An alert is generated for data backfill if the trigger condition is met.

    • No: The alerting feature is disabled for data backfill.

    Trigger Condition

    The trigger condition of an alert for data backfill. Valid values:

    • Alert on Failure or Success: An alert is generated when data backfill succeeds or fails.

    • Alert on Success: An alert is generated when data backfill succeeds.

    • Alert on Failure: An alert is generated when data backfill fails.

    Note

    This parameter is required only if you select Is for the Alert for Data Backfill parameter.

    Alert Notification Method

    The notification method for an alert. The alert recipient must be the initiator for data backfill. Valid values: Text Message and Email, Text Message, Email.

    Note
    • This parameter is required only if you select Is for the Alert for Data Backfill parameter.

    • You can click Inspection contact information to check whether the mobile phone number or email address of the alert recipient is registered. If not, you can refer to Configure and view alert contacts to configure an alert recipient.

    Order

    The sequence based on which data backfill instances are run. Valid values: Ascending by Business Date and Descending by Business Date.

    Resource Group for Scheduling

    Specifies whether to select another resource group for scheduling to run a data backfill instance. If you use another resource group for scheduling to run a data backfill instance, the data backfill instance does not need to compete for resources with auto triggered node instances.

    • If you set this parameter to Yes, a drop-down list appears, and you can select a resource group for scheduling to run the data backfill instance.

    • If you set this parameter to No, the resource group for scheduling that is configured for the current node is used to run the data backfill instance.

    Execution Period

    The period of time during which a data backfill instance is run.

    • If you set this parameter to Yes, a time picker appears and you can select a cycle based on which a data backfill instance is run and a specific point in time to start to run the data backfill instance.

    • If you set this parameter to No, the data backfill instance is immediately run in most cases. If you set Data Timestamp for the data backfill instance to the current date or a date later than the current date and you do not select Run Retroactive Instances Scheduled to Run after the Current Time, the data backfill instance is run as scheduled.

Manage data backfill instances

After you configure the preceding settings, data backfill instances are generated. Then, you can view the details and status of a data backfill instance, and stop or rerun a data backfill instance on the Patch Data page in Operation Center. Manage data backfill instances

Area

Description

1

In this area, you can specify filter conditions to search for a data backfill instance. You can also terminate multiple running data backfill instances at a time.

For example, you can search for a data backfill instance by node name, node ID, or one or more of the following conditions: Retroactive Instance Name, Created By, Creation Date, Status, Data Timestamp, My Nodes, and Initiated by Me.

Note
  • You can click Show Search Options if you want to specify more filter conditions such as Node Type, Scheduling Resource Group, and Engine Instance.

  • A fuzzy search is supported when you search for the desired node by node name. After you enter a keyword, all nodes whose names contain the keyword are displayed.

2

In this area, you can view the following information about a data backfill instance:

  • Node Name: the name of the data backfill instance. Click the Show icon before the name of the data backfill instance and view the information about the instance in the area marked with 3, such as the date when the data backfill instance is run and details about the nodes for which the instance is generated.

  • Check Status: the check status of the data backfill instance.

  • Running status: the status of the data backfill instance. The data backfill instance can be in the state of running, not running, waiting for resources, exception, or stopped.

  • Created By: the Alibaba Cloud account within which the data backfill instance is generated.

  • Creation Date: the date when the data backfill instance is generated.

  • Nodes: the number of nodes for which the data backfill instance is generated.

  • Data Timestamp: the date when the data backfill instance is run.

In this area, you can also perform the following operations on data backfill instances:

  • Batch Terminate Data Backfill Nodes: You can click this button to terminate multiple data backfill instances at a time. In the dialog box that appears after you click this button, you can select the data backfill instances that are running or waiting to be run and terminate the instances at a time. After you perform the batch terminate operation, the status of the related instances is set to failed.

    Note
    • Data backfill instances cannot be manually deleted. The system deletes data backfill instances after their validity period elapses. The validity period of data backfill instances is approximately 30 days. If you do not need to use a data backfill instance, you can freeze it.

    • You cannot stop data backfill instances that failed, are not running, or are successfully run.

  • Batch Rerun: You can rerun multiple data backfill instances at a time.

    Note

    You can rerun only failed data backfill instances at a time.

  • Reuse: You can reuse a data backfill instance. This way, you can quickly determine the nodes for which you want to backfill data.

    Note

    Data backfill instances that are generated for nodes whose data is backfilled in Backfill Data for Massive Nodes mode cannot be reused.

3

In this area, you can view the following information about each node for which the data backfill instance is generated:

  • Name: the name of the node for which the data backfill instance is generated. You can click the node name to open the DAG of the node and view the details about the node.

  • Workspace: the workspace to which the node belongs.

  • Owner: the owner of the workspace to which the node belongs.

  • Schedule: the scheduling time of the node.

  • Start run time: the time when the node starts to run.

  • End Time: the time when the node finishes running.

  • Runtime: the time consumed to run the node.

In this area, you can also perform the following operations on a node:

  • Stop: If the node is running or waiting to be run, you can stop the node. Then, the status of the node is set to failed.

    Note

    You cannot stop nodes that failed, are not running, or are successfully run.

  • Rerun: You can rerun the node.

    Note

    You can rerun only nodes that failed or are successfully run.

  • Rerun Descendant Nodes: You can rerun the descendant nodes of the node.

  • Set Status to Successful: You can set the status of the node to successful.

  • Freeze: You can freeze the node to pause the scheduling of the node.

    Note

    You cannot freeze a node that is in one of the following states: waiting for resources, waiting for the scheduling time, and running. If the code of the node is being executed or data quality of the node is being checked, the status of the node can be considered running.

  • Unfreeze: If the node is frozen, you can unfreeze the node to resume the scheduling of the node.

  • View Lineage: You can view the lineage of the node.

4

You can select multiple nodes in the area marked with 3 and click Stop or Rerun in the area marked with 4 to stop or rerun the selected nodes at a time.

Instance status

No.

Status

Icon

1

Succeeded

1

2

Not Running

2

3

Run failed

3

4

Running

4

5

Waiting time

5

6

Freeze

6

FAQ