You can generate retroactive data for an auto triggered node to run the node in a specified date range in DataWorks. You can stop, rerun and unfreeze the generated retroactive instances on the Patch Data page. This topic describes how to generate retroactive data and manage retroactive instances for an auto triggered node.

Background information

After an auto triggered node is developed, committed, and deployed to the scheduling system, the scheduling system runs the node as scheduled. If you want to run the auto triggered node in a specified date range, you can generate retroactive data for the node. For more information, see Generate retroactive data. You can select the following modes to generate retroactive data:
  • Current Node Retroactively: This mode is used to generate retroactive data for the current node.
  • Current and Descendent Nodes Retroactively: This mode is used to generate retroactive data for the current and descendant nodes at a time. We recommend that you use this mode when the number of the descendant nodes is small. In this mode, you can generate retroactive data for some of the descendant nodes.
  • Mass Nodes Retroactively: This mode is used to generate retroactive data for the current and descendant nodes at a time. We recommend that you use this mode when the number of the descendant nodes is large. In this mode, you can filter descendant nodes by workspace. You can set a whitelist to generate retroactive instances for the nodes that are not in the selected workspaces. You can also set a blacklist to prevent the generation of retroactive instances for the nodes that are included in the selected workspaces.
  • Advanced Mode: This mode is used to generate retroactive data for multiple nodes at a time. You can select nodes that may not have dependencies with each other. You can select nodes for which you want to generate retroactive data in the Directed acyclic graph (DAG) or in the node list on the Cycle Task page.
    • In the DAG, you can use the node aggregation feature to group nodes by workspace, owner, or priority. This way, you can generate retroactive data for the node group.
    • You can also select nodes in the node list on the Cycle Task page. You can filter nodes based on specific conditions and select the nodes for which you want to generate retroactive data.

Limits

  • You can use the advanced mode only in workspaces in the China (Shenzhen) and UAE (Dubai) regions.
  • You can stop multiple retroactive instances at a time, but you cannot delete them at a time. A retroactive instance is automatically deleted about 30 days after it expires.

Considerations

  • When DataWorks generates retroactive data for a node for a specific time range, if one instance of the node fails on a day within the time range, the retroactive instance for that day is also set to failed. DataWorks will not run the instances of this node for the next day. To sum up, DataWorks runs the instances of a node on a day only when all its instances on the previous day are successful.
  • For a self-dependent auto triggered node, if the first instance for which retroactive data needs to be generated has a last-cycle instance that is not run on the previous day, the retroactive instance cannot be run. If the first instance for which retroactive data needs to be generated does not have a last-cycle instance on the previous day, the retroactive instance is directly run.
  • If both an auto triggered node instance and a retroactive instance are running for a node, you must stop the retroactive instance to ensure that the auto triggered node instance can be run as expected.
  • A large number of retroactive instances or concurrent instances may lead to insufficient resources for the recurring schedule. Make sure that the number of instances is appropriate based on your business requirements.

Generate retroactive data

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. On the DataStudio page, click the Cycle Task icon in the upper-left corner and choose All Products > Operation Center.
  3. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Task.
  4. Generate retroactive data for nodes that you want.
    1. Click the name of an auto triggered node in the node list to open the DAG.
      You can also click the Show icon to show the node list. Find the required node and click DAG in the Actions column to open the DAG.
    2. Right-click the node in the DAG. In the shortcut menu that appears, move the pointer over Run and select a mode for generating retroactive data. In the dialog box that appears, the parameters.
    Note You can also perform this step on the Cycle Task page. Click the Show icon to show the auto triggered node list. Find the required auto triggered node, click Patch Data in the Actions column, and then select a mode for generating retroactive data.
    Modes for generating retroactive dataThe following tables describe the parameters that you must set when you select different modes for generating retroactive data:
    • Generate retroactive data in Current Node Retroactively mode.
      Generate retroactive data for the current nodeThe following table describes the parameters.
      Parameter Description
      Retroactive Instance Name

      DataWorks automatically generates a retroactive instance name. You can change the name as needed.

      Node

      The name of the node for which you want to generate retroactive data.

      Data Timestamp
      The data timestamp range of the retroactive instances. A data timestamp is a date-based timestamp.
      • If you want to generate retroactive data for the node in multiple non-consecutive date ranges, click Add multi-segment business date.
      • If the start date of the timestamp is later than the current date, you can select Run Retroactive Instances Scheduled to Run after the Current Time. When the start date passes, the system automatically runs the retroactive instance.

        For example, if the current date is August 24, 2021 and the start date of the timestamp is September 17, 2021, the system runs the retroactive instance on September 18, 2021.

      Note We recommend that you do not set this parameter to a long range. Otherwise, the retroactive instance may be delayed due to insufficient resources.
      Parallelism
      Specifies whether to run multiple retroactive instances in parallel.
      • If you do not select Parallelism, the retroactive instances are run in sequence based on the data timestamps.
      • If you select Parallelism, a specific number of retroactive instances are generated based on the data timestamps and run in parallel. The number of retroactive instances is specified by the Number of Concurrent Nodes parameter. Instances with different data timestamps can be run at the same time.
      Number of Concurrent Nodes
      The number of retroactive instances that are created and run in parallel during the generation of retroactive data.
      Note This parameter is required if Parallelism is selected.
      You can set the Number of Concurrent Nodes parameter to an integer from 2 to 10. The following rules are applied when multiple retroactive instances are run in parallel:
      • If the number of data timestamps is smaller than the number of parallel instances, the retroactive instances are run in parallel. For example, the data timestamps are from January 11 to January 13, and you set the Number of Concurrent Nodes parameter to 4. In this case, a retroactive instance is generated for each of the three data timestamps. These three retroactive instances are run in parallel.
      • If the number of data timestamps is larger than the number of parallel instances, the system runs specific instances in sequence and specific instances in parallel based on the data timestamps. For example, the data timestamps are from January 11 to January 13, and you set the Number of Concurrent Nodes parameter to 2. In this case, two retroactive instances are generated. They are run in parallel for once, and one of them must be run for the second time.
      Order

      Valid values: Ascending by Business Date and Descending by Business Date. You can generate retroactive data in an ascending or descending order of data timestamps.

    • Select Current and Descendent Nodes Retroactively.
      Generate retroactive data for the current node and its descendant nodesThe following table describes the parameters.
      Parameter Description
      Retroactive Instance Name

      DataWorks automatically generates a retroactive instance name. You can change the name as needed.

      Data Timestamp
      The data timestamp range of the retroactive instances. A data timestamp is a date-based timestamp.
      • If you want to generate retroactive data for the node in multiple non-consecutive date ranges, click Add multi-segment business date.
      • If the start date of the timestamp is later than the current date, you can select Run Retroactive Instances Scheduled to Run after the Current Time. When the start date passes, the system automatically runs the retroactive instance.

        For example, if the current date is August 24, 2021 and the start date of the timestamp is September 17, 2021, the system runs the retroactive instance on September 18, 2021.

      Note We recommend that you do not set this parameter to a long range. Otherwise, the retroactive instance may be delayed due to insufficient resources.
      Parallelism
      Specifies whether to run multiple retroactive instances in parallel.
      • If you do not select Parallelism, the retroactive instances are run in sequence based on the data timestamps.
      • If you select Parallelism, a specific number of retroactive instances are generated based on the data timestamps and run in parallel. The number of retroactive instances is specified by the Number of Concurrent Nodes parameter. Instances with different data timestamps can be run at the same time.
      Number of Concurrent Nodes
      You can set the Number of Concurrent Nodes parameter to an integer from 2 to 10. The following rules are applied when multiple retroactive instances are run in parallel:
      • If the number of data timestamps is smaller than the number of parallel instances, the retroactive instances are run in parallel. For example, the data timestamps are from January 11 to January 13, and you set the Number of Concurrent Nodes parameter to 4. In this case, a retroactive instance is generated for each of the three data timestamps. These three retroactive instances are run in parallel.
      • If the number of data timestamps is larger than the number of parallel instances, the system runs specific instances in sequence and specific instances in parallel based on the data timestamps. For example, the data timestamps are from January 11 to January 13, and you set the Number of Concurrent Nodes parameter to 2. In this case, two retroactive instances are generated. They are run in parallel for once, and one of them must be run for the second time.
      Order

      Valid values: Ascending by Business Date and Descending by Business Date. You can generate retroactive data in an ascending or descending order of data timestamps.

      Nodes You can filter nodes by name and level and select the nodes for which you want to generate retroactive data.
      Note
      • You can search nodes by name in fuzzy match mode. When you enter a keyword, all the nodes whose name contains the keyword appear in the table below the search box.
      • The search scope includes the current node and its descendant nodes of all levels. You can select the current node and some or all of its descendant nodes.
    • Select Mass Nodes Retroactively.
      Generate retroactive data for a large number of nodesThe following table describes the parameters.
      Parameter Description
      Retroactive Instance Name

      DataWorks automatically generates a retroactive instance name. You can change the name as needed.

      Data Timestamp
      The data timestamp range of the retroactive instances. A data timestamp is a date-based timestamp.
      • If you want to generate retroactive data for the node in multiple non-consecutive date ranges, click Add multi-segment business date.
      • If the start date of the timestamp is later than the current date, you can select Run Retroactive Instances Scheduled to Run after the Current Time. When the start date passes, the system automatically runs the retroactive instance.

        For example, if the current date is August 24, 2021 and the start date of the timestamp is September 17, 2021, the system runs the retroactive instance on September 18, 2021.

      Note We recommend that you do not set this parameter to a long range. Otherwise, the retroactive instance may be delayed due to insufficient resources.
      Order

      Valid values: Ascending by Business Date and Descending by Business Date. You can generate retroactive data in an ascending or descending order of data timestamps.

      Select Nodes Requiring Data Backfill by Workspace You can select workspaces in the Available Workspaces section and add them to the Selected Workspaces section. This way, you can generate retroactive data for all the nodes in the workspaces that you select.
      Note
      • You can search workspaces by name in fuzzy match mode. When you enter a keyword, the workspaces whose name contains the keyword appear in the sections.
      • You can select workspaces only in the current region.
      • You can set a whitelist to generate retroactive data for the nodes that are not in the selected workspaces. You can also set a blacklist to prevent the generation of retroactive data for the nodes that are included in the selected workspaces.
      • You can specify whether to generate retroactive data for the current node.
        • If you select Current Node, retroactive instances are generated for the current node and its descendant nodes.
        • If you clear Current Node, a dry-run instance is generated for the current node and retroactive instances are generated for its descendant nodes.
      Node Whitelist You can select the nodes that are not in the selected workspaces and generate retroactive data for the nodes.
      Note You can search for nodes only by node ID.
      Node Blacklist You can select the nodes in the selected workspaces for which you do not want to generate retroactive data.
      Note You can search for nodes only by node ID.
    • Select Advanced Mode.
      In advanced mode, you can use the node aggregation feature of the DAG to group nodes by different conditions such as the node type and owner. You can generate retroactive data for nodes that have no dependencies with each other. Generate retroactive data in advanced modeThe following steps describe how to generate retroactive data in this mode:
      1. Select the nodes for which you want to generate retroactive data.
        • In the DAG of the auto triggered node, you can use the node aggregation feature by clicking the Not Aggregate, Aggregate By Workspace, Aggregate By Owner or Aggregate By Priority icons in Section 1. This way, you group nodes by workspace, owner, or priority. You can select the check box in the upper-right corner of a group to select all the nodes in the group in Section 2.
        • You can also select nodes in the node list on the Cycle Task page. You can filter nodes based on different conditions such as the node name, node type, owner, and scheduling resource group dimension in Section 3. You can select the auto triggered node for which you want to generate retroactive data in Section 4. Click Add at the bottom of the node list.
          Note This way, the system generates retroactive data for the current node and all its descendant nodes of the auto triggered node. If you want to generate retroactive data for only some of the descendant nodes of the auto triggered node, click the name of the auto triggered node to enter the DAG and select the descendant nodes for which you want to generate retroactive data.
      2. View the selected nodes.
        After the nodes for which you want to generate retroactive data are selected, you can view the selected nodes in the Run panel in Section 5. You can also perform the following operations:
        • Click the Locate icon after the name of a node to open the DAG of the node. You can re-select the nodes for which you want to generate retroactive data.
        • Click the Delete icon after the name of a node to remove the node.
      3. In the Run panel in Section 5, click Configure to set the parameters for the generation of retroactive data. Advanced modeThe following table describes the parameters.
        Parameter Description
        Retroactive Instance Name

        DataWorks automatically generates a retroactive instance name. You can change the name as needed.

        Selected Nodes The number of nodes for which you want to generate retroactive data. You can click Change to change the nodes for which you want to generate retroactive data.
        Data Timestamp
        The data timestamp range of the retroactive instances. A data timestamp is a date-based timestamp.
        • If you want to generate retroactive data for the node in multiple non-consecutive date ranges, click Add multi-segment business date.
        • If the start date of the timestamp is later than the current date, you can select Run Retroactive Instances Scheduled to Run after the Current Time. When the start date passes, the system automatically runs the retroactive instance.

          For example, if the current date is August 24, 2021 and the start date of the timestamp is September 17, 2021, the system runs the retroactive instance on September 18, 2021.

        Note We recommend that you do not set this parameter to a long range. Otherwise, the retroactive instance may be delayed due to insufficient resources.
        Parallelism
        Specifies whether to run multiple retroactive instances in parallel.
        • If you do not select Parallelism, the retroactive instances are run in sequence based on the data timestamps.
        • If you select Parallelism, a specific number of retroactive instances are generated based on the data timestamps and run in parallel. The number of retroactive instances is specified by the Number of Concurrent Nodes parameter. Instances with different data timestamps can be run at the same time.
        Number of Concurrent Nodes
        You can set the Number of Concurrent Nodes parameter to an integer from 2 to 10. The following rules are applied when multiple retroactive instances are run in parallel:
        • If the number of data timestamps is smaller than the number of parallel instances, the retroactive instances are run in parallel. For example, the data timestamps are from January 11 to January 13, and you set the Number of Concurrent Nodes parameter to 4. In this case, a retroactive instance is generated for each of the three data timestamps. These three retroactive instances are run in parallel.
        • If the number of data timestamps is larger than the number of parallel instances, the system runs specific instances in sequence and specific instances in parallel based on the data timestamps. For example, the data timestamps are from January 11 to January 13, and you set the Number of Concurrent Nodes parameter to 2. In this case, two retroactive instances are generated. They are run in parallel for once, and one of them must be run for the second time.
        Order

        Valid values: Ascending by Business Date and Descending by Business Date. You can generate retroactive data in an ascending or descending order of data timestamps.

  5. Click OK to start to generate retroactive data.

Manage retroactive instances

In the left-side navigation pane of the Operation Center page, choose Cycle Task Maintenance > Patch Data. You can view the details and status of retroactive instances, and stop or rerun the instances. For more information about how to go to Operation Center, see the steps described in the "Generate retroactive data" section. The following table describes the operations that you can perform in different sections in the following figure. Manage retroactive instances
Section Description
1 In this section, you can specify filter conditions to search for specific retroactive instances.

You can set filter conditions such as the node name, node ID, retroactive instance name, creator, creation time, status, and data timestamp. You can also filter retroactive instances based on the nodes that you own or initiate.

Note
  • You must click Show Search Options if you want to set more filter conditions such as Node Type, Scheduling Resource Group, and Engine Instance.
  • You can search for nodes by node name in fuzzy match mode. When you enter a keyword, all the nodes whose name contains the keyword appear in the table below.
2
In this section, you can view details of retroactive instances, including:
  • Node Name: the name of the retroactive instance. Click the Show icon before the name of the retroactive instance to check information in Section 3, such as the date when the retroactive instance is run and details of the nodes for which the instance is generated.
  • Check Status: the check status of the retroactive instance.
  • Running status: the status of the retroactive instance. The retroactive instance can be in the status of running, not running, waiting for resources, exception, and stopping.
  • Created by: the Alibaba Cloud account that is used to create the retroactive instance.
  • Creation Date: the date when the retroactive instance is created.
  • Nodes: the number of nodes for which the instance is generated.
  • Data Timestamp: the date when the instance is run.
In this section, you can also perform the following operations on retroactive instances:
  • Stop: You can stop multiple retroactive instances that are running or waiting for resources at a time. This way, the instances are set to failed.
    Note
    • You can stop multiple retroactive instances at a time, but you cannot delete them at a time. A retroactive instance is automatically deleted about 30 days after it expires.
    • You cannot stop instances that are not running, succeed or fail to be run.
  • Batch Rerun: You can rerun multiple retroactive instances at a time.
    Note Only instances that are set to failed can be rerun at a time.
  • Reuse: You can reuse a group of nodes for which the retroactive instance is generated. This facilitates your selection of nodes for which you want to generate a retroactive instance.
3
In this section, you can view the details of the nodes for which the retroactive instance is generated. The details include:
  • Name: the name of the node. Click the name. Then, you can view the DAG and details of the node.
  • Owner: the owner of the workspace to which the node belongs.
  • Schedule: the time scheduled to run the node.
  • Start run time: the time when the node starts to run.
  • End time: the time when the node stops running.
  • Runtime: the amount of time consumed to run the node.
In this section, you can also perform the following operations on nodes:
  • Stop: You can stop the nodes that are running or waiting for resources. This way, the node is set to failed.
    Note You cannot stop the nodes that are not running, succeeded, or failed.
  • Rerun: You can rerun nodes.
    Note You can rerun only nodes that are successful or failed.
  • More > Rerun Descendant Nodes: Rerun the descendant nodes of a node.
  • More > Set Status to Successful: Set the status of a node to succeeded.
  • More > Freeze: Freeze a node and pause the scheduling of the node.
  • More > Unfreeze: Resume the scheduling of a frozen node.
  • More > View Lineage: View the lineage of a node.
4 You can select multiple nodes in Section 3 and click Stop or Rerun in Section 4. This way, you can stop or rerun multiple nodes at a time.

Instance states

No. State Icon
1 Succeeded 1
2 Not running 2
3 Failed 3
4 Running 4
5 Waiting for resources 5
6 Frozen 6