Retroactive instances are generated when DataWorks generates retroactive data for auto triggered nodes. You can manage retroactive instances. For example, you can view the running status of retroactive instances and stop, rerun, or unfreeze retroactive instances.

After an auto triggered node is developed, committed, and deployed to the scheduling system, the system runs the node as scheduled. If you want to perform computing on historical data in a time period, you can generate retroactive data for the node. The generated retroactive instance is run based on the specified data timestamp.

Limits

  • When DataWorks generates retroactive data for a node for a specific time range, if one instance of the node fails on a day within the time range, the retroactive instance for that day is also set to Failed. DataWorks will not run the instances of this node for the next day. To sum up, DataWorks runs the instances of a node on a day only when all its instances on the previous day are successfully run.
  • For a self-dependent auto triggered node, if the first instance for which retroactive data needs to be generated has a last-cycle instance that is not run on the previous day, the retroactive instance cannot be run. If the first instance for which retroactive data needs to be generated does not have a last-cycle instance on the previous day, the retroactive instance is directly run.
  • DataWorks generates alerts only for auto triggered node instances that fail.
  • If an auto triggered node instance is running for a node, the retroactive and test instances of the node can be run only after the running of the auto triggered node instance is complete.
  • If both an auto triggered node instance and a retroactive instance are running for a node, you must stop the retroactive instance to ensure that the auto triggered node instance can be normally run.
  • You can stop multiple retroactive instances at a time based on your business requirements. However, you cannot delete multiple retroactive instances at a time. A retroactive instance is automatically deleted about 30 days after it expires.

  • Directed acyclic graphs (DAGs) have the following limits:
    • Only users of the DataWorks Standard Edition or a more advanced edition can use the node aggregation, upstream analysis, and downstream analysis features provided by DAGs.
    • Users of the DataWorks Basic Edition or Standard Edition can use the node aggregation, upstream analysis, and downstream analysis features provided by DAGs free of charge on a trial basis as of May 31, 2021. From June 1, 2021, they must update their DataWorks to the Professional Edition before they can use these features. For more information about DataWorks editions, see DataWorks advanced editions.
    • You can use the node aggregation, upstream analysis, and downstream analysis features of the DAGs of auto triggered nodes that are deployed only in the China (Shenzhen) region.

Patch Data

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. Find the required workspace and click Data Analytics.
  4. Click the Icon icon in the upper-left corner and choose All Products > Operation Center.
  5. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Task.
  6. On the Cycle Task page, click the rightward arrow in the middle of the page to show the node list. Find the required node, click Patch Data, and then select a mode for generating retroactive data.
    You can also right-click the node in the directed acyclic graph (DAG), move the pointer over Run, and then select a mode for generating retroactive data. Patch Data

Generate retroactive data for the current node

  1. Find the required node and choose Patch Data > Current Node Retroactively.
  2. In the Patch Data dialog box, configure the parameters.
    Parameter Description
    Retroactive Instance Name DataWorks automatically generates a retroactive instance name for your node. You can modify the name.
    Data Timestamp The data timestamp of the retroactive instance.
    Node The name of the node for which you want to generate retroactive data. You cannot change the name.
    Parallelism Specifies whether to generate multiple retroactive instances at a time.
    • If you do not select Parallelism, only one retroactive instance is generated. The retroactive instance is run multiple times in sequence based on the data timestamp.
    • If you select Parallelism, you can specify no more than 10 retroactive instances to generate retroactive data at the same time for the node.
      The retroactive instances are run in parallel based on the data timestamp.
      • If the number of days in the data timestamp is less than the number of parallel groups, the retroactive instances are run in parallel. For example, the data timestamp is from January 11 to January 13, and you set Number of Concurrent Nodes to 4. In this case, three retroactive instances are generated for each day within the data timestamp and are run in parallel.
      • If the number of days in the data timestamp is greater than the number of parallel groups, some instances may be run multiple times in sequence whereas others are run in parallel. For example, the data timestamp is from January 11 to January 13, and you set Number of Concurrent Nodes to 2. In this case, two retroactive instances are generated. They are run in parallel for once, and one of them must be run for the second time.
  3. Click OK.

Generate retroactive data for the current node and its descendant nodes

  1. Find the required node and choose Patch Data > Current and Descendent Nodes Retroactively.
  2. In the Patch Data dialog box, configure the parameters.
    Parameter Description
    Retroactive Instance Name DataWorks automatically generates a retroactive instance name for your node. You can modify the name.
    Data Timestamp The data timestamp of the retroactive instance.
    Parallelism Specifies whether to generate multiple retroactive instances at a time.
    • If you do not select Parallelism, only one retroactive instance is generated.
    • If you select Parallelism, you can specify no more than 10 retroactive instances to generate retroactive data at the same time for the node.
    Nodes You can specify the Node Name and Node Type parameters to filter and select nodes for which you want to generate retroactive data.
  3. Click OK.

Generate retroactive data for a large number of nodes

  1. Find the required node and choose Patch Data > Mass Nodes Retroactively.
  2. In the Patch Data dialog box, configure the parameters. Patch Data
    Parameter Description
    Retroactive Instance Name DataWorks automatically generates a retroactive instance name for your node. You can modify the name.
    Data Timestamp The data timestamp of the retroactive instance.
    Note We recommend that you do not set this parameter to a long range. Otherwise, the retroactive instance may be delayed due to insufficient resources.
    Parallelism Specifies whether to generate multiple retroactive instances at a time.
    • If you do not select Parallelism, only one retroactive instance is generated.
    • If you select Parallelism, you can specify no more than 10 retroactive instances to generate retroactive data at the same time for the node.
    Nodes
    • If you select Current Node, retroactive instances are generated for the current node and its descendant nodes.
    • If you clear Current Node, a dry-run instance is generated for the current node and retroactive instances are generated for its descendant nodes.
    Workspaces You can select workspaces in the Available Workspaces section and add them to the Selected Workspaces section. Fuzzy match is supported when you search for workspaces in the Available Workspaces section.
    Node Whitelist You can add the nodes that are not contained in the selected workspaces for which you want to generate retroactive data.
    Note You can search for nodes only by node ID.
    Node Blacklist You can add the nodes that are contained in the selected workspaces for which you do not want to generate retroactive data.
    Note You can search for nodes only by node ID.
  3. Click OK.

Generate retroactive data for a specific node in a node group

Workflows that are created in DataWorks V1.0 are automatically converted to node groups in Operation Center of DataWorks V2.0. To generate retroactive data for a specific node in a node group, perform the following steps:
  1. In the left-side navigation pane of the Operation Center page, choose Cycle Task Maintenance > Cycle Task. On the Cycle Task page, find the required node and click DAG in the Actions column to open the DAG of the node.
  2. Right-click Node Group and select View Internal Nodes.
  3. On the page that appears, select the topmost ancestor node of the node for which you want to generate retroactive data. Then, click the Copy icon next to Node ID in the lower-right corner.
  4. Return to the Cycle Task page and enter the copied node ID to search for the node.
  5. Open the DAG of the node that is found, right-click the node, and then choose Run > Current and Descendent Nodes Retroactively.
  6. Select the specific node for which you want to generate retroactive data in the node group. Select a specific node
Note You can search for an inner node based on a node group, but not reversely.

Instances

Instances
Operation Description
Filter You can search for the required instances by specifying the filter conditions in the section marked with 1 in the preceding figure.

You can search for instances by node name or node ID. You can also specify the following conditions to search for your desired instance: Retroactive Instance Name, Node Type, Owner, Run At, Data Timestamp, Region, Engine Type, Engine Instance, Baseline, and My Nodes.

Note By default, the data timestamp is set to the previous day of the current day.
DAG Allows you to open the DAG of the current instance to view the running results of the instances.
Stop Allows you to stop the instance. You can stop an instance only in the Waiting time or Running state. After you perform this operation, the instance enters the Run failed state.
Note

You can stop multiple retroactive instances at a time based on your business requirements. However, you cannot delete multiple retroactive instances at a time. A retroactive instance is automatically deleted about 30 days after it expires.

Rerun Allows you to rerun the instance.
Rerun Descendent Nodes Allows you to rerun the descendant nodes of the current node.
Freeze Allows you to freeze the current node and pause the scheduling of the node.
Unfreeze Allows you to resume the scheduling of the frozen node.
View Lineage Allows you to view the lineage of the node.

Manage retroactive instances in a DAG

Click DAG in the Actions column that corresponds to a node to view the DAG of the node. You can perform the following operations in a DAG: DAG
  • Aggregate nodes
    If an auto triggered node has multiple ancestor and descendant nodes or the ancestor and descendant nodes are distributed at multiple levels, you can aggregate the nodes. The nodes can be aggregated from dimensions such as node status, workspace, owner, and priority. Then, you can view the number of nodes from your required dimension. This allows you to understand the number of nodes from different dimensions and helps the system run the nodes. The following figures show the node distribution when the ancestor and descendant nodes of an auto triggered node are not aggregated or are aggregated by priority.
    • The following figure shows the node distribution when the ancestor and descendant nodes of an auto triggered node are not aggregated. Ancestor and descendant nodes of an auto triggered node not aggregated
    • The following figure shows the node distribution when the ancestor and descendant nodes of an auto triggered node are aggregated by priority. From the figure, you can quickly understand that the current auto triggered node has six descendant nodes whose priorities are 1. Ancestor and descendant nodes of an auto triggered node aggregated by priority
  • Analyze ancestor nodes
    In most cases, an auto triggered node has upstream and downstream relationships. If an auto triggered node is not run for a long time, you can analyze the ancestor nodes of the node. You can view the ancestor node that blocks the running of the node in the DAG of the node, and quickly locate and troubleshoot the issue. This improves the running efficiency of the node.
    Note You can analyze the ancestor nodes of only the auto triggered nodes that are not run.
    The following figure shows how to analyze the ancestor nodes of an auto triggered node. For example, the 2_ node is not run for a long time. In this case, you can click the node and click Upstream Analysis in the upper-left corner to analyze the ancestor nodes of the node. An auto triggered node is not runThe analysis result shows that the ancestor nodes that block the running of the 2_ node are the table data synchronization and metric statistics nodes. Then, you can quickly troubleshoot the issue based on the analysis result.
  • Analyze descendant nodes
    If an auto triggered node has multiple descendant nodes or the descendant nodes of an auto triggered node are distributed at multiple levels, you can analyze the descendant nodes of the auto triggered node. You can aggregate the descendant nodes by node status, workspace, owner, or priority. Then, you can view the number of nodes at different levels from your required dimension or the total number of nodes at all levels from your required dimension.
    Note
    • By default, the descendant nodes of an auto triggered node are aggregated by owner. The system calculates the total number of nodes at all levels from the owner dimension.
    • If you analyze the descendant nodes of an auto triggered node, the analysis result is displayed by level, and a maximum of six levels of nodes can be displayed. If you want to view more levels of nodes, click Continue Analysis in the upper-left corner.
    In the following example, the descendant nodes of the tag node are analyzed. The following figures show the analysis results that are displayed by using different methods.
    • The descendant nodes of the tag node are aggregated based on the workspaces to which the descendant nodes belong, and the analysis result is presented by level. This way, the number of the descendant nodes in different workspaces is displayed at different levels. Analysis result displayed by level
    • The descendant nodes of the tag node are aggregated based on the workspaces to which the descendant nodes belong, and the analysis result is presented by using the merging method. This way, all the descendant nodes are placed at the same level, and the number of the descendant nodes that belong to different workspaces is displayed. Analysis result displayed by using the merging method
  • Select a display pattern for a DAG

    You can click the icons in the upper-right corner of a DAG panel to adjust the display pattern of the DAG based on your business requirements. For example, you can click Toggle Full Screen View or Fit Screen to perform the operation.

    In the following examples, the DAG of the 0_2 node is displayed after the descendant nodes of the 0_2 node are ungrouped or grouped:
    • The following figure shows the DAG of the 0_2 node when the descendant nodes of the 0_2 node are ungrouped. In this pattern, you can clearly view the upstream and downstream relationships of all the nodes. DAG of the 0_2 node when the descendant nodes of the 0_2 node are ungrouped
    • The following figure shows the DAG of the 0_2 node when the descendant nodes of the 0_2 node are grouped. In this pattern, every five descendant nodes of the 0_2 node are placed at the same level. This way, these descendant nodes are displayed in an orderly manner, and you can quickly obtain the total number of the descendant nodes. DAG of the 0_2 node when the descendant nodes of the 0_2 node are grouped
  • Right-click your desired node in a DAG and perform operations on the node. DAG
    Note After you click the Refresh icon in the upper-right corner, only the DAG of the instance is refreshed, but the operational logs of the instance are not.
    Operation Description
    Show Ancestor Nodes or Show Descendent Nodes If a workflow contains three or more nodes, specific nodes are automatically hidden in the DAG in Operation Center. You can select the number of levels to view all nodes at one or more levels.
    View Runtime Log Allows you to view the operational logs of the current instance if it is in the Running, Successful, or Failed state.
    View Code Allows you to view the code of the current instance.
    Edit Node Allows you to go to the DataStudio page to modify the current node.
    View Lineage Allows you to view the lineage of the current instance.
    Stop Allows you to stop the instance. You can stop an instance only in the Waiting time or Running state. After you perform this operation, the instance enters the Run failed state.
    Rerun Allows you to rerun the instance if it is in the Failed state or an abnormal state.
    Rerun Descendent Nodes Allows you to rerun all the descendant instances of the current instance. If multiple descendant instances exist, all these instances are rerun.
    Set Status to Successful Allows you to set the status of the current instance to Successful and run its pending descendant instances. Perform this operation if an instance fails.
    Note Only the status of a failed instance can be set to successful. This operation does not apply to workflows.
    Emergency Operations You can perform emergency operations only in emergencies. Emergency operations take effect only on the current node one time.

    Select Delete Dependencies to delete the dependencies of the current node. You can perform this operation to start the current node if the ancestor instances fail and the current instance does not depend on the data of the ancestor instances.

    Freeze Allows you to freeze the current instance and pause the scheduling of the instance.
    Unfreeze Allows you to resume the scheduling of the frozen instance.

Instance states

No. State Icon
1 Run successfully 1
2 Not running 2
3 Run failed 3
4 Running 4
5 Waiting time 5
6 Freeze 6