This topic provides answers to some frequently asked questions about the Data backfill.

Feature of generating retroactive data for nodes

DataWorks allows you to generate retroactive data for nodes for a specified time range in the past or the future. The scheduling parameters of the nodes are automatically replaced with specific values based on the data timestamps that you specify for retroactive data generation. The following figure shows how to write incremental data from a MySQL database to a specified time partition in MaxCompute. Incremental data synchronization

Why do the retroactive instances of a node that is scheduled by hour or minute not run in parallel after I enable the parallelism feature for the node?

The parallelism feature allows you to run multiple retroactive instances of a daily scheduled node in parallel to generate retroactive data for a number of days based on the data timestamp. However, if a node is scheduled by hour or minute, whether all the retroactive instances that are generated for the node on a day can be run in parallel is not controlled by the parallelism feature. Instead, the retroactive instances that are generated for the node on a day can be run in parallel only if you do not configure the node to depend on its instance in the last cycle. For more information, see Scenario 2: Configure scheduling dependencies for a node that depends on last-cycle instances.

  1. If you disable the parallelism feature, one retroactive instance is run multiple times in sequence based on the data timestamp.

    In other words, the retroactive instance can be run again only after the retroactive data is generated for the last cycle.

  2. If you enable the parallelism feature, you can set the Number of Concurrent nodes parameter to a value that is allowed by the resource groups as needed. In this case, multiple retroactive instances are generated.

    The retroactive instances are run in parallel based on the data timestamp.

Scenario: You want to generate retroactive data for one week for a node that is scheduled by hour or minute.
  • If you configure the node to depend on its instance in the last cycle, one retroactive instance is run multiple times in sequence on each day based on the data timestamp.
  • If you do not configure the node to depend on its instance in the last cycle, multiple retroactive instances are run in parallel on each day based on the data timestamp.

The retroactive instances of a node are not run after I specify the data timestamp for retroactive data generation. The retroactive instances are in the Pending (Schedule) state and are highlighted in yellow in the DAG. Why does this happen?

When you generate retroactive data for a node, if you set the Data Timestamp parameter to a future time range that is later than the current time, the retroactive instances of the node are in the Pending (Schedule) state. You can specify whether to immediately run the retroactive instances. Scheduled runtime for concurrent instancesSpecify whether to select Run Retroactive Instances Scheduled to Run after the Current Time based on your business requirements:
  • If you set the Data Timestamp parameter to a future time range and do not select this parameter, the retroactive instances are in the Pending (Schedule) state and are highlighted in yellow in the directed acyclic graph (DAG).
  • If you set the Data Timestamp parameter to a future time range and select this parameter, the retroactive instances are immediately run.

Why is a retroactive instance of an auto triggered node in the Pending (Schedule) state after I specify the last day and the current day for the Data Timestamp parameter?

DataWorks runs an auto triggered node on the current day based on the data whose data timestamp is of the last day. The process of generating retroactive data for the last day for an auto triggered node is the same as that of running the auto triggered node on the current day.
Note To query the instance that is generated by the auto triggered node for the current day, set the Data Timestamp parameter to T1 on the Cycle Instance page. The data timestamp of the instance is of the last day, and the scheduled runtime of the instance is of the current day.

Why are multiple retroactive instances generated for a node if I set the data timestamp to 00:00:00 to 01:00:00?

The number of retroactive instances that are generated for a node depends on the scheduled runtime that you specify for the node.

  • Scenario 1: You configure a node to be scheduled by hour from 00:00:00 to 23:59:00. If you set the data timestamp to 00:00:00 to 01:00:00, two retroactive instances are generated and scheduled at 00:00:00 and 01:00:00.
  • Scenario 2: You configure a node to be scheduled every 30 minutes from 00:00:00 to 23:59:00. If you set the data timestamp to 00:00:00 to 01:00:00, three instances are generated and scheduled at 00:00:00, 00:30:00, and 01:00:00.

If a large number of retroactive instances are generated for a node, the retroactive instances are in the Pending (Resources) state and are highlighted in yellow in the DAG. Why does this happen?

The maximum number of concurrent instances is limited for a resource group for scheduling. If the number of concurrent instances of a node exceeds the upper limit of the resource group for scheduling, the instances are in the Pending (Resources) state. For more information about how to troubleshoot this issue, see Pending (Resources).

Why do I receive the error message which indicates that the scheduled runtime of a node is not within the specified data timestamp range?

You must specify a time range for a node that is scheduled by hour or minute. Otherwise, retroactive instances cannot be generated for the node.

Why cannot retroactive instances be generated for a node after I enable retroactive data generation for the node?

Retroactive instances can be generated for nodes whose scheduled runtime is within the specified data timestamp range. Make sure that the scheduled runtime of the node meets this requirement. Properties