DataWorks allows you to backfill data for an auto triggered node to run the node in a specified time range. You can backfill data for an auto triggered node and its descendant nodes. You can view the status of the generated data backfill instances, and stop, rerun, or unfreeze these instances on the Patch Data page in Operation Center. This topic describes how to backfill data for auto triggered nodes and manage data backfill instances.
Background information
- Backfill Data for Current Node: This mode allows you to backfill data for the current node.
- Current and Descendent Nodes Retroactively: This mode allows you to backfill data for the current node and its descendant nodes at a time. If the current node has a small number of descendant nodes, we recommend that you use this mode. In this mode, you can specify the descendant nodes for which you want to backfill data.
- Backfill Data for Massive Nodes: This mode allows you to backfill data for the current node and its descendant nodes at a time. If the current node has a large number of descendant nodes, we recommend that you use this mode. In this mode, you can filter the descendant nodes by workspace. You can configure a whitelist to backfill data for some nodes that are not in the selected workspaces. You can also configure a blacklist to avoid backfilling data for some nodes that are in the selected workspaces.
- Advanced Mode: This mode allows you to backfill data for multiple nodes at a time. In this mode,
you can select nodes that do not have dependencies with each other. You can select
nodes for which you want to backfill data in the directed acyclic graph (DAG) of an
auto triggered node or in the node list on the Cycle Task page.
- In the DAG, you can use the node aggregation feature to group nodes by workspace, owner, or priority. This way, you can backfill data for multiple nodes at a time by specifying a node group. For more information about a DAG, see Manage instances in a DAG.
- You can also select nodes in the node list on the Cycle Task page. You can filter nodes based on specific conditions and select the nodes for which you want to backfill data.
Limits
- You can use the advanced mode only in workspaces that reside in the China (Shenzhen) and UAE (Dubai) regions.
- Data backfill instances cannot be manually deleted. The system deletes data backfill instances after their validity period elapses. The validity period of data backfill instances is approximately 30 days. If you do not need to use a data backfill instance, you can freeze it.
- Instances that run on the shared resource group for scheduling are retained for one month (30 days), and logs for the instances are retained for one week (7 days).
- Instances that run on exclusive resource groups for scheduling are retained for one month (30 days), and logs for the instances are also retained for one month (30 days).
- If the number of logs of an instance in the Complete state is greater than 3 MB, it will be cleaned regularly every day.
Usage notes
- When DataWorks backfills data for a node for a specific time range, if an instance generated for the node fails on a day within the time range, the status of the data backfill instance of the node for that day is also set to failed. In this case, DataWorks does not run the instances generated for this node for the next day. DataWorks runs the instances generated for a node on a day only after all instances generated for the node on the previous day are successfully run.
- When an hourly or minute task generates data for a day, whether all instances are concurrently executed on that day is related to whether the task is self-dependent. When a self-dependent task is used to generate retroactive data, the retroactive data generation task cannot be triggered if the periodic instance of the first instance in the retroactive data generation task is not running on the day before. If the first instance for which data needs to be backfilled does not depend on an instance generated on the previous day, the data backfill instance of the node is directly run.
- If both an auto triggered node instance and a data backfill instance are running for a node, you must stop the data backfill instance to ensure that the auto triggered node instance can run as expected.
- If you backfill data for multiple instances or run a large number of data backfill instances in parallel, scheduling resources may be insufficient. Make sure that your configurations are appropriate based on your business requirements.
- To prevent retroactive instances from consuming too many resources and affecting the
running of periodic instances, the platform formulates the following rules for retroactive
instances:
- if you set the data generation date to yesterday (T-1), the priority of the data generation task is determined by the baseline priority of the task.
- If you select a historical date (T-2) for retroactive data generation, the retroactive
data generation task is downgraded based on the following rules:
- The priority of level 7 and level 8 tasks is reduced to level 3.
- The priority of level 5 and level 3 tasks is reduced to level 2.
- The level 1 task priority remains unchanged.
Backfill data
Manage data backfill instances

Section | Description |
---|---|
1 | In this section, you can specify filter conditions to search for a data backfill instance.
For example, you can search for a data backfill instance by node name, node ID, or one or more of the following conditions: Retroactive Instance Name, Created By, Creation Date, Status, Data Timestamp, My Nodes, and Initiated by Me. Note
|
2 |
In this section, you can view the following information about a data backfill instance:
In this section, you can also perform the following operations on data backfill instances:
|
3 |
In this section, you can view the following information about each node for which
the data backfill instance is generated:
In this section, you can also perform the following operations on a node:
|
4 | You can select multiple nodes in Section 3 and click Stop or Rerun in Section 4 to stop or rerun the selected nodes at a time. |
Instance status
No. | Status | Icon |
---|---|---|
1 | Succeeded | ![]() |
2 | Not Running | ![]() |
3 | Run failed | ![]() |
4 | Running | ![]() |
5 | Waiting time | ![]() |
6 | Freeze | ![]() |