You can use the intelligent baseline feature to detect an exception that prevents a node in a baseline from being completed on time. If an exception is detected, the system sends you an alert notification about the exception at the earliest opportunity. This ensures that important data is generated as expected in scenarios in which dependencies between nodes in the baseline are complex. This also helps you reduce configuration costs, prevent invalid alerts, and automatically monitor important nodes.
Scenarios
- Manage the priorities of nodes.
In the scenario where the number of nodes is increasing, but the number of resources is limited, the nodes preempt the resources. You can create a baseline and add important nodes to the baseline. Then, you can configure a high priority for the baseline to ensure that the system preferentially allocates the resources to nodes in the baseline.
- Calculate the estimated completion time of a node.
The running of a node is affected by resource supply and the status of ancestor nodes of the node. After you add a node that is scheduled to run every day or every hour to a baseline, DataWorks can calculate and display the estimated completion time of the node on the specified day or in the specified hour.
- Ensure that a node finishes running before the committed point in time.
You can add a node to a baseline and configure a committed point in time for the baseline. If the system predicts that the node in the baseline cannot finish running before the committed point in time, an error occurs for an ancestor node of the node, or the ancestor node slows down, the system sends you an alert notification. This way, you can troubleshoot issues based on the alert at the earliest opportunity to ensure that the node can finish running before the committed point in time.
Terms
- Baseline: You can create a baseline, add important nodes to the baseline, and configure a committed point in time for the baseline. This way, the system can calculate the estimated completion time for nodes in the baseline based on the status of the nodes. If the system determines that a node in the baseline cannot finish running before the committed point in time, the system sends you an alert notification.
- Committed point in time: the committed point in time before which all nodes in a baseline finish running.
DataWorks ensures that all nodes in a data application finish running before the committed
point in time. If you want to reserve a certain amount of time for O&M personnel to
handle exceptions that occur on nodes in a baseline, you can configure an alert margin threshold for the baseline. The system uses the time obtained by
subtracting the alert margin threshold from the committed point in time
as the alert time of the baseline. The alert time is also the estimated point in time before which all nodes in the baseline finish running. - Alert time: the time that is obtained by
subtracting the alert margin threshold from the committed point in time
. - Baseline node: a node that you add to a baseline.
- Baseline instance: an instance that is generated by a node in a baseline. The system uses baseline
instances to calculate the estimated completion time for a node in the baseline each
time. The status of a baseline instance can be safe, alert, or overtime.
- A baseline instance is in the safe state if the
estimated completion time for the baseline instance is earlier than the alert time
. - A baseline instance is in the alert state if the
estimated completion time for the baseline instance is later than the alert time and earlier than the committed point in time
. - A baseline instance is in the overtime state if the
estimated completion time for the baseline instance is later than the committed point in time
.
- A baseline instance is in the safe state if the
- Key path: Multiple paths affect data generation of the current node in a baseline. The path that takes the longest time for node execution is considered the key path.
- Event: an event that is generated if an error is reported for a node in a baseline or for an ancestor node of the node in the baseline, or if a node in the key path slows down. Events prevent nodes in the baseline from being completed on time.
Description
After you add nodes to a baseline, DataWorks preferentially allocates resources to the nodes in the baseline based on the priority of the baseline to ensure data generation of the nodes. In addition, DataWorks determines a monitoring scope for the nodes in the baseline based on the dependencies of the nodes. A baseline alert or an event alert is triggered based on the status of the nodes within the monitoring scope.

- Create and manage a baseline
You can create and manage a baseline on the Baseline management tab.
- You can add important nodes to a baseline, configure basic information such as the committed point in time for the baseline, and configure alert rule parameters such as the notification method and alert contact for the baseline. The system monitors nodes and sends you alert notifications based on the baseline configurations.
- You can also configure the priority of the baseline, which determines the priority of the nodes in the baseline. The higher the priority of a baseline, the higher the priority of the nodes in the baseline. If scheduling resources are insufficient, the system preferentially allocates the scheduling resources to nodes in the baseline for which you configure a high priority.
- Determine the monitoring scope
DataWorks determines the monitoring scope based on the dependencies of nodes in a baseline. All nodes that may affect data generation of the nodes in the baseline are monitored. For more information, see Core logic: monitoring scope.
- Trigger an alert and send an alert notification
- Baseline alert
- Calculate the time
DataWorks can calculate the latest start time and latest completion time for each node in a baseline based on the committed point in time configured for the baseline and the average running duration of the node in the baseline over a historical period of time. Then, DataWorks can calculate related points in time for all nodes that are within the monitoring scope. For more information, see Core logic: baseline alert.
The computing resources required for time calculation are provided by baseline instances. Nodes in a baseline that is enabled generate instances in the baseline every day. The baseline instances can be used to calculate the points in time at which nodes within the monitoring scope are run each time. On the Baseline instance tab, you can view the list and status of baselines. For more information, see Manage baseline instances.
- Trigger an alert
An alert can be automatically triggered based on the alert rule parameters that you configure for a baseline, the calculated points in time, and the status of nodes that are within the monitoring scope. After an alert is triggered, DataWorks sends you an alert notification. If DataWorks predicts that nodes in a baseline cannot finish running before the committed point in time, DataWorks sends you an alert notification by using a notification method that you specify. For more information, see Core logic: baseline alert.
- Calculate the time
- Event alert
After a monitoring scope is determined, when an error is reported for a node in a baseline or for an ancestor node of the node in the baseline, or when a node in the key path slows down, the corresponding event is generated and DataWorks sends you an alert notification. You can view existing events on the Event management tab. For more information, see Manage events.
- Baseline alert
Billing
Number of baseline instances: Nodes in a baseline that is enabled generate instances in the baseline. You are billed based on the number of baseline instances that are generated before 00:23:59 each day. For more information, see Baseline instances.
Limits
Only users of DataWorks Standard Edition or a more advanced edition can use the intelligent baseline feature. If your DataWorks service does not meet the requirements, you must upgrade it to Standard Edition or a more advanced edition first. For more information, see Differences among DataWorks editions.
Core logic: monitoring scope
- Ancestor nodes: The ancestor nodes that affect data generation of nodes in a baseline are monitored.
- Descendant nodes: The descendant nodes of nodes in a baseline are not monitored. If an error is reported for a descendant node of a node in a baseline or for a descendant node that is in a different branch from the node in the baseline, the system does not send an alert notification.

As shown in the preceding figure, nodes A, B, C, D, E, and F are created in DataWorks. Nodes D and E are nodes in a baseline. The ancestor nodes that affect data generation of nodes D and E are nodes A and B. Therefore, nodes A, B, D, and E are within the monitoring scope. If an exception occurs on a node within the monitoring scope or if a node within the monitoring scope slows down, the system can detect the issue. However, nodes C and F are not within the monitoring scope of the intelligent baseline.
Core logic: baseline alert
- DataWorks uses the time obtained by
subtracting the alert margin threshold from the committed point in time
as the alert time of the baseline. DataWorks uses baseline instances to calculate the latest completion time and latest start time for each node within the monitoring scope based on the alert time and the average running durations of nodes within the monitoring scope over a historical period of time. - If DataWorks predicts that the status of a node within the monitoring scope may prevent nodes in a baseline from being completed before the alert time, DataWorks sends you an alert notification.
Time calculation
- Calculate the estimated completion time for a baseline
The estimated completion time for a baseline is calculated based on the average completion time for each node in the baseline over a historical period of time. If you add multiple nodes to a baseline, the estimated completion time for the baseline is determined by the completion time for the node whose average completion time is the latest in the baseline.
Calculation results:
Estimated completion time for Baseline A = Average completion time for Node D over a historical period of time = 5:00
Estimated completion time for Baseline B = Average completion time for Node E over a historical period of time = 6:00
Note Baseline B contains nodes E and F. The average completion time for Node E is later than the average completion time for Node F. Therefore, Node E is the node that affects the latest completion time of Baseline B, and the estimated completion time for Baseline B is determined by the average completion time for Node E.
- Estimate the latest completion time and latest start time for each node within the
monitoring scope
Calculation scenario example | Calculation formula | Calculation result example |
---|---|---|
Calculate the latest start time for a node in a baseline. | The latest start time for a node in a baseline = Estimated completion time for the baseline - Average running duration of the node in the baseline |
|
Calculate the latest start time and latest completion time for an ancestor node of a node in a baseline. |
|
|
Calculate the latest completion time for a node that is the ancestor node of nodes in different baselines. | The latest completion time for a node that is the ancestor node of nodes in different baselines = The earliest start time for nodes that depend on the ancestor node | 4:00 is earlier than 4:10. Only when you configure the latest completion time for Node B to 4:00, the conditions for running nodes in baselines A and B are met. You can also calculate the latest start time for Node B by using the following formula: The latest start time for Node B = The latest completion time for Node B (4:00) - Average running duration of Node B (2 h) = 2:00 |
Alerting mechanism

- Alert rule for a baseline before a node in the baseline is run:
Note The baseline that you create can estimate the completion time for each node within the monitoring scope on the current day. The completion time is estimated based on the average points in time at which the nodes within the monitoring scope finish running over a historical period of time. The system can predict nodes that cannot finish running before the alert time for the baseline and send an alert notification that contains exception information to the specified alert contact at the earliest opportunity before the nodes in the baseline start to run on that day. A baseline can help you identify exceptions and receive an alert notification about the exceptions at the earliest opportunity in the scenario where dependencies between nodes in the baseline are complex and the dependencies frequently change.
- If the estimated completion time for a node in a baseline is later than the alert time for the baseline, a baseline alert is triggered. You can view the estimated completion time for a baseline on the Baseline management tab. For more information, see Manage baselines.
- If the estimated completion time for an ancestor node of a node in a baseline is later than the alert time for the baseline, a baseline alert is triggered. The estimated completion time for the ancestor node is calculated based on the average completion time for the ancestor node over a historical period of time.
- Alert rule for a baseline when a node in the baseline is running:
If the completion time for a node in a baseline is later than the alert time for the baseline, a baseline alert is triggered.
Core logic: event alert
- Error: indicates that a node fails to run.
- Slow: indicates that the running duration of a node is significantly longer than the average running duration of the node in the previous periods.
Core logic: key path and key instance

