You can use the intelligent baseline feature to detect an exception that prevents a node in a baseline from being completed on time. If an exception is detected, the system sends you an alert notification about the exception at the earliest opportunity. This ensures that important data is generated as expected in scenarios in which dependencies between nodes in the baseline are complex. This also helps you reduce configuration costs, prevent invalid alerts, and automatically monitor important nodes.
Scenarios
Manage the priorities of nodes.
In the scenario where the number of nodes is increasing, but the number of resources is limited, the nodes preempt the resources. You can create a baseline and add important nodes to the baseline. Then, you can configure a high priority for the baseline to ensure that the system preferentially allocates the resources to nodes in the baseline.
Calculate the estimated completion time of a node.
The running of a node is affected by resource supply and the status of ancestor nodes of the node. After you add a node that is scheduled to run every day or every hour to a baseline, DataWorks can calculate and display the estimated completion time of the node on the specified day or in the specified hour.
Ensure that a node finishes running before the committed point in time.
You can add a node to a baseline and configure a committed point in time for the baseline. If the system predicts that the node in the baseline cannot finish running before the committed point in time, an error occurs for an ancestor node of the node, or the ancestor node slows down, the system sends you an alert notification. This way, you can troubleshoot issues based on the alert at the earliest opportunity to ensure that the node can finish running before the committed point in time.
Terms
Baseline: You can create a baseline, add important nodes to the baseline, and configure a committed point in time for the baseline. This way, the system can calculate the estimated completion time for nodes in the baseline based on the status of the nodes. If the system determines that a node in the baseline cannot finish running before the committed point in time, the system sends you an alert notification.
Committed point in time: the committed point in time before which all nodes in a baseline finish running. DataWorks ensures that all nodes in a data application finish running before the committed point in time. If you want to reserve a certain amount of time for O&M personnel to handle exceptions that occur on nodes in a baseline, you can configure an alert margin threshold for the baseline. The system uses the time obtained by
subtracting the alert margin threshold from the committed point in time
as the alert time of the baseline. The alert time is also the estimated point in time before which all nodes in the baseline finish running.Alert time: the time that is obtained by
subtracting the alert margin threshold from the committed point in time
.Baseline node: a node that you add to a baseline.
Baseline instance: an instance that is generated by a node in a baseline. The system uses baseline instances to calculate the estimated completion time for a node in the baseline each time. The status of a baseline instance can be safe, alert, or overtime.
A baseline instance is in the safe state if the
estimated completion time for the baseline instance is earlier than the alert time
.A baseline instance is in the alert state if the
estimated completion time for the baseline instance is later than the alert time and earlier than the committed point in time
.A baseline instance is in the overtime state if the
estimated completion time for the baseline instance is later than the committed point in time
.
Key path: Multiple paths affect data generation of the current node in a baseline. The path that takes the longest time for node execution is considered the key path.
Event: an event that is generated if an error is reported for a node in a baseline or for an ancestor node of the node in the baseline, or if a node in the key path slows down. Events prevent nodes in the baseline from being completed on time.
Features
After you add nodes to a baseline, DataWorks preferentially allocates resources to the nodes in the baseline based on the priority of the baseline to ensure data generation of the nodes. In addition, DataWorks determines a monitoring scope for the nodes in the baseline based on the dependencies between the nodes. A baseline alert or an event alert is triggered based on the status of the nodes within the monitoring scope.
You can perform the following operations on a baseline:
Create and manage a baseline
You can create and manage a baseline on the Baselines tab.
You can add important nodes to a baseline, configure basic information such as the committed point in time for the baseline, and configure alert rule parameters such as the notification method and alert contact for the baseline. The system monitors nodes and sends you alert notifications based on the baseline configurations.
You can also configure the priority of the baseline, which determines the priority of the nodes in the baseline. The higher the priority of a baseline, the higher the priority of the nodes in the baseline. If scheduling resources are insufficient, the system preferentially allocates the scheduling resources to nodes in the baseline for which you configure a high priority.
NoteThe priority that you configured for the baseline is mapped to the priority of MaxCompute compute nodes if the following conditions are met:
The priority feature is enabled for MaxCompute projects.
MaxCompute projects use the subscription computing resources.
The priority of a MaxCompute job is calculated based on the following formula: 9 - Priority of a baseline in DataWorks.
For more information about how to create and manage a baseline, see Manage baselines.
Determine the monitoring scope
DataWorks determines the monitoring scope based on the dependencies between nodes in a baseline. All nodes that may affect data generation of the nodes in the baseline are monitored. For more information, see Core logic: monitoring scope.
Trigger an alert and send an alert notification
Baseline alert
An alert can be automatically triggered based on the alert rule parameters that you configure for a baseline, and the status of nodes that are within the monitoring scope. After an alert is triggered, DataWorks sends you an alert notification. If DataWorks predicts that nodes in a baseline cannot finish running before the committed point in time, DataWorks sends you an alert notification by using a notification method that you specify. For more information, see Core logic: baseline alert.
Event alert
After a monitoring scope is determined, when an error is reported for a node in a baseline or for an ancestor node of the node in the baseline, or when a node in the key path slows down, the related event is generated and DataWorks sends you an alert notification. You can view existing events on the Events tab. For more information, see Manage events.
Billing
Number of baseline instances: Nodes in a baseline that is enabled generate instances in the baseline. You are billed based on the number of baseline instances that are generated before 00:23:59 each day. For more information, see Baseline instances.
Numbers of alert text messages and alert phone calls: You are charged for text messages and phone calls that are generated when baseline alerts are triggered. For more information, see Billing of alert text messages and alert phone calls.
Limits
Only users of DataWorks Standard Edition or a more advanced edition can use the intelligent baseline feature. If your DataWorks service does not meet the requirements, you must upgrade it to Standard Edition or a more advanced edition first. For more information, see Differences among DataWorks editions.
Core logic: monitoring scope
After you create a baseline and add a node to the baseline, you cannot use the intelligent baseline feature to monitor all ancestor and descendant nodes of the node in the baseline. The following content describes the monitoring scope for nodes in a baseline:
Ancestor nodes: The ancestor nodes that affect data generation of nodes in a baseline are monitored.
Descendant nodes: The descendant nodes of nodes in a baseline are not monitored. If an error is reported for a descendant node of a node in a baseline or for a descendant node that is in a different branch from the node in the baseline, the system does not send an alert notification.
As shown in the preceding figure, nodes A, B, C, D, E, and F are created in DataWorks. Nodes D and E are nodes in a baseline. Nodes A and B are ancestor nodes of Nodes D and E and affect data generation of Nodes D and E. Therefore, nodes A, B, D, and E are within the monitoring scope. If an exception occurs on a node within the monitoring scope or if a node within the monitoring scope slows down, the system can detect the issue. However, nodes C and F are not within the monitoring scope of the intelligent baseline.
Core logic: baseline alert
You can add important nodes to a baseline, and configure the committed point in time and alert margin threshold for the baseline.
DataWorks uses the time obtained by
subtracting the alert margin threshold from the committed point in time
as the alert time of the baseline. DataWorks uses baseline instances to calculate the latest completion time and latest start time for each node within the monitoring scope based on the alert time and the average running durations of nodes within the monitoring scope over a historical period of time.If DataWorks predicts that the status of a node within the monitoring scope may prevent nodes in a baseline from being completed before the alert time, DataWorks sends you an alert notification.
Core logic: event alert
After a monitoring scope is determined, the intelligent monitoring system generates an event and reports an alert when an exception occurs on a node within the monitoring scope. The alert is reported based on the analysis results of the event. Exceptions:
Error: indicates that a node fails to run.
Slow: indicates that the running duration of a node is significantly longer than the average running duration of the node in the previous periods.
If a node slows down and then encounters an error, two events are generated.
You can go to the Events tab to view the details about an event.
Core logic: key path and key instance
The dependencies between the nodes that you want to monitor in a baseline may be complex. DataWorks provides Gantt charts for you to quickly identify the key path and key instances that prevent the nodes in the baseline from generating data. The key path for a baseline is the path in which nodes affect data generation of the nodes that you want to monitor in the baseline and that takes the longest time for node execution.
You can use a Gantt chart to view the key path of a node that you want to monitor. The following Gantt chart shows the key path of and the point in time at which each exception occurs for the nodes that are shown in the preceding figure.