Use dynamic thresholds to detect anomalies in metric data -

To detect anomalies and configure alerting for metrics whose values fluctuate even in a normal state, such as the response time (RT) and queries per second (QPS), we recommend that you enable dynamic thresholds in different period of time. Anomaly detection based on dynamic thresholds is mainly used to monitor metrics whose trends are stable. If the specified thresholds are exceeded, the system generates exception events.

Scenarios

Application performance monitoring: monitors the key metrics of a website or service, such as the response time and request speed. If the response time of a service suddenly exceeds the dynamic thresholds, the system immediately issues an exception warning. This enables website administrators to quickly locate and solve the problem.
Server resource optimization: monitors the CPU utilization and memory usage of a server. If the resource usage of a server continuously exceeds the dynamic thresholds, the system automatically generates an exception event. This helps you adjust resource allocation in a timely manner to prevent system crashes.
Application connection pool analysis: monitors key metrics, such as the query speed and the number of concurrent connections. If some metrics of a thread exceed the dynamic thresholds, the system automatically triggers an exception event to optimize program performance in a timely manner.
Microservice model monitoring: monitors resource usage and response performance of each microservice. The interactions and dependencies among microservices are complex. With dynamic thresholds, if an exception occurs in a microservice, you can quickly locate the problem to ensure the stability of the entire microservice.

Example:

Assume that the normal page view of a website from 10:00 to 18:00 is greater than 1,000. If the page view is still greater than 1,000 from 22:00 to 06:00, the website is likely to be attacked. In this case, the expected data range of the page view changes over time. If you configure a static threshold value 1000, you can receive alert notifications when the page view is less than 1000 during the day. However, if the website is attacked at night, alerts are not triggered. In this case, you can use dynamic thresholds to intelligently update the data range and detect anomalies.

Prerequisites

Your application is monitored in Application Monitoring eBPF Edition. For more information, see Connect an application to Application Monitoring eBPF Edition and Manually connect an application to Application Monitoring eBPF Edition.

Configure dynamic thresholds

Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring eBPF > Application Monitoring eBPF Edition Alert Rules.
On the page that appears, click Create Alert Rule for Application Monitoring eBPF Edition.
On the Create Alert Rule for Application Monitoring eBPF Edition page, set Alert Rule Name, and set Alert Detection Type to Interval detection.
Note
To configure threshold detection, see Threshold detection.

In the Alert Contact section, select the required parameters.

Parameter	Description
Select Application	Select the application to monitor. Range detection currently supports configuring alerts for only a single application.
Metric Type	Select the type of metric to monitor. For more information, see Alert rule metrics. After you make a selection, the system automatically calculates the upper and lower bounds and renders them in real time. You can preview the metric trend in the Alert Condition section. Note The condition fields for the Alert Rule and the Filter Condition vary based on the selected Metric Type. The initial rendering takes about 2 to 4 seconds. For information about how the upper and lower bounds are calculated, see How the threshold range is calculated.
Filter Conditions	Further filter the metric to narrow the monitoring scope. Dimensions for the alert metric: Traverse: The alert content shows the specific dimension value that triggered the alert. No Dimension: The alert content shows the sum of all values for this dimension. = : The alert content only shows data for the specified dimension value. !=: The alert content only shows data for dimension values other than the one specified. Contain: The alert content only shows data for dimension values that contain the specified string. Do Not Contain: The alert content only shows data for dimension values that do not contain the specified string. Match Regular Expression: The alert content only shows data for dimension values that match the specified regular expression.

In the Alert rules section, configure the Alert Condition parameter.

Parameter

Description

Alert trigger mode

Valid value: Single Condition.

Alert Condition

Configure the alert conditions, including the following options:

Last X Minutes: the time period for triggering alerts. Maximum value: 60.
Data: the data that you want to monitor. Various data types can be specified, such as the number of calls or the response time.
Calculation method: specifies how data is calculated. Various methods can be specified, such as the average value, maximum value, or minimum value, depending on the metric and data type.
Comparison method: compares calculated data to find anomalies. Valid values:
- Outside the range of the dynamic threshold: automatically calculates the upper and lower boundaries of an allowed data range for the time period. If a data point falls outside the range, the data is abnormal and an alert is triggered.
- Larger than the maximum value of the dynamic threshold: automatically calculates the upper and lower boundaries of an allowed data range for the time period. If a data point is larger than the upper boundary, the data is abnormal and an alert is triggered.
- Lower than the minimum value of the dynamic threshold: automatically calculates the upper and lower boundaries of an allowed data range for the time period. If a data point is lower than the lower boundary, the data is abnormal and an alert is triggered.
Alert level: the severity of the alert. Valid values: P1, P2, P3, and P4.

In the data preview section, the color blue represents data points, and the color green specifies an allowed data range.

Configure the Alert Notification parameter and parameters in the Advanced Alert Settings section.

Parameter	Description
Notification Policy	This field is displayed only when Alert Notification is set to Standard Mode. Valid values: Do Not Specify Notification Policy: If alerts are triggered, no notification is sent. Notifications are sent only when the matching rules of a notification policy is triggered. Specify a notification policy: If you specify a notification policy, ARMS sends notifications by using the notification method specified in the notification policy. You can select an existing notification policy or create a notification policy. For more information, see Create and manage a notification policy.
Advanced Alert Settings
No data	This parameter is used to fix data anomalies, such as missing data, abnormal composite metrics, and abnormal period-over-period comparison results. If data anomalies can be fixed, the data is automatically changed to 0 or 1, or the alert is not triggered. For more information, see Terms.

Click Save.

Threshold calculation

The dynamic thresholds of ARMS are mainly developed based on the Prophet algorithm. After dynamic thresholds are enabled, ARMS analyzes historical data of last 7 days every 24 hours, extracts the tendency and seasonality, and then draws a trend chart for the predicted data in the next 24 hours. At the same time, an expected data range is calculated based on the fluctuations of the metric. When you configure dynamic thresholds, you can preview the upper and lower boundaries calculated by the algorithm.

Different from static thresholds, dynamic thresholds do not need to be updated by manually editing alert rules even if the expected data range of a metric changes over time. This is because ARMS analyzes metric trends once a day and predicts the upper and lower boundaries only of the next day.

Alert quantity prediction

The alert quantity prediction feature uses an algorithm to analyze historical data, display the time when historical alerts occur, and then predicts the number of alerts within a specified period of time. The feature helps you configure static thresholds or improve alert sensitivity for dynamic thresholds.

Implementation

Based on metric data in the last 24 hours, ARMS calculates the number of times that each threshold of a metric is exceeded to predict the quantity of alerts in the future. In addition, ARMS provides the metric details, including the specific time when each threshold is exceeded. You can adjust thresholds based on your business requirements.

After an alert rule is triggered, view historical alert events.
After you receive an alert notification, view historical alerts.