All Products
Search
Document Center

Application Real-Time Monitoring Service:Dynamic threshold

Last Updated:Jan 15, 2024

If the metrics that you want to detect fluctuate in the normal state, such as the RT and QPS, and the alert thresholds that need to be adjusted vary in different time periods, you can use the interval detection feature to detect anomalies in the metric data based on dynamic thresholds. When the abnormal mutation of a data point exceeds the preset upper and lower boundaries, the system generates an interval anomaly detection event. This detection is mainly used to monitor data or indicators with stable trends.

Use scenario

  • Application performance monitoring: Webmasters can use this feature to monitor the site's response time, request speed, and other golden metrics. When the response time of a service suddenly exceeds the normal range, the system immediately issues an exception warning. This enables the administrator to quickly locate and solve the problem.

  • Server resource optimization: You can monitor the CPU and memory usage of the server. If the resource usage of a server continuously exceeds the preset threshold, the system automatically generates an exception event. This helps the team adjust resource allocation in a timely manner to avoid potential system crashes.

  • Application connection pool analysis: You can use this feature to monitor key metrics such as the query speed and the number of concurrent connections. When certain metrics of a thread exceed the normal range, the system automatically triggers an exception event to optimize program performance in a timely manner.

  • microservices model monitoring: In microservices model, the interactions and dependencies between services are complex. This feature enables teams to monitor resource usage and response performance for each service. When an exception occurs in a service, you can quickly locate the problem to ensure the stable operation of the entire system.

Examples:

It is abnormal if the page view of a work website page view below 1000 during the day (for example, 10:00 to 18:00), but it may be attacked if it page view more than 1000 during the night (for example, 22:00 to 06:00). In this scenario, the normal water level of the indicator changes over time. If you configure a fixed threshold, for example, if the threshold is lower than 1000, you can receive an alert notification normally when the access is abnormal during the day, but you cannot receive an alert notification in time if you are attacked at night. If you use the interval detection function, you can intelligently identify the normal water level and automatically update the threshold interval.

Prerequisites

Application Monitoring eBPF is connected. For more information, see Connect an application to Application Monitoring eBPF and Manually connect an application to Application Monitoring eBPF.

Configure interval detection

  1. Log on to the ARMS console. In the left-side navigation pane, choose Application Monitoring eBPF > Application List Alert Rules.

  2. On the Alert Rules page, click Create Alert Rule for Application Real-time Monitoring (eBPF).

  3. On the Create Alert Rule for Application Real-time Monitoring eBPF page, set Alert Name and Alert Detection Type to Interval Detection.

    Note

    For more information about how to configure threshold detection, see Threshold detection.

  4. In the Alert Object section, select an alert application, metric type, and filter conditions.

    Parameter

    Description

    Select Applications

    Select the application that you want to detect. Currently, interval detection only supports configuring alerts for a single application.

    Metric type

    Select the type of the metric that you want to detect. For more information, see Alert rule metrics.

    After you select a metric, the system automatically calculates the upper and lower boundaries and renders the metric in real time. You can preview the metric trend in the Alert Condition section.

    Note
    • The values of the Alert Condition and Filter Condition parameters vary based on the value of the Metric Type parameter.

    • The initial rendering takes a long time, about 2 to 4 seconds.

    • For more information about how to calculate the upper and lower boundary values, see How to calculate threshold intervals.

    Filter Conditions

    The metrics are further filtered to shorten the monitoring scope.

    The method that is used to filter the metrics for which alerts are generated. Valid values:

    • Traversal: traverses all values of the metric type that you specify.

    • No: calculates the sum of all values of the metric type that you specify.

    • =: The alert content shows only the specified values of the dimension.

    • !=: filters the values of the metric type that you specify. The filtered values must be unequal to the value that you specify in the text box on the right.

    • Contain: filters the values of the metric type that you specify. The filtered values must contain the value that you specify in the text box on the right.

    • Do Not Contain: filters the values of the metric type that you specify. The filtered values cannot contain the value that you specify in the text box on the right.

    • Match Regular Expression: filters the values of the metric type that you specify. The filtered values must conform to the regular expression that you specify in the text box on the right.

    image.png

  5. In the Alert Rules section, configure Alert Conditions.

    Parameter

    Description

    Alert Trigger Mode

    Interval detection only supports single-condition triggers, and does not support the combination of multiple trigger modes.

    Alert Conditions

    Configure specific alert conditions, including the following factors:

    • Last X Minutes: the time range to monitor. You can select a maximum of 60 minutes.

    • Metric Measures: indicates the data or values that can be quantified for a metric. The metrics can be measured based on different metric types, such as the number of calls and the call response time.

    • Aggregation method: The calculation of metric data, including the average value, maximum value, and minimum value, depends on the metric and measure.

    • Comparison method: compares calculated data to find abnormal points. Interval detection includes three comparison methods:

      • Outside the upper and lower bounds of the dynamic threshold: The system automatically calculates the upper and lower bounds of the current time. If a data point is found to be outside the upper or lower bounds, the data is abnormal and an alert is triggered.

      • Dynamic Threshold Outside Upper Bound: The system automatically calculates the upper and lower bounds of the current time. If a data point is found to be outside the upper bound, the data is abnormal and an alert is triggered.

      • Outside the lower bound of the dynamic threshold: The system automatically calculates the upper and lower bounds of the current time. If a data point is found to be outside the lower bound, the data is abnormal and an alert is triggered.

    • Alert Level: Set the severity level from P1 to P4.

    In the data preview area, the blue line represents the actual data points, and the green area is the upper and lower boundary range.

    Tolerance

    The boundary is stretched or shrunk based on the upper and lower boundaries that are automatically calculated by the system. If the tolerance is higher (sliding to the right), the upper and lower boundaries are wider, and the higher the threshold for the data to be diagnosed as abnormal, the lower the threshold for the data to be diagnosed as abnormal, and the lower the threshold for the data to be diagnosed as abnormal, and the lower the threshold for the data to be diagnosed as abnormal.

    Alert count prediction

    View the number of times that the metric is expected to exceed the threshold within the selected time period. Click a specific alert value to query the metric value that triggers an alert at a historical point in time.

    Each time you create or modify an alert rule, you recommend use the Alert count prediction feature. This feature uses algorithms to analyze historical data and predict the number of alerts within the specified time range. This allows you to adjust the threshold. For more information, see Alert count prediction.

  6. Set Notification Policy and Advanced Alert Settings.

    Parameter

    Description

    create a notification policy

    • If you do not specify a notification rule, no alert is sent when the alert is triggered. Alerts are sent only when the matching rule of the notification policy is triggered.

    • If you specify a notification rule, ARMS sends alert notifications by using the notification method specified in the notification policy. You can select an existing notification policy or create a notification policy. For more information, see Create and manage a notification policy.

    Advanced Alert Settings

    No data

    This parameter is used to fix data anomalies, such as no data, abnormal composite metrics, and abnormal period-over-period comparison results. If the metric data does not meet the specified conditions, the metric data is automatically changed to 0 or 1, or the alert is not triggered.

    For more information, see Terminologies.

  7. Click Save.

Calculation principle of threshold interval

The interval detection function of ARMS is mainly based on the Prophet algorithm. After the interval detection task is configured, ARMS learns the characteristics of the metric data in the seven days at a frequency of every 24 hours in the background, extracts the characteristics of the metric, such as the trend and seasonality, and obtains the prediction curve of the metric in the next 24 hours. Then, based on the fluctuation of the indicator itself, that is, the size of the error variance, an estimation interval is made for the data of the indicator for the next day. When you configure interval detection alerts, you can preview the upper and lower boundaries calculated by the algorithm. In the following figure, the blue line is the actual value of the indicator, and the green shades are the upper and lower boundaries. image.png

Different from the static threshold recommend feature, you do not need to manually edit alert rules to update thresholds when the metric normal water level changes due to business changes. This is because ARMS continuously learns the characteristics of metrics at a frequency of once a day and only predicts the upper and lower boundaries of the next day. Therefore, you do not need to manually adjust the thresholds multiple times.

Alert count prediction

The alert count prediction feature uses algorithms to analyze historical data. This allows you to predict the number of alerts within a specified period of time and display the time when historical alerts occur. Helps you set alert static thresholds or adjust the alert sensitivity for interval detection.

Alert Number Prediction Principle

ARMS calculates the number of times that a metric will exceed each threshold based on historical 24-hour metric data to predict the number of alerts under this setting. In addition, ARMS provides the metric details. You can see the specific time when the actual value of the metric exceeds the threshold. You can adjust the threshold based on this information to suit your actual business needs.