Log Service allows you to configure alert rules based on the charts in a dashboard to monitor the service status in real time.

Specify the time range and check frequency for a query

A specified query statement is executed at the specified interval based on the specified time range to query and analyze data. Log Service checks whether the trigger condition is met based on the query and analysis result. If the trigger condition is met, an alert is triggered.

When you create an alert rule, we recommend that you specify a time range that is different from the value of the Check Frequency parameter. If you specify a time range that is the same as the value of the Check Frequency parameter, you may fail to query some data. The following example explains why this issue occurs. In the example, the time range is set to 1 Minute(Relative) and the Check Frequency parameter is set to Fixed Interval 1 Minutes.
  • After data is written to Log Service, you can query the data in Log Service after a specific period of time. In most cases, the period lasts less than 3 seconds Due to the latency, you may fail to find some data. For example, the alert rule is executed at 12:03:30 and the time range is set to 1 Minute(Relative), this indicates that the time range is [12:02:30, 12:03:30). Logs that are written to Log Service at 12:03:29 may not be queried at 12:03:30.
    • If you have high requirements for alert accuracy, such as no duplicate alerts and no missing alerts, you can set the time range to an earlier period. For example, if the alert rule is executed at 12:03:30, you can set the time range to [12:02:20,12:03:20) that is 10 seconds earlier. This prevents missing alerts that are caused by indexing latency.
    • If you have high requirements for real-time performance, we recommend that you set the time range to a period whose start time is earlier than the time when the alert rule is executed. This allows you to receive alert notifications when alerts are triggered regardless of duplicate alerts. For example, if the alert rule is executed at 12:03:30, you can set the time range to [12:02:20,12:03:30).
  • If the logs that are generated at different points in time within the same minute are written at the same time, an issue may occur. In this case, the indexes of later logs may be written to the point in time of the earlier logs. This issue occurs due to the method that Log Service uses to build indexes. For example, the alert rule is executed at 12:03:30 and the time range is set to 1 Minute(Relative). The time range is [12:02:30, 12:03:30). Multiple log entries are written at 12:02:50 and these log entries are generated at different points in time within the same minute, for example, 12:02:20 and 12:02:50. In this case, the indexes of these log entries may be written to 12:02:20. However, this may result in an issue that no log entries can be queried within the time range [12:02:30,12:03:30).
    • If you have high requirements for alert accuracy, such as no duplicate alerts and no missing alerts, you can set the time range to a time frame. You can set the time range to 1 Minute(Time Frame), 5 Minutes(Time Frame), or 1 Hour(Time Frame). Then, you can set the Check Frequency parameter to the same time period, for example, 1 Minutes, 5 Minutes, or 1 Hours.
    • If you have high requirements for real-time performance, we recommend that you set the time range to a specific period. The specific period must include at least the minute before the current time. This allows you to receive alert notifications when alerts are triggered regardless of duplicate alerts. For example, the alert rule is executed at 12:03:30. If you set the time range to 90 Minutes(Relative), this indicates that the time range is [12:02:00,12:03:30). Then, you can set the Check Frequency parameter to Fixed Interval 1 Minutes.

Configure an alert rule to trigger alerts based on query results

When you configure an alert rule to trigger alerts based on query results, you can specify the trigger condition to check whether a specific field exists. This way, the trigger condition of the alert rule is met if the returned result of a query is not empty.

For example, you can configure an alert rule to trigger an alert when a log contains the client_ip: 192.0.2.1 key-value pair. You can configure the alert rule as shown in the following figure. For more information, see Configure an alert rule. In this example, each log that is returned by a query contains the client_ip field and the field value is not empty. In this case, an alert is triggered if the client_ip field is not empty.

Configure an alert rule to trigger alerts based on query results

Configure an alert rule to trigger alerts based on analysis results

In most cases, alert rules are configured based on analysis results. For example, you can configure an alert rule to trigger an alert when the number of logs that contain the ERROR keyword reaches a specified threshold. You can configure the alert rule as shown in the following figure. For more information, see Configure an alert rule.

Configure an alert rule to trigger alerts based on analysis results

Associate multiple charts

When you configure an alert rule based on the charts in a dashboard, Log Service allows you to associate one to three query and analysis results at the same time.
Note
  • The time range of each query does not conflict with each other.
  • When you associate multiple charts, you must use the ${Serial number}.{Field name} syntax to reference a field in a query and analysis result. For example, the trigger condition is set to $0.pv > 100000 && $1.uv < 1000. In the trigger condition, $0 indicates chart 0, and $1 indicates chart 1. For more information, see How can I view the serial number of a chart?.
  • You must separate multiple conditions with two ampersands (&&), for example, pv%100 > 0 && uv > 0.
  • Configure an alert rule to trigger alerts based on the multiple query and analysis results that are generated in different time ranges.
    For example, you can configure an alert to trigger an alert when the value of the pv field within the last 15 minutes is greater than 100000 and the value of the uv field within the last hour is less than 1000. You can configure the alert rule as shown in the following figure. For more information, see Configure an alert rule. Create an alert rule for existing charts on a dashboard
  • Configure an alert rule to trigger alerts based on a chart. The other associated chart provides details for the alert that is triggered based on the alert rule.

    For example, you can configure an alert rule to trigger an alert when the number of log entries whose log level is ERROR reaches a specified threshold. You also want to receive an alert notification that contains the logs whose log level is ERROR. You can configure the alert rule as shown in the following figure. For more information, see Configure an alert rule.

    Alert
    If you want to view the logs whose log level is ERROR in an alert notification, set the notification content to the following value:
    ${results[1].RawResultsAsKv}

Suppress alerts

If the trigger condition of an alert rule is met, you may receive multiple alert notifications within a period of time. To prevent false and duplicate alerts that are caused by data jitter, you can set the Trigger Threshold and Notification Interval parameters to suppress alerts. For more information, see Configure an alert rule. Suppress alerts
  • Specify the threshold of continuous triggers.

    An alert is triggered only when the specified trigger condition is met during continuous check periods.

    For example, the Check Frequency parameter is set to Fixed Interval 1 Minutes and the Trigger Threshold parameter is set to 5. In this case, an alert notification is sent only when the specified trigger condition is met for five times during five continuous check periods. If the trigger condition is not met, no alert is triggered.

  • Specify the interval at which alert notifications are sent.

    If the check frequency is set to a short interval, you can set the Notification Interval parameter to specify the minimum interval at which alert notifications are sent. This prevents frequent alert notifications. For example, the Check Frequency parameter is set to Fixed Interval 1 Minutes and the Notification Interval parameter is set to 30Minutes. In this case, no alert notification is sent within 30 minutes after the previous alert notification is sent even if an alert is triggered.

Use template variables

When you configure a notification method for an alert rule, you can use template variables. When you specify the Content and Subject parameters, you can use the ${fieldName} syntax to reference a template variable. When Log Service sends an alert notification, Log Service replaces the template variables that are referenced in the Content and Subject parameters with the actual values. For example, Log Service replaces ${Project} with the name of the project to which the alert rule belongs. For more information, see Template variables.
  • You can use the ${fieldName} syntax to reference the Project, AlertName, or Dashboard variable. These variables are not case-sensitive.
  • Each time an alert is triggered, Log Service automatically generates the alert information and stores it in the Results field. The value of the Results field is an array. Each element in the array corresponds to a chart that is associated with the triggered alert. In most cases, the array contains only one element. For more information, see Fields in alert rule evaluation logs. You can reference the fields in the Results array by using the following methods:
    • Fields of the array type are referenced in the ${fieldName[{index}]} format. {index} indicates an array subscript that starts from 0. For example, ${results[0]} indicates that the first element in the Results array is referenced.
    • Fields of the object type are referenced in the ${object.key} format. For example, ${results[0].StartTimeTs} indicates a timestamp of 1542453580.
    Note The query results contain only the fields in RawResults and FireResult. These fields are case-sensitive. Other fields are not case-sensitive.
    {
      "EndTime": "2006-01-02 15:04:05",
      "EndTimeTs": 1542507580,
      "FireResult": {
        "__time__": "1542453580",
        "field": "value1",
        "count": "100"
      },
      "FireResultAsKv": "[field:value1,count:100]",
      "Truncated": false,
      "LogStore": "test-logstore",
      "Query": "* | SELECT field, count(1) group by field",
      "QueryUrl": "http://xxxx",
      "RawResultCount": 2,
      "RawResults": [
        {
          "__time__": "1542453580",
          "field": "value1",
          "count": "100"
        },
        {
          "__time__": "1542453580",
          "field": "value2",
          "count": "20"
        }
      ],
      "RawResultsAsKv": "[field:value1,count:100],[field:value2,count:20]",
      "StartTime": "2006-01-02 15:04:05",
      "StartTimeTs": 1542453580
    }

Troubleshoot the issue that no alert is triggered

After you configure an alert rule, you can use alert logs to troubleshoot the issue that no alert is triggered. For more information, see View the evaluation results of alert rules. For more information about log fields, see Fields in alert rule evaluation logs.