After logs are collected to Log Service, you can use the alerting system of Log Service to configure alerts based on log keywords.

Background information

Logs can record information about the operating process and exceptions of a system. For example, logs can record warnings, errors, panic errors in Go, and the java.lang.StackOverflowError error in Java. Logs can also record the status of a system. For example, logs can record payment failures. Log keyword-based retrieval, monitoring, and alerting are frequently used. You can retrieve keywords from logs and configure alerts based on the keywords. This way, you can identify issues at the earliest opportunity. Log Service provides an O&M-free alerting solution that features high performance and flexible configurations to help you configure alerts based on log keywords.

Case 1: Specify keywords to trigger alerts

This case provides an example on how to configure a query statement and an alert monitoring rule that triggers alerts when a specified keyword appears in logs.

  • Query statement

    Set the time range to 15 Minutes(Relative) and execute the following statement to query the logs that include the ERROR keyword. For more information, see Query and analyze logs.

    ERROR
  • Query and analysis result

    The following query and analysis result shows that the ERROR keyword appears once within the last 15 minutes.

    Keyword-based alert
  • Alert monitoring rule
    You can create an alert monitoring rule based on the obtained query and analysis result. For more information, see Create an alert monitoring rule for logs. You need to take note of the following parameters:
    • Set the Trigger Condition parameter to Data is returned. An alert is triggered when the ERROR keyword appears in logs.
    • Set the Description field in the Add Annotation parameter to ${logging} and Alert Template to SLS built-in content template. This way, an alert notification includes the content of the logging field in a raw log.
    Alert monitoring rule
  • Alert notification

    After the alert monitoring rule is created, you can receive an alert notification in the specified DingTalk group when the ERROR keyword appears in logs. You can click View Details to view the log for which an alert is generated to identify root causes.

Case 2: Configure alerts based on the number of times that a keyword appears in logs

This case provides an example on how to configure a query statement and an alert monitoring rule that triggers alerts when the number of times that a keyword appears in logs reaches a specified number within a specified time range.

  • Query statement

    Set the time range to 1 Hour(Relative) and execute the following statement to query the number of times that the ERROR keyword appears in logs within an hour. For more information, see Query and analyze logs.

    ERROR | SELECT count(*) AS cnt
  • Query and analysis result

    The following query and analysis result shows that the ERROR keyword appears 11 times within the last hour.

    Query and analysis result
  • Alert monitoring rule
    You can create an alert monitoring rule based on the obtained query and analysis result. For more information, see Create an alert monitoring rule for logs. You need to take note of the following parameters:
    • Set the Trigger Condition parameter to data matches the expression, cnt > 5. An alert is triggered when the number of times that the ERROR keyword appears in logs exceeds 5 within an hour.
    • Set the Description field in the Add Annotation parameter to ${cnt} times that the ERROR keyword appears within an hour and Alert Template to SLS builtin content template. This way, an alert notification displays the number of times that the ERROR keyword appears within the last hour.
    Alert monitoring rule
  • Alert notification

    After the alert monitoring rule is created, you can receive an alert notification in the specified DingTalk group when the number of times that the ERROR keyword appears in logs exceeds 5 within the last hour. You can click View Details to view the log for which an alert is generated to identify root causes.

Case 3: Configure alerts by comparing the number of times that a keyword appears within a specific time range on a specified day and the day before

A keyword appears at regular intervals, such as daily, and is more likely to appear during daytime than during nighttime. In this case, absolute values such as the number of times that a keyword appears may not be suitable to analyze the actual status of a system. You can use interval-valued comparison and periodicity-valued comparison functions to calculate the percentage of the number of times that a keyword appears in logs within a specific time range on one day to the number of times that the keyword appears in logs within the same time range on a different day and configure alerts based on the calculation result.

  • Query statement

    Set the time range to 1 Hour(Relative) and execute the following statement to calculate the percentage of the number of times that the ERROR keyword appears in logs within the last hour to the number of times that the ERROR keyword appeared in logs within the same time range the day before. For more information, see Query and analyze logs. For more information about the compare function, see Interval-valued comparison and periodicity-valued comparison functions.

    ERROR |
    SELECT
      diff [1] AS today,
      diff [2] AS yesterday,
      round((diff [3]-1) * 100, 2) AS ratio
    FROM  (
        SELECT
          compare(cnt, 86400) AS diff
        FROM      (
            SELECT
              COUNT(*) AS cnt
            FROM          log
          )
      )
  • Query and analysis result

    The following query and analysis result shows that the ERROR keyword appears 11 times within the last hour and 6 times within the same time range the day before. The growth rate is 83.33%.

    Query and analysis result
  • Alert monitoring rule
    You can create an alert monitoring rule based on the obtained query and analysis result. For more information, see Create an alert monitoring rule for logs. You need to take note of the following parameters:
    • Set the Trigger Condition parameter to data matches the expression, ratio > 10. An alert is triggered when the percentage of the number of times that the ERROR keyword appears in logs within the last hour to the number of times that the ERROR keyword appeared in logs within the same time range the day before exceeds 10%.
    • Set the Description field in the Add Annotation parameter to ${today} times that the keyword ERROR appears in logs within the last hour, ${yesterday} times that the ERROR keyword appeared in logs within the same time range the day before, and the growth rate is ${ratio}% and Alert Template to SLS builtin content template. This way, an alert notification displays the number of times that the ERROR keyword appears in logs within the last hour, the number of times that the ERROR keyword appeared in logs within the same time range the day before, and the growth rate.
    Alert monitoring rule
  • Alert notification

    After the alert monitoring rule is created, you can receive an alert notification in the specified DingTalk group when the percentage of the number of times that the ERROR keyword appears in logs within the last hour to the number of times that the ERROR keyword appeared in logs within the same time range the day before exceeds 10%. You can click View Details to view the log for which an alert is generated to identify root causes.

Case 4: Configure alerts for anomalies based on machine learning algorithms

The preceding cases describe the common scenarios for keyword-based alert configurations. However, in special scenarios, you need to use Log Service machine learning algorithms to configure alerts. For example, the number of times that a keyword appears in a day does not frequently fluctuate, but the number may sharply increase or decrease at a specific point in time. To identify the change at the earliest opportunity, you can perform time series forecasting and anomaly detection based on Log Service machine learning algorithms. For more information about machine learning algorithms, see Machine learning functions.

  • Query statement

    Set the time range to 4 Hours(Relative) and execute the following statement to query the number of times that anomalies are detected. The anomalies are detected on the numbers of times that the ERROR keyword appears within the last 4 hours. For more information, see Query and analyze logs. For more information about the ts_predicate_simple function, see ts_predicate_simple.

    ERROR |
    SELECT
      ts_predicate_simple(stamp, value, 6)
    FROM  (
        select
          __time__-__time__ % 30 AS stamp,
          count(1) AS value
        FROM      log
        GROUP BY
          stamp
        ORDER BY
          stamp
      )
  • Query and analysis result
    The following query and analysis result shows that the src, predict, upper, lower, and anomaly_prob columns are returned. If a value of anomaly_prob is greater than 0, an anomaly is detected. The total number of anomalies is equal to the number of data entries for which the value of anomaly_prob is greater than 0. You can configure alerts based on the numbers. Query and analysis result

    The query and analysis result can be displayed in a time series chart. This way, you can easily identify abrupt changes. Each small red circle in the following time series chart represents an anomaly. The chart shows that 15 anomalies are detected within the specified time range.

    Query and analysis result
  • Alert monitoring rule
    You can create an alert monitoring rule based on the obtained query and analysis result. For more information, see Create an alert monitoring rule for logs. You need to take note of the following parameters:
    • Set the Trigger Condition parameter to the query result contains, >, 5, anomaly_prob > 0. An alert is triggered when the number of times that anomalies are detected exceeds 5 within the last 4 hours.
    • Set the Description field in the Add Annotation parameter to the number of times that anomalies are detected exceeds 5 and Alert Template to SLS builtin content template. This way, an alert notification displays the number of anomalies within the last 4 hours.
    Alert monitoring rule
  • Alert notification

    After the alert monitoring rule is created, you can receive an alert notification in the specified DingTalk group when the number of times that anomalies are detected exceeds 5 within the last 4 hours. You can click View Details to view the log for which an alert is generated to identify root causes.