You can use the log monitoring feature of CloudMonitor to calculate the number of times that a specific keyword appears in the logs that are collected by Log Service. You can also use the log monitoring feature to configure an alert rule for the keyword. If the number of times that the keyword appears meets a specified condition, an alert is triggered. This topic describes how to create a metric to monitor a specific keyword in logs and how to configure an alert rule for the keyword.

Prerequisites

On-premises logs are collected and stored in Log Service. For more information, see Log Service.

Background information

The following example shows the sample logs that are collected by Log Service:
2017-06-21 14:38:05 [INFO] [impl.FavServiceImpl] execute_fail and run time is 100msuserid=
2017-06-21 14:38:05 [WARN] [impl.ShopServiceImpl] execute_fail, wait moment 200ms
2017-06-21 14:38:05 [INFO] [impl.ShopServiceImpl] execute_fail and run time is 100ms,reason:user_id invalid
2017-06-21 14:38:05 [INFO] [impl.FavServiceImpl] execute_success, wait moment ,reason:user_id invalid
2017-06-21 14:38:05 [WARN] [impl.UserServiceImpl] execute_fail and run time is 100msuserid=
2017-06-21 14:38:06 [WARN] [impl.FavServiceImpl] execute_fail, wait moment userid=
2017-06-21 14:38:06 [ERROR] [impl.UserServiceImpl] userid=, action=, test=, wait moment ,reason:user_id invalid
2017-06-21 14:38:06 [ERROR] [impl.ShopServiceImpl] execute_success:send msg,200ms
In this example, ERROR is used as the keyword to describe how to use the log monitoring feature to create a metric and configure an alert rule to monitor the keyword. The key is level and the value is the content of a log. The following table describes the key-value pairs that are extracted from the sample logs.
KeyValue
level2017-06-21 14:38:05 [INFO] [impl.FavServiceImpl] execute_fail and run time is 100msuserid=
level2017-06-21 14:38:05 [WARN] [impl.ShopServiceImpl] execute_fail, wait moment 200ms
level2017-06-21 14:38:06 [ERROR] [impl.ShopServiceImpl] execute_success:send msg,200ms

Procedure

  1. Optional. Grant CloudMonitor the permissions to access Log Service.
    The first time you use the log monitoring feature, you must grant CloudMonitor the permissions to access Log Service.
    1. Log on to the CloudMonitor console.
    2. In the left-side navigation pane, click Log Monitoring.
    3. In the Service-linked Role for CloudMonitor dialog box, click OK.
  2. Create a log monitoring metric to monitor the logs in which the value of the level field contains the keyword ERROR.
    1. In the upper-left corner of the Log Monitoring page, click Create Log Monitoring Metric.
    2. In the Associate Resource step, set the parameters and click Next.
      ParameterDescription
      RegionThe region in which the Log Service project resides.
      ProjectThe name of the Log Service project.
      LogstoreThe name of the Log Service Logstore.
    3. In the Define Metric step, set the parameters and click Next.
      ParameterDescription
      Metric NameThe name of the metric.
      UnitThe unit of the metric.
      Computing CycleThe statistical period of the metric. Unit: minutes. Valid values: 1, 2, 3, 4, 5, 10, 15, 20, 30, and 60.
      Statistical MethodThe function that is used to aggregate the log data of a statistical period. If the value of the specified field is a numeric value, you can use all statistical methods. Otherwise, you can use only the Count and countps methods to aggregate data. Valid values:
      • Count: calculates the number of values of the specified field within a statistical period.
      • Sum: calculates the sum of the values of the specified field within the last minute.
      • Max: calculates the maximum value of the specified field within a statistical period.
      • Min: calculates the minimum value of the specified field within a statistical period.
      • Average: calculates the average of the values of the specified field within a statistical period.
      • countps: calculates the number of values of the specified field divided by the total number of seconds of a statistical period.
      • sumps: calculates the sum of the values of the specified field divided by the total number of seconds of a statistical period.
      • distinct: calculates the number of unique values of the specified field within a statistical period.
      Extended FieldPerforms basic operations on calculation results. For example, after you set the Statistical Method parameter to aggregate log data, you specify a field as TotalNumber to calculate the total number of HTTP requests. At the same time, you specify another field as 5xxNumber to calculate the number of HTTP requests whose status code is greater than 499. In this case, you can specify an extended field to calculate the server error rate by using the following formula: 5XXNumber/TotalNumber × 100%.
      Log FilterFilters log data. This parameter is equivalent to the WHERE clause in SQL. For example, if you want to monitor logs in which the value of the level field is ERROR, set the parameter to level=Error.

      The name of the log field that you want to use to filter data cannot contain Chinese characters.

      Group-byThe dimension based on which data is aggregated. This parameter is equivalent to the GROUP BY clause in SQL.

      Log data is grouped by specified dimension. If you do not specify a dimension, all data is aggregated based on the specified aggregate function.

      For more information, see GROUP BY clause.

      Select SQLConverts the statistical methods that you specify to an SQL statement. This parameter indicates how data is processed.
      Application GroupThe name of the application group. The metric is added to the specified application group.
    4. In the Configure Alert Rule step, configure an alert rule to monitor the keyword ERROR and click Next.
      ParameterDescription
      Alert RuleThe name of the alert rule.
      Rule DescriptionThe condition that triggers alerts. If the metric meets the specified condition, an alert is triggered.
      Alert LevelThe alert notification method. Valid value:

      Email + DingTalk

      Triggered when threshold is exceeded forThe number of consecutive times that the threshold value is exceeded. If the number of times exceeds the limit that you specified, the alert contacts in the contact group receive alert notifications. Valid values: 1, 3, 5, 10, 15, and 30.
      Mute ForThe interval at which CloudMonitor sends alert notifications until the alert that is triggered based on the alert rule is cleared. Valid values: 5 m, 10 m, 15 m, 30 m, 60 m, 3 h, 6 h, 12 h, and 24 h.

      An alert is triggered if the condition of the alert rule is met. CloudMonitor does not resend an alert notification when the alert is triggered within the mute period. If the alert is not cleared after the mute period ends, CloudMonitor resends alert notifications.

      Effective TimeThe period during which the alert rule is effective. CloudMonitor monitors data based on the alert rule only within the specified period.
      Alert CallbackThe callback URL that can be accessed over the Internet. CloudMonitor sends a POST or GET request to push an alert notification to the callback URL that you specify. Only HTTP requests are supported. For information about how to configure alert callback, see Use the alert callback feature to send notifications about threshold-triggered alerts.
    5. In the Creation Result step, click Close.
  3. View the monitoring data of the keyword ERROR.
    After you create the log monitoring metric, wait for 3 to 5 minutes. On the Log Monitoring page, find the metric whose monitoring chart you want to view and click the Monitoring chart icon in the Actions column.
  4. View the alert notifications that are sent for the keyword ERROR.
    If an ERROR-level log appears in Log Service, CloudMonitor sends an alert notification.