All Products
Search
Document Center

CloudMonitor:Use the log monitoring feature to monitor log keywords and configure alert rules

Last Updated:Mar 26, 2024

You can use the log monitoring feature of CloudMonitor to calculate the number of times that a specific keyword appears in the logs that are collected by Simple Log Service. You can also use the log monitoring feature to configure an alert rule for the keyword. If the number of times that the keyword appears meets a specified condition, an alert is triggered. This topic describes how to create a metric to monitor a specific keyword in logs and how to configure an alert rule for the keyword.

Important

The log monitoring feature is available for only users who purchased subscription CloudMonitor Basic (cms_edition) before 22:00:00 on September 13, 2022, or users who have activated pay-as-you-go CloudMonitor Basic (cms_post). The log monitoring feature will no longer be available from September 14, 2024. We recommend that you use the Simple Log Service log monitoring feature in Hybrid Cloud Monitoring.

Prerequisites

On-premises logs are collected and stored in Simple Log Service. For more information, see Simple Log Service.

Background information

The following example shows the sample logs that are collected by Simple Log Service:

2017-06-21 14:38:05 [INFO] [impl.FavServiceImpl] execute_fail and run time is 100msuserid=
2017-06-21 14:38:05 [WARN] [impl.ShopServiceImpl] execute_fail, wait moment 200ms
2017-06-21 14:38:05 [INFO] [impl.ShopServiceImpl] execute_fail and run time is 100ms,reason:user_id invalid
2017-06-21 14:38:05 [INFO] [impl.FavServiceImpl] execute_success, wait moment ,reason:user_id invalid
2017-06-21 14:38:05 [WARN] [impl.UserServiceImpl] execute_fail and run time is 100msuserid=
2017-06-21 14:38:06 [WARN] [impl.FavServiceImpl] execute_fail, wait moment userid=
2017-06-21 14:38:06 [ERROR] [impl.UserServiceImpl] userid=, action=, test=, wait moment ,reason:user_id invalid
2017-06-21 14:38:06 [ERROR] [impl.ShopServiceImpl] execute_success:send msg,200ms

In this example, ERROR is used as the keyword to describe how to use the log monitoring feature to create a metric and configure an alert rule to monitor the keyword. The key is level and the value is the content of a log. The following table describes the key-value pairs that are extracted from the sample logs.

Key

Value

level

2017-06-21 14:38:05 [INFO] [impl.FavServiceImpl] execute_fail and run time is 100msuserid=

level

2017-06-21 14:38:05 [WARN] [impl.ShopServiceImpl] execute_fail, wait moment 200ms

level

2017-06-21 14:38:06 [ERROR] [impl.ShopServiceImpl] execute_success:send msg,200ms

Procedure

  1. Optional. Grant CloudMonitor the permissions to access Simple Log Service.

    The first time you use the log monitoring feature, you must grant CloudMonitor the permissions to access Simple Log Service.

    1. Log on to the CloudMonitor console.

    2. In the left-side navigation pane, choose Access Center > Log Monitoring.

    3. In the Service-linked Role for CloudMonitor dialog box, click OK.

  2. Create a log monitoring metric to monitor the logs in which the value of the level field contains the keyword ERROR.

    1. In the upper-left corner of the Log Monitoring page, click Create Log Monitoring Metric.

    2. In the Associate Resource step, configure the parameters and click Next.

      Parameter

      Description

      Region

      The region in which the Simple Log Service project resides.

      Project

      The name of the Simple Log Service project.

      Logstore

      The name of the Simple Log Service Logstore.

    3. In the Define Metric step, configure the parameters and click Next.

      Parameter

      Description

      Metric Name

      The name of the metric.

      Unit

      The unit of the metric.

      Computing Cycle

      The statistical period of the metric. Unit: minutes. Valid values: 1, 2, 3, 4, 5, 10, 15, 20, 30, and 60.

      Statistical Method

      The function that is used to aggregate log data within a statistical period. If the value of the specified field is a numeric value, you can use all statistical methods. Otherwise, you can use only the Count and countps methods to aggregate log data. Valid values:

      • Count: calculates the number of values of the specified field within a statistical period.

      • Sum: calculates the sum of the values of the specified field within the last minute.

      • Max: calculates the maximum value of the specified field within a statistical period.

      • Min: calculates the minimum value of the specified field within a statistical period.

      • Average: calculates the average of the values of the specified field within a statistical period.

      • countps: calculates the number of values of the specified field divided by the total number of seconds within a statistical period.

      • sumps: calculates the sum of the values of the specified field divided by the total number of seconds within a statistical period.

      • distinct: calculates the number of unique values of the specified field within a statistical period.

      Extended Field

      Performs basic operations on calculation results. For example, after you set the Statistical Method parameter to aggregate log data, you specify a field as TotalNumber to calculate the total number of HTTP requests. At the same time, you specify another field as 5xxNumber to calculate the number of HTTP requests whose status code is greater than 499. In this case, you can specify an extended field to calculate the server error rate by using the following formula: 5XXNumber/TotalNumber × 100%.

      Log Filter

      Filters log data. This parameter is equivalent to the WHERE clause in SQL. For example, if you want to monitor logs in which the value of the level field is ERROR, set the parameter to level>=ERROR.

      The name of the log field that you use to filter data cannot contain Chinese characters.

      Group-By

      The dimension based on which data is aggregated. This parameter is equivalent to the GROUP BY clause in SQL.

      Log data is grouped by the specified dimension. If you do not specify a dimension, all data is aggregated based on the specified aggregate function.

      For more information, see GROUP BY clause.

      Select SQL

      Converts the statistical methods that you specify to an SQL statement. This parameter indicates how data is processed.

      Application Groups

      The name of the application group. The metric is added to the specified application group.

    4. In the Configure Alert Rule step, configure an alert rule to monitor the keyword ERROR and click Next.

      Parameter

      Description

      Alert Rule

      The name of the alert rule.

      Rule Description

      The condition that triggers an alert. If the metric meets the specified condition, an alert is triggered.

      Alert Level

      The alert notification method. Valid value:

      Email + Webhook

      Triggered when threshold is exceeded for

      The number of consecutive times that the threshold value is exceeded. If the number of times exceeds the limit that you specify, the alert contacts in the alert contact groups receive alert notifications. Valid values: 1, 3, 5, 10, 15, 30, 60, 90, 120, and 180.

      Mute For

      The interval at which CloudMonitor resends alert notifications before the alert is cleared. Valid values: 5 Minutes, 15 Minutes, 30 Minutes, 60 Minutes, 3 Hours, 6 Hours, 12 Hours, and 24 Hours.

      If the threshold value is exceeded, CloudMonitor sends an alert notification. If the threshold value is exceeded again within the mute period, CloudMonitor does not resend an alert notification. If the alert is not cleared after the mute period ends, CloudMonitor resends an alert notification.

      Effective Period

      The period during which the alert rule is effective. CloudMonitor monitors the metric based on the alert rule only within the specified period.

      Alert Callback

      The callback URL that can be accessed over the Internet. CloudMonitor sends a POST or GET request to push an alert notification to the callback URL that you specify. Only HTTP requests are supported. For information about how to configure alert callbacks, see Use the alert callback feature to send notifications about threshold-triggered alerts.

    5. In the Creation Result step, click Close.

  3. View the monitoring data of the keyword ERROR.

    After you create the log monitoring metric, wait for 3 to 5 minutes. On the Log Monitoring page, find the metric whose monitoring chart you want to view and click the 监控图表 icon in the Actions column.

  4. View the alert notifications that are sent for the keyword ERROR.

    If an ERROR-level log appears in Simple Log Service, CloudMonitor sends an alert notification.