Use the log monitoring feature to monitor log keywords and configure alert rules - Cloud Monitor

You can use the log monitoring feature of Cloud Monitor to calculate the number of times that a specific keyword appears in the logs that are collected by Simple Log Service. You can also use the log monitoring feature to configure an alert rule for the keyword. If the number of times that the keyword appears meets a specified condition, an alert is triggered. This topic describes how to create a metric to monitor a specific keyword in logs and how to configure an alert rule for the keyword.

Important

The log monitoring feature is available for only users who purchased subscription CloudMonitor Basic (cms_edition) before 22:00:00 on September 13, 2022, or users who have activated pay-as-you-go CloudMonitor Basic (cms_post). The log monitoring feature will no longer be available from September 14, 2024. We recommend that you use the Simple Log Service log monitoring feature in Hybrid Cloud Monitoring.

Prerequisites

On-premises logs are collected and stored in Simple Log Service. For more information, see Simple Log Service.

Background information

The following example shows the sample logs that are collected by Simple Log Service:

2017-06-21 14:38:05 [INFO] [impl.FavServiceImpl] execute_fail and run time is 100msuserid=
2017-06-21 14:38:05 [WARN] [impl.ShopServiceImpl] execute_fail, wait moment 200ms
2017-06-21 14:38:05 [INFO] [impl.ShopServiceImpl] execute_fail and run time is 100ms,reason:user_id invalid
2017-06-21 14:38:05 [INFO] [impl.FavServiceImpl] execute_success, wait moment ,reason:user_id invalid
2017-06-21 14:38:05 [WARN] [impl.UserServiceImpl] execute_fail and run time is 100msuserid=
2017-06-21 14:38:06 [WARN] [impl.FavServiceImpl] execute_fail, wait moment userid=
2017-06-21 14:38:06 [ERROR] [impl.UserServiceImpl] userid=, action=, test=, wait moment ,reason:user_id invalid
2017-06-21 14:38:06 [ERROR] [impl.ShopServiceImpl] execute_success:send msg,200ms

In this example, ERROR is used as the keyword to describe how to use the log monitoring feature to create a metric and configure an alert rule to monitor the keyword. The key is level and the value is the content of a log. The following table describes the key-value pairs that are extracted from the sample logs.

Key	Value
level	`2017-06-21 14:38:05 [INFO] [impl.FavServiceImpl] execute_fail and run time is 100msuserid=`
level	`2017-06-21 14:38:05 [WARN] [impl.ShopServiceImpl] execute_fail, wait moment 200ms`
level	2017-06-21 14:38:06 [ERROR] [impl.ShopServiceImpl] execute_success:send msg,200ms

Procedure

Optional. Grant Cloud Monitor the permissions to access Simple Log Service.
The first time you use the log monitoring feature, you must grant Cloud Monitor the permissions to access Simple Log Service.
1. Log on to the Cloud Monitor console.
2. In the left-side navigation pane, choose Access Center > Log Monitoring.
3. In the Service-linked Role for CloudMonitor dialog box, click OK.

Create a log monitoring metric to monitor the logs in which the value of the level field contains the keyword ERROR.

In the upper-left corner of the Log Monitoring page, click Create Log Monitoring Metric.
In the Associate Resource step, configure the parameters and click Next.
Parameter
Description
Region
The region in which the Simple Log Service project resides.
Project
The name of the Simple Log Service project.
Logstore
The name of the Simple Log Service Logstore.

In the Define Metric step, configure the parameters and click Next.

Parameter	Description
Metric Name	The name of the metric.
Unit	The unit of the metric.
Computing Cycle	The statistical period of the metric. Unit: minutes. Valid values: 1, 2, 3, 4, 5, 10, 15, 20, 30, and 60.
Statistical Method	The function that is used to aggregate log data within a statistical period. If the value of the specified field is a numeric value, you can use all statistical methods. Otherwise, you can use only the Count and countps methods to aggregate log data. Valid values: Count: calculates the number of values of the specified field within a statistical period. Sum: calculates the sum of the values of the specified field within the last minute. Max: calculates the maximum value of the specified field within a statistical period. Min: calculates the minimum value of the specified field within a statistical period. Average: calculates the average of the values of the specified field within a statistical period. countps: calculates the number of values of the specified field divided by the total number of seconds within a statistical period. sumps: calculates the sum of the values of the specified field divided by the total number of seconds within a statistical period. distinct: calculates the number of unique values of the specified field within a statistical period.
Extended Field	Performs basic operations on calculation results. For example, after you set the Statistical Method parameter to aggregate log data, you specify a field as TotalNumber to calculate the total number of HTTP requests. At the same time, you specify another field as 5xxNumber to calculate the number of HTTP requests whose status code is greater than 499. In this case, you can specify an extended field to calculate the server error rate by using the following formula: 5XXNumber/TotalNumber × 100%.
Log Filter	Filters log data. This parameter is equivalent to the WHERE clause in SQL. For example, if you want to monitor logs in which the value of the level field is ERROR, set the parameter to `level>=ERROR`. The name of the log field that you use to filter data cannot contain Chinese characters.
Group-By	The dimension based on which data is aggregated. This parameter is equivalent to the GROUP BY clause in SQL. Log data is grouped by the specified dimension. If you do not specify a dimension, all data is aggregated based on the specified aggregate function. For more information, see GROUP BY clause.
Select SQL	Converts the statistical methods that you specify to an SQL statement. This parameter indicates how data is processed.
Application Groups	The name of the application group. The metric is added to the specified application group.

In the Configure Alert Rule step, configure an alert rule to monitor the keyword ERROR and click Next.

Parameter	Description
Alert Rule	The name of the alert rule.
Rule Description	The condition that triggers an alert. If the metric meets the specified condition, an alert is triggered.
Alert Level	The alert notification method. Valid value: Email + Webhook
Number of times the threshold is exceeded before alerts are triggered	The number of consecutive times that the threshold value is exceeded. If the number of times exceeds the limit that you specify, the alert contacts in the alert contact groups receive alert notifications. Valid values: 1, 3, 5, 10, 15, 30, 60, 90, 120, and 180.
Mute Period	The interval at which Cloud Monitor resends an alert notification before the alert is cleared. Valid values: 5 Minutes, 15 Minutes, 30 Minutes, 60 Minutes, 3 Hours, 6 Hours, 12 Hours, and 24 Hours. Cloud Monitor sends an alert notification when a metric value reaches the threshold. If the metric value reaches the threshold again within the mute period, Cloud Monitor does not resend an alert notification. If the alert is not cleared after the mute period ends, Cloud Monitor resends an alert notification.
Effective Period	The period during which the alert rule is effective. Cloud Monitor monitors the metric based on the alert rule only within the specified period.
Alert Callback	The callback URL that can be accessed over the Internet. Cloud Monitor sends a POST or GET request to push an alert notification to the callback URL that you specify. Only HTTP requests are supported. For information about how to configure alert callbacks, see Use the alert callback feature to send notifications about threshold-triggered alerts.

In the Creation Result step, click Close.

View the monitoring data of the keyword ERROR.
After you create the log monitoring metric, wait for 3 to 5 minutes. On the Log Monitoring page, find the metric whose monitoring chart you want to view and click the icon in the Actions column.
View the alert notifications that are sent for the keyword ERROR.
If an ERROR-level log appears in Simple Log Service, Cloud Monitor sends an alert notification.

Parameter	Description
Region	The region in which the Simple Log Service project resides.
Project	The name of the Simple Log Service project.
Logstore	The name of the Simple Log Service Logstore.