Log Service allows you to configure alert rules based on the charts in a dashboard to monitor the service status in real time.
Specify the time range and check frequency for a query
A specified query statement is executed at the specified interval based on the specified time range to query and analyze data. Log Service checks whether the trigger condition is met based on the query and analysis result. If the trigger condition is met, an alert is triggered.
- After data is written to Log Service, you can query the data in Log Service after
a specific period of time. In most cases, the period lasts less than 3 seconds Due
to the latency, you may fail to find some data. For example, the alert rule is executed
at 12:03:30 and the time range is set to 1 Minute(Relative), this indicates that the
time range is [12:02:30, 12:03:30). Logs that are written to Log Service at 12:03:29
may not be queried at 12:03:30.
- If you have high requirements for alert accuracy, such as no duplicate alerts and no missing alerts, you can set the time range to an earlier period. For example, if the alert rule is executed at 12:03:30, you can set the time range to [12:02:20,12:03:20) that is 10 seconds earlier. This prevents missing alerts that are caused by indexing latency.
- If you have high requirements for real-time performance, we recommend that you set the time range to a period whose start time is earlier than the time when the alert rule is executed. This allows you to receive alert notifications when alerts are triggered regardless of duplicate alerts. For example, if the alert rule is executed at 12:03:30, you can set the time range to [12:02:20,12:03:30).
- If the logs that are generated at different points in time within the same minute
are written at the same time, an issue may occur. In this case, the indexes of later
logs may be written to the point in time of the earlier logs. This issue occurs due
to the method that Log Service uses to build indexes. For example, the alert rule
is executed at 12:03:30 and the time range is set to 1 Minute(Relative). The time
range is [12:02:30, 12:03:30). Multiple log entries are written at 12:02:50 and these
log entries are generated at different points in time within the same minute, for
example, 12:02:20 and 12:02:50. In this case, the indexes of these log entries may
be written to 12:02:20. However, this may result in an issue that no log entries can
be queried within the time range [12:02:30,12:03:30).
- If you have high requirements for alert accuracy, such as no duplicate alerts and no missing alerts, you can set the time range to a time frame. You can set the time range to 1 Minute(Time Frame), 5 Minutes(Time Frame), or 1 Hour(Time Frame). Then, you can set the Check Frequency parameter to the same time period, for example, 1 Minutes, 5 Minutes, or 1 Hours.
- If you have high requirements for real-time performance, we recommend that you set the time range to a specific period. The specific period must include at least the minute before the current time. This allows you to receive alert notifications when alerts are triggered regardless of duplicate alerts. For example, the alert rule is executed at 12:03:30. If you set the time range to 90 Minutes(Relative), this indicates that the time range is [12:02:00,12:03:30). Then, you can set the Check Frequency parameter to Fixed Interval 1 Minutes.
Configure an alert rule to trigger alerts based on query results
When you configure an alert rule to trigger alerts based on query results, you can specify the trigger condition to check whether a specific field exists. This way, the trigger condition of the alert rule is met if the returned result of a query is not empty.
For example, you can configure an alert rule to trigger an alert when a log contains
the client_ip: 192.0.2.1
key-value pair. You can configure the alert rule as shown in the following figure.
For more information, see Configure an alert rule. In this example, each log that is returned by a query contains the client_ip field and the field value is not empty. In this case, an alert is triggered if the
client_ip field is not empty.
Configure an alert rule to trigger alerts based on analysis results
In most cases, alert rules are configured based on analysis results. For example, you can configure an alert rule to trigger an alert when the number of logs that contain the ERROR keyword reaches a specified threshold. You can configure the alert rule as shown in the following figure. For more information, see Configure an alert rule.
Associate multiple charts
- The time range of each query does not conflict with each other.
- When you associate multiple charts, you must use the ${Serial number}.{Field name} syntax to reference a field in a query and analysis result. For example, the trigger
condition is set to
$0.pv > 100000 && $1.uv < 1000
. In the trigger condition, $0 indicates chart 0, and $1 indicates chart 1. For more information, see How can I view the serial number of a chart?. - You must separate multiple conditions with two ampersands (&&), for example,
pv%100 > 0 && uv > 0
.
- Configure an alert rule to trigger alerts based on the multiple query and analysis
results that are generated in different time ranges.
For example, you can configure an alert to trigger an alert when the value of the pv field within the last 15 minutes is greater than 100000 and the value of the uv field within the last hour is less than 1000. You can configure the alert rule as shown in the following figure. For more information, see Configure an alert rule.
- Configure an alert rule to trigger alerts based on a chart. The other associated chart
provides details for the alert that is triggered based on the alert rule.
For example, you can configure an alert rule to trigger an alert when the number of log entries whose log level is ERROR reaches a specified threshold. You also want to receive an alert notification that contains the logs whose log level is ERROR. You can configure the alert rule as shown in the following figure. For more information, see Configure an alert rule.
If you want to view the logs whose log level is ERROR in an alert notification, set the notification content to the following value:${results[1].RawResultsAsKv}
Suppress alerts
- Specify the threshold of continuous triggers.
An alert is triggered only when the specified trigger condition is met during continuous check periods.
For example, the Check Frequency parameter is set to Fixed Interval 1 Minutes and the Trigger Threshold parameter is set to 5. In this case, an alert notification is sent only when the specified trigger condition is met for five times during five continuous check periods. If the trigger condition is not met, no alert is triggered.
- Specify the interval at which alert notifications are sent.
If the check frequency is set to a short interval, you can set the Notification Interval parameter to specify the minimum interval at which alert notifications are sent. This prevents frequent alert notifications. For example, the Check Frequency parameter is set to Fixed Interval 1 Minutes and the Notification Interval parameter is set to 30Minutes. In this case, no alert notification is sent within 30 minutes after the previous alert notification is sent even if an alert is triggered.
Use template variables
- You can use the ${fieldName} syntax to reference the Project, AlertName, or Dashboard variable. These variables are not case-sensitive.
- Each time an alert is triggered, Log Service automatically generates the alert information
and stores it in the Results field. The value of the Results field is an array. Each element in the array corresponds
to a chart that is associated with the triggered alert. In most cases, the array contains
only one element. For more information, see Fields in alert rule evaluation logs. You can reference the fields in the Results array by using the following methods:
- Fields of the array type are referenced in the ${fieldName[{index}]} format. {index} indicates an array subscript that starts from 0. For example, ${results[0]} indicates that the first element in the Results array is referenced.
- Fields of the object type are referenced in the ${object.key} format. For example, ${results[0].StartTimeTs} indicates a timestamp of 1542453580.
Note The query results contain only the fields in RawResults and FireResult. These fields are case-sensitive. Other fields are not case-sensitive.{ "EndTime": "2006-01-02 15:04:05", "EndTimeTs": 1542507580, "FireResult": { "__time__": "1542453580", "field": "value1", "count": "100" }, "FireResultAsKv": "[field:value1,count:100]", "Truncated": false, "LogStore": "test-logstore", "Query": "* | SELECT field, count(1) group by field", "QueryUrl": "http://xxxx", "RawResultCount": 2, "RawResults": [ { "__time__": "1542453580", "field": "value1", "count": "100" }, { "__time__": "1542453580", "field": "value2", "count": "20" } ], "RawResultsAsKv": "[field:value1,count:100],[field:value2,count:20]", "StartTime": "2006-01-02 15:04:05", "StartTimeTs": 1542453580 }
Troubleshoot the issue that no alert is triggered
After you configure an alert rule, you can use alert logs to troubleshoot the issue that no alert is triggered. For more information, see View the evaluation results of alert rules. For more information about log fields, see Fields in alert rule evaluation logs.