Configure cluster alerting - Elasticsearch - Alibaba Cloud Documentation Center

Alibaba Cloud Elasticsearch can monitor clusters and allows you to customize alert thresholds for your Elasticsearch cluster. If an alert is detected, the system notifies you of the alert. To ensure the stability of your Elasticsearch cluster, we recommend that you configure monitoring and alerting for the cluster. This way, the system can monitor items such as cluster status and disk usage in real time, and you can check alert notifications and take measures at the earliest opportunity. This topic describes how to configure alerting for an Elasticsearch cluster, including the initiative alert feature and custom alert rules.

Enable the initiative alert feature

The initiative alert feature is provided by CloudMonitor and is disabled by default. After this feature is enabled, alert rules are created to detect errors, such as abnormal cluster status, high disk usage (greater than 75%), and high JVM heap memory usage (greater than 85%). These rules apply to all Elasticsearch clusters within your Alibaba Cloud account.

Log on to the Alibaba Cloud Elasticsearch console.
In the left-side navigation pane, click Elasticsearch Clusters.
On the Elasticsearch Clusters page, click Initiative Alert.
In the Initiative Alert dialog box, click Enable Now.
Note
If the Disable Now button is displayed in the dialog box, the initiative alert feature is already enabled. In this case, you do not need to perform the following steps.
On the Initiative Alert page of the CloudMonitor console, turn on the Initiative Alert switch for Elasticsearch.
(Optional) Go to the Elasticsearch console and check whether the initiative alert feature is enabled.
1. On the Elasticsearch Clusters page, find your cluster and click its ID.
2. In the left-side navigation pane of the page that appears, choose Monitoring and Logs > Cluster Monitoring.
3. In the upper-right corner of the Basic Monitoring tab, view the value of Initiative Alert.
  If the value of Initiative Alert is Enabled, the initiative alert feature is enabled.

Configure custom alert rules in CloudMonitor

Log on to the CloudMonitor console.
In the left-side navigation pane, choose Alerts > Alert Rules.
On the Alert Rules page, click Create Alert Rule.

In the Create Alert Rule panel, configure an alert rule.

In this example, an alert rule is created to monitor the NodeDiskUtilization, ClusterStatus, and NodeHeapMemoryUtilization metrics. The following table describes some parameters for configuring the alert rule. For parameters that are not provided in the following table, default values are used. For more information about the involved parameters, see Create an alert rule.

Parameter	Description
Product	Select Elasticsearch.
Resource Range	Select Instances.
Associated Resources	Select the cluster that you want to monitor.
Rule Description	Click Add Rule and select a metric type. In the Configure Rule Description panel, specify a rule name in the Alert Rule field and configure the following parameters: Metric Type: Select Combined Metrics. Alert Level: Select Warning(Warn). Multi-metric Alert Condition: Choose clusterId > ClusterStatus, select Value, select >=, and then specify 2.0. Choose nodeName > NodeDiskUtilization, select Average, select >=, and then specify 75. Choose nodeName > NodeHeapMemoryUtilization, select Average, select >=, and then specify 85. Relationship Between Metrics: Select Generate alerts if one of the conditions is met. Alert Threshold Triggers: Select 3 Consecutive Cycles (1 Cycle = 1 Minutes). You can also select Single Metric for Metric Type to configure an alert rule only for disk usage. For more information, see Example of configuring an alert rule for disk usage.
Alert Contact Group	Select the alert contact group that you created. For information about how to create an alert contact group, see Create an alert contact or alert contact group.

Note

You can also click Advanced Settings and enter a URL that can be accessed over the Internet in the Alert Callback field. This way, CloudMonitor can push alert notifications to the URL through a POST request. Only HTTP requests are supported. For more information, see Use the alert callback feature to send notifications about threshold-triggered alerts.

You can configure alert rules for the metrics of your Elasticsearch cluster based on the instructions in the following table. For more information about the metrics, see Metrics and exception handling suggestions.

Metric	Description
ClusterStatus(value)	Required. This metric checks the status of your cluster. Green indicates that your cluster is in a normal state. Yellow or red indicates that your cluster is in an abnormal state. The value for the cluster state green is 0.00, that for the cluster state yellow is 1.00, and that for the cluster state red is 2.00. Reference these values and specify a suitable threshold for the ClusterStatus(value) metric.
NodeDiskUtilization(%)	Required. Set the threshold to a value that is less than 75%. The upper limit is 80%.
NodeHeapMemoryUtilization(%)	Required. Set the threshold to a value that is less than 85%. The upper limit is 90%.
NodeCPUUtilization(%)	Optional. Set the threshold to a value that is less than or equal to 95%.
NodeLoad_1m(value)	Optional. Set the threshold to a value that is 80% of the number of vCPUs for each node.
ClusterQueryQPS(Count/Second)	Optional. Set the threshold based on the actual test result.
ClusterIndexQPS(Count/Second)	Optional. Set the threshold based on the actual test result.
NodeStatsFullGcCollectionCount(Count)	Optional. If the value of this metric is not 0, an error occurs on your cluster.
NodeStatsExceptionLogCount(Count)	Optional. If the value of this metric is not 0, an error occurs on your cluster.
ClusterAutoSnapshotLatestStatus(value)	Optional. If the value of this metric is -1 or 0, your cluster is normal. If the value of this metric is 2, an error occurs on your cluster.

Click Confirm.
The system then starts to monitor your cluster. If the system detects an exception on the metrics that are configured in the alert rule, the specified alert contact can receive an alert notification based on the notification method configured in the alert rule.

Example of configuring an alert rule for disk usage

You can configure an alert rule for the disk usage of nodes in an Elasticsearch cluster in the CloudMonitor console. This way, you can obtain exceptions on the disk usage and troubleshoot related issues at the earliest opportunity.

For more information, see Configure custom alert rules in CloudMonitor. The following table provides the related parameter configurations in this example.

Parameter	Description
Alert Rule	Set the value to Disk Usage Alerting.
Metric Type	Select Single Metric.
Metric	Choose nodeName > NodeDiskUtilization.
Threshold and Alert Level	Critical: Select 3 Consecutive Cycles (1 Cycle= 1 Minutes), select Average, select >=, and then specify 80. Warn: Select 3 Consecutive Cycles (1 Cycle= 1 Minutes), select Average, select >=, and then specify 75. Info: Select 3 Consecutive Cycles (1 Cycle= 1 Minutes), select Average, select >=, and then specify 70.
Chart Preview	The chart in which the monitoring data of the selected metric is displayed.