Alibaba Cloud Elasticsearch can monitor clusters and allows you to customize alert thresholds for your Elasticsearch cluster. If an alert is detected, the system notifies you of the alert. To ensure the stability of your Elasticsearch cluster, we recommend that you configure monitoring and alerting for the cluster. This way, the system can monitor items such as cluster status and disk usage in real time, and you can check alert notifications and take measures at the earliest opportunity. This topic describes how to configure alerting for an Elasticsearch cluster, including the initiative alert feature and custom alert rules.
Enable the initiative alert feature
The initiative alert feature is provided by CloudMonitor. After this feature is enabled, alert rules are created to detect errors, such as abnormal cluster status, high disk usage (greater than 75%), and high JVM heap memory usage (greater than 85%). These rules apply to all Elasticsearch clusters within your Alibaba Cloud account.
- Log on to the Alibaba Cloud Elasticsearch console.
- In the left-side navigation pane, click Elasticsearch Clusters.
- On the Elasticsearch Clusters page, click Alarms.
- In the Alarms message, click Enable Now. The initiative alert feature is disabled by default.
- On the Initiative Alert List page of the CloudMonitor console, turn on the Initiative Alert switch for Elasticsearch.
- Go to the Elasticsearch console and check whether the initiative alert feature is enabled.
- On the Elasticsearch Clusters page, click the ID of your cluster.
- In the left-side navigation pane of the page that appears, choose .
- In the upper-right corner of the Basic Monitoring tab, view the value of Initiative Alerting. If the value of Initiative Alerting is Enabled, the initiative alert feature is enabled.
Configure custom alert rules in CloudMonitor
- Log on to the CloudMonitor console.
- In the left-side navigation pane, choose .
- Click Create Alert Rule.
- In the Create Alert Rule panel, configure an alert rule. In this example, an alert rule is created to monitor the NodeDiskUtilization, ClusterStatus, and NodeHeapMemoryUtilization metrics. The following table describes some parameters for configuring the alert rule. For parameters that are not provided in the following table, default values are used. For more information about the involved parameters, see Create an alert rule.
Parameter Description Product Select Elasticsearch. Resource Range Select Instances. Associated Resources Select the cluster that you want to monitor. Rule Description Click Add Rule. In the Add Rule Description panel, specify a rule name in the Alert Rule field and configure the following parameters:
- Metric Type: Select Multiple Metrics.
- Alert Level: Select Warning(Warn).
- Metric Type: Select Standard creation.
- Multi-metric Alert Condition:
- Choose , select Value, select >=, and then specify 2.0.
- Choose , select Average, select >=, and then specify 75.
- Choose , select Average, select >=, and then specify 85.
- Relationship Between Metrics: Select Generate alerts if one of the conditions is met.
- Select the number of times the threshold is reached before an alert is triggered: Select Continuous 3 Count Period.
You can also select Single Metric for Metric Type to configure an alert rule only for disk usage. For more information, see Example of configuring an alert rule for disk usage.
Alert Contact Group Select the alert contact group that you created. For more information about how to create an alert contact group, see Create an alert contact or alert contact group.Note You can also click Advanced Settings and enter a URL that can be accessed over the Internet in the Alert Callback field. This way, CloudMonitor can push alert notifications to the URL through a POST request. Only HTTP requests are supported. For more information, see Use the alert callback feature to send notifications about threshold-triggered alerts.You can configure alert rules for the metrics of your Elasticsearch cluster based on the instructions in the following table. For more information about the metrics, see Metrics and exception handling suggestions. Metric Description ClusterStatus(value) Required. This metric checks the status of your cluster. Green indicates that your cluster is in a normal state. Yellow or red indicates that your cluster is in an abnormal state.
The value for the cluster state green is 0.0, that for the cluster state yellow is 1.0, and that for the cluster state red is 2.0. Reference these values and specify a suitable threshold for the ClusterStatus(value) metric.
NodeDiskUtilization(%) Required. Set the threshold to a value that is less than 75%. The upper limit is 80%. NodeHeapMemoryUtilization(%) Required. Set the threshold to a value that is less than 85%. The upper limit is 90%. NodeCPUUtilization(%) Optional. Set the threshold to a value that is less than or equal to 95%. NodeLoad_1m(value) Optional. Set the threshold to a value that is 80% of the number of vCPUs for each node. ClusterQueryQPS(Count/Second) Optional. Set the threshold based on the actual test result. ClusterIndexQPS(Count/Second) Optional. Set the threshold based on the actual test result. NodeStatsFullGcCollectionCount(Count) Optional. If the value of this metric is not 0, an error occurs on your cluster. NodeStatsExceptionLogCount(Count) Optional. If the value of this metric is not 0, an error occurs on your cluster. ClusterAutoSnapshotLatestStatus(value) Optional. If the value of this metric is -1 or 0, your cluster is normal. If the value of this metric is 2, an error occurs on your cluster.
- Click OK. The system then starts to monitor your cluster. If the system detects an exception on the metrics that are configured in the alert rule, the specified alert contact can receive an alert notification based on the notification method configured in the alert rule.
Example of configuring an alert rule for disk usage
You can configure an alert rule for the disk usage of nodes in an Elasticsearch cluster in the CloudMonitor console. This way, you can obtain exceptions on the disk usage and troubleshoot related issues at the earliest opportunity.
|Alert Rule||Set the value to Disk Usage Alerting.|
|Metric Type||Select Single Metric.|
|Threshold and Alert Level|
|Chart Preview||The chart in which the monitoring data of the selected metric is displayed.|