When you use Elasticsearch for queries, you may encounter the following issue: You send a query request to an Elasticsearch cluster, but the query is defined as a slow query. As a result, all the resources on the nodes in the cluster are used for the query, which affects your online business. To address this issue, the Alibaba Cloud Elasticsearch team develops the slow query isolation feature. This feature can be used to track the overheads for a single query request and implement logical separation. If the overheads for the request exceed a specific threshold, the system considers the query as an anomalous query and suspends it. This prevents exceptions caused by a single anomalous query in the cluster and improves cluster stability. This topic describes how to use the slow query isolation feature.
Background information
To use the slow query isolation feature, you must configure a slow query isolation pool that has a fixed memory size. If the size of the memory consumed by a single query request exceeds a specific threshold, the query request is directed to the pool for management. If the total size of the memory consumed by the query requests in the pool exceeds a specific threshold, the system suspends the query requests that consume the most memory based on a priority policy. The priority policy can be adopted by users based on their business requirements.
Precautions
The slow query isolation feature is available for Alibaba Cloud Elasticsearch V6.7.0 clusters whose kernel versions are V1.3.0 and Alibaba Cloud Elasticsearch V7.10.0 clusters.
NoteIf the version of your Elasticsearch cluster is V6.7.0, you must make sure that the kernel version of the cluster is V1.3.0 before you use the slow query isolation feature. If the kernel version of the cluster is not V1.3.0, upgrade the kernel. You can upgrade the kernels only of Standard Edition clusters whose kernel versions are V0.3.0, V1.0.2, or V1.2.0. If the version of your Elasticsearch cluster is V7.10.0, you can directly use this feature.
The slow query isolation feature is disabled by default. You must enable the feature before you use it.
All commands provided in this topic can be run in the Kibana console. For more information about how to log on to the Kibana console, see Log on to the Kibana console.
Procedure
Enable the slow query isolation feature.
PUT _cluster/settings { "persistent": { "search.isolator.enabled": true } }
NoteIf you want to disable the feature, set search.isolator.enabled to null or false.
Configure thresholds to intercept query requests. If the size or latency of a query request exceeds the related threshold, the query request is directed to the slow query isolation pool.
PUT _cluster/settings { "persistent": { "search.isolator.trigger.task.mem_cost": "500mb", "search.isolator.trigger.task.latency": "10s" } }
Parameter
Default value
Description
search.isolator.trigger.task.mem_cost
100mb
The threshold for the size of the memory that can be consumed by a single query request. If the size of the memory that is consumed by a query request exceeds the threshold, the system directs the query request to the slow query isolation pool.
search.isolator.trigger.task.latency
10s
The threshold for the latency of a query request. If the time spent on a query request exceeds the threshold, the system directs the query request to the slow query isolation pool.
Configure the thresholds for the total size of the memory that can be consumed by the query requests in the slow query isolation pool and the maximum number of query requests that can be processed at the same time in the slow query isolation pool. If the total size of the memory consumed by the query requests in the slow query isolation pool or the number of query requests that are processed at the same time in the slow query isolation pool exceeds the related threshold, the system suspends the query requests that consume the most memory in the pool.
PUT _cluster/settings { "persistent": { "search.isolator.total.mem.limit": "60%", "search.isolator.total.heap.usage.limit": "75%", "search.isolator.total.tasks.limit": 1000 } }
Parameter
Default value
Description
search.isolator.total.mem.limit
60%
The threshold for the proportion of the heap memory that is consumed by the query requests in the slow query isolation pool to the memory of the whole cluster. The default value is 60%. This value indicates that the query requests that consume the most memory in the slow query isolation pool are suspended based on a priority policy if the proportion reaches 60%. The priority policy can be adopted based on your business requirements.
search.isolator.total.heap.usage.limit
75%
The threshold for the heap memory usage of the cluster. The default value is 75%. This value indicates that the query requests that consume the most memory in the slow query isolation pool are suspended based on a priority policy if the usage reaches 75%. The priority policy can be adopted based on your business requirements.
search.isolator.total.tasks.limit
1000
The maximum number of query requests that can be processed at the same time in the slow query isolation pool. The default value is 1000. This value indicates that the query requests that consume the most memory in the slow query isolation pool are suspended based on a priority policy if the number of query requests that are processed at the same time exceeds 1,000. The priority policy can be adopted based on your business requirements.
View the query requests in the slow query isolation pool.
GET _tasks/isolator?detailed=true
Cancel a query request.
POST _tasks/<taskId>/_cancel
Replace <taskId> with the ID of the query request. You can obtain the ID from the query request list obtained in the previous step.