When you use Elasticsearch for queries, you may encounter the following issue: You send a query request to an Elasticsearch cluster, but the query is defined as a slow query. As a result, all the resources on the nodes in the cluster are used for the query, which affects your online business. To address this issue, the Alibaba Cloud Elasticsearch team develops the slow query isolation feature. This feature can be used to track the overheads for a query request and implement logical separation. If the overheads for the request exceed a specific threshold, the system considers the query as an anomalous query and suspends it. This avoids exceptions caused by a single anomalous query in the cluster and improves cluster stability. This topic describes how to use the slow query isolation feature.

Background information

To use the slow query isolation feature, you must configure a resource isolation pool that has a fixed memory size. If the size of memory requested by a single query exceeds a specific threshold, the query is directed to the isolation pool for management. If the total size of memory used by queries in the pool exceeds a specific threshold, the system suspends the queries that consume the most memory based on a priority policy. The priority policy can be adopted by users based on their business requirements.

Precautions

  • The slow query isolation feature is available for Alibaba Cloud Elasticsearch V6.7.0 clusters that have a kernel version of 1.3.0. Before you use this feature, make sure that the kernel version of your Elasticsearch cluster is 1.3.0. Otherwise, upgrade the kernel. You can upgrade only the kernels of Standard Edition clusters whose kernel versions are V0.3.0, V1.0.2, or V1.2.0.
    Notice Only kernels of the clusters whose endpoint or IP addresses are included in a whitelist can be upgraded. If you upgrade a cluster whose endpoint or IP addresses are not included in the whitelist, submit a ticket to the technical support engineers of Alibaba Cloud Elasticsearch.
  • The slow query isolation feature is disabled by default. You must enable the feature before you use it.
  • All commands in this topic can be run in the Kibana console. For more information about how to log on to the Kibana console, see Log on to the Kibana console.

Procedure

  1. Enable the slow query isolation feature.
    PUT _cluster/settings
    {
      "persistent": {   
         "search.isolator.enabled": true
       }
    }
    Note If you want to disable the feature, set search.isolator.enabled to null or false.
  2. Configure thresholds for query interception. If the size or latency of a query request exceeds the related threshold, the query is directed to the slow query isolation pool.
    PUT _cluster/settings
    {
       "persistent": {
          "search.isolator.trigger.task.mem_cost": "500mb",  
          "search.isolator.trigger.task.latency": "10s" 
       }
    }
    Parameter Default value Description
    search.isolator.trigger.task.mem_cost 500mb The threshold for the size of memory that can be used for a single query request. If the size of memory that is used for a query exceeds the threshold, the system directs the query to the slow query isolation pool.
    search.isolator.trigger.task.latency 10s The threshold for the latency of a query request. If the time spent on a query exceeds the threshold, the system directs the query to the slow query isolation pool.
  3. Configure thresholds for the total size of memory that can be used for slow queries in and the number of query requests that can be processed by the isolation pool. If the size of memory used by slow queries or the number of query requests processed by the isolation pool exceeds the related threshold, the system suspends the queries that consume the most memory in the isolation pool.
    PUT _cluster/settings
    {
       "persistent": {
          "search.isolator.total.mem.limit": "60%",
          "search.isolator.total.heap.usage.limit": "75%",
          "search.isolator.total.tasks.limit": 1000 
       }
    }
    Parameter Default value Description
    search.isolator.total.mem.limit 60% The threshold for the proportion of the heap memory that is consumed by slow queries in the isolation pool to the memory of the whole cluster. The default value is 60%. This value indicates that slow queries are suspended if the proportion reaches 60%.
    search.isolator.total.heap.usage.limit 75% The threshold for the heap memory usage of the cluster. The default value is 75%. This value indicates that slow queries are suspended if the usage reaches 75%.
    search.isolator.total.tasks.limit 1000 The maximum number of query requests that can be processed in the slow query isolation pool. The default value is 1000. This value indicates that slow queries are suspended if the number of slow queries that are processed at the same time exceeds 1,000.
  4. View query requests in the slow query isolation pool.
    GET _tasks/isolator?detailed=true
  5. Cancel a query request.
    POST _tasks/<taskId>/_cancel