Use the analytic-search plug-in - Elasticsearch - Alibaba Cloud Documentation Center

analytic-search is a log query plug-in that is developed by the Alibaba Cloud Elasticsearch team. The analytic-search plug-in supports the time field index.sort that is used to accelerate queries based on specific search conditions on the Discover page in the Kibana console of an Elasticsearch cluster. The analytic-search plug-in also provides the concurrent query feature. This plug-in helps significantly reduce the period of time that is required to query data. This topic describes how to use the analytic-search plug-in.

Background information

The following descriptions provide the features supported by the analytic-search plug-in, the scenarios in which the features can be used, and the information about the performance tests performed on the features.

Acceleration feature for queries performed on the Discover page of the Kibana console

Use scenarios: This feature is suitable for log query scenarios. For example, you can use the analytic-search plug-in to accelerate unconditional queries and single-condition queries on the Discover page of the Kibana console.
Benefits: Index merging policies and date histogram aggregation policies are optimized. This significantly improves the performance of unconditional queries and single-condition queries in log query scenarios. In scenarios in which more than 1 TB of data is added each day, the period of time that is required to complete a query is reduced from minutes to 5 seconds or less.

Performance test:

Test environment
- Node: 10 nodes, each of which offers 16 vCPUs and 64 GiB of memory.
- Dataset: business log data of 60 billion documents each day. The data is stored in 12 indexes, each of which is configured with 60 shards.

The following table provides the percentages by which the period of time required to query data from different storage media is reduced after the acceleration feature is enabled for queries performed on the Discover page of the Kibana console.


Query type	Standard SSD	Ultra disk	OpenStore
Unconditional query	Reduced by 96%	Reduced by 95%	Reduced by 94%
Single-condition query	Reduced by 88%	Reduced by 77%	Reduced by 85%
Multi-condition query	Reduced by 8%	Reduced by 11%	Reduced by 14%

Concurrent query feature

Use scenarios: This feature is suitable for scenarios in which the queries per second (QPS) is low, a long period of time is required to return results for queries, and computing resources of nodes are sufficient.
Benefits: Concurrent threads can be used to return results for queries, and the average period of time required to return results for queries is reduced by 50%. Resource utilization is improved.
Performance test:
- Test environment
  - Node: three warm-hot shared computing nodes provided by OpenStore, each of which offers 16 vCPUs and 64 GiB of memory.
    
    Note Warm-hot shared computing nodes provided by OpenStore are available for purchase in the Elasticsearch console only at the Alibaba Cloud China site (aliyun.com).
  - Dataset: 1.6 TB of business log data. The data is stored as 6 billion documents in an index for which 60 shards are configured.
  - Query: 3 TermQuery(and) + TimeRange + Sort + Datehistogram. 10 million documents can be queried from a single shard at a hit rate of 10%.
- Test results:
  - Period of time required to query data from a single shard is reduced by 65%.
  - Period of time required to query data from multiple shards is reduced by 53%.

Prerequisites

An Alibaba Cloud Elasticsearch V7.10.0 cluster is created. The kernel version of the cluster is V1.7.0 or later. In this example, an Alibaba Cloud Elasticsearch V7.10.0 cluster is used. For information about how to create an Alibaba Cloud Elasticsearch cluster, see Create an Alibaba Cloud Elasticsearch cluster.

Note By default, the analytic-search plug-in is installed and cannot be removed. For information about the plug-in, see Overview of plug-ins.

Enable the acceleration feature for queries performed on the Discover page of the Kibana console

To enable the acceleration feature for queries performed on the Discover page of the Kibana console, add the following configurations in the settings and mappings fields when you create an index.

Note The following sample code is provided for reference only. In actual business scenarios, you must specify the fields that you want to use to sort your index and the sorting order for each field based on your business requirements.

{
  "settings": {
    "index.points.same_sort_order_as_index_sort": true,
    "index.sort.field": [
      "@timestamp"
    ],
    "index.sort.order": [
      "desc"
    ]
  },
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      }
    }
  }
}

Use the concurrent query feature

Log on to the Kibana console of your Elasticsearch cluster and go to the homepage of the Kibana console as prompted.
For more information about how to log on to the Kibana console, see Log on to the Kibana console.

Note In this example, an Elasticsearch V7.10.0 cluster is used. Operations on clusters of other versions may differ. The actual operations in the console prevail.
In the upper-right corner of the page that appears, click Dev tools.

On the Console tab, run the following command to enable the concurrent query feature:

PUT _cluster/settings
{
  "persistent": {
    "apack.analytic_search.doc_concurrency.enabled": "true"
  }
}

After the preceding command is successfully run, requests that are received by the Elasticsearch cluster are processed based on the default settings for the concurrent query feature. You can modify the configurations related to the concurrent query feature to control concurrent query operations. The following tables describe the parameters that you can configure.

Cluster-level configurations


Parameter	Default value	Description
apack.analytic_search.doc_concurrency.enabled	false	Specifies whether to enable the concurrent query feature. Valid values: true: Enables the concurrent query feature. false: Disables the concurrent query feature.
apack.analytic_search.doc_concurrency.concurrent.policy	80%:4;90%:2	The query concurrency policy. You must configure this parameter in the `Threshold 1:Concurrency 1;Threshold 2:Concurrency 2;...` format. Threshold n represents the threshold for node CPU utilization. `Concurrency n` represents the number of concurrent threads that will be used for queries when the CPU utilization of a node is less than the value of `Threshold n`. For example, the value `80%:4;90%:2` indicates that four concurrent threads are used for queries when the CPU utilization of a node is less than 80% and two concurrent threads are used for queries when the CPU utilization of a node is less than 90%. If the CPU utilization of a node is greater than or equal to 90%, only one thread is used for queries.
apack.analytic_search.doc_concurrency.min_support_doc	10000	The minimum number of documents in an index on which the concurrent query feature can take effect. If the number of documents in an index is less than the lower limit, the concurrent query feature does not take effect on the index.
apack.analytic_search.doc_concurrency.min_support_processors	4	The minimum number of vCPUs that are configured for a node on which the concurrent query feature can take effect. If the number of vCPUs that are configured for a node is less than the lower limit, the concurrent query feature does not take effect on the node.
apack.analytic_search.doc_concurrency.max_support_heap_usage	80%	The maximum Java Virtual Machine (JVM) heap memory usage of a node on which the concurrent query feature can take effect. If the JVM heap memory usage of a node is greater than the upper limit, the concurrent query feature does not take effect on the node.
apack.analytic_search.doc_concurrency.max_support_cpu_usage	90%	The maximum CPU utilization of a node on which the concurrent query feature can take effect. If the CPU utilization of a node is greater than the upper limit, the concurrent query feature does not take effect on the node.

Index-level configurations


Parameter	Default value	Description
index.apack.analytic_search.doc_concurrency.enabled	true	Specifies whether to enable the concurrent query feature. Valid values: true: Enables the concurrent query feature. false: Disables the concurrent query feature.
index.apack.analytic_search.doc_concurrency.allow_no_agg	false	Specifies whether to enable the concurrent query feature for queries other than aggregate queries. true: Enables the concurrent query feature for queries other than aggregate queries. false: Disables the concurrent query feature for queries other than aggregate queries.