The faster-bulk plug-in is developed by the Alibaba Cloud Elasticsearch team. This plug-in aggregates bulk write requests in batches based on the specified maximum request size and aggregation interval. This prevents small bulk requests from blocking the write queue, improves write throughput, and reduces write request rejections. This topic introduces the use scenarios of the plug-in and describes how to use the plug-in.

Scenarios

The faster-bulk plug-in is ideal for scenarios in which write throughput is high and numerous index shards exist. The plug-in can improve the write throughput by more than 20% in these scenarios.
Notice The faster-bulk plug-in aggregates bulk write requests in batches before it writes data to shards. Therefore, we recommend that you do not use this plug-in in latency-sensitive scenarios.
  • Test environment
    • Node configuration: 3 data nodes and 2 independent client nodes. Each node offers 16 vCPUs and 64 GiB of memory.
    • Dataset: nyc_taixs provided by Rally in open source Elasticsearch. The size of a single document is 650 bytes.
    • Parameter setting: The apack.fasterbulk.combine.interval parameter is set to 200ms.
    • Translog status: Tests are performed in each of the synchronous and asynchronous states. If the index.translog.durability parameter is set to request, translogs are in the synchronous state. If the index.translog.durability parameter is set to async, translogs are in the asynchronous state.
  • Test results
    Translog status Write performance of an open source Elasticsearch cluster without faster-bulk (document/s) Write performance of an Alibaba Cloud Elasticsearch cluster with faster-bulk (document/s) Performance improvement (percentage)
    Synchronous 182,314 226,242 23%
    Asynchronous 218,732 241,060 10%
  • Test conclusion

    The write performance is improved in both the synchronous state (default state) and asynchronous state after the faster-bulk plug-in is used. In the synchronous state, the write performance is improved by 23%.

Prerequisites

  • An Alibaba Cloud Elasticsearch V6.7.0 or V7.10.0 cluster is created.
    For more information, see Create an Alibaba Cloud Elasticsearch cluster.
    Note Only Alibaba Cloud Elasticsearch V6.7.0 and V7.10.0 clusters of the Standard or Advanced Edition support the faster-bulk plug-in.
  • The faster-bulk plug-in is installed.

    For more information, see Install and remove a built-in plug-in. After the plug-in is installed, the bulk request aggregation feature is disabled by default. Before you use this plug-in, you must enable this feature.

Enable the bulk request aggregation feature

  1. Log on to the Kibana console of the Elasticsearch cluster.
    For more information, see Log on to the Kibana console.
  2. In the left-side navigation pane, click Dev Tools.
  3. On the Console tab of the page that appears, run the following command to enable the bulk request aggregation feature:
    PUT _cluster/settings
    {
       "transient" : {
          "apack.fasterbulk.combine.enabled":"true"
       }
    }
    Note You can also use the cURL tool or a third-party visualizer to run the preceding command.

Configure the maximum request size and aggregation interval

Run the following command to configure the maximum request size and aggregation interval. If the total size of bulk requests or the aggregation interval on a single data node reaches the configured threshold, the system writes data to shards.
PUT _cluster/settings
{
   "transient" : {
      "apack.fasterbulk.combine.flush_threshold_size":"1mb",
      "apack.fasterbulk.combine.interval":"50"
   }
}
  • apack.fasterbulk.combine.flush_threshold_size: the maximum size of bulk requests. Default value: 1mb.
  • apack.fasterbulk.combine.interval: the maximum interval at which bulk requests are aggregated. Default value: 50. Unit: ms.
Note To process highly concurrent bulk requests and prevent the requests from blocking the write queue, you can increase the maximum request size or aggregation interval based on your business requirements.

Enable directed routing

If no routing setting and primary key are configured for the documents that need to be written in a bulk write request, you can enable directed routing for the Elasticsearch cluster or a specified index to improve data write speed. The primary key is specified by the _id parameter.
Note If directed routing is enabled for the Elasticsearch cluster or the specified index, and the routing setting and primary key are configured for the documents that need to be written, directed routing does not take effect and data write operations are not affected.
  • Enable directed routing for the Elasticsearch cluster
    PUT _cluster/settings
    {
      "persistent" : {
        "index.direct_routing.global.enable" : "true"
      }
    }
  • Enable directed routing for a specified index
    PUT index/settings
    {
      "index.direct_routing.enable" : "true"
    }

Disable the bulk request aggregation feature

Run the following command to disable the bulk request aggregation feature:
PUT _cluster/settings
{
   "transient" : {
      "apack.fasterbulk.combine.enabled":"false"
   }
}