How to use the faster-bulk plug-in - Elasticsearch - Alibaba Cloud Documentation Center

The faster-bulk plug-in is a built-in tool that optimizes write operations by aggregating bulk requests based on a specified size and time interval. This prevents small-batch writes from blocking the write queue, which makes it ideal for high-throughput scenarios with numerous index shards. The plug-in is disabled by default and must be manually enabled. However, because this aggregation adds latency, the plug-in is not recommended for low-latency write scenarios.

Usage notes

You must install the plug-in before using it. For more information, see Install or uninstall a built-in plug-in.

Write performance

The following reference data shows the performance of the faster-bulk plug-in in a specific test environment.

Test environment: Three 16-core, 64 GB data nodes and two 16-core, 64 GB independent client nodes, using the official esrally nyc-taxis dataset (650 bytes per document), with apack.fasterbulk.combine.interval set to 200 ms.

Translog status	Without plug-in	With plug-in	Performance improvement
Synchronous (default)	182,314/s	226,242/s	23%
Asynchronous	218,732/s	241,060/s	10%

Enable bulk aggregation

PUT _cluster/settings
{
   "transient" : {
      "apack.fasterbulk.combine.enabled":"true"
   }
}

Configure aggregation parameters

Configure the aggregation size and time interval for bulk requests. The system triggers a data write when either the cumulative size of bulk requests or the aggregation time interval on a single data node reaches the configured threshold.

PUT _cluster/settings
{
   "transient" : {
      "apack.fasterbulk.combine.flush_threshold_size":"1mb",
      "apack.fasterbulk.combine.interval":"50"
   }
}

Parameter	Description	Default
apack.fasterbulk.combine.flush_threshold_size	The maximum cumulative size of aggregated bulk requests on a single data node.	1mb
apack.fasterbulk.combine.interval	The maximum time interval for aggregating bulk requests. Unit: ms.	50

For high-concurrency scenarios with large data volumes, you can increase the maximum aggregation size or time interval within the capacity of your cluster. This helps prevent bulk requests from blocking the write queue.

Directed routing

When you batch-write documents without specifying a routing value or a primary key (_id), you can enable directed routing for the cluster or a specific index to improve write speed. This feature does not affect write requests that already specify a routing value or a primary key (_id).

To enable directed routing for a cluster:

PUT _cluster/settings
{
  "persistent" : {
    "index.direct_routing.global.enable" : "true"
  }
}

To enable directed routing for a specific index:

PUT <index_name>/_settings
{
  "index.direct_routing.enable" : "true"
}

Disable bulk aggregation

PUT _cluster/settings
{
   "transient" : {
      "apack.fasterbulk.combine.enabled":"false"
   }
}