Configure request throttling with aliyun-qos - Elasticsearch

When upstream services send bursts of read or write traffic that you cannot control, the aliyun-qos plug-in lets you enforce cluster-level throttling to protect Elasticsearch stability. The plug-in reduces the priority of specific indexes and limits queries per second (QPS), transactions per second (TPS), throughput, and concurrent thread counts — giving you granular control over which workloads get through.

Prerequisites

Before you begin, ensure that:

The aliyun-qos plug-in is installed on your Elasticsearch cluster. Check the Plug-ins page in the Elasticsearch console. If the plug-in is not installed, see Install and remove a built-in plug-in. Note that the plug-in cannot be uninstalled after installation.
The plug-in is upgraded to the latest version:
- Elasticsearch V7.10 clusters: 7.10.0_ali1.6.0.2
- All other versions: <ES-version>-rc4

To check the current version, run the following command in the Kibana console:

GET /_cat/plugins?v

Check the version of the aliyun-qos plug-in

Upgrade paths:

Elasticsearch V7.10 clusters: Update the kernel version to V1.6.0. For more information, see Upgrade the version of a cluster.
All other versions: Submit a ticket to contact Elasticsearch technical engineers. After the upgrade, restart your cluster for the change to take effect.

If the plug-in version is earlier than rc4, the system reports an unsupported_operation_exception error. The plug-in can be upgraded only on clusters running Elasticsearch V6.7.0 or later. If your cluster runs an earlier version, upgrade the cluster first.

Usage notes

aliyun-qos is a built-in plug-in and cannot be uninstalled.
Throttling is disabled by default. Enable it before use.
The plug-in throttles at the cluster level but does not precisely measure per-node traffic. Measured values may differ from actual traffic.
Before upgrading to the latest version, note the following:
- Throttling may be ineffective for a short period during the upgrade. It recovers after the dedicated master node is upgraded.
- Some limiters may fail to upgrade. If this happens, run the following command. Repeat until hasError returns false.
```
POST /_qos/limiter/ops/upgrade
```

Evaluate thresholds

Before configuring limiters, estimate appropriate throttling thresholds using the following rules.

Query (read) requests:

Throttling threshold = end-to-end QPS from client to Elasticsearch (the number of query requests sent to client nodes per second).

Write requests:

Use the same calculation as for query requests, then adjust for the number of replica shards.

For example: a cluster with two data nodes stores one index that has one primary shard and one replica shard, and 10 MB is written per operation. Because of the replica shard, each data node receives 10 MB per write operation. Also account for write traffic generated by X-Pack Monitor, Audit, and Watcher when setting the threshold.

Enable throttling

Throttling is disabled by default. Enable it by running the appropriate command in the Kibana console based on your cluster version.

All commands in this topic can be run in the Kibana console.

Elasticsearch V7.10 (latest plug-in version):

PUT _cluster/settings
{
  "persistent": {
    "apack.qos.limiter.enabled": true
  }
}

All other Elasticsearch versions:

PUT _cluster/settings
{
  "persistent" : {
    "apack.qos.ratelimit.enabled":"true"
  }
}

Disable throttling

Set the throttling parameter to false or null to disable it.

Elasticsearch V7.10 — set to `false`:

PUT _cluster/settings
{
  "persistent": {
    "apack.qos.limiter.enabled": false
  }
}

Elasticsearch V7.10 — set to `null`:

PUT _cluster/settings
{
  "persistent": {
    "apack.qos.limiter.enabled": null
  }
}

All other versions — set to `false`:

PUT _cluster/settings
{
  "persistent" : {
    "apack.qos.ratelimit.enabled":"false"
  }
}

All other versions — set to `null`:

PUT _cluster/settings
{
  "persistent" : {
    "apack.qos.ratelimit.enabled":null
  }
}

Configure a limiter (Elasticsearch V7.10 only)

This section applies only to the aliyun-qos plug-in installed on Elasticsearch V7.10 clusters.

A limiter defines what to throttle and how. Each limiter has two parts:

limiters: the throttling type and threshold (for example, search.qps:1000)
tags: the resources to apply throttling to (for example, a specific index or node)

Limiters come in two types: common limiters (match a specific resource) and default limiters (use ** in the tags field to match all resources of a type and create a separate limiter for each). When a threshold is reached, the system rejects subsequent requests.

Preview mode before enforcement

Before enforcing throttling, use watchMode to validate your configuration safely. When watchMode is set to true, the plug-in records denied request counts in metrics without actually rejecting requests. Review the metrics to confirm thresholds are appropriate, then switch to enforcement mode by setting watchMode to false.

Limiter syntax

PUT /_qos/limiter/<limiterName>
{
  "limiters": {
     ${action}.${limiter_type}:${threshold}
  },
  "tags": {
    ${tagName}:${tagValue}
  },
  "priority":0,
  "params":{
      "watchMode":true
  }
}

Limiter parameters

Parameter	Description	Valid values
`action`	The operation type to throttle.	`write` (index or create a document), `update`, `delete`, `search` (query), `search_shards` (query the number of primary and replica shards for an index)
`limiter_type`	The throttling metric. See the limiter type reference below.	`rate`, `qps`, `tps`, `throughput`, `thread_count`, `concurrent_count`, `max_per_request`, `max_size_per_request`
`threshold`	The throttling threshold.	An integer >= -1. For `throughput`, a string with a unit (for example, `100MB`).
`tagName`	The resource type to match.	`node`, `is_master`, `index`, `shard`, `index_in_url`
`tagValue`	The value to match against the tag.	A string or array. Supports exact match (`"abc"`), fuzzy match (`"ab"`), and all values (`""`). When set to `"*"`, a separate limiter is created for each matching resource.
`priority`	The limiter priority. A higher value takes precedence. When multiple default limiters match, the one with the highest priority applies.	Integer. Default: `0`.
`params.watchMode`	When `true`, records denied request counts in metrics without throttling. Use this to validate thresholds before enforcement.	`true` or `false`. Default: `false`.

Limiter type reference

`limiter_type`	Throttling metric	Applicable `action` values	Threshold unit
`rate`	Rate	All	Integer
`qps`	Queries per second	All	Integer
`tps`	Transactions per second	All	Integer
`throughput`	Data volume per second	`write`, `update`, `delete` only	GB, MB, or KB (max 2 GB)
`thread_count`	Concurrent threads (one thread per request by default)	All	Integer
`concurrent_count`	Concurrent threads (calculated per operation)	All	Integer
`max_per_request`	Max times an operation is allowed in a single request	All	Integer
`max_size_per_request`	Max operations in a single request	`write`, `update`, `delete` only	Integer

Tag name reference

`tagName`	Description
`node`	The name of the current node.
`is_master`	Whether the current node is a dedicated master node. `tagValue` is `true` or `false`.
`index`	The name of an index. Accepts an array for multiple indexes. Resolves index aliases to actual names. Applies only to IndicesRequest subrequests.
`shard`	The shard name in `index[id]` format (for example, `test[0]`). Applies only to ReplicationRequest subrequests.
`index_in_url`	The index name string in the URL. Resolves aliases. Applies only to IndicesRequest subrequests.

Limiter configuration examples

Throttle QPS for read requests

Limit the number of search requests a client node receives per second. Both exact index names and wildcard patterns are supported for tag values. For Elasticsearch V7.10, use the index tag key; for all other versions, use the index_patterns tag key.

Elasticsearch V7.10 — throttle a specific index:

PUT /_qos/limiter/<limiterName>
{
  "limiters": {
    "search.qps": "1000"
  },
  "tags": {
    "index": "twitter"
  }
}

Elasticsearch V7.10 — throttle indexes with a name prefix:

PUT /_qos/limiter/<limiterName>
{
  "limiters": {
    "search.qps": "1000"
  },
  "tags": {
    "index": "nginx-log-*"
  }
}

Elasticsearch V7.10 — throttle each index individually (per-index limit):

PUT /_qos/limiter/<limiterName>
{
  "limiters": {
    "search.qps": "1000"
  },
  "tags": {
    "index": "**"
  }
}

index:** applies the threshold to each index separately. For example, if the cluster has indexes A, B, and C, each is limited to 1,000 QPS independently.

Elasticsearch V7.10 — throttle total QPS across all indexes:

PUT /_qos/limiter/<limiterName>
{
  "limiters": {
    "search.qps": "1000"
  },
  "tags": {
    "index": "*"
  }
}

index:* applies the threshold to the combined QPS of all indexes. You can also omit the tags field to achieve the same effect.

All other versions — throttle a specific index:

PUT _qos/_ratelimit/<limiterName>
{
  "search.index_patterns" : "twitter",
  "search.max_queries_per_sec" : 1000
}

All other versions — throttle indexes with a name prefix:

PUT _qos/_ratelimit/<limiterName>
{
  "search.index_patterns" : "nginx-log-*",
  "search.max_queries_per_sec" : 1000
}

All other versions — throttle total QPS across all indexes:

PUT _qos/_ratelimit/<limiterName>
{
  "search.index_patterns" : "*",
  "search.max_queries_per_sec" : 1000
}

Multiple rules can be active at the same time. A request is throttled if it matches any rule.

QPS throttling errors

If QPS exceeds the configured threshold, the system returns a 429 error. Reduce client-side QPS to resolve this error.

For Elasticsearch V7.10:

{
  "error": {
    "root_cause": [
      {
        "type": "status_exception",
        "reason": "search blocked, limited by [<limiterName>][search.qps](<limiterId>) threshold:[x]"
      }
    ],
    "type": "status_exception",
    "reason": "search blocked, limited by [<limiterName>][search.qps](<limiterId>) threshold:[x]"
  },
  "status": 429
}

For all other versions:

{
  "error": {
    "root_cause": [
      {
        "type": "rate_limited_exception",
        "reason": "request indices:data/read/search rejected, limited by [l1:t*:1.0]"
      }
    ],
    "type": "rate_limited_exception",
    "reason": "request indices:data/read/search rejected, limited by [l1:t*:1.0]"
  },
  "status": 429
}

Throttle TPS for write requests

Limit the number of write requests a client node receives per second. Both exact index names and wildcard patterns are supported for tag values. For Elasticsearch V7.10, use the index tag key; for all other versions, use the index_patterns tag key.

Elasticsearch V7.10:

PUT /_qos/limiter/<limiterName>
{
  "limiters": {
    "write.tps": "100000"
  },
  "tags": {
    "index": "nginx-log-*"
  }
}

TPS throttling is not supported for other Elasticsearch versions.

Throttle bulk write throughput (total per second)

Limit the total bytes written per second across all bulk requests to a client node. For more information about bulk requests, see Bulk API. Both exact index names and wildcard patterns are supported for tag values. For Elasticsearch V7.10, use the index tag key; for all other versions, use the index_patterns tag key.

Elasticsearch V7.10:

PUT /_qos/limiter/<limiterName>
{
  "limiters": {
    "write.throughput": "100MB"
  },
  "tags": {
    "index": "nginx-log-*"
  }
}

All other versions:

PUT _qos/_ratelimit/<limiterName>
{
  "bulk.index_patterns": "nginx-log-*",
  "bulk.max_throughput_in_bytes" : 104857600
}

Multiple rules can be active at the same time. A request is throttled if it matches any rule.

Throttle bulk write size per request

Limit the maximum bytes written in a single bulk request to a client node. For more information about bulk requests, see Bulk API. Both exact index names and wildcard patterns are supported for tag values. For Elasticsearch V7.10, use the index tag key; for all other versions, use the index_patterns tag key.

Elasticsearch V7.10:

PUT /_qos/limiter/<limiterName>
{
  "limiters": {
    "write.max_size_per_request": "1000"
  },
  "tags": {
    "index": "nginx-log-*"
  }
}

All other versions:

PUT _qos/_ratelimit/<limiterName>
{
  "bulk.index_patterns": "nginx-log-*",
  "bulk.max_request_size_in_bytes" : 1000
}

Multiple rules can be active at the same time. A request is throttled if it matches any rule.

Bulk write size throttling errors

If a single bulk request exceeds the configured size threshold, the system rejects it.

For Elasticsearch V7.10 (HTTP 400):

{
  "error" : {
    "root_cause" : [
      {
        "type" : "status_exception",
        "reason" : "write_size blocked, limited by [<limiterName>][write.max_size_per_request](<limiterId>) threshold:[x] try acquire [x]"
      }
    ],
    "type" : "status_exception",
    "reason" : "write_size blocked, limited by [<limiterName>][write.max_size_per_request](<limiterId>) threshold:[x] try acquire [x]"
  },
  "status" : 400
}

For all other versions (HTTP 413):

{
  "error": {
    "root_cause": [
      {
        "type": "rate_limited_exception",
        "reason": "request indices:data/write/bulk rejected, limited by [b2:ByteSizePreSeconds:992.0]"
      }
    ],
    "type": "rate_limited_exception",
    "reason": "request indices:data/write/bulk rejected, limited by [b2:ByteSizePreSeconds:992.0]"
  },
  "status": 413
}

Reduce the bytes in each bulk request to resolve this error.

Throttle concurrent shard queries

Limit the number of concurrent threads used to query the number of primary or replica shards to reduce cluster load. Both exact index names and wildcard patterns are supported for tag values. For Elasticsearch V7.10, use the index tag key; for all other versions, use the index_patterns tag key.

Elasticsearch V7.10:

PUT /_qos/limiter/<limiterName>
{
  "limiters": {
    "search_shards.concurrent_count": "10"
  },
  "tags": {
    "index": "nginx-log-*"
  }
}

This configuration is not supported for other Elasticsearch versions.

Configure multiple throttle rules in one limiter

Apply multiple throttling rules to a single limiter in one request.

Elasticsearch V7.10:

PUT /_qos/limiter/<limiterName>
{
  "limiters": {
    "search.qps": "1000",
    "write.tps": "100000",
    "write.throughput": "1000000",
    "write.max_size_per_request": "1000",
    "search_shards.concurrent_count": "10"
  },
  "tags": {
    "index": "nginx-log-*"
  }
}

Multiple rules can be active at the same time. A request is throttled if it matches any rule.

Query limiters

Operation	Elasticsearch V7.10	All other versions
Query all limiters	`GET _qos/limiter`	`GET _qos/_ratelimit`
Query a specific limiter	`GET _qos/limiter/`	`GET _qos/_ratelimit/`
Query multiple limiters	`GET _qos/limiter/`	`GET _qos/_ratelimit/`

Separate multiple limiter names with commas. Wildcards are not supported.

Delete limiters

Operation	Elasticsearch V7.10	All other versions
Delete a specific limiter	`DELETE _qos/limiter/`	`DELETE _qos/_ratelimit/`
Delete multiple limiters	`DELETE _qos/limiter/`	`DELETE _qos/_ratelimit/`

Separate multiple limiter names with commas. Wildcards are not supported.

Monitor throttling metrics

Use the following APIs to retrieve throttling data. Run these APIs in watch mode first to validate thresholds before enforcing throttling.

Current metrics:

# All nodes
GET /_qos/limiter/nodes/stats

# A specific node
GET /_qos/limiter/nodes/{nodeId}/stats

# A specific node and limiter
GET /_qos/limiter/nodes/{nodeId}/stats/{limiterIds}

Historical metrics:

# All limiters
GET /_qos/limiter/metric

# A specific limiter
GET /_qos/limiter/metric/{limiterId}

FAQ

How do I safely test a throttling configuration before enforcing it?

Set watchMode to true in the limiter params. The plug-in records denied request counts in metrics without rejecting requests. Use the monitoring APIs to review the data, then set watchMode to false to enforce throttling.