If the disk usage of your Alibaba Cloud Elasticsearch cluster exceeds 85%, the cluster or Kibana may not provide services. This topic describes how to resolve this issue.

Important Disclaimer: This topic may contain information about third-party products. Such information is only for reference. Alibaba Cloud does not make any guarantee, express or implied, with respect to the performance and reliability of third-party products, as well as potential impacts of operations on the products.

Problem description

  • After the system receives an index request, it returns an error message similar to index read_only, such as FORBIDDEN/12/index read-only / allow delete (api)];].
  • The cluster is in a state that is indicated by the color red. In severe cases, some nodes do not join the cluster. You can run the GET _cat/nodes? command to view the nodes in the cluster. In addition, some shards are not allocated to nodes. You can run the GET _cat/allocation?v command to view the allocation of shards.
    Note If a cluster is in a state that is indicated by the color red, the primary shards of the cluster are unavailable, and data on the cluster may be lost.
  • When a pipeline is created or a Beat is enrolled in the Kibana console, the internal server error message is returned.
  • On the Cluster Monitoring page of the cluster or the Monitoring page in the Kibana console of the cluster, the disk usage has reached 100% recently.

Cause

The preceding issues are caused by high disk usage. The disk usage of nodes has the following thresholds:

  • 85%: If the disk usage of a node exceeds 85%, the system no longer allocates new shards to the node.
  • 90%: If the disk usage of a node exceeds 90%, the system migrates the shards on the node to other data nodes with low disk usage.
  • 95%: If the disk usage of a node exceeds 95%, the system forcibly adds the read_only_allow_delete attribute to all indexes in the cluster. As a result, data cannot be written to the indexes, and you can only read data from the indexes or delete the indexes.Disk usage

Solution

  1. Run the following command to delete data:
    Warning Deleted data cannot be restored. Proceed with caution. You can also retain the data, but you must resize disks. For more information, see Upgrade the configuration of a cluster.
    curl -u <username>:<password> -XDELETE  http://<host>:<port>/<index-name>
    • Set <host> to the internal or public endpoint of the cluster. We recommend that you configure the related whitelist before you run this command.
    • If the cluster has no response after you run the preceding command, we recommend that you trigger a forced restart and try to run this command during the restart.
  2. Check whether indexes are still read-only. If they are, run the following command to set the index.blocks.read_only_allow_delete attribute to null for all indexes to ensure that all indexes on the cluster are not read-only:
    PUT _settings
    {  
       "index.blocks.read_only_allow_delete": null
    }
  3. Check whether the cluster is still in a state that is indicated by the color red. If it is, run the _cat/allocation?v command to check whether the cluster contains shards that are not allocated.
  4. If the cluster contains shards that are not allocated, run the GET _cluster/allocation/explain command to view the reason. If the reason is similar to that shown in the following figure, run the POST /_cluster/reroute?retry_failed=true command.Reason why shards are not allocated
  5. After shards are allocated, view the cluster status. If the cluster is still in a state that is indicated by the color red, contact Alibaba Cloud technical support engineers.

Additional information

To avoid the impact of high disk usage on Alibaba Cloud Elasticsearch, we recommend that you enable disk usage monitoring and alerting. In addition, you must view the alerting text message in time and take appropriate measures in advance. For more information, see Configure cluster alerting.