This topic describes the troubleshooting methods that are used when the disk usage exceeds 85% or even reaches 100% and your Alibaba Cloud Elasticsearch cluster or Kibana has no response.

Notice This topic includes information about third-party products. The information is for reference only. Alibaba Cloud does not make any guarantee, express or implied, with respect to the performance and reliability of third-party products, as well as potential impacts of operations on the products.

Problem description

  • During an index request, an error message similar to index read_only is returned, such as FORBIDDEN/12/index read-only / allow delete (api)];].
  • The cluster is in the Unhealthy or Paused state (indicated by the color red). In severe cases, nodes are not added to the cluster. You can run the GET _cat/nodes? command to view nodes in the cluster. In addition, some shards are not allocated to nodes. You can run the GET _cat/allocation?v command to view shard allocation.
    Note The cluster status indicates that primary shards are unavailable and may have lost data.
  • The following error message is returned when you create a pipeline or enroll a Beat in the Kibana console: internal server error.
  • On the Cluster Monitoring page of the cluster or the Monitoring page in the Kibana console of the cluster, the disk usage has reached 100% over a recent period of time.

Cause

The disk usage of nodes has three thresholds:

  • 85%: If the disk usage of a node exceeds 85%, the system no longer allocates new shards to the node.
  • 90%: If the disk usage of a node exceeds 90%, the system re-allocates the shards on the node to other data nodes with low disk usage.
  • 95%: If the disk usage of a node exceeds 95%, the system adds the read_only_allow_delete attribute to all indexes in the cluster. As a result, data cannot be written to the indexes, and you can only read or delete data from the indexes.Disk usage

Solution

  1. Run the following command to delete data:
    curl -u <username>:<password> -XDELETE  http://<host>:<port>
    Notice
    • If you want to retain the data, resize the disk. For more information, see Upgrade the configuration of a cluster.
    • Set <host> to the internal or public endpoint of the cluster. We recommend that you configure the related whitelist before you run this command.
    • If the cluster has no response, we recommend that you trigger a forced restart and try to run this command during the restart.
  2. Check whether the index is still read only. If yes, run the following command to set the index.blocks.read_only_allow_delete attribute to null for all indexes:
    PUT _settings
    {  
       "index.blocks.read_only_allow_delete": null
    }

    After the preceding command is executed, the cluster does not contain indexes that are read only.

  3. Check whether the cluster is still in the Unhealthy or Paused state. If yes, run the _cat/allocation?v command to check whether the cluster contains shards that are not allocated.
  4. If the cluster contains shards that are not allocated, run the GET _cluster/allocation/explain command to view the reason. If the reason is similar to that shown in the following figure, run the POST /_cluster/reroute?retry_failed=true command.Reason why shards are not allocated
  5. After shards are allocated, view the cluster status. If the cluster is still in the Unhealthy or Paused state, contact Alibaba Cloud technical support engineers.

Additional information

To avoid the impact of high disk usage on Alibaba Cloud Elasticsearch, we recommend that you enable disk usage monitoring and alerting. You can view the alerting text message in time and take protective measures in advance. For more information, see Configure the monitoring and alerting feature in CloudMonitor.