Troubleshoot Logstash performance issues - Elasticsearch

Alibaba Cloud Logstash is the same as open source Logstash in terms of use and tuning. Data processing by using a Logstash pipeline consists of the following stages: input, filter, and output. In each stage, data is processed by using independent worker threads. In the input stage, events are written to the central queue of the memory or a disk. By default, events are written to the central queue of the memory. Each pipeline worker thread extracts a batch of events from the central queue. In the filter stage, the events are processed. In the output stage, the processed events are transferred to a destination. This topic describes how to troubleshoot the performance issues of Logstash. The instructions provided in this topic are for reference only.

Enable monitoring

If you need to use an Alibaba Cloud Logstash cluster, we recommend that you use the following methods to configure monitoring and alerting for the cluster:

Configure a custom alert policy for the Logstash cluster in Alibaba Cloud CloudMonitor: After you configure a custom alert policy for the Logstash cluster in Alibaba Cloud CloudMonitor, you can obtain the monitoring data of system metrics that are related to the Logstash cluster. For more information, see Configure a custom alert policy.
Enable the X-Pack Monitoring feature: If you enable the X-Pack Monitoring feature for the Logstash cluster, you must make sure that the Alibaba Cloud Elasticsearch cluster that is associated with the Logstash cluster resides in the same virtual private cloud (VPC) as the Logstash cluster. You can obtain the monitoring data of metrics, such as those related to the CPU utilization and memory usage of the Logstash cluster, the rate at which the Logstash cluster receives events, and the rate at which the Logstash cluster transfers the processed events. For more information, see Enable the X-Pack Monitoring feature.

Recommendations for debugging

During debugging, we recommend that you debug the value of one parameter at a time and gradually increase or reduce the value. During this process, you can observe the changes in the data write and data consumption of the source and destination.
If the system resources for the Logstash cluster are sufficient, we recommend that you first debug the value of the Pipeline Batch Size parameter. When the amount of data that can be written to the destination reaches the upper limit, you can debug the value of the Pipeline Workers parameter. If you want to write data to an Elasticsearch cluster, we recommend that you set the amount of data to write at a time in a bulk write request to a size of around 5 MB. For more information, see Check the configurations of pipeline parameters.
During the debugging, you need to pay attention to the loads on the source, Logstash cluster, and destination, and make sure that sufficient heap memory is available to handle surged exceptions. For more information, see Check the performance of a Logstash cluster.

Note Logstash processes events by using pipelines. The processing speed depends on the consumption capabilities of the source and destination.
You can install the logstash-output-file_extend plug-in for the Logstash cluster. This way, after a Logstash pipeline is started, you can analyze the processing details of business data based on the debug logs. For more information, see Use the pipeline configuration debugging feature.

Check the performance of a Logstash cluster

Note If resources for a Logstash cluster are fully utilized, the data throughput can be improved. If the CPU or memory resources for the Logstash cluster are not fully utilized, the data throughput cannot be improved even if you increase the specifications of the resources.

CPU

You can view the CPU utilization based on the monitoring data. If the CPU utilization of the Logstash cluster is excessively high, you must pay attention to the heap memory usage of the Logstash cluster.

Heap memory

In most cases, you can configure a heap memory size that ranges from 4 GB to 8 GB for a Logstash cluster. If you require a larger heap memory size due to reasons such as insufficient resources, you can scale out the Logstash cluster. In other cases, you do not need to increase the heap memory size for the Logstash cluster. Before you use the cluster in the production environment, we recommend that you perform a test on your Logstash cluster and configure a heap memory size for the cluster based on your business requirements.
Excessively high or low heap memory usage triggers frequent garbage collection (GC), which increases CPU utilization. We recommend that you configure a memory size that is twice the original memory size of the Logstash cluster and check whether the performance of the Logstash cluster is improved.
We recommend that you configure the same size of heap memory and off-heap memory based on the best practices of open source Logstash.

Check the configurations of pipeline parameters

Pipeline Workers: This parameter specifies the number of worker threads that are used to filter events and transfer output results after events are processed. By default, this parameter is set to the number of vCPUs of a node in a Logstash cluster. If the CPU utilization of your Logstash cluster is not high, you can increase the value of this parameter to improve event processing performance.
Pipeline Batch Size: This parameter specifies the number of events that are processed by each worker thread in the filter and output stages. You can increase the value of this parameter to improve the efficiency of event processing. This parameter corresponds to the bulk setting of Elasticsearch.

Common scenario

Kafka

How do I improve the message consumption capability of Kafka when message accumulation occurs in Kafka?

You can use the following methods to improve the message consumption capability of Kafka. For more information, see Tips and Best Practices in the documentation for open source Logstash. You can use one of the methods or all methods at the same time to improve the message consumption capability of Kafka.

In scenarios in which a large amount of data exists, you can determine the number of partitions in a Kafka topic by multiplying the number of nodes in a Logstash cluster with the number of consumer threads for each node.

Note More partitions require more overheads. We recommend that you configure partitions based on your business requirements.
You can use the same group ID in the configurations of multiple pipelines in a Logstash cluster to distribute loads to multiple physical servers. Messages in a topic are delivered to consumers in the same group. This improves the message consumption capability.
You can increase the value of the Pipeline Workers parameter and the value of the Pipeline Batch Size parameter.

References

Documentation for performance tuning of open source Logstash