Troubleshoot issues occurred when you use Logstash to transfer data - Elasticsearch

When you use an Alibaba Cloud Logstash cluster to transfer data to an Alibaba Cloud Elasticsearch cluster that is specified as the output for a Logstash pipeline, you may encounter some issues. For example, the network cannot be connected, pipeline configurations are invalid, the load of the source, Logstash pipeline, or Elasticsearch cluster is high, the pipeline is started but no data is written to the Elasticsearch cluster, and the related services normally run but no data can be queried from the source or the data write operation for the destination is abnormal. This topic describes solutions to these issues.

Failed network connection

Check item	Sample scenario for the issue	Recommended solution
Check whether your Logstash cluster resides in the same network environment as the destination Elasticsearch cluster and the source. Note Alibaba Cloud Logstash clusters and Elasticsearch clusters are deployed in virtual private clouds (VPCs). We recommend that you deploy the Logstash cluster and the Elasticsearch cluster in the same VPC.	The source resides on the Internet, while the Logstash cluster resides in a VPC.	Use one of the following solutions: Use a network connection tool to connect the Logstash cluster to the source and the Elasticsearch cluster. Configure a NAT gateway to transmit data over the Internet. For more information, see Configure a NAT gateway for data transmission over the Internet. Create a Logstash cluster and an Elasticsearch cluster that reside in the same VPC and configure a Logstash pipeline.
Check whether the NAT gateway is correctly configured.	The IP address or port number specified for a NAT entry is incorrect. The type of the NAT gateway is not suitable for the scenario.	Use one of the following solutions based on your business requirements: Check the IP address and port number specified for the NAT entry to ensure the network connection. Select an IP address translation type based on your business requirements: Source Network Address Translation (SNAT): allows Logstash to access the Internet. Destination Network Address Translation (DNAT): allows services on the Internet to transfer data to nodes in a Logstash cluster.
Check whether the uploaded Java Database Connectivity (JDBC) driver is valid.	When a JDBC driver is used to synchronize data from PolarDB, no error is reported in the log but data cannot be written to the destination Elasticsearch cluster. After a JDBC driver of an earlier version is used, data can be written.	Use a JDBC driver of an appropriate version. For more information, see Configure third-party libraries.
Check whether the whitelist mechanism or security group rules for the source limits the access to the Logstash cluster.	Filebeat is used to collect data from the source to the Logstash cluster. Filebeat is deployed on ECS instances of the client side. However, the ports of ECS instances are not enabled in a security group.	Use one of the following solutions based on your business requirements: Add the IP addresses of the nodes in the Logstash cluster to the whitelist of the source. Note For more information about how to obtain the IP addresses of the nodes in a Logstash cluster, see View the basic information of a cluster. Enable a port of an ECS instance to allow access to the Logstash cluster. For more information, see Add a security group rule.
Check whether the RAM users specified in the input and output configurations of the Logstash pipeline have the required permissions.	You specify a RAM user in the output configurations of a Logstash pipeline for the access to the Elasticsearch cluster. However, the RAM user does not have the required permissions on the destination index in the Elasticsearch cluster. The error code 401 is reported in the cluster logs of the Logstash cluster.	Use one of the following solutions based on your business requirements: Grant the required permissions to the RAM users. For more information, see Grant permissions to a RAM user. Specify valid usernames and passwords for the Elasticsearch cluster and the source. The passwords cannot contain special characters. If a password contains special characters, change the password. For more information, see Reset the access password for an Elasticsearch cluster.

Invalid pipeline configurations

Check item	Sample scenario for the issue	Recommended solution
Query the cluster logs of the Logstash cluster and check whether errors are reported in the logs. For information about how to query the cluster logs of a Logstash cluster, see Query logs.	A required plug-in is not installed for the Logstash cluster. For example, if the cluster logs contain the error message `Couldn't find any output plugin named 'file_extend'`, the logstash-output-file_extend plug-in is not installed for the Logstash cluster.	Use one of the following solutions: Install the plug-in for the Logstash cluster. Delete the configuration information of the plug-in in the pipeline configurations.
	Configuration information contains hidden special characters.	Manually enter the configuration information.
	Code fails to be filtered based on the configurations in the filter part. For example, invalid Ruby code fails to be filtered out based on the configurations in the filter part.	Use one of the following solutions: Simplify the configurations in the filter part to the original configurations and gradually add configurations to this part. Then, find the root cause and troubleshoot the issue based on actual situations. Use a third-party debugging tool to debug the configurations in the filter part.
	The parameter names or parameter values that you configure in pipeline configurations are invalid. For example, you enter the parameter name hosts as host when you configure the logstash-output-elasticsearch plug-in, or you enter an invalid RDS instance name.	For information about how to configure pipeline settings, see open source Logstash documentation or Best practices for Alibaba Cloud Logstash.
	The connection times out when the Logstash cluster connects to the source or the Elasticsearch cluster. For example, if the Logstash cluster cannot connect to the Elasticsearch cluster for a long period of time, the error message `Elasticsearch Unreachable: [http://xxxx:9200/][Manticore::ConnectTimeout] connect timed out` appears.	Make sure that the Logstash cluster can connect to the Elasticsearch cluster and the endpoints of the source and the Elasticsearch cluster are correct.
	HTTPS is enabled for the Elasticsearch cluster, but you specify http when you configure the Logstash pipeline.	Modify the configurations of the pipeline to make sure that the Logstash pipeline, the source, and the Elasticsearch cluster use the same protocol.

Abnormal load

Check item	Sample scenario for the issue	Recommended solution
Check whether the disk usage of nodes is excessively high. For more information, see Cluster monitoring overview.	In pipeline configurations, the Queue Type parameter is set to PERSISTED. In this case, data is permanently stored on a disk. As a result, the disk space is exhausted as data increases. `stdout{}` is configured in the configurations in the output part.	Use one of the following solutions based on your business requirements: Set the Queue Type parameter to MEMORY, which is the default queue type. For information about how to change the queue type, see Use configuration files to manage pipelines. Important Alibaba Cloud Logstash does not provide an entry point for you to clear a disk. If your disk space is exhausted, you must contact Alibaba Cloud technical personnel to clear the disk at the backend. Delete `stdout{}` from the output configuration part of the pipeline. Important You cannot configure `stdout{}` in the output configuration part of a pipeline. Otherwise, the disk usage may be excessively high.
Check whether an out of memory (OOM) error is reported for the memory usage of nodes. For more information, see Cluster monitoring overview.	An OOM error is reported for the memory usage of nodes. As a result, nodes fail to start.	Restart the nodes in the Elasticsearch cluster.
Check whether the load of the source or the Elasticsearch cluster is normal.	The Elasticsearch cluster is in an unhealthy state. As a result, data cannot be written to the cluster.	Pause the write operation and recover the Elasticsearch cluster to a normal state. We recommend that you scale out the cluster.

No data written to the Elasticsearch cluster after the pipeline is started

Check item

Sample scenario for the issue

Recommended solution

Enable the pipeline configuration debugging feature for the Logstash cluster and view the debug logs of the cluster to check whether data is transferred to the Logstash cluster. To enable the pipeline configuration debugging feature, you must install the logstash-output-file_extend plug-in for the Logstash cluster. For information about how to use the pipeline configuration debugging feature, see Use the pipeline configuration debugging feature.

If no data is transferred to the Logstash cluster, check whether the configuration information of the source is valid.
If data is transferred to the Logstash cluster, check whether the configuration information of the Elasticsearch cluster is valid.

If no data is transferred to the Logstash cluster, the following situations may occur:

The configuration information of the source contains the AccessKey pair of an Alibaba Cloud account, but the AccessKey pair is invalid.
No data is written to the source in real time. For example, Filebeat is used to collect data from a file of the source in real time, but no data is stored to the file in real time. In this case, Filebeat cannot collect real-time data from the source or transfer data to the Logstash cluster.

Use one of the following solutions based on your business requirements:

Check the configuration information of the source and modify invalid information.
If a real-time streaming data plug-in is installed for the Logstash cluster, make sure that data is written to the source in real time.

If data is transferred to the Logstash cluster, the following situations may occur:

The Auto Indexing feature is disabled for the Elasticsearch cluster.
The write operation is not allowed for the Elasticsearch cluster. For example, data is not allowed to be written to the destination index of the Elasticsearch cluster.

Use one of the following solutions based on your business requirements:

Enable the Auto Indexing feature for the Elasticsearch cluster.
Make sure that the write operation is allowed for the Elasticsearch cluster.

Abnormal data query from the source or abnormal data write to the destination while all services normally run

Check item	Sample scenario for the issue	Recommended solution
Perform the following operations based on the pipeline configuration scenario and the attributes of the pipeline plug-ins: Check whether the JDBC query statement that you use is correct. If the logstash-input-elasticsearch plug-in is used, check whether the plug-in writes data to the source in real time.	If you use JDBC, the following situations may occur: The return result does not contain data. The data of identifier fields, such as the time field and ID, is not incremental data. The time zone of the JDBC driver is different from that of the Elasticsearch cluster.	Use one of the following solutions based on your business requirements: Debug the query statement in the source. Check whether the data types of the identifier fields are the same as the data types recommended by open source Elasticsearch. We recommend that you use the numeric or timestamp type. Check whether the time zone of the JDBC driver is the same as that of the Elasticsearch cluster. If the time zones are different, modify related configurations based on the check result.
	If you use the logstash-input-elasticsearch plug-in, the following situations may occur: Data is written to the source in real time. You specify a short query interval in the pipeline configurations. In this case, data is written to the Elasticsearch cluster at short intervals, and a large amount of data is written to the Elasticsearch cluster. As a result, data is piled up in the Elasticsearch cluster.	Logstash is not suitable for real-time data synchronization. If data is written to the source in real time, we recommend that you specify a long query interval. This can prevent frequent queries in the source and frequent write operations in the Elasticsearch cluster.
View the slow logs of the Logstash cluster to check whether data is written to the Elasticsearch cluster at a low speed. For information about how to view slow logs, see Query logs.	The loads on the source and the Elasticsearch cluster have not encountered bottlenecks. However, the default value is retained for the Pipeline Workers parameter. As a result, data is written to the destination Elasticsearch cluster at a low speed.	Set the Pipeline Batch Size and Pipeline Workers parameters to larger values. For more information, see Use configuration files to manage pipelines.