All Products
Search
Document Center

Elasticsearch:Use configuration files to manage pipelines

Last Updated:Sep 11, 2023

Logstash uses pipelines to collect and process data. You must configure input and output plug-ins for pipelines. You can configure filter plug-ins based on your business requirements. The input and output plug-ins are used to configure input and output data sources, and the filter plug-ins are used to preprocess collected data. Alibaba Cloud Logstash allows you to run a maximum of 20 pipelines at the same time. This topic describes how to use the configuration file of an Alibaba Cloud Logstash cluster to manage pipelines.

Prerequisites

The following operations are performed:

  • Create an Alibaba Cloud Elasticsearch cluster.

    For more information, see Create an Alibaba Cloud Elasticsearch cluster.

  • Enable the Auto Indexing feature for the Alibaba Cloud Elasticsearch cluster, or create an index in the Alibaba Cloud Elasticsearch cluster and configure mappings for the index. In this example, the Auto Indexing feature is used.

    For more information about how to enable the Auto Indexing feature, see Configure the YML file. For more information about how to create an index and configure mappings for the index, see Step 3: Create an index.

    Note

    To ensure data security, Alibaba Cloud Elasticsearch disables the Auto Indexing feature by default. When you use Alibaba Cloud Logstash to transfer data to an Alibaba Cloud Elasticsearch cluster, indexes are created in the Elasticsearch cluster by submitting data instead of calling the create index API. Therefore, before you use Alibaba Cloud Logstash to transfer data, you must enable the Auto Indexing feature for the destination Elasticsearch cluster, or create an index in the destination Elasticsearch cluster and configure mappings for the index.

  • Create an Alibaba Cloud Logstash cluster.

    For more information, see Create an Alibaba Cloud Logstash cluster.

Limits

  • Alibaba Cloud Logstash allows you to run a maximum of 20 pipelines at the same time.

  • If you specify an Alibaba Cloud Elasticsearch cluster in the output configuration of a pipeline, you must make sure that the Auto Indexing feature is enabled for the cluster or an index is created in the cluster and configured with mappings.

  • If the data sources that you specify in the configurations of a pipeline belong to other Alibaba Cloud services, you must make sure that the data sources reside in the same virtual private cloud (VPC) as your Alibaba Cloud Logstash cluster. Otherwise, you must configure network and security settings for your Logstash cluster. For more information, see Configure a NAT gateway for data transmission over the Internet.

  • If the file_extend parameter is configured in the output configuration of a pipeline, you must make sure that the logstash-output-file_extend plug-in is installed for the Alibaba Cloud Logstash cluster. For more information, see Install and remove a plug-in.

Create a pipeline

  1. Go to the Logstash Clusters page of the Alibaba Cloud Elasticsearch console.
  2. Navigate to the desired cluster.
    1. In the top navigation bar, select the region where the cluster resides.
    2. On the Logstash Clusters page, find the cluster and click its ID.
  3. In the left-side navigation pane of the page that appears, click Pipelines.

  4. On the Pipelines page, click Create Pipeline.

  5. In the Config Settings step of the Create Task wizard, configure Pipeline ID and Config Settings.

    The following sample code provides a configuration example:

    input {
        beats {
            port => 8000
        }
    }
    filter {
    
    }
    output {
        elasticsearch {
            hosts => ["http://es-cn-o40xxxxxxxxxx****.elasticsearch.aliyuncs.com:9200"]
            index => "logstash_test_1"
            password => "es_password"
            user => "elastic"
        }
        file_extend {
            path => "/ssd/1/ls-cn-v0h1kzca****/logstash/logs/debug/test"
        }
    }

    Part

    Description

    input

    Specifies the input data source. For more information about supported data source types, see Input plugins.

    Note
    • If you want input plug-ins to listen to the port of the node where the Logstash process is running, specify a port that ranges from 8000 to 9000.

    • If you want to define a plug-in, driver, or file in the input part, perform the following steps: Click Show Third-party Libraries. In the Third-party Libraries dialog box, click Upload. Then, upload files as prompted. For more information, see Configure third-party libraries.

    filter

    Specifies the plug-in that is used to filter input data. For more information about supported plug-ins, see Filter plugins.

    output

    Specifies the output data source. For more information about supported data source types, see Output plugins.

    file_extend: specifies whether the pipeline configuration debugging feature is enabled. This parameter is optional. You can use the path field to specify the path that stores debug logs. We recommend that you configure this parameter. After the parameter is configured, you can directly view the output data of the pipeline in the console. If the parameter is not configured, you need to check the output data of the pipeline in the destination. If the output data is incorrect, you need to modify the configuration of the pipeline in the console. This increases time and labor costs. For more information, see Use the pipeline configuration debugging feature.

    Important

    Before you use the file_extend parameter, you must install the logstash-output-file_extend plug-in. For more information, see Install and remove a plug-in. By default, the path field is set to a system-specified path. We recommend that you do not change the path. You can click Start Configuration Debug to obtain the path.

    For more information about the code structure in Config Settings and the supported data types, see Structure of a Config File. The supported data types may differ in different versions.

    Important
    • For security purposes, if you use a JDBC driver to configure a pipeline, you must add allowLoadLocalInfile=false&autoDeserialize=false at the end of the jdbc_connection_string parameter, such as jdbc_connection_string => "jdbc:mysql://xxx.drds.aliyuncs.com:3306/<Database name>?allowLoadLocalInfile=false&autoDeserialize=false". Otherwise, the system displays an error message that indicates a check failure.

    • If a parameter similar to last_run_metadata_path exists in Config Settings, the file path must be provided by Logstash. A path in the /ssd/1/<Logstash cluster ID>/logstash/data/ format is provided at the backend and is available for tests, and the system does not delete the data in this path. Make sure that your disk has sufficient storage space when you use this path. After you specify a path, Logstash automatically generates a file in the path, but you cannot view the data in the file.

    • Alibaba Cloud Logstash clusters are deployed in VPCs. If you want to specify data sources that belong to other Alibaba Cloud services in the configurations of a pipeline, we recommend that you use data sources that are deployed in the same VPC as your Alibaba Cloud Logstash cluster. If you want to allow access to your Logstash cluster over the Internet, configure network and security settings for your Logstash cluster. For more information, see Configure a NAT gateway for data transmission over the Internet.

    • We recommend that you use file_extend to print logs for tests. Do not use stdout.

  6. Click Next to configure pipeline parameters.

    管道参数配置

    Parameter

    Description

    Pipeline Workers

    The number of worker threads that run the filter and output plug-ins of the pipeline in parallel. If a backlog of events exists or some CPU resources are not used, we recommend that you increase the number of threads to maximize CPU utilization. The default value of this parameter is the number of vCPUs.

    Pipeline Batch Size

    The maximum number of events that a single worker thread can collect from input plug-ins before it attempts to run filter and output plug-ins. If you set this parameter to a large value, a single worker thread can collect more events but consumes larger memory. If you want to make sure that the worker thread has sufficient memory to collect more events, specify the LS_HEAP_SIZE variable to increase the Java virtual machine (JVM) heap size. Default value: 125.

    Pipeline Batch Delay

    The wait time for an event. This time occurs before you assign a small batch to a pipeline worker thread and after you create batch tasks for pipeline events. Default value: 50. Unit: milliseconds.

    Queue Type

    The internal queue model for buffering events. Valid values:

    • MEMORY: traditional memory-based queue. This is the default value.

    • PERSISTED: disk-based ACKed queue, which is a persistent queue.

    Queue Max Bytes

    The value must be less than the total capacity of your disk. Default value: 1024. Unit: MB.

    Queue Checkpoint Writes

    The maximum number of events that are written before a checkpoint is enforced when persistent queues are enabled. The value 0 indicates no limit. Default value: 1024.

    Warning

    After you configure the parameters, you must save the settings and deploy the pipeline. This triggers a restart of the Logstash cluster. Before you can proceed, make sure that the restart does not affect your business.

  7. Click Save or Save and Deploy.

    • Save: After you click this button, the system stores the pipeline settings and triggers a cluster change. However, the settings do not take effect. After you click Save, the Pipelines page appears. On the Pipelines page, find the created pipeline and click Deploy in the Actions column. Then, the system restarts the Logstash cluster to make the settings take effect.

    • Save and Deploy: After you click this button, the system restarts the Logstash cluster to make the settings take effect.

  8. In the message that appears, click OK. Then, you can view the created pipeline on the Pipelines page.

    After the cluster is restarted, the pipeline is created.

Modify a pipeline

Warning

After you modify a pipeline, save the modifications, and deploy the pipeline again, the system restarts the Logstash cluster. Before you can proceed, make sure that the restart does not affect your business.

  1. On the Pipelines page, find the desired pipeline and click Modify in the Actions column.

  2. In the Modify wizard, modify the settings in the Config Settings and Pipeline Parameters steps. You cannot change the value of Pipeline ID.

  3. Click Save or Save and Deploy. After the cluster is restarted, the pipeline is modified.

Copy a pipeline

Warning

After you copy a pipeline, save the pipeline settings, and deploy the pipeline, the system restarts the Logstash cluster. Before you can proceed, make sure that the restart does not affect your business.

  1. On the Pipelines page, find the desired pipeline and choose 更多 > Copy in the Actions column.

  2. In the Copy wizard, configure Pipeline ID and retain other settings.

  3. Click Save or Save and Deploy. After the cluster is restarted, the pipeline is copied.

Delete a pipeline

Warning
  • After a pipeline is deleted, it cannot be recovered, and tasks that are running on the pipeline are stopped. Before you can proceed, make sure that the deletion does not affect your business.

  • The deletion triggers an update of the Logstash cluster. Before you can proceed, make sure that the update does not affect your business.

  1. On the Pipelines page, find the desired pipeline and choose 更多 > Delete in the Actions column.

  2. In the Delete Pipeline message, view risk warnings.

  3. Click Continue. After the cluster is updated, the pipeline is deleted.

References

FAQ