This topic describes how to create an Alibaba Cloud Logstash cluster and configure a Logstash pipeline to synchronize data between Alibaba Cloud Elasticsearch clusters.

Background information

Make sure that you have understood the following information:

Prerequisites

Limits

An Alibaba Cloud Logstash cluster can be used to synchronize data only between Elasticsearch clusters that are of the same version and reside in the same VPC.

Procedure

  1. Preparations
    Create a source Elasticsearch cluster and a destination Elasticsearch cluster, enable the Auto Indexing feature for the destination Elasticsearch cluster, and prepare data.
  2. Step 1: Create a Logstash cluster
    Create a Logstash cluster. After the state of the Logstash cluster changes to Active, you can create and run a Logstash pipeline.
  3. Step 2: Create and run a Logstash pipeline
    Create and configure a Logstash pipeline to synchronize data.
  4. Step 3: View synchronization results
    Log on to the Kibana console of the destination Elasticsearch cluster and view data synchronization results.

Preparations

  1. Create Alibaba Cloud Elasticsearch clusters.
    1. Log on to the Elasticsearch console.
    2. In the left-side navigation pane, click Elasticsearch Clusters.
    3. Click Create on the Elasticsearch Clusters page and create two Elasticsearch clusters.
      The two Elasticsearch clusters are used as the input and output of a Logstash pipeline. For more information, see Create an Alibaba Cloud Elasticsearch cluster. In this example, a Logstash V6.7.0 cluster is used to synchronize data between Elasticsearch V6.7.0 clusters of the Standard Edition. The code provided in this topic applies only to this type of data synchronization. The following figure shows the configurations of the created Elasticsearch clusters.
      Note If you want to perform other types of data synchronization, you must check whether your Elasticsearch clusters and Logstash cluster are compatible with each other based on the instructions in Compatibility matrixes. If they are not compatible with each other, you can upgrade their versions or purchase new clusters.
      Cluster configuration
      Notice
      • The Logstash cluster and the two Elasticsearch clusters must reside in the same VPC. If the Logstash cluster resides in a different VPC from the two Elasticsearch clusters, you must configure a Network Address Translation (NAT) gateway for the Logstash cluster. This way, the Logstash cluster can be connected to the two Elasticsearch clusters over the Internet. For more information, see Configure a NAT gateway for data transmission over the Internet.
      • The default username that is used to access the Elasticsearch clusters is elastic. If you want to use a user other than elastic, you must grant the required permissions to a role and assign the role to the user.
  2. Enable the Auto Indexing feature for the destination Elasticsearch cluster.
    For more information, see Configure the YML file.
    Note To ensure data security, Alibaba Cloud Elasticsearch disables the Auto Indexing feature by default. When you use Alibaba Cloud Logstash to migrate data to an Alibaba Cloud Elasticsearch cluster, indexes are created in the Elasticsearch cluster by submitting data instead of calling the create index API. Therefore, before you use Alibaba Cloud Logstash to migrate data, you must enable the Auto Indexing feature for the destination Elasticsearch cluster or create an index in the destination Elasticsearch cluster.
  3. Create an index and documents.
    Log on to the Kibana console of the source Elasticsearch cluster. In the left-side navigation pane, click Dev Tools. On the Console tab of the page that appears, create an index and documents.
    Notice
    • For more information about how to log on to the Kibana console, see Log on to the Kibana console.
    • The following code applies only to Elasticsearch V6.7.0 clusters and is used for tests only. For more information about sample code for Elasticsearch clusters of V7.0 or later, see Quick start of Elasticsearch.
    1. Create an index named my_index.
      PUT /my_index
      {
          "settings" : {
            "index" : {
              "number_of_shards" : "5",
              "number_of_replicas" : "1"
            }
          },
          "mappings" : {
              "my_type" : {
                  "properties" : {
                    "post_date": {          
                         "type": "date"       
                     },
                    "tags": {
                         "type": "keyword"
                     },
                      "title" : {
                          "type" : "text"
                      }
                  }
              }
          }
      }
    2. Create a document named 1.
      PUT /my_index/my_type/1?pretty
      {
        "title": "One", 
        "tags": ["ruby"],
        "post_date":"2009-11-15T13:00:00"
      }
    3. Create a document named 2.
      PUT /my_index/my_type/2?pretty
      {
        "title": "Two", 
        "tags": ["ruby"],
        "post_date":"2009-11-15T14:00:00"
      }

Step 1: Create a Logstash cluster

  1. Go to the Logstash Clusters page.
    1. In the top navigation bar, select the region where the destination Elasticsearch cluster resides.
    2. In the left-side navigation pane, click Logstash Clusters.
  2. On the Logstash Clusters page, click Create.
  3. On the buy page, configure cluster launch settings.
    In this example, Billing Method is set to Pay-As-You-Go, and Logstash Version is set to 6.7. Default values are retained for other parameters. For more information, see Create a cluster.
    Note
    • We recommend that you purchase pay-as-you-go Logstash clusters for program development or functional testing.
    • Discounts are offered for subscription clusters.
  4. Read the terms of service, select Logstash (Pay-as-you-go) Terms of Service, and then click Buy Now.
  5. After a message that indicates creation success is displayed, click Console.
  6. In the top navigation bar, select the region where the Logstash cluster resides. In the left-side navigation pane, click Logstash Clusters. On the Logstash Clusters page, view the newly created Logstash cluster.

Step 2: Create and run a Logstash pipeline

After the state of the newly created Logstash cluster changes to Active, you can create and run a Logstash pipeline to synchronize data.

  1. On the Logstash Clusters page, find the newly created Logstash cluster and click Manage Pipeline in the Actions column.
  2. In the Pipelines section of the page that appears, click Create Pipeline.
  3. In the Config Settings step, configure Pipeline ID and Config Settings.
    In this topic, the following configurations are used for the pipeline:
    input {
        elasticsearch {
            hosts => ["http://es-cn-0pp1f1y5g000h****.elasticsearch.aliyuncs.com:9200"]
            user => "elastic"
            password => "your_password"
            index => "*,-.monitoring*,-.security*,-.kibana*"
            docinfo => true
        }
    }
    filter {}
    output {
        elasticsearch {
            hosts => ["http://es-cn-mp91cbxsm000c****.elasticsearch.aliyuncs.com:9200"]
            user => "elastic"
            password => "your_password"
            index => "%{[@metadata][_index]}"
            document_type => "%{[@metadata][_type]}"
            document_id => "%{[@metadata][_id]}"
        }
        file_extend {
            path => "/ssd/1/ls-cn-v0h1kzca****/logstash/logs/debug/test"
        }
    }
    Parameter Description
    hosts The endpoint of the source or destination Elasticsearch cluster. In the input part, specify a value for this parameter in the format of http://<ID of the source Elasticsearch cluster>.elasticsearch.aliyuncs.com:9200. In the output part, specify a value for this parameter in the format of http://<ID of the destination Elasticsearch cluster>.elasticsearch.aliyuncs.com:9200.
    user The username that is used to access the source or destination Elasticsearch cluster. The default username is elastic.
    password The password that is used to access the source or destination Elasticsearch cluster. The password is specified when you create the cluster. If you forget the password, you can reset it. For more information about the procedure and precautions for resetting a password, see Reset the access password for an Elasticsearch cluster.
    index The name of the index whose data you want to synchronize or to which you want to synchronize data. If you set this parameter to *,-.monitoring*,-.security*,-.kibana* in the input part, the system synchronizes data in indexes other than system indexes whose names start with a period (.). The value %{[@metadata][_index]} indicates that the system matches the index parameter in the metadata. In this case, the name of the destination index is the same as the name of the source index.
    Note System indexes are used to store the monitoring logs of Elasticsearch clusters and do not need to be synchronized.
    docinfo The value true indicates that the system extracts the metadata of Elasticsearch documents, such as the index, type, and ID.
    document_type The type of the destination index. The value %{[@metadata][_type]} indicates that the system matches the type parameter in the metadata. In this case, the type of the destination index is the same as the type of the source index.
    document_id The IDs of documents in the destination index. The value %{[@metadata][_id]} indicates that the system matches the id parameter in the metadata. In this case, the IDs of the documents in the destination index are the same as the IDs of documents in the source index.
    file_extend Specifies whether the pipeline configuration debugging feature is enabled. This parameter is optional. You can use the path field to specify the path that stores debug logs. We recommend that you configure this parameter. After the parameter is configured, you can directly view the output data of the pipeline in the console. If the parameter is not configured, you need to check the output data of the pipeline in the destination. If the output data is incorrect, you need to modify the configuration of the pipeline in the console. This increases time and labor costs. For more information, see Use the pipeline configuration debugging feature.
    Notice Before you use the file_extend parameter, you must install the logstash-output-file_extend plug-in. For more information, see Install and remove a plug-in. By default, the path field is set to a system-specified path. We recommend that you do not change the path. You can click Start Configuration Debug to obtain the path.

    For more information about the code structure in Config Settings and the supported data types, see Structure of a Config File. The supported data types may differ in different versions.

  4. Click Next to configure pipeline parameters.
    In the Pipeline Parameters step, set Pipeline Workers to the number of vCPUs that you configured for the Logstash cluster. Default values are retained for other parameters. For more information, see Use configuration files to manage pipelines.
  5. Click Save or Save and Deploy.
    • Save: After you click this button, the system stores the pipeline settings and triggers a cluster change. However, the settings do not take effect. After you click Save, the Pipelines page appears. In the Pipelines section, find the created pipeline and click Deploy in the Actions column. Then, the system restarts the Logstash cluster to make the settings take effect.
    • Save and Deploy: After you click this button, the system restarts the Logstash cluster to make the settings take effect.
  6. In the message that appears, click OK.
    Then, you can view the newly created pipeline in the Pipelines section. After the state of the pipeline changes to Running, your Logstash cluster starts to synchronize data. Pipelines

Step 3: View synchronization results

After the Logstash pipeline is created and starts to run, you can log on to the Kibana console of the destination Elasticsearch cluster to view data synchronization results.

  1. Log on to the Kibana console of the destination Elasticsearch cluster.
    For more information, see Log on to the Kibana console.
  2. In the left-side navigation pane, click Dev Tools.
  3. On the Console tab of the page that appears, run the following command to view data synchronization results:
    GET /my_index/_search
    {
      "query": {
        "match_all": null
      }
    }
    Command output
    If the data in the destination Elasticsearch cluster is the same as the data in the source Elasticsearch cluster, data is successfully synchronized. You can also run the GET _cat/indice?v command and check whether the size of the destination index is the same as the size of the source index to determine whether data is successfully synchronized.

References

FAQ

FAQ about Alibaba Cloud Logstash: FAQ about Logstash