This topic describes how to create an Alibaba Cloud Logstash cluster and configure a Logstash pipeline to synchronize data between Alibaba Cloud Elasticsearch clusters.

Background information

Make sure that you have understood the following information:

Prerequisites

Limits

An Alibaba Cloud Logstash cluster can be used to synchronize data only between Elasticsearch clusters that are of the same version and reside in the same VPC.

Procedure

  1. Preparations
    Create a source Elasticsearch cluster and a destination Elasticsearch cluster, enable the Auto Indexing feature for the destination Elasticsearch cluster, and prepare data.
  2. Step 1: Create a Logstash cluster
    Create a Logstash cluster. After the state of the Logstash cluster changes to Active, you can create and run a Logstash pipeline.
  3. Step 2: Create and run a Logstash pipeline
    Create and configure a Logstash pipeline to synchronize data.
  4. Step 3: View synchronization results
    Log on to the Kibana console of the destination Elasticsearch cluster and view data synchronization results.

Preparations

  1. Create Alibaba Cloud Elasticsearch clusters.
    1. Log on to the Elasticsearch console.
    2. In the left-side navigation pane, click Elasticsearch Clusters.
    3. Click Create on the Elasticsearch Clusters page and create two Elasticsearch clusters.
      The two Elasticsearch clusters are used as the input and output of Logstash. For more information about how to create an Elasticsearch cluster, see Create an Alibaba Cloud Elasticsearch cluster. In this example, a Logstash V6.7.0 cluster is used to synchronize data between Elasticsearch V6.7.0 clusters of the Standard Edition. The code provided in this topic applies only to this type of data synchronization. The following figure shows the configurations of the created Elasticsearch clusters. Cluster configuration
      Notice
      • The Logstash cluster and the two Elasticsearch clusters must reside in the same VPC. If the Logstash cluster resides in a different VPC from the two Elasticsearch clusters, you must configure a Network Address Translation (NAT) gateway for the Logstash cluster. This way, the Logstash cluster can be connected to the two Elasticsearch clusters over the Internet. For more information, see Configure a NAT gateway for data transmission over the Internet. The versions of the three clusters must be compatible with each other. For more information, see Compatibility matrixes.
      • The default username that is used to access the Elasticsearch clusters is elastic. If you want to use a username other than elastic, you must assign the required roles and grant permissions to the username.
  2. Enable the Auto Indexing feature for the destination Elasticsearch cluster.
    To ensure data security, Alibaba Cloud Elasticsearch disables the Auto Indexing feature by default. When you use Logstash to synchronize data between Elasticsearch clusters, indexes are created by submitting data instead of calling the Create Index API. Therefore, before you use Logstash to synchronize data, you must enable the Auto Indexing feature for the destination Elasticsearch cluster. For more information about how to enable this feature, see Configure the YML file.
  3. Create an index and documents.
    Log on to the Kibana console of the source Elasticsearch cluster. In the left-side navigation pane, click Dev Tools. On the Console tab of the page that appears, create an index and documents.
    Notice
    • For more information about how to log on to the Kibana console, see Log on to the Kibana console.
    • The following code applies only to Elasticsearch V6.7.0 clusters.
    1. Create an index named my_index.
      PUT /my_index
      {
          "settings" : {
            "index" : {
              "number_of_shards" : "5",
              "number_of_replicas" : "1"
            }
          },
          "mappings" : {
              "my_type" : {
                  "properties" : {
                    "post_date": {          
                         "type": "date"       
                     },
                    "tags": {
                         "type": "keyword"
                     },
                      "title" : {
                          "type" : "text"
                      }
                  }
              }
          }
      }
    2. Create a document named 1.
      PUT /my_index/my_type/1?pretty
      {
        "title": "One", 
        "tags": ["ruby"],
        "post_date":"2009-11-15T13:00:00"
      }
    3. Create a document named 2.
      PUT /my_index/my_type/2?pretty
      {
        "title": "Two", 
        "tags": ["ruby"],
        "post_date":"2009-11-15T14:00:00"
      }

Step 1: Create a Logstash cluster

  1. Go to the Logstash Clusters page.
    1. In the top navigation bar, select the region where the destination Elasticsearch cluster resides.
    2. In the left-side navigation pane, click Logstash Clusters.
  2. On the Logstash Clusters page, click Create.
  3. On the buy page, configure cluster launch settings.
    In this example, Billing Method is set to Pay-As-You-Go, and Logstash Version is set to 6.7. Default values are retained for other parameters. For more information, see Parameters on the buy page.
    Note
    • We recommend that you purchase pay-as-you-go Logstash clusters for program development or functional testing.
    • Discounts are offered for subscription clusters.
  4. Read and select Logstash (Pay-as-you-go) Terms of Service. Then, click Buy Now.
  5. After the cluster is created, click Console to navigate to the Logstash Clusters page and view the newly created cluster.

Step 2: Create and run a Logstash pipeline

After the state of the newly created Logstash cluster becomes Active, you can create and run a Logstash pipeline to synchronize data.

  1. On the Logstash Clusters page, find the newly created Logstash cluster and click Manage Pipelines in the Actions column.
  2. In the Pipelines section of the page that appears., click Create Pipeline.
    Create Pipeline
  3. In the Config Settings step, specify Pipeline ID and Config Settings.
    The following configurations are used in this example:
    input {
        elasticsearch {
        hosts => ["http://es-cn-0pp1f1y5g000h****.elasticsearch.aliyuncs.com:9200"]
        user => "elastic"
        password => "your_password"
        index => "*"
        docinfo => true
      }
    }
    filter {
    }
    output {
      elasticsearch {
        hosts => ["http://es-cn-mp91cbxsm000c****.elasticsearch.aliyuncs.com:9200"]
        user => "elastic"
        password => "your_password"
        index => "%{[@metadata][_index]}"
        document_type => "%{[@metadata][_type]}"
        document_id => "%{[@metadata][_id]}"
      }
      file_extend {
         path => "/ssd/1/ls-cn-v0h1kzca****/logstash/logs/debug/test"
       }
    }
    Parameter Description
    hosts The endpoint of the source Elasticsearch cluster and that of the destination Elasticsearch cluster. In the input part, specify this parameter in the format of http://<ID of the source Elasticsearch cluster>.elasticsearch.aliyuncs.com:9200. In the output part, specify this parameter in the format of http://<ID of the destination Elasticsearch cluster>.elasticsearch.aliyuncs.com:9200.
    user The usernames that are used to access the Elasticsearch clusters. The default username is elastic.
    password The passwords that are used to access the Elasticsearch clusters. The passwords are specified when you create the clusters. If you forget the passwords, you can reset them. For more information about the procedure and precautions for resetting a password, see Reset the access password for an Elasticsearch cluster.
    index The name of the index whose data you want to synchronize or the name of the index to which you want to write data. The value %{[@metadata][_index]} indicates that the system matches the index parameter in the metadata. In this case, the name of the destination index is the same as that of the source index.
    docinfo The value true indicates that the system extracts the metadata of Elasticsearch documents, such as the index, type, and ID.
    document_type The type of the destination index. The value %{[@metadata][_type]} indicates that the system matches the document_type parameter in the metadata. In this case, the type of the destination index is the same as that of the source index.
    document_id The IDs of documents in the destination index. The value %{[@metadata][_id]} indicates that the system matches the document_id parameter in the metadata. In this case, the IDs of the documents in the destination index are the same as those in the source index.
    file_extend Enables the pipeline configuration debugging feature. You can use the path field to specify the path that stores debug logs. Before you use this feature, you must install the logstash-output-file_extend plug-in. For more information, see Use the pipeline configuration debugging feature.
    Notice By default, the path field is set to a system-specified path. We recommend that you do not change the path. You can click Start Configuration Debug to obtain the path.

    For more information about configurations in Config Settings, see Logstash configuration files.

  4. Click Next to configure pipeline parameters.
    In the Pipeline Parameters step, set Pipeline Workers to the number of vCPU cores that you configured for the Logstash cluster. Default values are retained for other parameters. For more information, see Use configuration files to manage pipelines.
  5. Click Save or Save and Deploy.
    • Save: After you click this button, the system stores the pipeline settings and triggers a cluster change. However, the settings do not take effect. After you click Save, the Pipelines page appears. In the Pipelines section, find the created pipeline and click Deploy in the Actions column. Then, the system restarts the Logstash cluster to make the settings take effect.
    • Save and Deploy: After you click this button, the system restarts the Logstash cluster to make the settings take effect.
  6. In the message that appears, click OK.
    Then, you can view the newly created pipeline in the Pipelines section. After the state of the pipeline changes to Running, your Logstash cluster starts to synchronize data. Pipelines

Step 3: View synchronization results

After the Logstash pipeline is created and starts to run, you can log on to the Kibana console of the destination Elasticsearch cluster to view data synchronization results.

  1. Log on to the Kibana console of the destination Elasticsearch cluster.
    For more information, see Log on to the Kibana console.
  2. In the left-side navigation pane, click Dev Tools.
  3. On the Console tab of the page that appears, run the following command to view data synchronization results:
    GET /my_index/_search
    {
      "query": {
        "match_all": {}
      }
    }
    If the command is successfully run, the result shown in the following figure is returned. Results

What to do next