You can use an Alibaba Cloud Logstash pipeline to migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster. This topic describes the migration procedure in detail.

Prerequisites

  • A self-managed Elasticsearch cluster is created.
    We recommend that you create a self-managed Elasticsearch cluster on Alibaba Cloud Elastic Compute Service (ECS) instances. For more information, see Install and Run Elasticsearch.
    Notice
    • The ECS instances that host the self-managed Elasticsearch cluster must be deployed in a virtual private cloud (VPC). You cannot use ECS instances that are connected to a VPC over ClassicLink.
    • Alibaba Cloud Logstash clusters are deployed in VPCs. Before you configure a Logstash pipeline, you must check whether the ECS instances that host the self-managed Elasticsearch cluster reside in the same VPC as the Alibaba Cloud Logstash cluster that you want to use. If they reside in different VPCs, you must configure NAT gateways to connect the ECS instances and Logstash cluster to the Internet. For more information, see Configure a NAT gateway for data transmission over the Internet.
    • You must configure security group rules to allow access from the IP addresses of the nodes in the Logstash cluster for the security groups of the ECS instances that host the self-managed Elasticsearch cluster. In addition, you must enable port 9200. You can obtain the IP addresses of the nodes in the Logstash cluster on the Basic Information page of the Logstash cluster.
    • In this example, an Alibaba Cloud Logstash V6.7.0 cluster is used to migrate data from a self-managed Elasticsearch 5.6.16 cluster to an Alibaba Cloud Elasticsearch V6.7.0 cluster. The scripts provided in this topic apply only to this type of data migration. If you want to perform other types of data synchronization, you must check whether your Elasticsearch clusters and Logstash cluster are compatible with each other based on the instructions in Compatibility matrixes. If they are not compatible with each other, you can upgrade their versions or purchase new clusters.
  • An Alibaba Cloud Logstash cluster is created.

    For more information, see Create an Alibaba Cloud Logstash cluster.

  • An Alibaba Cloud Elasticsearch cluster is created in the VPC where the Alibaba Cloud Logstash cluster resides. Make sure that the Alibaba Cloud Elasticsearch cluster is of the same version as the Logstash cluster. In this example, V6.7.0 is used.

    For more information, see Create an Alibaba Cloud Elasticsearch cluster.

  • The Auto Indexing feature is enabled for the Alibaba Cloud Elasticsearch cluster.

    For more information, see Configure the YML file.

    Note Logstash does not synchronize the structure features of data when Logstash migrates data. Therefore, if you enable the Auto Indexing feature, the structure of data may change after the data is migrated to the destination. If you want the structure of the data to remain unchanged, we recommend that you create an empty index in the destination and migrate data to the index. When you create the index, copy the mappings and settings configurations of the source and set the numbers of shards and replicas to appropriate values.

Configure and run a Logstash pipeline

  1. Log on to the Elasticsearch console.
  2. Navigate to the desired cluster.
    1. In the top navigation bar, select the region where the cluster resides.
    2. In the left-side navigation pane, click Logstash Clusters. On the Logstash Clusters page, find the cluster and click its ID.
  3. In the left-side navigation pane of the page that appears, click Pipelines.
  4. On the Pipelines page, click Create Pipeline.
    Create a pipeline
  5. In the Create Task wizard, enter a pipeline ID and configure the pipeline.
    In this example, the following configurations are used for the pipeline:
    input {
      elasticsearch {
        hosts => ["http://<IP address of the master node in the self-managed Elasticsearch cluster>:9200"]
        user => "elastic"
        index => "*,-.monitoring*,-.security*,-.kibana*"
        password => "your_password"
        docinfo => true
      }
    }
    filter {
    }
    output {
      elasticsearch {
        hosts => ["http://es-cn-mp91cbxsm000c****.elasticsearch.aliyuncs.com:9200"]
        user => "elastic"
        password => "your_password"
        index => "%{[@metadata][_index]}"
        document_type => "%{[@metadata][_type]}"
        document_id => "%{[@metadata][_id]}"
      }    
      file_extend {
            path => "/ssd/1/ls-cn-v0h1kzca****/logstash/logs/debug/test"
        }
    }
    Table 1. Parameters
    Parameter Description
    hosts The endpoint of the self-managed Elasticsearch cluster or Alibaba Cloud Elasticsearch cluster. In the input part, specify a value for this parameter in the format of http://<IP address of the master node in the self-managed Elasticsearch cluster>:<Port number>. In the output part, specify a value for this parameter in the format of http://<ID of the Alibaba Cloud Elasticsearch cluster>.elasticsearch.aliyuncs.com:9200.
    Notice When you configure this parameter, you must replace <IP address of the master node in the self-managed Elasticsearch cluster>, <Port number>, and <ID of the Alibaba Cloud Elasticsearch cluster> with your actual values.
    user The username that is used to access the self-managed Elasticsearch cluster or Alibaba Cloud Elasticsearch cluster.
    Notice
    • The user and password parameters are required in most cases. If the X-Pack plug-in is not installed on the self-managed Elasticsearch cluster, you can leave the two parameters empty.
    • The default username that is used to access the Alibaba Cloud Elasticsearch clusters is elastic. The default username is used in this example. You can use a custom username. Before you use a custom username, you must create a role for it and grant the required permissions to the role. For more information, see Use the RBAC mechanism provided by Elasticsearch X-Pack to implement access control.
    password The password that is used to access the self-managed Elasticsearch cluster or Alibaba Cloud Elasticsearch cluster.
    index The names of the indexes whose data you want to migrate or to which you want to migrate data. If you set this parameter to *,-.monitoring*,-.security*,-.kibana* in the input part, the system migrates data in indexes other than system indexes whose names start with a period (.). If you set this parameter to %{[@metadata][_index]} in the output part, the system matches the index parameter in the metadata. This indicates that the names of the indexes generated on the Alibaba Cloud Elasticsearch cluster are the same as the names of the indexes on the self-managed Elasticsearch cluster.
    docinfo If you set this parameter to true, the system extracts the metadata of documents in the self-managed Elasticsearch cluster, such as the index, type, and id fields.
    document_type If you set this parameter to %{[@metadata][_type]}, the system matches the index type in the metadata. This indicates that the type of the indexes generated on the Alibaba Cloud Elasticsearch cluster is the same as the type of the indexes on the self-managed Elasticsearch cluster.
    document_id If you set this parameter to %{[@metadata][_id]}, the system matches the document IDs in the metadata. This indicates that the IDs of the documents generated on the Alibaba Cloud Elasticsearch cluster are the same as the IDs of the documents on the self-managed Elasticsearch cluster.
    file_extend specifies whether the pipeline configuration debugging feature is enabled. This parameter is optional. You can use the path field to specify the path that stores debug logs. We recommend that you configure this parameter. After the parameter is configured, you can directly view the output data of the pipeline in the console. If the parameter is not configured, you need to check the output data of the pipeline in the destination. If the output data is incorrect, you need to modify the configuration of the pipeline in the console. This increases time and labor costs. For more information, see Use the pipeline configuration debugging feature.
    Notice Before you use the file_extend parameter, you must install the logstash-output-file_extend plug-in. For more information, see Install and remove a plug-in. By default, the path field is set to a system-specified path. We recommend that you do not change the path. You can click Start Configuration Debug to obtain the path.

    For more information about how to configure parameters in the Config Settings field, see Logstash configuration files.

  6. Click Next to configure pipeline parameters.
    Configure pipeline parameters
    Table 2. Pipeline parameters
    Parameter Description
    Pipeline Workers The number of worker threads that run the filter and output plug-ins of the pipeline in parallel. If a backlog of events exists or some CPU resources are not used, we recommend that you increase the number of threads to maximize CPU utilization. The default value of this parameter is the number of vCPUs.
    Pipeline Batch Size The maximum number of events that a single worker thread can collect from input plug-ins before it attempts to run filter and output plug-ins. If you set this parameter to a large value, a single worker thread can collect more events but consumes larger memory. If you want to make sure that the worker thread has sufficient memory to collect more events, specify the LS_HEAP_SIZE variable to increase the Java virtual machine (JVM) heap size. Default value: 125.
    Pipeline Batch Delay The wait time for an event. This time occurs before you assign a small batch to a pipeline worker thread and after you create batch tasks for pipeline events. Default value: 50. Unit: milliseconds.
    Queue Type The internal queue model for buffering events. Valid values:
    • MEMORY: traditional memory-based queue. This is the default value.
    • PERSISTED: disk-based ACKed queue, which is a persistent queue.
    Queue Max Bytes The value must be less than the total capacity of your disk. Default value: 1024. Unit: MB.
    Queue Checkpoint Writes The maximum number of events that are written before a checkpoint is enforced when persistent queues are enabled. The value 0 indicates no limit. Default value: 1024.
    Warning After you configure the parameters, you must save the settings and deploy the pipeline. This triggers a restart of the Logstash cluster. Before you can proceed, make sure that the restart does not affect your services.
  7. Click Save or Save and Deploy.
    • Save: After you click this button, the system stores the pipeline settings and triggers a cluster change. However, the settings do not take effect. After you click Save, the Pipelines page appears. On the Pipelines page, find the created pipeline and click Deploy in the Actions column. Then, the system restarts the Logstash cluster to make the settings take effect.
    • Save and Deploy: After you click this button, the system restarts the Logstash cluster to make the settings take effect.

View migration results

  1. Log on to the Kibana console of the Alibaba Cloud Elasticsearch cluster.
    For more information, see Log on to the Kibana console.
  2. In the left-side navigation pane, click Dev Tools.
  3. On the Console tab of the page that appears, run the GET /_cat/indices?v command to view the indexes that store the migrated data.
    Index that stores the migrated data

FAQ

  • Q: How do I connect the ECS instances that host the self-managed Elasticsearch cluster to the Alibaba Cloud Logstash cluster when the ECS instances and the Logstash cluster belong to different accounts?

    A: The ECS instances and the Logstash cluster belong to different accounts. Therefore, the ECS instances and the Logstash cluster reside in different VPCs. In this case, you can use Cloud Enterprise Network (CEN) to connect the ECS instances to the Logstash cluster. For more information, see Step 3: Attach network instances.

  • Q: An error occurs when Logstash writes data to the destination. How do I do?

    A: Troubleshoot the error based on the instructions provided in FAQ about data transfer by using Logstash.