This topic describes how to migrate data from an Amazon Elasticsearch Service (Amazon ES) domain to an Alibaba Cloud Elasticsearch cluster.

The following figure shows the reference architecture for the migration.

Reference architecture for Elasticsearch data migration

Terms

  • Elasticsearch: a distributed, RESTful search and analysis engine designed for various scenarios. As the core of the Elastic Stack, Elasticsearch stores your data in a centralized manner and searches and analyzes data.
  • Kibana: provides a visual interface for you to search and analyze data.
  • Amazon ES: a fully managed service that offers easy-to-use Elasticsearch API operations and real-time analytics capabilities. This service also provides the availability, scalability, and security that are required for production workloads. You can use Amazon ES to easily deploy, protect, manage, and scale Elasticsearch clusters for scenarios such as log analysis, full-text search, and application monitoring.
  • Alibaba Cloud Elasticsearch: It is designed based on open-source Elasticsearch for scenarios such as data analytics and search. It provides enterprise-grade access control, automated reporting, and security monitoring and alerting.
  • Snapshot and restore: You can store snapshots of individual indexes or an entire cluster in a remote repository like a shared file system, such as Amazon Simple Storage Service (Amazon S3) or HDFS. The snapshots can be used to restore data. However, the data can be restored only to Elasticsearch clusters of specific versions:
    • Data in a snapshot created in an Elasticsearch 5.x cluster can be restored to an Elasticsearch 6.x cluster.
    • Data in a snapshot created in an Elasticsearch 2.x cluster can be restored to an Elasticsearch 5.x cluster.
    • Data in a snapshot created in an Elasticsearch 1.x cluster can be restored to an Elasticsearch 2.x cluster.
    Note The first snapshot that you create for an index is a full snapshot. Subsequent snapshots are incremental snapshots.

Migration plan

To migrate data to an Alibaba Cloud Elasticsearch cluster, follow these steps:
  1. Create a baseline index.
    1. Create a snapshot repository and associate it with an S3 bucket.
    2. Create the first snapshot for the index whose data you want to migrate. The first snapshot is a full snapshot.

      This snapshot is automatically stored in the S3 bucket.

    3. Create an Object Storage Service (OSS) bucket on Alibaba Cloud, and register it with the snapshot repository of your Alibaba Cloud Elasticsearch cluster.
    4. Use ossimport to transfer the full snapshot from the S3 bucket to the OSS bucket.
    5. Restore data from the full snapshot to your Alibaba Cloud Elasticsearch cluster.
  2. Process incremental snapshots on a regular basis.

    Repeat the preceding steps to restore data from incremental snapshots.

  3. Identify the final snapshot and perform a service switchover.
    1. Stop services that may modify index data.
    2. Create the final snapshot for your Amazon ES domain.
    3. Transfer the final snapshot to your OSS bucket. Then, restore data from the snapshot to your Alibaba Cloud Elasticsearch cluster.
    4. Perform a service switchover to the cluster.

Prerequisites

  • An Amazon ES 5.5.2 domain is created in the Asia Pacific (Singapore) region.
  • An Alibaba Cloud Elasticsearch V5.5.3 cluster is created in the China (Hangzhou) region.

    For more information, see Create an Elasticsearch cluster.

  • The index whose data you want to migrate is prepared. This topic uses the movies index as an example.

Prerequisites for creating manual snapshots in an Amazon ES domain

Amazon ES automatically creates snapshots for the primary index shards in a domain every day and stores them in a pre-configured S3 bucket. These snapshots are retained for a maximum of 14 days at no additional charge. You can use these snapshots to restore data to the domain. However, you cannot use them to migrate data to other domains. To migrate data, you must use manual snapshots stored in your S3 bucket. Standard S3 charges apply to manual snapshots.

To create manual snapshots and restore data from the snapshots, you must use AWS Identity and Access Management (IAM) and S3. Before you create snapshots, perform the operations that are listed in the following table.

Operation Description
Create an S3 bucket The bucket stores manual snapshots of your Amazon ES domain.
Create an IAM role The role is used to grant permissions on Amazon ES. When you add a trust relationship for the role, you must specify Amazon ES in the Principal element. This role is also required when you register a snapshot repository with Amazon ES. Only IAM users that assume this role can register the snapshot repository.
Create an IAM policy This policy specifies the actions that S3 can perform on your S3 bucket. The policy must be attached to the IAM role that is used to grant permissions on Amazon ES. You must specify your S3 bucket in the Resource element of the policy.
  • Create an S3 bucket

    You need an S3 bucket to store manual snapshots. Record its Amazon Resource Name (ARN). The ARN is used by the following items:

    • Resource element of the IAM policy that is attached to your IAM role
    • Python client that is used to register a snapshot repository
    The following example shows the ARN of an S3 bucket:
    arn:aws:s3:::eric-es-index-backups
  • Create an IAM role
    You must have an IAM role, for which Amazon ES (es.amazonaws.com) is specified in the Service element in its trust relationship. Example:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "",
          "Effect": "Allow",
          "Principal": {
            "Service": "es.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }

    You can view the trust relationship details in the AWS IAM console.

    View trust relationships
    Note When you create a role in the IAM console, Amazon ES is not included in the Select role type drop-down list. You can select Amazon EC2 from the drop-down list and create the role as prompted. Then, change ec2.amazonaws.com in the trust relationship of the role to es.amazonaws.com.
  • Create an IAM policy
    You must attach an IAM policy to the IAM role. The policy specifies the S3 bucket that is used to store the manual snapshots of your Amazon ES domain. The following example specifies the ARN of the eric-es-index-backups bucket:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "s3:ListBucket"
                ],
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:s3:::eric-es-index-backups"
                ]
            },
            {
                "Action": [
                    "s3:GetObject",
                    "s3:PutObject",
                    "s3:DeleteObject"
                ],
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:s3:::eric-es-index-backups/*"
                ]
            }
        ]
    }
    1. Copy the policy content to the Edit policy section.Edit policy section
    2. Click Policy summary to check whether the policy is correct.Policy summary
    3. Attach the policy to the role.Attach an IAM policy to an IAM role

Register a manual snapshot repository

You can create manual snapshots only after you register a snapshot repository with Amazon ES. Before you create manual snapshots, sign your AWS request to the user or role specified in the trust relationship of the IAM role. For more information, see Prerequisites for creating manual snapshots in an Amazon ES domain.

Notice You cannot use a curl command to register a snapshot repository because the command does not support AWS request signing. Instead, use the sample Python client to register a snapshot repository.
  1. Download the Sample Python Client file.
  2. Modify the file.

    Change the values highlighted in yellow in the file based on actual conditions. Then, copy the content into a Python file named snapshot.py.

    The following table describes the variables in the Sample Python Client file.

    Variable Description
    region The AWS region where the snapshot repository is created.
    host The endpoint of your Amazon ES domain.
    aws_access_key_id The ID of your IAM credential.
    aws_secret_access_key The key of your IAM credential.
    path The name of the snapshot repository.
    data: bucket; region;role_arn The value must include the name and ARN of the S3 bucket for the IAM role that you created in Prerequisites for creating manual snapshots in an Amazon ES domain.
    Notice
    • If you want to enable server-side encryption with S3-managed keys for the snapshot repository, add "server_side_encryption": true to the settings JSON array.
    • If the S3 bucket resides in the ap-southeast-1 region, replace "region": "ap-southeast-1" with "endpoint": "s3.amazonaws.com".
  3. Install Amazon Web Services Library boto-2.48.0.
    The preceding sample Python client requires that you install the boto package of version 2.x on the computer where you register your snapshot repository.
    # wget https://pypi.python.org/packages/66/e7/fe1db6a5ed53831b53b8a6695a8f134a58833cadb5f2740802bc3730ac15/boto-2.48.0.tar.gz#md5=ce4589dd9c1d7f5d347363223ae1b970 
    # tar zxvf boto-2.48.0.tar.gz
    # cd boto-2.48.0
    # python setup.py install
  4. Run the Python client to register the snapshot repository.
    # pyth
    on snapshot.py
  5. Log on to the Kibana console of your AWS ES domain. In the left-side navigation pane, click Dev Tools. On the Console tab of the page that appears, run the following command to view the registration result:
    GET _snapshot
    View the registration result

Create the first snapshot and restore data from the snapshot

  1. Create a snapshot in your Amazon ES domain.
    Note You can run the following commands in the Kibana console or by using curl commands in the Linux or Mac OS X command line interface (CLI).
    • Create a snapshot named snapshot_movies_1 for the movies index in the eric-snapshot-repository snapshot repository.
      PUT _snapshot/eric-snapshot-repository/snapshot_movies_1
      {
      "indexes": "movies"
      }
    • View snapshot status.
      GET _snapshot/ eric-snapshot-repository/snapshot_movies_1
      View snapshot status
    • In the S3 console, view snapshot objects.View snapshot objects
  2. Transfer the created snapshot from your S3 bucket to your OSS bucket.

    For more information, see Migrate data from Amazon S3 to Alibaba Cloud OSS.

    After the snapshot is transferred, view the snapshot in the OSS console.

    Snapshot stored in OSS
  3. Restore data from the snapshot to your Alibaba Cloud Elasticsearch cluster.
    1. Create a snapshot repository.
      Log on to the Kibana console of your Elasticsearch cluster. In the left-side navigation pane, click Dev Tools. On the Console tab of the page that appears, run the following command to create a snapshot repository. The name of the snapshot repository must be the same as that of the snapshot repository registered with Amazon ES:
      PUT _snapshot/eric-snapshot-repository
      {
      "type": "oss",
      "settings": {
                  "endpoint": "http://oss-cn-hangzhou-internal.aliyuncs.com",  
                  "access_key_id": "your AccessKeyID",
                  "secret_access_key": "your AccessKeySecret ",
                  "bucket": "eric-oss-aws-es-snapshot-s3",
                  "compress": true
            }
      }
    2. View the status of the snapshot named snapshot_movies_1.
      GET _snapshot/eric-snapshot-repository/snapshot_movies_1
      View snapshots
      Note Record the start time and end time of the snapshot creation operation. This record is used when you use ossimport to migrate data in incremental snapshots. Example:
      • "start_time_in_millis": 1519786844591
      • "end_time_in_millis": 1519786846236
  4. Restore data from the snapshot.
    Log on to the Kibana console of your Elasticsearch cluster. For more information, see Log on to the Kibana console. Then, in the left-side navigation pane, click Dev Tools. On the Console tab of the page that appears, run the following command to view the availability of the movies index:
    POST _snapshot/eric-snapshot-repository/snapshot_movies_1/_restore
    {
        "indexes": "movies"
    }
    GET movies/_recovery
    After the command is executed successfully, you can view three sets of data in the movies index. In addition, the data is the same as that in the Amazon ES domain.View index data

Create the final snapshot and restore data from the snapshot

  1. Insert data to the movies index in your Amazon ES domain.

    The movies index contains three sets of data. You need to insert two other sets of data.

    Insert two other sets of data

    You can run the GET movies/_count command to view the data volume in the index.

  2. Create a snapshot.
    Run the following command to create a snapshot. For more information, see step 1 in the "Create the first snapshot and restore data from the snapshot" section.
    PUT _snapshot/eric-snapshot-repository/snapshot_movies_2
    {
    "indices": "movies"
    }
    After the snapshot is created, run the following command to view the status of the snapshot:
    GET _snapshot/eric-snapshot-repository/snapshot_movies_2

    View objects in your S3 bucket.

    View objects in your S3 bucket
  3. Transfer the snapshot from your S3 bucket to your OSS bucket.

    You can use ossimport to transfer the snapshot from the S3 bucket to the OSS bucket. The S3 bucket stores two snapshot objects. You can change the value of the isSkipExistFile variable in the local_job.cfg file to migrate the incremental snapshot object.

    The isSkipExistFile variable indicates whether existing objects are skipped during data migration. The value of this variable is of the Boolean type. The default value is false. If you set the value to true, objects are skipped based on the size and LastModifiedTime settings. If you set the value to false, existing objects are overwritten. If jobType is set to audit, this variable is invalid.

    Then, you can view the incremental snapshot object in the OSS bucket.

    Incremental snapshot object in the OSS bucket
  4. Restore data from the snapshot.
    For more information, see step 4 in the Create the first snapshot and restore data from the snapshot section. Before you restore data, you must close the movies index. After the restoration, you can open the index.
    • Close the movies index
      POST /movies/_close
    • View the status of the movies index
      GET movies/_stats
    • Restore data from the snapshot
      POST _snapshot/eric-snapshot-repository/snapshot_movies_2/_restore
      {
          "indexes": "movies"
      }
    • Open the movies index
      POST /movies/_open

    After data is restored from the snapshot, the number of documents in the movies index of your Elasticsearch cluster is 5. This number is the same as that in the index of your Amazon ES domain.

    Data restoration result

Summary

You can use the snapshot and restore feature to migrate data from an Amazon ES domain to an Alibaba Cloud Elasticsearch cluster. This feature requires that you close the index whose data you want to migrate to avoid requests and write operations during the migration.

References: