All Products
Search
Document Center

Elasticsearch index migration from AWS to Alibaba Cloud

Last Updated: Feb 21, 2019

Abstract

This document describes Elasticsearch (ES) index migration from AWS to Alibaba Cloud.

A reference architecture diagram is shown in the following figure:arch

Introduction

Concepts

  • Elasticsearch: A distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.
  • Kibana: Lets you visualize your Elasticsearch data and navigate the Elastic Stack.
  • Amazon Elasticsearch Service: It’s easy to deploy, secure, operate, and scale Elasticsearch for log analytics, full text search, application monitoring, and more. Amazon Elasticsearch Service is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time analytics capabilities alongside the availability, scalability, and security that production workloads require.
  • Alibaba Elasticsearch Service: Alibaba Cloud’s Elasticsearch service. In this guide, we explain how to use Elasticsearch through our Alibaba Cloud China site. Have not onboard on International site.
  • Snapshot and Restore: You can store snapshots of individual indexes or an entire cluster in a remote repository like a shared file system, S3, or HDFS. These snapshots are great for backups because they can be restored relatively quickly. However, snapshots can only be restored to versions of Elasticsearch that can read the indexes:

    • A snapshot of an index created in 5.x can be restored to 6.x.
    • A snapshot of an index created in 2.x can be restored to 5.x.
    • A snapshot of an index created in 1.x can be restored to 2.x.

    Conversely, snapshots of indexes created in 1.x cannot be restored to 5.x or 6.x, and snapshots of indexes created in 2.x cannot be restored to 6.x.

    Snapshots are incremental and can contain indexes created in various versions of Elasticsearch. If any indexes in a snapshot were created in an incompatible version, you will not be able restore the snapshot.

Solution overview

Elasticsearch (ES) indexes can be migrated with following steps:

Step 1: Create baseline indexes

  1. Create a snapshot repository and associate it to an AWS S3 Bucket.
  2. Create the first snapshot of the indexes to be migrated, which is a full snapshot.

    The snapshot will be automatically stored in the AWS S3 bucket created in the first step.

  3. Create an OSS Bucket on Alibaba Cloud, and register it to a snapshot repository of an Alibaba Cloud ES instance.

  4. Use the OSSImport tool to pull the data from the AWS S3 bucket into the Alibaba Cloud OSS bucket.

  5. Restore this full snapshot to the Alibaba Cloud ES instance.

Step 2: Periodic incremental snapshots

Repeat serval incremental snapshot and restore.

Step 3: Final snapshot and service switchover

  1. Stop services which can modify index data.
  2. Create a final incremental snapshot of the AWS ES instance.
  3. Transfer and restore the final incremental snapshot to an Alibaba Cloud ES instance.
  4. Perform service switchover to the Alibaba Cloud ES instance.

Prerequisites

Elasticsearch service

  • The version number of AWS ES is 5.5.2, located in the Singapore region.
  • The version number of Alibaba Cloud ES is 5.5.3, located in Hangzhou.
  • The demo index name is movies.

Manual Snapshot Prerequisites on AWS

Amazon ES takes daily automated snapshots of the primary index shards in a domain, and stores these automated snapshots in a preconfigured Amazon S3 bucket for 14 days at no additional charge to you. You can use these snapshots to restore the domain.

You cannot, however, use automated snapshots to migrate to new domains. Automated snapshots are read-only from within a given domain. For migrations, you must use manual snapshots stored in your own repository (an S3 bucket). Standard S3 charges apply for manual snapshots.

To create and restore index snapshots manually, you must work with IAM and Amazon S3. Verify that you have met the following prerequisites before you attempt to take a snapshot.

Prerequisite Description
S3 bucket Stores manual snapshots for your Amazon ES domain.
IAM role Delegates permissions to Amazon Elasticsearch Service. The trust relationship for the role must specify Amazon Elasticsearch Service in the Principal statement. The IAM role also is required to register your snapshot repository with Amazon ES. Only IAM users with access to this role may register the snapshot repository.
IAM policy Specifies the actions that Amazon S3 may perform with your S3 bucket. The policy must be attached to the IAM role that delegates permissions to Amazon Elasticsearch Service. The policy must specify an S3 bucket in a Resource statement.

S3 bucket

You need an S3 bucket to store manual snapshots. Make a note of its Amazon Resource Name (ARN). You need it for the following:

  • Resource statement of the IAM policy that is attached to your IAM role.
  • Python client that is used to register a snapshot repository.

The following example shows an ARN for an S3 bucket:

arn:aws:s3:::eric-es-index-backups

IAM role

You must have a role that specifies Amazon Elasticsearch Service, es.amazonaws.com, in a ServiceStatement in its trust relationship, as shown in the following example:

  1. {
  2. "Version": "2012-10-17",
  3. "Statement": [
  4. {
  5. "Sid": "",
  6. "Effect": "Allow",
  7. "Principal": {
  8. "Service": "es.amazonaws.com"
  9. },
  10. "Action": "sts:AssumeRole"
  11. }
  12. ]
  13. }

In the AWS IAM Console, you can find Trust Relationship details here:0203

When you create an AWS service role by using the IAM Console, Amazon ES is not included in the Select role type list. However, you can still create the role by choosing Amazon EC2, following the steps to create the role, and then editing the role’s trust relationships to es.amazonaws.com instead of ec2.amazonaws.com.

IAM Policy

You must attach an IAM policy to the IAM role. The policy specifies the S3 bucket that is used to store manual snapshots for your Amazon ES domain. The following example specifies the ARN of the eric-es-index-backups bucket:

  1. {
  2. "Version": "2012-10-17",
  3. "Statement": [
  4. {
  5. "Action": [
  6. "s3:ListBucket"
  7. ],
  8. "Effect": "Allow",
  9. "Resource": [
  10. "arn:aws:s3:::eric-es-index-backups"
  11. ]
  12. },
  13. {
  14. "Action": [
  15. "s3:GetObject",
  16. "s3:PutObject",
  17. "s3:DeleteObject"
  18. ],
  19. "Effect": "Allow",
  20. "Resource": [
  21. "arn:aws:s3:::eric-es-index-backups/*"
  22. ]
  23. }
  24. ]
  25. }

You need to paste it in here:04

You can make sure the policy is correct by looking at the Policy summary, as follows:05

Attach IAM Policy to IAM Role

06

Registering a manual snapshot directory

You must register the snapshot directory with Amazon Elasticsearch Service before you can take manual index snapshots. This one-time operation requires that you sign your AWS request with credentials for one of the users or roles specified in the IAM role’s trust relationship, as described in Section Manual snapshot prerequisites on AWS.

You can’t use curl to perform this operation because it doesn’t support AWS request signing. Instead, use the sample Python client to register your snapshot directory.

Modify sample python client

Download a copy of the file “Sample Python Client.docx”, then modify the values in yellow in that document to match your real values. Copy the contents of “Sample Python Client.docx” into a Python file called “snapshot.py” after you have finished editing.

Sample Python Client.docx

Variable name Description
region AWS Region where you created the snapshot repository
host Endpoint for your Amazon ES domain
aws_access_key_id IAM credential
aws_secret_access_key IAM credential
path Name of the snapshot repository
data:
bucket;
region;
role_arn
Must include the name of the S3 bucket and the ARN for the IAM role that you created in Section Manual snapshot prerequisites on AWS . To enable server-side encryption with S3-managed keys for the snapshot repository, add "server_side_encryption": true to the settings JSON.
Important
If the S3 bucket is in the us-east-1 region, you need to use "endpoint": "s3.amazonaws.com" in place of "region": "us-east-1".

Install Amazon Web Services Library boto-2.48.0

This sample Python client requires that you install version 2.x of the boto package on the computer where you register your snapshot repository.

  1. # wget https://pypi.python.org/packages/66/e7/fe1db6a5ed53831b53b8a6695a8f134a58833cadb5f2740802bc3730ac15/boto-2.48.0.tar.gz#md5=ce4589dd9c1d7f5d347363223ae1b970
  2. # tar zxvf boto-2.48.0.tar.gz
  3. # cd boto-2.48.0
  4. # python setup.py install

Execute Python client to register snapshot directory

  1. # python snapshot.py

Registering Snapshot Repository

Check result in Kibana->Dev Tools with request:

  1. GET _snapshot

07

Snapshot and restore for the first time

Take a snapshot manually on AWS ES

The following commands are all performed on Kibana->Dev Tools, you can also perform them using curl from the Linux or Mac OSX command line.

  • Take a snapshot with the name snapshot_movies_1 only for the index movies in the repository eric-snapshot-repository.
  1. PUT _snapshot/eric-snapshot-repository/snapshot_movies_1
  2. {
  3. "indexes": "movies"
  4. }
  • Check snapshot status
  1. GET _snapshot/ eric-snapshot-repository/snapshot_movies_1

08

  • Check snapshot files on the AWS S3 console09

Pull snapshot data from AWS S3 to Alibaba Cloud OSS

In this step, you need to pull snapshot data from your AWS S3 bucket into Alibaba Cloud OSS. For more information, see Migrate data from Amazon S3 to Alibaba Cloud OSS.

After data transfer, check stored snapshot data from the OSS console:

1

Restore snapshot to an Alibaba Cloud ES instance

Create snapshot repository

Perform the following request on Kibana->Dev Tools to create a snapshot repository with the same name: modify values as follows to match your real values.

  1. PUT _snapshot/eric-snapshot-repository
  2. {
  3. "type": "oss",
  4. "settings": {
  5. "endpoint": "http://oss-cn-hangzhou-internal.aliyuncs.com",
  6. "access_key_id": "Put your AccessKey id here.",
  7. "secret_access_key": "Put your secret AccessKey here.",
  8. "bucket": "eric-oss-aws-es-snapshot-s3",
  9. "compress": true
  10. }
  11. }

011

After creating the snapshot directory, check the snapshot status for the snapshot named snapshot_movies_1, which was assigned in AWS ES manual snapshot step.

  1. GET _snapshot/eric-snapshot-repository/snapshot_movies_1

012

Note: Please record the start time and end time of this snapshot operation: It will be used when you transfer incremental snapshot data with the Alibaba Cloud OSSimport tool. For example:

“start_time_in_millis”: 1519786844591

“end_time_in_millis”: 1519786846236

Restore snapshots

Perform the following request on Kibana->Dev Tools.

  1. POST _snapshot/eric-snapshot-repository/snapshot_movies_1/_restore
  2. {
  3. "indexes": "movies"
  4. }
  5. GET movies/_recovery

013

Check the availability of index movies on Kibana->Dev Tools, you can see there exist three records in the index movies, the number of records on your AWS ES instance.

014

Snapshot and restore for the last time

Create some sample data on AWS ES index movies

In the previous steps, you know that there are only three records in the index movies, so you insert another two records.015

You could also see the number of indexes using this request: GET movies/_count.

016

Take another snapshot manually

See section Take a snapshot manually on AWS ES, then check the snapshot status:016

Check the files listed in the S3 bucket:017

If you check the folder indexes, you can also find some differences.

Pull incremental snapshot data from AWS S3 to Alibaba Cloud OSS

You can use the OSSImport tool to migrate data from S3 to OSS. Because there are 2 snapshot files stored in our S3 bucket now, we try to migrate only new files by modifying the value of isSkipExistFile in the configuration file local_job.cfg.

Filed Meaning Description
isSkipExistFile Whether to skip the existing objects during data migration, a Boolean value. If it is set to true, the objects are skipped according to the size and LastModifiedTime.
If it is set to false, the existing objects are overwritten. The default value is false. This option is invalid when jobType is set to audit.

After the OSS Import migration job completes, you can see only ‘new’ files are migrated to OSS.

In your Alibaba Cloud OSS bucket:

2

In our AWS S3 bucket:

020

Restore an incremental snapshot

You can follow along with the steps from Section Restore snapshots , but the index movies needs to be closed firstly, then you have to restore the snapshot, and open the index again after restore:

  1. POST /movies/_close
  2. GET movies/_stats
  3. POST _snapshot/eric-snapshot-repository/snapshot_movies_2/_restore
  4. {
  5. "indexes": "movies"
  6. }
  7. POST /movies/_open

After the restore procedure completes, you can see the count (5) of documents in the index movies is the same as it is in our AWS ES instance.

021

Conclusion

It is possible to migrate AWS Elasticsearch service data to Alibaba Cloud’s Elasticsearch service by the snapshot and restore method.

This solution requires that the AWS ES instance is stopped first to prevent writes and requests during migration.

Further reading: