This document describes Elasticsearch (ES) index migration from AWS to Alibaba Cloud.
A reference architecture diagram is shown in the following figure:
- Elasticsearch: A distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.
- Kibana: Lets you visualize your Elasticsearch data and navigate the Elastic Stack.
- Amazon Elasticsearch Service: It’s easy to deploy, secure, operate, and scale Elasticsearch for log analytics, full text search, application monitoring, and more. Amazon Elasticsearch Service is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time analytics capabilities alongside the availability, scalability, and security that production workloads require.
- Alibaba Elasticsearch Service: Alibaba Cloud’s Elasticsearch service. In this guide, we explain how to use Elasticsearch through our Alibaba Cloud China site. Have not onboard on International site.
Snapshot and Restore: You can store snapshots of individual indexes or an entire cluster in a remote repository like a shared file system, S3, or HDFS. These snapshots are great for backups because they can be restored relatively quickly. However, snapshots can only be restored to versions of Elasticsearch that can read the indexes:
- A snapshot of an index created in 5.x can be restored to 6.x.
- A snapshot of an index created in 2.x can be restored to 5.x.
- A snapshot of an index created in 1.x can be restored to 2.x.
Conversely, snapshots of indexes created in 1.x cannot be restored to 5.x or 6.x, and snapshots of indexes created in 2.x cannot be restored to 6.x.
Snapshots are incremental and can contain indexes created in various versions of Elasticsearch. If any indexes in a snapshot were created in an incompatible version, you will not be able restore the snapshot.
Elasticsearch (ES) indexes can be migrated with following steps:
- Create a snapshot repository and associate it to an AWS S3 Bucket.
Create the first snapshot of the indexes to be migrated, which is a full snapshot.
The snapshot will be automatically stored in the AWS S3 bucket created in the first step.
Create an OSS Bucket on Alibaba Cloud, and register it to a snapshot repository of an Alibaba Cloud ES instance.
Use the OSSImport tool to pull the data from the AWS S3 bucket into the Alibaba Cloud OSS bucket.
Restore this full snapshot to the Alibaba Cloud ES instance.
Repeat serval incremental snapshot and restore.
- Stop services which can modify index data.
- Create a final incremental snapshot of the AWS ES instance.
- Transfer and restore the final incremental snapshot to an Alibaba Cloud ES instance.
- Perform service switchover to the Alibaba Cloud ES instance.
- The version number of AWS ES is 5.5.2, located in the Singapore region.
- The version number of Alibaba Cloud ES is 5.5.3, located in Hangzhou.
- The demo index name is movies.
Amazon ES takes daily automated snapshots of the primary index shards in a domain, and stores these automated snapshots in a preconfigured Amazon S3 bucket for 14 days at no additional charge to you. You can use these snapshots to restore the domain.
You cannot, however, use automated snapshots to migrate to new domains. Automated snapshots are read-only from within a given domain. For migrations, you must use manual snapshots stored in your own repository (an S3 bucket). Standard S3 charges apply for manual snapshots.
To create and restore index snapshots manually, you must work with IAM and Amazon S3. Verify that you have met the following prerequisites before you attempt to take a snapshot.
|S3 bucket||Stores manual snapshots for your Amazon ES domain.|
|IAM role||Delegates permissions to Amazon Elasticsearch Service. The trust relationship for the role must specify Amazon Elasticsearch Service in the Principal statement. The IAM role also is required to register your snapshot repository with Amazon ES. Only IAM users with access to this role may register the snapshot repository.|
|IAM policy||Specifies the actions that Amazon S3 may perform with your S3 bucket. The policy must be attached to the IAM role that delegates permissions to Amazon Elasticsearch Service. The policy must specify an S3 bucket in a Resource statement.|
You need an S3 bucket to store manual snapshots. Make a note of its Amazon Resource Name (ARN). You need it for the following:
- Resource statement of the IAM policy that is attached to your IAM role.
- Python client that is used to register a snapshot repository.
The following example shows an ARN for an S3 bucket:
You must have a role that specifies Amazon Elasticsearch Service,
es.amazonaws.com, in a ServiceStatement in its trust relationship, as shown in the following example:
In the AWS IAM Console, you can find Trust Relationship details here:
When you create an AWS service role by using the IAM Console, Amazon ES is not included in the Select role type list. However, you can still create the role by choosing Amazon EC2, following the steps to create the role, and then editing the role’s trust relationships to
es.amazonaws.com instead of
You must attach an IAM policy to the IAM role. The policy specifies the S3 bucket that is used to store manual snapshots for your Amazon ES domain. The following example specifies the ARN of the
You need to paste it in here:
You can make sure the policy is correct by looking at the Policy summary, as follows:
You must register the snapshot directory with Amazon Elasticsearch Service before you can take manual index snapshots. This one-time operation requires that you sign your AWS request with credentials for one of the users or roles specified in the IAM role’s trust relationship, as described in Section Manual snapshot prerequisites on AWS.
You can’t use curl to perform this operation because it doesn’t support AWS request signing. Instead, use the sample Python client to register your snapshot directory.
Download a copy of the file “Sample Python Client.docx”, then modify the values in yellow in that document to match your real values. Copy the contents of “Sample Python Client.docx” into a Python file called “snapshot.py” after you have finished editing.
|region||AWS Region where you created the snapshot repository|
|host||Endpoint for your Amazon ES domain|
|path||Name of the snapshot repository|
| data: |
| Must include the name of the S3 bucket and the ARN for the IAM role that you created in Section Manual snapshot prerequisites on AWS . To enable server-side encryption with S3-managed keys for the snapshot repository, add |
If the S3 bucket is in the us-east-1 region, you need to use
This sample Python client requires that you install version 2.x of the boto package on the computer where you register your snapshot repository.
# wget https://pypi.python.org/packages/66/e7/fe1db6a5ed53831b53b8a6695a8f134a58833cadb5f2740802bc3730ac15/boto-2.48.0.tar.gz#md5=ce4589dd9c1d7f5d347363223ae1b970
# tar zxvf boto-2.48.0.tar.gz
# cd boto-2.48.0
# python setup.py install
# python snapshot.py
Registering Snapshot Repository
Check result in Kibana->Dev Tools with request:
The following commands are all performed on Kibana->Dev Tools, you can also perform them using curl from the Linux or Mac OSX command line.
- Take a snapshot with the name snapshot_movies_1 only for the index movies in the repository
- Check snapshot status
GET _snapshot/ eric-snapshot-repository/snapshot_movies_1
- Check snapshot files on the AWS S3 console
In this step, you need to pull snapshot data from your AWS S3 bucket into Alibaba Cloud OSS. For more information, see Migrate data from Amazon S3 to Alibaba Cloud OSS.
After data transfer, check stored snapshot data from the OSS console:
Perform the following request on Kibana->Dev Tools to create a snapshot repository with the same name: modify values as follows to match your real values.
"access_key_id": "Put your AccessKey id here.",
"secret_access_key": "Put your secret AccessKey here.",
After creating the snapshot directory, check the snapshot status for the snapshot named snapshot_movies_1, which was assigned in AWS ES manual snapshot step.
Note: Please record the start time and end time of this snapshot operation: It will be used when you transfer incremental snapshot data with the Alibaba Cloud OSSimport tool. For example:
Perform the following request on Kibana->Dev Tools.
Check the availability of index movies on Kibana->Dev Tools, you can see there exist three records in the index movies, the number of records on your AWS ES instance.
In the previous steps, you know that there are only three records in the index movies, so you insert another two records.
You could also see the number of indexes using this request:
See section Take a snapshot manually on AWS ES, then check the snapshot status:
Check the files listed in the S3 bucket:
If you check the folder indexes, you can also find some differences.
You can use the OSSImport tool to migrate data from S3 to OSS. Because there are 2 snapshot files stored in our S3 bucket now, we try to migrate only new files by modifying the value of isSkipExistFile in the configuration file local_job.cfg.
|isSkipExistFile||Whether to skip the existing objects during data migration, a Boolean value.|| If it is set to true, the objects are skipped according to the size and LastModifiedTime. |
If it is set to false, the existing objects are overwritten. The default value is false. This option is invalid when jobType is set to audit.
After the OSS Import migration job completes, you can see only ‘new’ files are migrated to OSS.
In your Alibaba Cloud OSS bucket:
In our AWS S3 bucket:
You can follow along with the steps from Section Restore snapshots , but the index movies needs to be closed firstly, then you have to restore the snapshot, and open the index again after restore:
After the restore procedure completes, you can see the count (5) of documents in the index movies is the same as it is in our AWS ES instance.
It is possible to migrate AWS Elasticsearch service data to Alibaba Cloud’s Elasticsearch service by the snapshot and restore method.
This solution requires that the AWS ES instance is stopped first to prevent writes and requests during migration.