All Products
Search
Document Center

Elasticsearch:Use OSS to migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster

Last Updated:May 07, 2024

You can use the snapshots that are stored in Object Storage Service (OSS) to migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster. To migrate data, call the snapshot API to create a snapshot for the self-managed Elasticsearch cluster and store the snapshot in OSS. Then, restore data from the snapshot to your Alibaba Cloud Elasticsearch cluster. This topic describes the procedure in detail.

Background information

OSS allows you to migrate large volumes of data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster.

Procedure

  1. Step 1: Make preparations

    Prepare a self-managed Elasticsearch cluster, and create an OSS bucket and an Alibaba Cloud Elasticsearch cluster.

  2. Step 2: Install the elasticsearch-repository-oss plug-in

    Install the elasticsearch-repository-oss plug-in on each node of the self-managed Elasticsearch cluster. You can create an OSS repository for the self-managed Elasticsearch cluster only after the plug-in is installed.

  3. Step 3: Create a snapshot repository for the self-managed Elasticsearch cluster

    Call the snapshot API to create a snapshot repository for the self-managed Elasticsearch cluster.

  4. Step 4: Create a snapshot for specific indexes

    Create a snapshot for the indexes that you want to migrate and store the snapshot in the created snapshot repository.

  5. Step 5: Create the same snapshot repository for the Alibaba Cloud Elasticsearch cluster

    In the Kibana console of the Alibaba Cloud Elasticsearch cluster, call the snapshot API to create a snapshot repository for the cluster. The snapshot repository must have the same name as the snapshot repository for the self-managed Elasticsearch cluster.

  6. Step 6: Restore data to the Alibaba Cloud Elasticsearch cluster from the created snapshot

    Restore data from the snapshot in the snapshot repository of the self-managed Elasticsearch cluster to the Alibaba Cloud Elasticsearch cluster.

  7. Step 7: View restoration results

    View the restored indexes and the data in the indexes.

Step 1: Make preparations

  1. Prepare a self-managed Elasticsearch cluster.

    We recommend that you deploy an Elasticsearch cluster on Alibaba Cloud Elastic Compute Service (ECS) instances. For more information, see Installing and Running Elasticsearch.

    Note

    We recommend that you use Alibaba Cloud ECS instances in the same virtual private cloud (VPC) as the Alibaba Cloud Elasticsearch cluster to deploy a self-managed Elasticsearch cluster. If you use self-managed servers to deploy a cluster, a network connectivity issue may occur.

    In this example, a single-node Elasticsearch V6.7.0 cluster is used. In actual production, you can purchase multiple ECS instances that reside in the same VPC to deploy a multi-node Elasticsearch cluster. For more information about how to purchase an ECS instance, see Create an instance on the Custom Launch tab.

  2. Activate OSS, and create a bucket in the region where the ECS instance that hosts the self-managed Elasticsearch cluster resides.

    For more information, see Activate OSS and Create a bucket.

    Important

    The storage class of the bucket must be Standard. Elasticsearch does not support the Archive storage class.

  3. Create an Alibaba Cloud Elasticsearch cluster in the region where the created bucket resides.

Step 2: Install the elasticsearch-repository-oss plug-in

  1. Connect to the ECS instance that hosts the self-managed Elasticsearch cluster.

    For more information, see Connect to a Linux instance by using a password or key.

    Note

    In this example, a regular user is used.

  2. Download the installation package of the elasticsearch-repository-oss plug-in.

    In this example, the version of the plug-in is V6.7.0, which requires JDK 11.0 or later.

    wget https://github.com/aliyun/elasticsearch-repository-oss/releases/download/v6.7.0/elasticsearch-repository-oss-6.7.0.zip
    Note

    For more information about how to obtain the installation package of the elasticsearch-repository-oss plug-in of another version, see FAQ.

  3. Decompress the installation package to the plugins folder in the installation path for the self-managed Elasticsearch cluster on the ECS instance.

    sudo unzip -d /usr/local/elasticsearch-6.7.0/plugins/elasticsearch-repository-oss elasticsearch-repository-oss-6.7.0.zip

    You can also use a command to install the plug-in.

    sudo ./bin/elasticsearch-plugin install file:///usr/local/elasticsearch-repository-oss-6.7.0.zip
  4. Start the ECS instance that hosts the self-managed Elasticsearch cluster.

    cd /usr/local/elasticsearch-6.7.0
    ./bin/elasticsearch -d

Step 3: Create a snapshot repository for the self-managed Elasticsearch cluster

Connect to the ECS instance that hosts the self-managed Elasticsearch cluster and run the following command to create a snapshot repository:

sudo curl -H "Content-Type: application/json" -XPUT localhost:9200/_snapshot/<yourBackupName> -d' {"type": "oss", "settings": { "endpoint": "http://oss-cn-hangzhou-internal.aliyuncs.com",  "access_key_id": "<yourAccesskeyId>",  "secret_access_key":"<yourAccesskeySecret>", "bucket": "<yourBucketName>", "compress": true }}'

Parameter

Description

<yourBackupName>

The name of the repository, which can be customized.

type

The type of the repository. Set this parameter to oss.

endpoint

The endpoint of your OSS bucket. For more information, see Regions and endpoints.

Note

If the ECS instance that hosts the self-managed Elasticsearch cluster resides in the same region as your OSS bucket, use the internal endpoint of the OSS bucket. Otherwise, use the public endpoint of the OSS bucket.

access_key_id

The AccessKey ID of the Alibaba Cloud account that is used to create the OSS bucket. For more information about how to obtain the AccessKey ID, see How do I obtain an AccessKey pair?

secret_access_key

The AccessKey secret of the Alibaba Cloud account that is used to create the OSS bucket. For more information about how to obtain the AccessKey secret, see How do I obtain an AccessKey pair?

bucket

The name of the OSS bucket.

compress

Specifies whether to enable compression. Valid values:

  • true: indicates that compression is enabled.

  • false: indicates that compression is disabled.

If the repository is created, "acknowledge":true is returned.

Step 4: Create a snapshot for specific indexes

Create a snapshot for indexes whose data you want to migrate. By default, all indexes in the open state are backed up in the snapshot. If you do not want to back up system indexes, such as indexes whose names start with .kibana, .security, or .monitoring, you can specify the indexes that you want to back up.

Important

We recommend that you do not back up system indexes because they occupy large storage space.

curl -H "Content-Type: application/json" -XPUT localhost:9200/_snapshot/<yourBackupName>/snapshot_1?pretty -d'
{
"indices": "index1,index2"
}'

Replace <yourBackupName> with the name of the snapshot repository that you created in Step 3: Create a snapshot repository for the self-managed Elasticsearch cluster. Replace index1 and index2 with the names of the indexes that you want to back up. If the snapshot is created, "accepted" : true is returned.

During snapshot creation, you can run the GET /_snapshot/<yourBackupName>/<yourSnapshotName>/_status command to view the details of the snapshot. If the value of state in the response is SUCCESS, the snapshot is created.

Step 5: Create the same snapshot repository for the Alibaba Cloud Elasticsearch cluster

  1. Log on to the Kibana console of your Elasticsearch cluster and go to the homepage of the Kibana console as prompted.
    For more information about how to log on to the Kibana console, see Log on to the Kibana console.
    Note In this example, an Elasticsearch V6.7.0 cluster is used. Operations on clusters of other versions may differ. The actual operations in the console prevail.
  2. In the left-side navigation pane of the page that appears, click Dev Tools.
  3. On the Console tab of the page that appears, run the following command to create a snapshot repository that has the same name as the snapshot repository for the self-managed Elasticsearch cluster.

    PUT _snapshot/<yourBackupName>
    {
        "type": "oss",
        "settings": {
            "endpoint": "oss-cn-hangzhou-internal.aliyuncs.com",
            "access_key_id": "<yourAccesskeyId>",
            "secret_access_key": "<yourAccesskeySecret>",
            "bucket": "<yourBucketName>",
            "compress": true
        }
    }

    Replace <yourBackupName> and <yourBucketName> with the repository name and bucket name that you specified in Step 3: Create a snapshot repository for the self-managed Elasticsearch cluster.

Step 6: Restore data to the Alibaba Cloud Elasticsearch cluster from the created snapshot

In the Kibana console of the Alibaba Cloud Elasticsearch cluster, run the following command to restore all indexes (except system indexes whose names start with .) from the created snapshot. Follow the instructions in Step 5: Create the same snapshot repository for the Alibaba Cloud Elasticsearch cluster to perform the operation.

POST _snapshot/es_backup/snapshot_1/_restore
{"indices":"*,-.monitoring*,-.security_audit*","ignore_unavailable":"true"}

If the command is successfully run, "accepted" : true is returned.

The preceding command restores all indexes in the snapshot. You can also specify the indexes that you want to restore. In the Alibaba Cloud Elasticsearch cluster, an existing index may have the same name as an index you want to restore. In this case, if you do not want to overwrite the data in the existing index, you can rename the index you want to restore during the restoration.

POST _snapshot/es_backup/snapshot_1/_restore
{
  "indices":"index1",
  "rename_pattern": "index(.+)",
  "rename_replacement": "restored_index_$1"
}
Note

For more information about the commands that are used to create snapshots or restore data, see Create manual snapshots and restore data from manual snapshots.

Step 7: View restoration results

In the Kibana console of the Alibaba Cloud Elasticsearch cluster, run the following command to view the restoration results. Follow the instructions in Step 5: Create the same snapshot repository for the Alibaba Cloud Elasticsearch cluster to perform the operation.

  • View the restored indexes

    GET /_cat/indices?v

    查看恢复成功的索引

  • View the data in the restored indexes

    GET /index1/_search

    If the command is successfully run, the following result is returned:

    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 1,
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "index1",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "productName" : "testpro",
              "annual_rate" : "3.22%",
              "describe" : "testpro"
            }
          }
        ]
      }
    }

FAQ

Q: How do I obtain the installation package of the elasticsearch-repository-oss plug-in of another version?

A: You can download the installation package of the elasticsearch-repository-oss plug-in of the required version from GitHub. If GitHub does not provide the installation package of the required version, we recommend that you download the installation package of a version whose minor version is nearest to the minor version of the required version. Then, change the values of the following parameters in the plugin-descriptor.properties file of the plug-in, package the file again, and then install the plug-in.

  • version=Required plug-in version

  • elasticsearch.version=Version of the self-managed Elasticsearch cluster

    Note

    The version of the plug-in must be the same as that of the self-managed Elasticsearch cluster.

  • java.version=1.8

    Note
    • Different versions of Elasticsearch clusters depend on different versions of JDKs. The actual JDK version is determined by open source Elasticsearch and the plug-in.

    • Open source Elasticsearch provides a variety of cluster versions, and different versions of clusters are compiled in different ways. Therefore, before you install the elasticsearch-repository-oss plug-in for your cluster, you need to compile and debug the plug-in based on the cluster version. For example, you deploy a self-managed Elasticsearch V7.6.2 cluster, and the required JDK version is 1.8 or later. After compilation and debugging, the plug-in is elasticsearch-repository-oss-7.6.2.