Blanche
Engineer
Engineer
  • UID619
  • Fans2
  • Follows2
  • Posts59
Reads:1025Replies:0

TableStore: Using Datax to migrate data of instance A to instance B

Created#
More Posted time:Sep 18, 2016 16:25 PM
Background
We need to migrate data A to B and perform a backup. We plan to use the Datax tool for data transmission which uses OTS reader and OTS writer.
Migrating a copy of data A to B
• Step 1: Prepare the TableStore environment. Currently Data does not support automatic creation of tables, so we need to create a migration table for B. When creating a table, there are two choices: use OTS CLI or use TableStore SDK. We suggest that you use SDK.
ots cli : https://market.aliyun.com/products/53690006/cmgj000264.html?spm=5176.730005.0.0.CEf1EF
 sdk: https://www.aliyun.com/product/ots
At the same time, obtain relevant information about A and B. Below information is needed for the following configuration:
## Instance A
## endpoint:http://a.cn-hangzhou.ots.aliyuncs.com
## ak(key): ******
## Instance: A
## Table: person_info
## Primary key: uid(string), pid(int)
## Properties:length(int),address(string),country(string),description(string)

## Instance B
## endpoint:http://b.cn-hangzhou.ots.aliyuncs.com
## ak(key): ******
## Instance: B
## Table:  person_info
## Primary key: uid(string), pid(int)


• Step 2: Prepare the Datax environment. Find a machine which can link to an ECS on both sides. Deploy Datax in the ECS.
git clone https://github.com/red-chen/one_key_install_datax.git
cd one_key_install_datax
sh datax.sh install


• Step 3: Edit the Datax Job configuration.
# Here we will use two plug-ins: OTS reader and OTS writer.
# OTS reader help file:https://github.com/alibaba/DataX/blob/master/otsreader/doc/otsreader.md
# OTS writer help file:https://github.com/alibaba/DataX/blob/master/otswriter/doc/otswriter.md
# Configuration is as follows:

vim ots_to_ots.json

{
    "job": {
        "setting": {
            "speed": {
                "channel": "1"
            }
        },
        "content": [
            {
                "reader": {
                    "name": "otsreader",    
                    "parameter": {
                        "endpoint":"http://a.cn-hangzhou.ots.aliyuncs.com",
                        "accessId":"*********",
                        "accessKey":"*********",
                        "instanceName":"A",
                        "table":"person_info",
                        "column" : [
                            {"name":"uid"},
                            {"name":"pid"},
                            {"name":"length"},
                            {"name":"address"},
                            {"name":"country"},
                            {"name":"description"}
                        ],
                       "range": {
                            "begin":[{"type", "INF_MIN"},{"type", "INF_MIN"}],
                            "end":[{"type", "INF_MAX"},{"type", "INF_MAX"}],
                           "split":[]      
                       }
                    }
                },
               "writer": {
                    "name": "otswriter",
                    "parameter": {
                        "endpoint":"http://b.cn-hangzhou.ots.aliyuncs.com",
                        "accessId":"*********",
                        "accessKey":"*********",
                        "instanceName":"B",
                        "table":"person_info",
                        "primaryKey" : [
                            {"name":"uid", "type":"string"},
                            {"name":"pid", "type":"int"}
                        ],
                        "column" : [
                            {"name":"length", "type":"int"},
                            {"name":"address", "type":"string"},
                            {"name":"country", "type":"string"},
                            {"name":"description", "type":"string"}
                        ],
                        "writeMode" : "PutRow"
                    }
                }
            }
        ]
    }
}


• Datax Step 4: Enable Datax
# Datax will print a rate every 10 seconds. You can estimate the total migration time with this rate.
sh datax.sh run ots_to_ots.json

# When execution is complete, the following data appears:
Task start time                 :  2016-06-30 00:00:00
Task finish time                :  2016-06-30 16:00:00
Total task time                 :              57600s
Average data per task            :               1.2M/s
Write speed                    :            1736rec/s
Read records                   :           100000000
Failed read-write attempts        :                   0


Advanced options
The data size is very large in many instances, but the full migration (backup) time is limited for business reasons and therefore data backup must be performed very quickly. In this situation advanced options must be enabled. To support and send plug-ins, the OTS reader range must be used and it must be sent for reading. The principle here is very simple: tables are divided into multiple copies for concurrent data reading. The steps are outlined below:
# Step 1: Prepare environment and create target table.

# Step 2: Calculate the number of ranges. You can perform a simple migration task first and view the speed for a range (without configuring the split, the default is a range). Generally it is from 1M to 10M with a definite relationship between the size and single-line size. For example, we want to complete in 1000s, the single range only has 5M/s, the data size is 10 GB. Therefore we get a range of 10 GB/5M/1000s = 2, requiring two range tasks.

# Step 3: Send the ticket and ask Ali's engineers for the source table and target table, divided over two partitions, providing the partition point. If the first row in our table is Int, the data range is 0 to 10000 and the data is evenly distributed over this range. After partitioning, the first range for data partitioning is 0 to 5000, the second is 5000 to 10000 and therefore we can migrate with a range of (0, 5000), (5000, 10000) with a partition point (split) of 5000.

# Step 4: modify the configuration and the range of the reader by adding the type and value in “split” as shown below:

    "split":[
        {"type":"int", "value":"5000"}
    ]

Change the channel to 2 as follows:

"speed": {
        "channel": "2"
 }

# In the above configuration, the imported task has been split once. The partition point is 5000. After th
is configuration, the Datax task will be split into two threads. The data range for the first thread imported is up to 5000. The second thread is 5000 to infinity.
Guest