This topic describes how to prepare for a data migration task.
Step 1: Upload list files
HTTP or HTTPS list files contain two types of files, including one manifest.json file and one or more example.csv.gz files. An example.csv.gz file is a compressed CSV list file. The size of a single example.csv.gz file cannot exceed 25 MB. A manifest.json file is used to configure columns in each CSV file. You can upload the list files to Alibaba Cloud Object Storage Service (OSS) or Amazon Simple Storage Service (Amazon S3).
Create a CSV list file.
Create a CSV list file on your on-premises machine. A list file can contain up to eight columns separated by commas (,). Each line represents one file to be migrated. Multiple files are separated by using line feeds (
\n
). The following tables describe the columns.ImportantThe Key and URL columns are required. Other columns are optional.
Required columns
Column
Required
Description
Limit
URL
Yes
The download URL of the file to be migrated. Data Online Migration uses HTTP GET requests to download files from the HTTP or HTTPS URLs and uses HTTP HEAD requests to obtain the metadata of the files.
NoteMake sure that the URL can be accessed by using commands such as [curl -L --HEAD "$url"] and [curl -L --GET "$url"].
The values of the URL and Key columns must be encoded. Otherwise, the migration may fail due to the special characters contained in the values.
Before you encode the value of the URL column, make sure that the URL can be accessed by using CLI tools such as
curl
. Then, perform URL encoding.Before you encode the value of the Key column, make sure that you can obtain the required object name in OSS after the migration. Then, perform URL encoding.
ImportantAfter you encode the values of the URL and Key columns, make sure that the following requirements are met. Otherwise, the migration may fail, or the source files are not migrated to the specified destination path.
A plus sign (+) in the original string is encoded as %2B.
A percent sign (%) in the original string is encoded as %25.
A comma (,) in the original string is encoded as %2C.
For example, if the original string is
a+b%c,d.file
, the encoded string isa%2Bb%25c%2Cd.file
.Key
Yes
The name of the file to be migrated. After a file is migrated, the name of the object that corresponds to the file consists of a prefix and the file name.
The following code provides an example on how to encode the values of the URL and Key columns in Python:
# -*- coding: utf-8 -*- import sys if sys.version_info.major == 3: from urllib.parse import quote_plus else: from urllib import quote_plus raw_urls = [ # Format: ($url, $key) # url: These urls can be accessed normally by using linux 'curl' or 'wget' cmd. # key: These keys are the ObjectName you expect on OSS. ("http://www.example1.com/path/ab.file?t=aef87", "ab.file"), ("http://www.example2.com/path/a+b.file", "a+b.file"), ("http://www.example3.com/path/a%b.file", "a%b.file"), ("http://www.example4.com/path/a,b.file", "a,b.file"), ("http://www.example5.com/path/a b.file", "a b.file"), ("http://www.example6.com/path/a and b.file", "a and b.file"), ("http://www.example7.com/path/a%E4%B8%8Eb.file", "a%E4%B8%8Eb.file"), ("http://www.example8.com/path/a\\b.file", "a\\b.file") ] for item in raw_urls: url, key = item[0], item[1] enc_url = quote_plus(url) enc_key = quote_plus(key) # The enc_url and enc_key vars are encoded format, you can use them to build csv files. print("(%s, %s) -> (%s, %s)" % (url, key, enc_url, enc_key))
All columns
Column
Required
Description
Key
Yes
The name of the file to be migrated. After a file is migrated, the name of the object that corresponds to the file consists of a prefix and the file name.
URL
Yes
The download URL of the file to be migrated. Data Online Migration uses HTTP GET requests to download files from the HTTP or HTTPS URLs and uses HTTP HEAD requests to obtain the metadata of the files.
Size
No
The size of the file to be migrated.
StorageClass
No
The storage class of the file in the source bucket.
LastModifiedDate
No
The time when the file to be migrated was last modified.
ETag
No
The entity tag (ETag) of the file to be migrated.
HashAlg
No
The hash algorithm of the file to be migrated.
HashValue
No
The hash value of the file to be migrated.
NoteThe order of the preceding columns varies in CSV files. You need to only make sure that the order of the columns in a CSV file is the same as that in the fileSchema column of the manifest.json file.
Compress one or more CSV list files.
Compress the CSV file into a CSV GZ file. The following examples show how to compress one or more CSV files:
Compress a CSV file
In this example, a file named file1 resides in the dir directory. Run the following command to compress the file:
gzip -r dir
NoteIf you run the preceding
gzip
command to compress a file, the source file is not retained. To retain both the compressed file and the source file, run thegzip -c Source file >Source file.gz
command.The
file1.gz
file is generated.Compress multiple CSV files
In this example, the file1, file2, and file3 files reside in the dir dictionary. Run the following command to compress the files:
gzip -r dir
NoteThe
gzip
command is not used to package a directory. It only separately compresses all files in the directory.The file1.gz, file2.gz, and file3.gz files are generated.
Create a manifest.json file.
You can use a manifest.json file to configure multiple CSV files. The following information shows the content of a manifest.json file:
fileFormat: the format of the list file. Example: CSV.
fileSchema: the columns in the CSV file. Pay attention to the order of columns.
files:
key: the location of the CSV file in the source bucket.
mD5checksum: the MD5 value of the CSV file. The value is a hexadecimal MD5 string, which is not case-sensitive. Example: 91A76757B25C8BE78BC321DEEBA6A5AD. If you do not specify this parameter, the CSV file is not verified.
size: the size of the CSV file.
The following sample code provides an example:
{ "fileFormat":"CSV", "fileSchema":"Url, Key, Bucket, Size, StorageClass, LastModifiedDate, ETag, HashAlg, HashValue ", "files":[{ "key":"dir/example1.csv.gz", "mD5checksum":"", "size":0 },{ "key":"dir/example2.csv.gz", "mD5checksum":"", "size":0 }] }
Upload the list files that you create to OSS or Amazon S3.
Upload the list files to OSS. For more information, see Simple upload.
NoteAfter the list files are uploaded to OSS, Data Online Migration downloads the list files and migrates the files based on the specified URLs.
When you create a migration task, specify the bucket in which a list file is stored. You must specify the path of the list file in the
Directory in which the list file resides/manifest.json
format. Example: dir/manifest.json.
Upload the list files to Amazon S3.
NoteAfter the list files are uploaded to Amazon S3, Data Online Migration downloads the list files and migrates the files based on the specified URLs.
When you create a migration task, specify the bucket in which a list file is stored. You must specify the path of the list file in the
Directory in which the list file resides/manifest.json
format. Example: dir/manifest.json.
Create a destination bucket
Create a destination bucket to store the migrated data. For more information, see Create buckets.
Step 3: Create a RAM user
- Log on to the Resource Access Management (RAM) console.
- In the left-side navigation pane, choose .
- On the Users page, click Create User.
- In the User Account Information section of the Create User page, configure the Logon Name and Display Name parameters.
- In the Access Mode section, select Console Access and OpenAPI Access. Then, save the generated logon name, password, AccessKey ID, and AccessKey secret.
- Console Access: If you select this option, you must configure the console password, password reset settings, and multi-factor authentication (MFA) settings.
- OpenAPI Access: If you select this option, an AccessKey pair is automatically created for the RAM user. The RAM user can call API operations or use other development tools to access Alibaba Cloud resources.
Note If you need to migrate data across accounts, you must save the logon name, password, AccessKey ID, and AccessKey secret that are generated for each RAM user by the corresponding Alibaba Cloud account. - After the RAM user is created, go to the Users page. Find the RAM user that you want to manage and click Add Permissions in the column to grant the RAM user the AliyunOSSFullAccess permissions.
- In the left-side navigation pane, click Overview.
- On the page that appears, navigate to the Account Management section and click the link under RAM user logon. On the page that appears, enter the logon name and password of the RAM user to log on to the Alibaba Cloud Management Console.
Step 4: Grant permissions to the RAM user
After the RAM user is created, go to the Users page in the RAM console. Find the RAM user that you want to manage and click Add Permissions in the Actions column to grant permissions to the RAM user.