This topic describes the preparations required before data migration.
Step 1: Upload the list file
An HTTP/HTTPS list file includes two types of files: a manifest.json file and one or more example.csv.gz files. An example.csv.gz file is a compressed CSV list file. A single example.csv.gz file cannot exceed 100 MB in size. The manifest.json file defines the schema of the manifest and a series of CSV files.
Create a CSV list file
Create a CSV-formatted list file on your local machine. Each row represents a file, and rows are separated by a line feed (\n). Each file has multiple properties, which are separated by commas (,).
ImportantKey and Url are required, while the other items are optional.
Each row must end with a line feed. Otherwise, the task may be interrupted because of a CSV parsing failure.
Required items
Name
Required
Description
Notes
Url
Yes
Data Online Migration uses this link to download file content with a GET request and to get file metadata with a HEAD request.
NoteEnsure that the Url can be accessed by commands such as `curl --HEAD "$Url"` and `curl --GET "$Url"`. Data Online Migration does not support redirection for `$Url`.
The Url and Key items must be encoded. If they are not encoded and contain special characters, the file migration may fail.
Url item: URL-encode a URL that is accessible by command-line tools such as
curl(without redirection).Key encoding principle: Perform URL encoding on the desired object name for the file in OSS.
ImportantAfter encoding the Url and Key items, confirm the following. Otherwise, the file migration may fail, or the file path at the destination may not be what you expect.
The plus sign (+) in the original string is encoded as %2B.
The percent sign (%) in the original string is encoded as %25.
The comma (,) in the original string is encoded as %2C.
For example, if the original string is
a+b%c,d.file, the encoded string should bea%2Bb%25c%2Cd.file.Key
Yes
The object name after migration is `prefix + file name`.
Assume that you have generated a CSV file named plain_example.csv that is not URL-encoded. The file has only two columns. The first column is Url, and these URLs can be accessed directly using the curl command. The second column is Key, and these keys are the object names that you expect for the files in OSS. The following is an example:
https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/1354977961/p486238.jpg,assets/img/zh-CN/1354977961/p486238.jpg https://www.example-fake1.com/url-that-can-be-accessed-only-after-encoding/123.png,url-that-can-be-accessed-only-after-encoding/123.png https://www.example-fake2.com/url-that-can-be-accessed-without-encoding/123.png,url-that-can-be-accessed-without-encoding/123.png https://www.example-fake3.com/Chinese/Japanese/Korean/123.png,Chinese/Japanese/Korean/123.pngImportantDo not use the built-in Notepad application in Windows to edit manifest.json or plain_example.csv. This application may add a byte order mark (BOM) (0xefbbbf) to the first 3 bytes of the file, which can cause parsing errors in Data Online Migration. On Linux or macOS, you can run
od -c plain_example.csv | lessto check whether the first 3 bytes of the file contain this mark. In Windows, use an application such as Notepad++ or Visual Studio Code to create or edit files.The following sample Python code reads plain_example.csv line by line and outputs the encoded result to example.csv. This code is for reference only. You can modify it as needed.
# -*- coding: utf-8 -*- import sys if sys.version_info.major == 3: from urllib.parse import quote_plus else: from urllib import quote_plus reload(sys) sys.setdefaultencoding("utf-8") # Source CSV file path. src_path = "plain_example.csv" # URL-encoded file path. out_path = "example.csv" # The sample CSV contains only two columns: url and key. with open(src_path) as fin, open(out_path, "w") as fout: for line in fin: items = line.strip().split(",") url, key = items[0], items[1] enc_url = quote_plus(url.encode("utf-8")) enc_key = quote_plus(key.encode("utf-8")) # The enc_url and enc_key vars are encoded format. fout.write(enc_url + "," + enc_key + "\n")After you run the preceding code, the content of example.csv is as follows:
https%3A%2F%2Fhelp-static-aliyun-doc.aliyuncs.com%2Fassets%2Fimg%2Fzh-CN%2F1354977961%2Fp486238.jpg,assets%2Fimg%2Fzh-CN%2F1354977961%2Fp486238.jpg https%3A%2F%2Fwww.example-fake1.com%2Furl-that-can-be-accessed-only-after-encoding%2F123.png,url-that-can-be-accessed-only-after-encoding%2F123.png https%3A%2F%2Fwww.example-fake2.com%2Furl-that-can-be-accessed-without-encoding%2F123.png,url-that-can-be-accessed-without-encoding%2F123.png https%3A%2F%2Fwww.example-fake3.com%2FChinese%2FJapanese%2FKorean%2F123.png,Chinese%2FJapanese%2FKorean%2F123.pngAll items
Name
Required
Notes
Key
Yes
The object name after migration is `prefix + file name`.
Url
Yes
Data Online Migration uses this link to download file content with a GET request and to get file metadata with a HEAD request.
Size
No
The size of the file to be migrated, in bytes.
NoteThis field is used to calculate the storage usage of the migrated files. If this field is missing, the storage usage chart in the console will be unavailable.
NoteThe order of the items in the preceding example is not fixed. The order must match that of the items in the fileSchema field in the manifest.json file.
Compress the CSV file
Compress the CSV file into a .csv.gz file. The following compression methods are available:
Compress a single file
For example, if a file named example.csv exists in the dir directory, you can run the following command to compress it:
gzip -c example.csv > example.csv.gzNoteWhen you run the preceding
gzipcommand to compress a file, the source file is not retained. To retain the source file, you can run the commandgzip -c <source_file> > <source_file>.gz.After compression, a
.csv.gzfile is generated.Compress multiple files
For example, if three files named example1.csv, example2.csv, and example3.csv exist in the dir directory, you can run the following command to compress them:
gzip -r dirNoteThe
gzipcommand does not package the directory. Instead, it compresses each file in the specified directory separately and does not retain the corresponding source files.After compression, three files named example1.csv.gz, example2.csv.gz, and example3.csv.gz are generated in the dir directory.
Create the manifest.json file
You can configure multiple CSV files. The details are as follows:
fileFormat: Specifies that the list file format is CSV.
fileSchema: Corresponds to the items in the CSV file. Note the order.
NoteEnsure that the number of columns in the CSV file is the same as the number of fields in this configuration. The migration service checks for consistency.
files:
key: The location of the CSV file in the bucket.
MD5checksum: A 16-digit hexadecimal MD5 string. The value is case-insensitive. Example: 91A76757B25C8BE78BC321DEEBA6A5AD. If you do not specify this value, no check is performed.
size: The size of the list file.
The following example is for reference only.
{ "fileFormat":"CSV", "fileSchema":"Url, Key", "files":[{ "key":"dir/example1.csv.gz", "MD5checksum":"", "size":0 },{ "key":"dir/example2.csv.gz", "MD5checksum":"", "size":0 }] }Upload the manifest file that you created to OSS or AWS S3.
Upload the manifest.json file and the compressed CSV list files. The names of the CSV list files must match the CSV file names in the manifest.json file.
Record the path of the manifest.json file. You must specify the location of the manifest when you create the source data address.
Step 2: Create a destination bucket
Create an Object Storage Service (OSS) bucket as the destination to store the migrated data. For more information, see Create buckets.
Step 3: Create a RAM user and grant permissions to the RAM user
The Resource Access Management (RAM) user is used to perform the data migration task. You must create RAM roles and perform the data migration task as the RAM user. We recommend that you create the RAM user within the Alibaba Cloud account that owns the source or destination OSS bucket.
For more information, see Create a RAM user and grant permissions to the RAM user.
Log on to the RAM console with an Alibaba Cloud account. On the Users page, find the RAM user that you created and click Add Permissions in the Actions column.
System policy: AliyunOSSImportFullAccess. This policy grants permissions to manage Data Online Migration.
Custom policy: The policy must include the
ram:CreateRole,ram:CreatePolicy,ram:AttachPolicyToRole, andram:ListRolespermissions.For more information about how to attach a custom policy, see Create a custom policy. The following sample code provides an example of a custom policy:
{ "Version":"1", "Statement":[ { "Effect":"Allow", "Action":[ "ram:CreateRole", "ram:CreatePolicy", "ram:AttachPolicyToRole", "ram:ListRoles" ], "Resource":"*" } ] }
Step 4: Grant permissions on the manifest bucket
Perform the corresponding operations based on whether the bucket that stores inventory lists belongs to the current Alibaba Cloud account.
The bucket that stores inventory lists belongs to the current Alibaba Cloud account
Automatic authorization
We recommend that you complete the authorization in the Data Online Migration console. For more information, see the "Step 2: Create a source data address" section of the Migrate data topic.
Manual authorization
Grant permissions on the bucket that stores inventory lists
On the Roles page, find the created RAM role and click Grant Permission in the Actions column.
Custom policy: Attach a custom policy that includes the
oss:List*andoss:Get*permissions to the RAM role.
For more information about how to attach a custom policy, see Create a custom policy. The following sample code provides an example of a custom policy.
NoteThe following policy is only for reference. Replace <myInvBucket> with the name of the bucket that stores inventory lists.
For more information about RAM policies for OSS, see Common examples of RAM policies.
ImportantIf server-side encryption by using Key Management Service managed keys (SSE-KMS) is configured for the bucket that stores inventory lists, you must attach the AliyunKMSFullAccess system policy to the RAM role.
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": [ "oss:List*", "oss:Get*" ], "Resource": [ "acs:oss:*:*:<myInvBucket>", "acs:oss:*:*:<myInvBucket>/*" ] } ] }
The bucket that stores inventory lists does not belong to the current Alibaba Cloud account
Grant permissions on the bucket that stores inventory lists
Log on to the OSS console with the Alibaba Cloud account that owns the bucket that stores inventory lists.
In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the bucket that stores inventory lists.
In the left-side navigation pane, choose Permission Control > Bucket Policy.
On the Bucket Policy tab, click Add by Syntax. On the page that appears, click Edit, enter the custom bucket policy in the code editor, and then click Save.
Custom policy:
Grant the RAM role the permissions to list and read all resources in the bucket that stores inventory lists.
NoteThe following policy is only for reference. Replace <otherInvBucket> with the name of the bucket that stores inventory lists, <myuid> with the ID of the Alibaba Cloud account that is used to log on to the Data Online Migration console, <otherUid> with the ID of the Alibaba Cloud account that owns the bucket that stores inventory lists, and <roleName> with the name of the RAM role that you created. For more information about RAM policies for OSS, see Common examples of RAM policies.
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": [ "oss:List*", "oss:Get*" ], "Principal": [ "arn:sts::<myUid>:assumed-role/<roleName>/*" ], "Resource": [ "acs:oss:*:<otherUid>:<otherInvBucket>", "acs:oss:*:<otherUid>:<othereInvBucket>/*" ] } ] }
2. Configure a policy for a custom key
If server-side encryption by using SSE-KMS is configured for the bucket that stores inventory lists, you must attach the AliyunKMSFullAccess system policy to the RAM role.
If a custom key of KMS is used to encrypt data in the bucket that stores inventory lists, perform the following steps to configure a policy for the custom key:
Log on to the KMS console and find the custom key.
On the Key Policy tab of the details page, click Configure Key Policy. In the Key Policy panel, enter the ARN of the RAM role in the Cross-account User field. For more information, see Configure a key policy.

Step 5: Grant permissions on the destination bucket
Perform the corresponding operations based on whether the destination bucket belongs to the current Alibaba Cloud account.
The destination bucket belongs to the current Alibaba Cloud account
Automatic authorization
We recommend that you use automatic authorization in the Data Online Migration console. For more information, see the "Step 3: Create a destination data address" section of the Migrate data topic.
Manual authorization
NoteYou can perform manual authorization in the following scenarios:
You want to grant permissions on multiple source buckets to a RAM role. This allows you to effectively manage multiple source buckets.
You do not want to create more RAM roles because the number of RAM roles within the current Alibaba Cloud account is close to the upper limit.
Automatic authorization is not applicable or cannot be used.
1. Create a RAM role that is used to migrate data
Log on to the RAM console in which the RAM user is created. On the Roles page, click Create Role.
Principal Type: Select Cloud Service.
Principal Name: Select Data Transport.
Role Name: Enter the RAM role name. The RAM role name must be in lowercase.


2. Grant permissions on the destination bucket to the RAM role
On the Roles page, find the created RAM role and click Grant Permission in the Actions column.
Custom policy: Attach a custom policy that includes the
oss:List*,oss:Get*,oss:Put*, andoss:AbortMultipartUpload*permissions to the RAM role.
For more information about how to attach a custom policy, see Create a custom policy. The following sample code provides an example of the custom policy:
NoteThe following policy is only for reference. Replace <myDestBucket> with the name of the destination bucket.
For more information about RAM policies for OSS, see Common examples of RAM policies.
ImportantIf server-side encryption by using Key Management Service managed keys (SSE-KMS) is configured for the destination bucket, you must attach the AliyunKMSFullAccess system policy to the RAM role.
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": [ "oss:List*", "oss:Get*", "oss:Put*", "oss:AbortMultipartUpload" ], "Resource": [ "acs:oss:*:*:<myDestBucket>", "acs:oss:*:*:<myDestBucket>/*" ] } ] }
The destination bucket does not belong to the current Alibaba Cloud account
1. Create a RAM role that is used to migrate data
Log on to the RAM console in which the RAM user is created. On the Roles page, click Create Role.
Principal Type: Select Cloud Service.
Principal Name: Select Data Transport.
Role Name: Enter the RAM role name. The RAM role name must be in lowercase.


2. Grant permissions on the destination bucket to the RAM role
If you configure a bucket policy by specifying policy statements to grant the RAM role the required permissions, the new bucket policy overwrites the existing bucket policy. Make sure that the new bucket policy contains the content of the existing bucket policy. Otherwise, the authorization based on the existing bucket policy may fail.
Log on to the OSS console with the Alibaba Cloud account that owns the destination bucket.
In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the destination bucket.
In the left-side pane of the bucket details page, choose Permission Control > Bucket Policy.
On the Bucket Policy tab, click Add by Syntax and then click Edit. In the code editor, enter the custom bucket policy. Then, click Save.
Grant the RAM role the permissions to list, read, and delete objects in and write objects to the destination bucket.
The following policy is only for reference. Replace <otherDestBucket> with the name of the destination bucket, <otherUid> with the ID of the Alibaba Cloud account that owns the destination bucket, <myUid> with the ID of the Alibaba Cloud account that is used to log on to the Data Online Migration console, and <roleName> with the name of the RAM role that you created. For more information about RAM policies for OSS, see Common examples of RAM policies.
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"oss:List*",
"oss:Get*",
"oss:Put*",
"oss:AbortMultipartUpload"
],
"Principal": [
"arn:sts::<myUid>:assumed-role/<roleName>/*"
],
"Resource": [
"acs:oss:*:<otherUid>:<otherDestBucket>",
"acs:oss:*:<otherUid>:<otherDestBucket>/*"
]
}
]
}3. Configure a policy for a custom key
If SSE-KMS is configured for the destination bucket, you must attach the AliyunKMSFullAccess system policy to the RAM role.
If a custom key of KMS is used to encrypt data in the destination bucket, perform the following steps to configure a policy for the custom key.
Log on to the KMS console and find the custom key.
On the Key Policy tab of the details page, click Configure Key Policy. In the Key Policy panel, enter the ARN of the RAM role in the Cross-account User field. For more information, see Configure a key policy.
