All Products
Search
Document Center

Data Online Migration:Preparations

Last Updated:Nov 19, 2025

This topic describes the preparations required before data migration.

Step 1: Upload the list file

An HTTP/HTTPS list file includes two types of files: a manifest.json file and one or more example.csv.gz files. An example.csv.gz file is a compressed CSV list file. A single example.csv.gz file cannot exceed 100 MB in size. The manifest.json file defines the schema of the manifest and a series of CSV files.

  1. Create a CSV list file

    Create a CSV-formatted list file on your local machine. Each row represents a file, and rows are separated by a line feed (\n). Each file has multiple properties, which are separated by commas (,).

    Important
    • Key and Url are required, while the other items are optional.

    • Each row must end with a line feed. Otherwise, the task may be interrupted because of a CSV parsing failure.

    • Required items

      Name

      Required

      Description

      Notes

      Url

      Yes

      Data Online Migration uses this link to download file content with a GET request and to get file metadata with a HEAD request.

      Note

      Ensure that the Url can be accessed by commands such as `curl --HEAD "$Url"` and `curl --GET "$Url"`. Data Online Migration does not support redirection for `$Url`.

      The Url and Key items must be encoded. If they are not encoded and contain special characters, the file migration may fail.

      • Url item: URL-encode a URL that is accessible by command-line tools such as curl (without redirection).

      • Key encoding principle: Perform URL encoding on the desired object name for the file in OSS.

      Important

      After encoding the Url and Key items, confirm the following. Otherwise, the file migration may fail, or the file path at the destination may not be what you expect.

      • The plus sign (+) in the original string is encoded as %2B.

      • The percent sign (%) in the original string is encoded as %25.

      • The comma (,) in the original string is encoded as %2C.

      For example, if the original string is a+b%c,d.file, the encoded string should be a%2Bb%25c%2Cd.file.

      Key

      Yes

      The object name after migration is `prefix + file name`.

      Assume that you have generated a CSV file named plain_example.csv that is not URL-encoded. The file has only two columns. The first column is Url, and these URLs can be accessed directly using the curl command. The second column is Key, and these keys are the object names that you expect for the files in OSS. The following is an example:

      https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/1354977961/p486238.jpg,assets/img/zh-CN/1354977961/p486238.jpg
      https://www.example-fake1.com/url-that-can-be-accessed-only-after-encoding/123.png,url-that-can-be-accessed-only-after-encoding/123.png
      https://www.example-fake2.com/url-that-can-be-accessed-without-encoding/123.png,url-that-can-be-accessed-without-encoding/123.png
      https://www.example-fake3.com/Chinese/Japanese/Korean/123.png,Chinese/Japanese/Korean/123.png
      Important

      Do not use the built-in Notepad application in Windows to edit manifest.json or plain_example.csv. This application may add a byte order mark (BOM) (0xefbbbf) to the first 3 bytes of the file, which can cause parsing errors in Data Online Migration. On Linux or macOS, you can run od -c plain_example.csv | less to check whether the first 3 bytes of the file contain this mark. In Windows, use an application such as Notepad++ or Visual Studio Code to create or edit files.

      The following sample Python code reads plain_example.csv line by line and outputs the encoded result to example.csv. This code is for reference only. You can modify it as needed.

      # -*- coding: utf-8 -*-
      import sys
      
      if sys.version_info.major == 3:
          from urllib.parse import quote_plus
      else:
          from urllib import quote_plus
          reload(sys)
          sys.setdefaultencoding("utf-8")
      
      # Source CSV file path.
      src_path = "plain_example.csv"
      # URL-encoded file path.
      out_path = "example.csv"
      
      # The sample CSV contains only two columns: url and key.
      with open(src_path) as fin, open(out_path, "w") as fout:
          for line in fin:
              items = line.strip().split(",")
              url, key = items[0], items[1]
              enc_url = quote_plus(url.encode("utf-8"))
              enc_key = quote_plus(key.encode("utf-8"))
              # The enc_url and enc_key vars are encoded format.
              fout.write(enc_url + "," + enc_key + "\n")
      

      After you run the preceding code, the content of example.csv is as follows:

      https%3A%2F%2Fhelp-static-aliyun-doc.aliyuncs.com%2Fassets%2Fimg%2Fzh-CN%2F1354977961%2Fp486238.jpg,assets%2Fimg%2Fzh-CN%2F1354977961%2Fp486238.jpg
      https%3A%2F%2Fwww.example-fake1.com%2Furl-that-can-be-accessed-only-after-encoding%2F123.png,url-that-can-be-accessed-only-after-encoding%2F123.png
      https%3A%2F%2Fwww.example-fake2.com%2Furl-that-can-be-accessed-without-encoding%2F123.png,url-that-can-be-accessed-without-encoding%2F123.png
      https%3A%2F%2Fwww.example-fake3.com%2FChinese%2FJapanese%2FKorean%2F123.png,Chinese%2FJapanese%2FKorean%2F123.png

    • All items

      Name

      Required

      Notes

      Key

      Yes

      The object name after migration is `prefix + file name`.

      Url

      Yes

      Data Online Migration uses this link to download file content with a GET request and to get file metadata with a HEAD request.

      Size

      No

      The size of the file to be migrated, in bytes.

      Note

      This field is used to calculate the storage usage of the migrated files. If this field is missing, the storage usage chart in the console will be unavailable.

      Note

      The order of the items in the preceding example is not fixed. The order must match that of the items in the fileSchema field in the manifest.json file.

  2. Compress the CSV file

    Compress the CSV file into a .csv.gz file. The following compression methods are available:

    • Compress a single file

      For example, if a file named example.csv exists in the dir directory, you can run the following command to compress it:

      gzip -c example.csv > example.csv.gz
      Note

      When you run the preceding gzip command to compress a file, the source file is not retained. To retain the source file, you can run the command gzip -c <source_file> > <source_file>.gz.

      After compression, a .csv.gz file is generated.

    • Compress multiple files

      For example, if three files named example1.csv, example2.csv, and example3.csv exist in the dir directory, you can run the following command to compress them:

      gzip -r dir
      Note

      The gzip command does not package the directory. Instead, it compresses each file in the specified directory separately and does not retain the corresponding source files.

      After compression, three files named example1.csv.gz, example2.csv.gz, and example3.csv.gz are generated in the dir directory.

  3. Create the manifest.json file

    You can configure multiple CSV files. The details are as follows:

    • fileFormat: Specifies that the list file format is CSV.

    • fileSchema: Corresponds to the items in the CSV file. Note the order.

      Note

      Ensure that the number of columns in the CSV file is the same as the number of fields in this configuration. The migration service checks for consistency.

    • files:

      • key: The location of the CSV file in the bucket.

      • MD5checksum: A 16-digit hexadecimal MD5 string. The value is case-insensitive. Example: 91A76757B25C8BE78BC321DEEBA6A5AD. If you do not specify this value, no check is performed.

      • size: The size of the list file.

    The following example is for reference only.

    {
        "fileFormat":"CSV",
        "fileSchema":"Url, Key",
        "files":[{
            "key":"dir/example1.csv.gz",
            "MD5checksum":"",
            "size":0
        },{
            "key":"dir/example2.csv.gz",
            "MD5checksum":"",
            "size":0
        }]
    }
  4. Upload the manifest file that you created to OSS or AWS S3.

    1. Upload the manifest.json file and the compressed CSV list files. The names of the CSV list files must match the CSV file names in the manifest.json file.

    2. Record the path of the manifest.json file. You must specify the location of the manifest when you create the source data address.

Step 2: Create a destination bucket

Create an Object Storage Service (OSS) bucket as the destination to store the migrated data. For more information, see Create buckets.

Step 3: Create a RAM user and grant permissions to the RAM user

Important
  • The Resource Access Management (RAM) user is used to perform the data migration task. You must create RAM roles and perform the data migration task as the RAM user. We recommend that you create the RAM user within the Alibaba Cloud account that owns the source or destination OSS bucket.

  • For more information, see Create a RAM user and grant permissions to the RAM user.

Log on to the RAM console with an Alibaba Cloud account. On the Users page, find the RAM user that you created and click Add Permissions in the Actions column.

  1. System policy: AliyunOSSImportFullAccess. This policy grants permissions to manage Data Online Migration.

  2. Custom policy: The policy must include the ram:CreateRole, ram:CreatePolicy, ram:AttachPolicyToRole, and ram:ListRoles permissions.

    For more information about how to attach a custom policy, see Create a custom policy. The following sample code provides an example of a custom policy:

    {
        "Version":"1",
        "Statement":[
            {
                "Effect":"Allow",
                "Action":[
                    "ram:CreateRole",
                    "ram:CreatePolicy",
                    "ram:AttachPolicyToRole",
                    "ram:ListRoles"
                ],
                "Resource":"*"
            }
        ]
    }

Step 4: Grant permissions on the manifest bucket

Perform the corresponding operations based on whether the bucket that stores inventory lists belongs to the current Alibaba Cloud account.

The bucket that stores inventory lists belongs to the current Alibaba Cloud account

  • Automatic authorization

    We recommend that you complete the authorization in the Data Online Migration console. For more information, see the "Step 2: Create a source data address" section of the Migrate data topic.

  • Manual authorization

    Grant permissions on the bucket that stores inventory lists

    On the Roles page, find the created RAM role and click Grant Permission in the Actions column.

    • Custom policy: Attach a custom policy that includes the oss:List* and oss:Get* permissions to the RAM role.

    For more information about how to attach a custom policy, see Create a custom policy. The following sample code provides an example of a custom policy.

    Note

    The following policy is only for reference. Replace <myInvBucket> with the name of the bucket that stores inventory lists.

    For more information about RAM policies for OSS, see Common examples of RAM policies.

    Important

    If server-side encryption by using Key Management Service managed keys (SSE-KMS) is configured for the bucket that stores inventory lists, you must attach the AliyunKMSFullAccess system policy to the RAM role.

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "oss:List*",
            "oss:Get*"
          ],
          "Resource": [
            "acs:oss:*:*:<myInvBucket>",
            "acs:oss:*:*:<myInvBucket>/*"
          ]
        }
      ]
    }

The bucket that stores inventory lists does not belong to the current Alibaba Cloud account

Grant permissions on the bucket that stores inventory lists

  1. Log on to the OSS console with the Alibaba Cloud account that owns the bucket that stores inventory lists.

  2. In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the bucket that stores inventory lists.

  3. In the left-side navigation pane, choose Permission Control > Bucket Policy.

  4. On the Bucket Policy tab, click Add by Syntax. On the page that appears, click Edit, enter the custom bucket policy in the code editor, and then click Save.

  • Custom policy:

    Grant the RAM role the permissions to list and read all resources in the bucket that stores inventory lists.

    Note

    The following policy is only for reference. Replace <otherInvBucket> with the name of the bucket that stores inventory lists, <myuid> with the ID of the Alibaba Cloud account that is used to log on to the Data Online Migration console, <otherUid> with the ID of the Alibaba Cloud account that owns the bucket that stores inventory lists, and <roleName> with the name of the RAM role that you created. For more information about RAM policies for OSS, see Common examples of RAM policies.

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "oss:List*",
            "oss:Get*"
          ],
          "Principal": [
             "arn:sts::<myUid>:assumed-role/<roleName>/*"
          ],
          "Resource": [
            "acs:oss:*:<otherUid>:<otherInvBucket>",
            "acs:oss:*:<otherUid>:<othereInvBucket>/*"
          ]
        }
      ]
    }

2. Configure a policy for a custom key

  1. If server-side encryption by using SSE-KMS is configured for the bucket that stores inventory lists, you must attach the AliyunKMSFullAccess system policy to the RAM role.

  2. If a custom key of KMS is used to encrypt data in the bucket that stores inventory lists, perform the following steps to configure a policy for the custom key:

    1. Log on to the KMS console and find the custom key.

    2. On the Key Policy tab of the details page, click Configure Key Policy. In the Key Policy panel, enter the ARN of the RAM role in the Cross-account User field. For more information, see Configure a key policy. image

Step 5: Grant permissions on the destination bucket

Perform the corresponding operations based on whether the destination bucket belongs to the current Alibaba Cloud account.

The destination bucket belongs to the current Alibaba Cloud account

  • Automatic authorization

    We recommend that you use automatic authorization in the Data Online Migration console. For more information, see the "Step 3: Create a destination data address" section of the Migrate data topic.

  • Manual authorization

    Note

    You can perform manual authorization in the following scenarios:

    • You want to grant permissions on multiple source buckets to a RAM role. This allows you to effectively manage multiple source buckets.

    • You do not want to create more RAM roles because the number of RAM roles within the current Alibaba Cloud account is close to the upper limit.

    • Automatic authorization is not applicable or cannot be used.

    1. Create a RAM role that is used to migrate data

    Log on to the RAM console in which the RAM user is created. On the Roles page, click Create Role.

    1. Principal Type: Select Cloud Service.

    2. Principal Name: Select Data Transport.

    3. Role Name: Enter the RAM role name. The RAM role name must be in lowercase.

    lQLPKIBPhyQhs7vNAlPNA-mwb_9Zfe8j6sMHtpv2syNfAA_1001_595

    image

    2. Grant permissions on the destination bucket to the RAM role

    On the Roles page, find the created RAM role and click Grant Permission in the Actions column.

    • Custom policy: Attach a custom policy that includes the oss:List*, oss:Get*, oss:Put*, and oss:AbortMultipartUpload* permissions to the RAM role.

    For more information about how to attach a custom policy, see Create a custom policy. The following sample code provides an example of the custom policy:

    Note

    The following policy is only for reference. Replace <myDestBucket> with the name of the destination bucket.

    For more information about RAM policies for OSS, see Common examples of RAM policies.

    Important

    If server-side encryption by using Key Management Service managed keys (SSE-KMS) is configured for the destination bucket, you must attach the AliyunKMSFullAccess system policy to the RAM role.

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "oss:List*",
            "oss:Get*",
            "oss:Put*",
            "oss:AbortMultipartUpload"
          ],
          "Resource": [
            "acs:oss:*:*:<myDestBucket>",
            "acs:oss:*:*:<myDestBucket>/*"
          ]
        }
      ]
    }

The destination bucket does not belong to the current Alibaba Cloud account

1. Create a RAM role that is used to migrate data

Log on to the RAM console in which the RAM user is created. On the Roles page, click Create Role.

  1. Principal Type: Select Cloud Service.

  2. Principal Name: Select Data Transport.

  3. Role Name: Enter the RAM role name. The RAM role name must be in lowercase.

lQLPKIBPhyQhs7vNAlPNA-mwb_9Zfe8j6sMHtpv2syNfAA_1001_595

image

2. Grant permissions on the destination bucket to the RAM role

Important

If you configure a bucket policy by specifying policy statements to grant the RAM role the required permissions, the new bucket policy overwrites the existing bucket policy. Make sure that the new bucket policy contains the content of the existing bucket policy. Otherwise, the authorization based on the existing bucket policy may fail.

  1. Log on to the OSS console with the Alibaba Cloud account that owns the destination bucket.

  2. In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the destination bucket.

  3. In the left-side pane of the bucket details page, choose Permission Control > Bucket Policy.

  4. On the Bucket Policy tab, click Add by Syntax and then click Edit. In the code editor, enter the custom bucket policy. Then, click Save.

    • Grant the RAM role the permissions to list, read, and delete objects in and write objects to the destination bucket.

Note

The following policy is only for reference. Replace <otherDestBucket> with the name of the destination bucket, <otherUid> with the ID of the Alibaba Cloud account that owns the destination bucket, <myUid> with the ID of the Alibaba Cloud account that is used to log on to the Data Online Migration console, and <roleName> with the name of the RAM role that you created. For more information about RAM policies for OSS, see Common examples of RAM policies.

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "oss:List*",
        "oss:Get*",
        "oss:Put*",
        "oss:AbortMultipartUpload"
      ],
      "Principal": [
         "arn:sts::<myUid>:assumed-role/<roleName>/*"
      ],
      "Resource": [
        "acs:oss:*:<otherUid>:<otherDestBucket>",
        "acs:oss:*:<otherUid>:<otherDestBucket>/*"
      ]
    }
  ]
}
3. Configure a policy for a custom key
  1. If SSE-KMS is configured for the destination bucket, you must attach the AliyunKMSFullAccess system policy to the RAM role.

  2. If a custom key of KMS is used to encrypt data in the destination bucket, perform the following steps to configure a policy for the custom key.

    1. Log on to the KMS console and find the custom key.

    2. On the Key Policy tab of the details page, click Configure Key Policy. In the Key Policy panel, enter the ARN of the RAM role in the Cross-account User field. For more information, see Configure a key policy. image