All Products
Search
Document Center

Data Online Migration:Preparations

Last Updated:Nov 19, 2025

This topic describes the preparations required before data migration.

Step 1: Upload the inventory files

Data Online Migration supports native cloud storage inventory and custom inventory.

  • Native cloud storage inventory: If the source storage service supports the inventory feature, configure the inventory on the source bucket to generate inventory files. For more information, see Bucket inventory.

  • Custom inventory for Data Online Migration: This is an inventory format defined by Data Online Migration. You can follow the steps below to generate inventory files in the required format.

Note
  • Native inventory is supported for the following data storage sources: Alibaba Cloud OSS, AWS S3, Volcengine TOS, Baidu BOS, Tencent COS, and Huawei OBS.

  • Custom inventory is supported for the following data storage sources: Alibaba Cloud OSS, AWS S3, Volcengine TOS, Baidu BOS, Tencent COS, Huawei OBS, Qiniu Kodo, Azure Blob, Google Cloud Storage, Kingsoft KS3, UCloud US3, Youpai USS, and Compatible S3. This feature is not supported for some data sources.

Inventory includes two types of files: one manifest.json file and one or more example.csv.gz files. The example.csv.gz files are compressed CSV list files. Each example.csv.gz file cannot exceed 100 MB in size. The manifest.json file defines the schema of the inventory and lists the CSV files.

  1. Create a CSV list file

    Create a CSV list file on your local machine. In the file, each row represents a file and must end with a line feed (\n). Each row contains multiple properties that are separated by commas (,).

    Name

    Required

    Description

    Key

    Yes

    The ObjectName at the destination is the destination data address Prefix + Key.

    The Key field must be URL-encoded. If the Key field is not encoded or contains special characters, the file migration may fail.

    • The encoding principle for the Key field is to URL-encode the desired ObjectName in OSS.

    Important

    After you encode the Key field, you must confirm the following. Otherwise, the file migration may fail, or the file path at the destination may not be what you expect.

    • Plus signs (+) in the original string are encoded as %2B.

    • Percent signs (%) in the original string are encoded as %25.

    • Commas (,) in the original string are encoded as %2C.

    For example, if the original string is a+b%c,d.file, the encoded string is a%2Bb%25c%2Cd.file.

    Size

    No

    The size of the file to be migrated, in bytes.

    Note

    This field is used to calculate the total storage size of the files to be migrated. If this field is missing, the storage size chart in the console will not be available.

    Important
    • The Key field is required, but the other fields are optional.

    • Each row must end with a line feed. If a row does not end with a line feed, the task fails due to a CSV parsing error.

    • CSV file description

      For example, consider a CSV file named plain_example.csv that is not URL-encoded. The file has two columns. The first column is Key, which is the name of the file to migrate. The second column is Size, which is the size of the file in bytes. An example is shown below:

      assets/img/en-US/1354977961/p486238.jpg,1024
      url-that-requires-encoding/123.png,0
      url-accessible-without-encoding/123.png,1024
      Chinese/Japanese/Korean/123.png,1
      Important

      Do not use Notepad in Windows to edit manifest.json or plain_example.csv. This application may add a special mark (0xefbbbf) at the beginning of the file, which can cause parsing errors in Data Online Migration. On Linux or macOS, you can run the od -c plain_example.csv | less command to check if the file starts with this mark. On Windows, use an editor such as Notepad++ or Visual Studio Code to create or edit files.

      The following Python code provides an example of how to read plain_example.csv line by line and write the encoded result to example.csv. Modify the code as needed.

      # -*- coding: utf-8 -*-
      import sys
      
      if sys.version_info.major == 3:
          from urllib.parse import quote_plus
      else:
          from urllib import quote_plus
          reload(sys)
          sys.setdefaultencoding("utf-8")
      
      # Source CSV file path.
      src_path = "plain_example.csv"
      # URL-encoded file path.
      out_path = "example.csv"
      
      # The sample CSV contains two columns: key and size.
      with open(src_path) as fin, open(out_path, "w") as fout:
          for line in fin:
              items = line.strip().split(",")
              key = items[0]
              size = items[1]
              enc_key = quote_plus(key.encode("utf-8"))
              # The enc_key variable is in encoded format.
              fout.write(enc_key + "," + size + "\n")
      

      After you run the code, the content of the output file example.csv is:

      assets%2Fimg%2Fen-US%2F1354977961%2Fp486238.jpg,1024
      url-that-requires-encoding%2F123.png,0
      url-accessible-without-encoding%2F123.png,1024
      Chinese%2FJapanese%2FKorean%2F123.png,1

      Note

      The rows in the example CSV file can be in any order. However, the columns must be in the same order as specified in the fileSchema field of the manifest.json file.

  2. Compress the CSV file

    You must compress the CSV file into a .csv.gz file. You can use one of the following methods:

    • Compress a single file

      For example, to compress a file named example.csv in the dir directory, run the following command:

      gzip -c example.csv > example.csv.gz
      Note

      The preceding gzip command deletes the source file after compression. To preserve the source file, run the gzip -c source_file > source_file.gz command.

      After compression, a .csv.gz file is created.

    • Compress multiple files

      For example, to compress three files named example1.csv, example2.csv, and example3.csv in the dir directory, run the following command:

      gzip -r dir
      Note

      The gzip command does not create an archive of the directory. Instead, it compresses each file in the specified directory separately and deletes the source files.

      After compression, three files are created in the dir directory: example1.csv.gz, example2.csv.gz, and example3.csv.gz.

  3. Create the manifest.json file

    You can configure multiple CSV files. The file contains the following fields.

    • fileFormat: Specifies that the list file format is CSV.

    • fileSchema: Corresponds to the file properties in the CSV file. Ensure that the order of properties matches the order in the CSV file.

      Note

      Make sure that the number of columns in the CSV file matches the number of fields in this configuration. Data Online Migration validates this consistency.

    • files:

      • key: The location of the CSV file in the bucket.

      • MD5checksum: The hexadecimal MD5 string used for validation. The string is not case-sensitive. Example: 91A76757B25C8BE78BC321DEEBA6A5AD. If this field is left empty, no validation is performed.

      • size: The size of the list file. If the value is not 0, validation is performed.

    The following example is for reference only.

    Note

    If you use a custom inventory for Data Online Migration, the mgwInventoryVersion field is required. Otherwise, the inventory is parsed as a native inventory. The native inventory formats of third-party sources vary, which can cause compatibility issues.

    {
        "fileFormat":"CSV",
        "fileSchema":"Key,Size",
        "files":[{
            "key":"dir/example1.csv.gz",
            "MD5checksum":"",
            "size":0
        },{
            "key":"dir/example2.csv.gz",
            "MD5checksum":"",
            "size":0
        }],
      "mgwInventoryVersion": "1.0" 
    }
  4. Upload the inventory files to OSS or a third-party source.

    1. Upload the manifest.json file and the compressed CSV list files. The names of the CSV list files must match the names specified in the manifest.json file.

    2. Record the path of the manifest.json file. You will need this path when you create the source data address.

Step 2: Create a destination bucket

Create an Object Storage Service (OSS) bucket as the destination to store the migrated data. For more information, see Create buckets.

Step 3: Create a RAM user and grant permissions

Important
  • The Resource Access Management (RAM) user is used to perform the data migration task. You must create RAM roles and perform the data migration task as the RAM user. We recommend that you create the RAM user within the Alibaba Cloud account that owns the source or destination OSS bucket.

  • For more information, see Create a RAM user and grant permissions to the RAM user.

Log on to the RAM console with an Alibaba Cloud account. On the Users page, find the RAM user that you created and click Add Permissions in the Actions column.

  1. System policy: AliyunOSSImportFullAccess. This policy grants permissions to manage Data Online Migration.

  2. Custom policy: The policy must include the ram:CreateRole, ram:CreatePolicy, ram:AttachPolicyToRole, and ram:ListRoles permissions.

    For more information about how to attach a custom policy, see Create a custom policy. The following sample code provides an example of a custom policy:

    {
        "Version":"1",
        "Statement":[
            {
                "Effect":"Allow",
                "Action":[
                    "ram:CreateRole",
                    "ram:CreatePolicy",
                    "ram:AttachPolicyToRole",
                    "ram:ListRoles"
                ],
                "Resource":"*"
            }
        ]
    }

Step 4: Grant permissions on the bucket that stores inventory lists

Perform the corresponding operations based on whether the bucket that stores inventory lists belongs to the current Alibaba Cloud account.

The bucket that stores inventory lists belongs to the current Alibaba Cloud account

  • Automatic authorization

    We recommend that you complete the authorization in the Data Online Migration console. For more information, see the "Step 2: Create a source data address" section of the Migrate data topic.

  • Manual authorization

    Grant permissions on the bucket that stores inventory lists

    On the Roles page, find the created RAM role and click Grant Permission in the Actions column.

    • Custom policy: Attach a custom policy that includes the oss:List* and oss:Get* permissions to the RAM role.

    For more information about how to attach a custom policy, see Create a custom policy. The following sample code provides an example of a custom policy.

    Note

    The following policy is only for reference. Replace <myInvBucket> with the name of the bucket that stores inventory lists.

    For more information about RAM policies for OSS, see Common examples of RAM policies.

    Important

    If server-side encryption by using Key Management Service managed keys (SSE-KMS) is configured for the bucket that stores inventory lists, you must attach the AliyunKMSFullAccess system policy to the RAM role.

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "oss:List*",
            "oss:Get*"
          ],
          "Resource": [
            "acs:oss:*:*:<myInvBucket>",
            "acs:oss:*:*:<myInvBucket>/*"
          ]
        }
      ]
    }

The bucket that stores inventory lists does not belong to the current Alibaba Cloud account

Grant permissions on the bucket that stores inventory lists

  1. Log on to the OSS console with the Alibaba Cloud account that owns the bucket that stores inventory lists.

  2. In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the bucket that stores inventory lists.

  3. In the left-side navigation pane, choose Permission Control > Bucket Policy.

  4. On the Bucket Policy tab, click Add by Syntax. On the page that appears, click Edit, enter the custom bucket policy in the code editor, and then click Save.

  • Custom policy:

    Grant the RAM role the permissions to list and read all resources in the bucket that stores inventory lists.

    Note

    The following policy is only for reference. Replace <otherInvBucket> with the name of the bucket that stores inventory lists, <myuid> with the ID of the Alibaba Cloud account that is used to log on to the Data Online Migration console, <otherUid> with the ID of the Alibaba Cloud account that owns the bucket that stores inventory lists, and <roleName> with the name of the RAM role that you created. For more information about RAM policies for OSS, see Common examples of RAM policies.

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "oss:List*",
            "oss:Get*"
          ],
          "Principal": [
             "arn:sts::<myUid>:assumed-role/<roleName>/*"
          ],
          "Resource": [
            "acs:oss:*:<otherUid>:<otherInvBucket>",
            "acs:oss:*:<otherUid>:<othereInvBucket>/*"
          ]
        }
      ]
    }

2. Configure a policy for a custom key

  1. If server-side encryption by using SSE-KMS is configured for the bucket that stores inventory lists, you must attach the AliyunKMSFullAccess system policy to the RAM role.

  2. If a custom key of KMS is used to encrypt data in the bucket that stores inventory lists, perform the following steps to configure a policy for the custom key:

    1. Log on to the KMS console and find the custom key.

    2. On the Key Policy tab of the details page, click Configure Key Policy. In the Key Policy panel, enter the ARN of the RAM role in the Cross-account User field. For more information, see Configure a key policy. image

Step 5: Grant permissions to the destination bucket

Perform the corresponding operations based on whether the destination bucket belongs to the current Alibaba Cloud account.

The destination bucket belongs to the current Alibaba Cloud account

  • Automatic authorization

    We recommend that you use automatic authorization in the Data Online Migration console. For more information, see the "Step 3: Create a destination data address" section of the Migrate data topic.

  • Manual authorization

    Note

    You can perform manual authorization in the following scenarios:

    • You want to grant permissions on multiple source buckets to a RAM role. This allows you to effectively manage multiple source buckets.

    • You do not want to create more RAM roles because the number of RAM roles within the current Alibaba Cloud account is close to the upper limit.

    • Automatic authorization is not applicable or cannot be used.

    1. Create a RAM role that is used to migrate data

    Log on to the RAM console in which the RAM user is created. On the Roles page, click Create Role.

    1. Principal Type: Select Cloud Service.

    2. Principal Name: Select Data Transport.

    3. Role Name: Enter the RAM role name. The RAM role name must be in lowercase.

    lQLPKIBPhyQhs7vNAlPNA-mwb_9Zfe8j6sMHtpv2syNfAA_1001_595

    image

    2. Grant permissions on the destination bucket to the RAM role

    On the Roles page, find the created RAM role and click Grant Permission in the Actions column.

    • Custom policy: Attach a custom policy that includes the oss:List*, oss:Get*, oss:Put*, and oss:AbortMultipartUpload* permissions to the RAM role.

    For more information about how to attach a custom policy, see Create a custom policy. The following sample code provides an example of the custom policy:

    Note

    The following policy is only for reference. Replace <myDestBucket> with the name of the destination bucket.

    For more information about RAM policies for OSS, see Common examples of RAM policies.

    Important

    If server-side encryption by using Key Management Service managed keys (SSE-KMS) is configured for the destination bucket, you must attach the AliyunKMSFullAccess system policy to the RAM role.

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "oss:List*",
            "oss:Get*",
            "oss:Put*",
            "oss:AbortMultipartUpload"
          ],
          "Resource": [
            "acs:oss:*:*:<myDestBucket>",
            "acs:oss:*:*:<myDestBucket>/*"
          ]
        }
      ]
    }

The destination bucket does not belong to the current Alibaba Cloud account

1. Create a RAM role that is used to migrate data

Log on to the RAM console in which the RAM user is created. On the Roles page, click Create Role.

  1. Principal Type: Select Cloud Service.

  2. Principal Name: Select Data Transport.

  3. Role Name: Enter the RAM role name. The RAM role name must be in lowercase.

lQLPKIBPhyQhs7vNAlPNA-mwb_9Zfe8j6sMHtpv2syNfAA_1001_595

image

2. Grant permissions on the destination bucket to the RAM role

Important

If you configure a bucket policy by specifying policy statements to grant the RAM role the required permissions, the new bucket policy overwrites the existing bucket policy. Make sure that the new bucket policy contains the content of the existing bucket policy. Otherwise, the authorization based on the existing bucket policy may fail.

  1. Log on to the OSS console with the Alibaba Cloud account that owns the destination bucket.

  2. In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the destination bucket.

  3. In the left-side pane of the bucket details page, choose Permission Control > Bucket Policy.

  4. On the Bucket Policy tab, click Add by Syntax and then click Edit. In the code editor, enter the custom bucket policy. Then, click Save.

    • Grant the RAM role the permissions to list, read, and delete objects in and write objects to the destination bucket.

Note

The following policy is only for reference. Replace <otherDestBucket> with the name of the destination bucket, <otherUid> with the ID of the Alibaba Cloud account that owns the destination bucket, <myUid> with the ID of the Alibaba Cloud account that is used to log on to the Data Online Migration console, and <roleName> with the name of the RAM role that you created. For more information about RAM policies for OSS, see Common examples of RAM policies.

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "oss:List*",
        "oss:Get*",
        "oss:Put*",
        "oss:AbortMultipartUpload"
      ],
      "Principal": [
         "arn:sts::<myUid>:assumed-role/<roleName>/*"
      ],
      "Resource": [
        "acs:oss:*:<otherUid>:<otherDestBucket>",
        "acs:oss:*:<otherUid>:<otherDestBucket>/*"
      ]
    }
  ]
}
3. Configure a policy for a custom key
  1. If SSE-KMS is configured for the destination bucket, you must attach the AliyunKMSFullAccess system policy to the RAM role.

  2. If a custom key of KMS is used to encrypt data in the destination bucket, perform the following steps to configure a policy for the custom key.

    1. Log on to the KMS console and find the custom key.

    2. On the Key Policy tab of the details page, click Configure Key Policy. In the Key Policy panel, enter the ARN of the RAM role in the Cross-account User field. For more information, see Configure a key policy. image