All Products
Search
Document Center

Data Online Migration:Preparations

Last Updated:Apr 07, 2024

This topic describes how to prepare for a data migration task.

Step 1: Upload list files

HTTP or HTTPS list files contain two types of files, including one manifest.json file and one or more example.csv.gz files. An example.csv.gz file is a compressed CSV list file. The size of a single example.csv.gz file cannot exceed 25 MB. A manifest.json file is used to configure columns in each CSV file. You can upload the list files to Alibaba Cloud Object Storage Service (OSS) or Amazon Simple Storage Service (Amazon S3).

  1. Create a CSV list file.

    Create a CSV list file on your on-premises machine. A list file can contain up to eight columns separated by commas (,). Each line represents one file to be migrated. Multiple files are separated by using line feeds (\n). The following tables describe the columns.

    Important

    The Key and URL columns are required. Other columns are optional.

    • Required columns

      Column

      Required

      Description

      Limit

      URL

      Yes

      The download URL of the file to be migrated. Data Online Migration uses HTTP GET requests to download files from the HTTP or HTTPS URLs and uses HTTP HEAD requests to obtain the metadata of the files.

      Note

      Make sure that the URL can be accessed by using commands such as [curl -L --HEAD "$url"] and [curl -L --GET "$url"].

      The values of the URL and Key columns must be encoded. Otherwise, the migration may fail due to the special characters contained in the values.

      • Before you encode the value of the URL column, make sure that the URL can be accessed by using CLI tools such as curl. Then, perform URL encoding.

      • Before you encode the value of the Key column, make sure that you can obtain the required object name in OSS after the migration. Then, perform URL encoding.

      Important

      After you encode the values of the URL and Key columns, make sure that the following requirements are met. Otherwise, the migration may fail, or the source files are not migrated to the specified destination path.

      • A plus sign (+) in the original string is encoded as %2B.

      • A percent sign (%) in the original string is encoded as %25.

      • A comma (,) in the original string is encoded as %2C.

      For example, if the original string is a+b%c,d.file, the encoded string is a%2Bb%25c%2Cd.file.

      Key

      Yes

      The name of the file to be migrated. After a file is migrated, the name of the object that corresponds to the file consists of a prefix and the file name.

      The following code provides an example on how to encode the values of the URL and Key columns in Python:

      # -*- coding: utf-8 -*-
      import sys
      if sys.version_info.major == 3:
          from urllib.parse import quote_plus
      else:
          from urllib import quote_plus
      
      raw_urls = [
          # Format: ($url, $key)
          #    url: These urls can be accessed normally by using linux 'curl' or 'wget' cmd.
          #    key: These keys are the ObjectName you expect on OSS.
          ("http://www.example1.com/path/ab.file?t=aef87",  "ab.file"),
          ("http://www.example2.com/path/a+b.file",         "a+b.file"),
          ("http://www.example3.com/path/a%b.file",         "a%b.file"),
          ("http://www.example4.com/path/a,b.file",         "a,b.file"),
          ("http://www.example5.com/path/a b.file",         "a b.file"),
          ("http://www.example6.com/path/a and b.file",        "a and b.file"),
          ("http://www.example7.com/path/a%E4%B8%8Eb.file", "a%E4%B8%8Eb.file"),
          ("http://www.example8.com/path/a\\b.file",        "a\\b.file")
      ]
      
      for item in raw_urls:
          url, key = item[0], item[1]
          enc_url = quote_plus(url)
          enc_key = quote_plus(key)
          # The enc_url and enc_key vars are encoded format, you can use them to build csv files.
          print("(%s, %s) -> (%s, %s)" % (url, key, enc_url, enc_key))
      
    • All columns

      Column

      Required

      Description

      Key

      Yes

      The name of the file to be migrated. After a file is migrated, the name of the object that corresponds to the file consists of a prefix and the file name.

      URL

      Yes

      The download URL of the file to be migrated. Data Online Migration uses HTTP GET requests to download files from the HTTP or HTTPS URLs and uses HTTP HEAD requests to obtain the metadata of the files.

      Size

      No

      The size of the file to be migrated.

      StorageClass

      No

      The storage class of the file in the source bucket.

      LastModifiedDate

      No

      The time when the file to be migrated was last modified.

      ETag

      No

      The entity tag (ETag) of the file to be migrated.

      HashAlg

      No

      The hash algorithm of the file to be migrated.

      HashValue

      No

      The hash value of the file to be migrated.

      Note

      The order of the preceding columns varies in CSV files. You need to only make sure that the order of the columns in a CSV file is the same as that in the fileSchema column of the manifest.json file.

  2. Compress one or more CSV list files.

    Compress the CSV file into a CSV GZ file. The following examples show how to compress one or more CSV files:

    • Compress a CSV file

      In this example, a file named file1 resides in the dir directory. Run the following command to compress the file:

      gzip -r dir
      Note

      If you run the preceding gzip command to compress a file, the source file is not retained. To retain both the compressed file and the source file, run the gzip -c Source file >Source file.gz command.

      The file1.gz file is generated.

    • Compress multiple CSV files

      In this example, the file1, file2, and file3 files reside in the dir dictionary. Run the following command to compress the files:

      gzip -r dir
      Note

      The gzip command is not used to package a directory. It only separately compresses all files in the directory.

      The file1.gz, file2.gz, and file3.gz files are generated.

  3. Create a manifest.json file.

    You can use a manifest.json file to configure multiple CSV files. The following information shows the content of a manifest.json file:

    • fileFormat: the format of the list file. Example: CSV.

    • fileSchema: the columns in the CSV file. Pay attention to the order of columns.

    • files:

      • key: the location of the CSV file in the source bucket.

      • mD5checksum: the MD5 value of the CSV file. The value is a hexadecimal MD5 string, which is not case-sensitive. Example: 91A76757B25C8BE78BC321DEEBA6A5AD. If you do not specify this parameter, the CSV file is not verified.

      • size: the size of the CSV file.

    The following sample code provides an example:

    {
        "fileFormat":"CSV",
        "fileSchema":"Url, Key, Bucket, Size, StorageClass, LastModifiedDate, ETag, HashAlg, HashValue ",
        "files":[{
            "key":"dir/example1.csv.gz",
            "mD5checksum":"",
            "size":0
        },{
            "key":"dir/example2.csv.gz",
            "mD5checksum":"",
            "size":0
        }]
    
    }
  4. Upload the list files that you create to OSS or Amazon S3.

    • Upload the list files to OSS. For more information, see Simple upload.

      Note
      • After the list files are uploaded to OSS, Data Online Migration downloads the list files and migrates the files based on the specified URLs.

      • When you create a migration task, specify the bucket in which a list file is stored. You must specify the path of the list file in the Directory in which the list file resides/manifest.json format. Example: dir/manifest.json.

    • Upload the list files to Amazon S3.

      Note
      • After the list files are uploaded to Amazon S3, Data Online Migration downloads the list files and migrates the files based on the specified URLs.

      • When you create a migration task, specify the bucket in which a list file is stored. You must specify the path of the list file in the Directory in which the list file resides/manifest.json format. Example: dir/manifest.json.

Create a destination bucket

Create a destination bucket to store the migrated data. For more information, see Create buckets.

Step 3: Create a RAM user

  1. Log on to the Resource Access Management (RAM) console.
  2. In the left-side navigation pane, choose Identities > Users.
  3. On the Users page, click Create User.
  4. In the User Account Information section of the Create User page, configure the Logon Name and Display Name parameters.
  5. In the Access Mode section, select Console Access and OpenAPI Access. Then, save the generated logon name, password, AccessKey ID, and AccessKey secret.
    • Console Access: If you select this option, you must configure the console password, password reset settings, and multi-factor authentication (MFA) settings.
    • OpenAPI Access: If you select this option, an AccessKey pair is automatically created for the RAM user. The RAM user can call API operations or use other development tools to access Alibaba Cloud resources.
    Note If you need to migrate data across accounts, you must save the logon name, password, AccessKey ID, and AccessKey secret that are generated for each RAM user by the corresponding Alibaba Cloud account.
  6. After the RAM user is created, go to the Users page. Find the RAM user that you want to manage and click Add Permissions in the Actions column to grant the RAM user the AliyunOSSFullAccess permissions.
  7. In the left-side navigation pane, click Overview.
  8. On the page that appears, navigate to the Account Management section and click the link under RAM user logon. On the page that appears, enter the logon name and password of the RAM user to log on to the Alibaba Cloud Management Console.

Step 4: Grant permissions to the RAM user

After the RAM user is created, go to the Users page in the RAM console. Find the RAM user that you want to manage and click Add Permissions in the Actions column to grant permissions to the RAM user.