All Products
Search
Document Center

MaxCompute:Migrate MaxCompute data

Last Updated:Mar 11, 2024

This topic describes the preparations and procedure of MaxCompute data migration.

Preparations

  • Data migration across MaxCompute projects in the same region

    Prepare an account that can access both the source project and the destination project. The account must have the List and CreateTable permissions on the projects and read and write permissions on tables in the projects.

  • MaxCompute data migration by using E-MapReduce (EMR), Data Lake Formation (DLF), and Object Storage Service (OSS)

    Build a data lakehouse for data migration. For more information about how to build a data lakehouse, see Lakehouse of MaxCompute. The data migration process in this scenario is the same as the data migration process when you migrate data across MaxCompute projects in the same region.

  • Data migration across MaxCompute projects in different regions

    • Copytasks cannot be used to enable cross-border data replication due to security compliance requirements. If the copytask feature is enabled for projects in the same region, the default validity period of the feature is one month. The copytask feature is disabled after one month.

    • The copytask feature must be enabled for the source project. You can click the link to apply for enabling the copytask feature for the source project.

Procedure

  1. Add a data source.

    1. In the left-side navigation pane of the MaxCompute Migration Assist (MMA) console, click Data Sources. The Data Sources page appears.

    2. Click Add Data Source to go to the Add Data Source page.

    3. Set Data Source Type to MAXCOMPUTE and click Next.

    4. Configure the parameters for the data source. The following table describes the parameters.

      Parameter

      Description

      Data Source Name

      The name of the data source. Enter a custom name.

      maxcompute endpoint

      The endpoint of the region in which the source project resides. For more information, see Endpoints.

      maxcompute access id

      The AccessKey ID that is used to access MaxCompute. For more information about how to obtain an AccessKey secret, see Create an Alibaba Cloud account.

      maxcompute access key

      The AccessKey secret that is used to access MaxCompute. For more information about how to obtain an AccessKey secret, see Create an Alibaba Cloud account.

      maxcompute default project (for SQL)

      The default project whose quota is used by the SQL statements related to the source project. If the source project is Project A and the default project is Project B, the quota of Project B is used when you execute select * from A.table.

      Source MaxCompute Projects

      The names of projects whose data you want to migrate. Separate multiple project names with commas (,).

      instance number of one copyTask (Only for Data Migration Across MaxCompute Projects in Different Regions)

      This parameter is used only when you migrate data across MaxCompute projects in different regions. The number of parallel threads in each copytask when you migrate data across MaxCompute projects in different regions.

      SQL Parameter of MaxCompute Migration Task (Only for Data Migration Across MaxCompute Projects in the Same Region)

      This parameter is used only when you migrate data across MaxCompute projects in the same region. The flag parameter that is used when you execute the SQL statement to migrate data. In most cases, you can retain the default value of this parameter. If an SQL error is reported during the execution of the statement, you can send the LogView information to MaxCompute technical support personnel.

      Maximum Number of Partitions Processed in a Task (Only for Data Migration Across MaxCompute Projects in the Same Region)

      This parameter is used only when you migrate data across MaxCompute projects in the same region. An MMA subtask can migrate data from multiple partitions of a table at the same time. This parameter specifies the maximum number of partitions whose data can be migrated at the same time.

      Meta API Access Concurrency

      The maximum number of concurrent accesses to the source project. Recommended value: 20.

      Table Blacklist (dbname.tablename Format)

      The value of this parameter is in the dbname.tablename format. Separate multiple table names with commas (,).

      Table Whitelist (dbname.tablename Format)

      The value of this parameter is in the dbname.tablename format. Separate multiple table names with commas (,).

    5. Click Submit at the bottom of the page.

      Note

      If the configurations are correct, MMA pulls metadata. After the metadata is pulled, you are redirected to the Data Sources page. Otherwise, an error occurs. In this case, you must check the configurations, enter correct values, and submit the configurations again.

    6. When the progress bar for pulling metadata reaches 100%, you are redirected to the Data Sources page.

  2. Create a data migration task.

    MMA allows you to create migration tasks for a single database, multiple tables, and multiple partitions.

    Note
    • A migration task for a single database migrates data from a single database.

    • A migration task for multiple tables migrates data from one or more tables.

    • A migration task for multiple partitions migrates data from one or more partitions.

    • Migrate data from multiple tables.

      1. In the left-side navigation pane of the MMA console, click Data Sources. On the Data Sources page, click the name of the data source whose data you want to migrate.

      2. On the details page of the selected data source, click the name of the database whose data you want to migrate.

      3. Select the tables whose data you want to migrate and click Create Migration Task.

      4. In the Create Migration Task dialog box, configure the parameters based on your business requirements. The following table describes the parameters.

        Parameter

        Description

        Name

        The name of the migration task. We recommend that you enter a task name that can help you organize migration records.

        Task Type

        The type of the task. You can configure this parameter based on your business requirements. Valid values:

        • MaxCompute Projects in the Same Region.

        • MaxCompute Projects Across Regions.

        • MaxCompute Verification. This type of task compares data of all the same tables between the source and destination projects.

        MaxCompute Project

        The destination MaxCompute project.

        Tables

        A list of tables whose data you want to migrate. Separate multiple table names with commas (,).

        Enable Verification

        By default, this feature is enabled.

        Incremental Update

        By default, this feature is enabled. If this feature is enabled, data of the partitions whose data has been migrated is not migrated again.

        Migrate Schema Only

        Specifies whether to migrate only the table schema and partition values. You can determine whether to enable this feature based on your business requirements.

        Partition Filter

        For more information, see Partition filter expressions.

        Table Name Mapping

        The name of the table in the destination project after data of the table is migrated.

      5. Click OK.

        Note

        If the configuration of the migration task is correct, you can view the migration task in the Tasks section on the Migration Tasks page and view the related subtasks in the Subtasks section on the Migration Tasks page.

    • Migrate data from multiple partitions.

      1. In the left-side navigation pane of the MMA console, click Data Sources. On the Data Sources page, click the name of the data source whose data you want to migrate.

      2. On the details page of the selected data source, click the name of the database whose data you want to migrate.

      3. Click the Partitions tab. On this tab, select the partitions whose data you want to migrate.

      4. Click Create Migration Task. In the Create Migration Task dialog box, configure the parameters based on your business requirements. The following table describes the parameters.

        Parameter

        Description

        Name

        The name of the migration task. We recommend that you enter a task name that can help you organize migration records.

        Task Type

        The type of the task. You can configure this parameter based on your business requirements. Valid values:

        • MaxCompute Projects in the Same Region.

        • MaxCompute Projects Across Regions.

        • MaxCompute Verification. This type of task compares data of all the same tables between the source and destination projects.

        MaxCompute Project

        The destination MaxCompute project.

        Enable Verification

        By default, this feature is enabled.

        Migrate Schema Only

        Specifies whether to migrate only the table schema and partition values. You can determine whether to enable this feature based on your business requirements.

        Partitions

        Retain the default value.

        Table Name Mapping

        The name of the table in the destination project after data of the table is migrated.

      5. Click OK.

        Note

        If the configuration of the migration task is correct, you can view the migration task in the Tasks section on the Migration Tasks page and view the related subtasks in the Subtasks section on the Migration Tasks page.

    • Migrate data from a single project.

      Note

      If a large amount of data exists in a project, we recommend that you do not migrate all data from the project at a time. You can create migration tasks for multiple tables to migrate data in batches.

      1. In the left-side navigation pane of the MMA console, click Data Sources. On the Data Sources page, click the name of the data source whose data you want to migrate.

      2. Find the database whose data you want to migrate and click Migrate.

      3. In the Create Migration Task dialog box, configure the parameters based on your business requirements. The following table describes the parameters.

        Parameter

        Description

        Name

        The name of the migration task. We recommend that you enter a task name that can help you organize migration records.

        Task Type

        The type of the task. You can configure this parameter based on your business requirements. Valid values:

        • MaxCompute Projects in the Same Region.

        • MaxCompute Projects Across Regions.

        • MaxCompute Verification. This type of task compares data of all the same tables between the source and destination projects.

        MaxCompute Project

        The destination MaxCompute project.

        Table Whitelist

        A list of tables whose data you want to migrate. Separate multiple tables with commas (,).

        Enable Verification

        By default, this feature is enabled.

        Incremental Update

        By default, this feature is enabled. If this feature is enabled, data of the partitions whose data has been migrated is not migrated again.

        Migrate Schema Only

        Specifies whether to migrate only the table schema and partition values.

        Partition Filter

        For more information, see Partition filter expressions.

        Table Name Mapping

        The name of the table in the destination project after data of the table is migrated.

      4. Click OK.

        Note

        If the configuration of the migration task is correct, you can view the migration task in the Tasks section on the Migration Tasks page and view the related subtasks in the Subtasks section on the Migration Tasks page.