All Products
Search
Document Center

MaxCompute:Overview of MMA

Last Updated:Apr 11, 2024

MaxCompute Migration Assist (MMA) is a tool provided by MaxCompute to migrate data. This topic describes how to use MMA.

Features

  • Migration of data from Hive to MaxCompute

  • Migration of data across MaxCompute projects

  • Migration of MaxCompute data by using E-MapReduce (EMR), Data Lake Formation (DLF), and Object Storage Service (OSS)

How to migrate data

Hive data migration

You can use a Hive user-defined table-valued function (UDTF) or OSS to migrate Hive data to MaxCompute. This topic describes the data migration methods.

  • Use a Hive UDTF to migrate Hive data to MaxCompute.

    In this scenario, Hive data can be transmitted with high concurrency based on the distributed capability of Hive.

    • Prerequisites

      Each node of the Hive cluster can access MaxCompute.

    • Data migration process通过Hive UDTF迁移数据

      1. MMA uses a Hive metastore to obtain metadata, including all table names, table schemas, and partition information.

      2. MMA creates tables and partitions in MaxCompute based on the obtained schemas.

      3. MMA commits SQL statements to Hive for UDTF execution.

      4. The UDTF uses the Tunnel SDK of MaxCompute to write table data to MaxCompute.

      5. Perform data verification.

        Note

        You can separately execute the SELECT COUNT(*) statement for the same table or multiple partitions of a table in the Hive console and the MaxCompute console, and compare the numbers of rows in the command output on both sides to check whether data migration is successful.

  • Use OSS to migrate Hive data to MaxCompute.

    In this scenario, MMA migrates data to OSS and then MaxCompute reads the data from OSS. The following figure shows the data migration process.通过OSS迁移Hive数据

    1. Alibaba Cloud Data Transport, Jindo DistCp, or Juicesync is used to migrate data from Hadoop Distributed File System (HDFS) to OSS.

    2. MMA uses a Hive metastore to obtain metadata, including all table names, table schemas, and partition information.

    3. MMA creates an OSS external table and a standard table that corresponds to the external table in MaxCompute based on the obtained schemas and OSS path.

    4. You can execute the INSERT Standard table FROM SELECT OSS external table statement to import data from OSS to MaxCompute.

MaxCompute data migration

You can migrate data across MaxCompute projects that are in the same region, migrate MaxCompute data by using EMR, DLF, and OSS, and migrate data across MaxCompute projects that are in different regions. This topic describes the data migration methods.

  • Migrate data across MaxCompute projects that are in the same region.

    1. Obtain all tables and partitions in the source project and create tables and partitions in the destination project.

    2. Execute the INSERT OVERWRITE Destination table FROM Source table statement to migrate data.

  • Migrate MaxCompute data by using EMR, DLF, and OSS.

    In this scenario, you must use MaxCompute and DLF to create an external project of MaxCompute.

    1. Obtain all tables and partitions in the source project and create tables and partitions in the destination project.

    2. Execute the INSERT OVERWRITE Destination table FROM Source table statement to migrate data.

  • Migrate data across MaxCompute projects that are in different regions.

    In this scenario, MaxCompute CopyTask is used. CopyTask can copy table data from a project in one region to a project in another region.

    Prerequisites: CopyTask is enabled for the source project.

MMA tasks and subtasks

  • MMA can commit migration tasks for a single database, multiple tables, or multiple partitions.

  • A migration task is divided into subtasks by partition and non-partitioned table. Subtasks are run to perform the migration operation. One subtask is used to migrate data from one non-partitioned table or one or more partitions.

MMA migration solutions

This topic describes two migration solutions that you can use to migrate data from Hadoop to MaxCompute.

  • Migration solution 1

    If Express Connect circuits are used, you can use MMA to migrate Hive data to MaxCompute. The following figure shows the migration solution.迁移链路一

  • Migration solution 2

    If no Express Connect circuits are used, you can use Data Transport to migrate HDFS data to OSS and then use MMA to write the data to MaxCompute. The following figure shows the migration solution.迁移链路二

References

Topic

Description

Install and configure MMA

This topic describes the preparations for configuring MMA and the configuration process. This helps you quickly build an MMA environment.

Migrate Hive data

This topic describes the preparations for Hive data migration by using a Hive user-defined table-valued function (UDTF) and the data migration process. This helps you quickly understand the process of Hive data migration.

Migrate MaxCompute data

This topic describes the preparations for data migration between MaxCompute projects and the data migration process. This helps you quickly understand the process of MaxCompute data migration.

Partition filter expressions

This topic describes the formats of partition filter expressions. This helps you quickly configure partition filtering parameters when you create a data migration task.

View and manage migration tasks

This topic describes how to view and manage data migration tasks and also describes the incremental migration method. This helps you quickly understand the features of migration tasks.