MaxCompute Migration Assist (MMA) is a tool provided by MaxCompute to migrate data. This topic describes how to use MMA.
Features
Migration of data from Hive to MaxCompute
Migration of data across MaxCompute projects
Migration of MaxCompute data by using E-MapReduce (EMR), Data Lake Formation (DLF), and Object Storage Service (OSS)
How to migrate data
Hive data migration
You can use a Hive user-defined table-valued function (UDTF) or OSS to migrate Hive data to MaxCompute. This topic describes the data migration methods.
Use a Hive UDTF to migrate Hive data to MaxCompute.
In this scenario, Hive data can be transmitted with high concurrency based on the distributed capability of Hive.
Prerequisites
Each node of the Hive cluster can access MaxCompute.
Data migration process
MMA uses a Hive metastore to obtain metadata, including all table names, table schemas, and partition information.
MMA creates tables and partitions in MaxCompute based on the obtained schemas.
MMA commits SQL statements to Hive for UDTF execution.
The UDTF uses the Tunnel SDK of MaxCompute to write table data to MaxCompute.
Perform data verification.
NoteYou can separately execute the
SELECT COUNT(*)statement for the same table or multiple partitions of a table in the Hive console and the MaxCompute console, and compare the numbers of rows in the command output on both sides to check whether data migration is successful.
Use OSS to migrate Hive data to MaxCompute.
In this scenario, MMA migrates data to OSS and then MaxCompute reads the data from OSS. The following figure shows the data migration process.
Alibaba Cloud Data Transport, Jindo DistCp, or Juicesync is used to migrate data from Hadoop Distributed File System (HDFS) to OSS.
MMA uses a Hive metastore to obtain metadata, including all table names, table schemas, and partition information.
MMA creates an OSS external table and a standard table that corresponds to the external table in MaxCompute based on the obtained schemas and OSS path.
You can execute the
INSERT Standard table FROM SELECT OSS external tablestatement to import data from OSS to MaxCompute.
MaxCompute data migration
You can migrate data across MaxCompute projects that are in the same region, migrate MaxCompute data by using EMR, DLF, and OSS, and migrate data across MaxCompute projects that are in different regions. This topic describes the data migration methods.
Migrate data across MaxCompute projects that are in the same region.
Obtain all tables and partitions in the source project and create tables and partitions in the destination project.
Execute the
INSERT OVERWRITE Destination table FROM Source tablestatement to migrate data.
Migrate MaxCompute data by using EMR, DLF, and OSS.
In this scenario, you must use MaxCompute and DLF to create an external project of MaxCompute.
Obtain all tables and partitions in the source project and create tables and partitions in the destination project.
Execute the
INSERT OVERWRITE Destination table FROM Source tablestatement to migrate data.
Migrate data across MaxCompute projects that are in different regions.
In this scenario, MaxCompute CopyTask is used. CopyTask can copy table data from a project in one region to a project in another region.
Prerequisites: CopyTask is enabled for the source project.
MMA tasks and subtasks
MMA can commit migration tasks for a single database, multiple tables, or multiple partitions.
A migration task is divided into subtasks by partition and non-partitioned table. Subtasks are run to perform the migration operation. One subtask is used to migrate data from one non-partitioned table or one or more partitions.
MMA migration solutions
This topic describes two migration solutions that you can use to migrate data from Hadoop to MaxCompute.
Migration solution 1
If Express Connect circuits are used, you can use MMA to migrate Hive data to MaxCompute. The following figure shows the migration solution.
Migration solution 2
If no Express Connect circuits are used, you can use Data Transport to migrate HDFS data to OSS and then use MMA to write the data to MaxCompute. The following figure shows the migration solution.
References
Topic | Description |
This topic describes the preparations for configuring MMA and the configuration process. This helps you quickly build an MMA environment. | |
This topic describes the preparations for Hive data migration by using a Hive user-defined table-valued function (UDTF) and the data migration process. This helps you quickly understand the process of Hive data migration. | |
This topic describes the preparations for data migration between MaxCompute projects and the data migration process. This helps you quickly understand the process of MaxCompute data migration. | |
This topic describes the formats of partition filter expressions. This helps you quickly configure partition filtering parameters when you create a data migration task. | |
This topic describes how to view and manage data migration tasks and also describes the incremental migration method. This helps you quickly understand the features of migration tasks. |