All Products
Search
Document Center

MaxCompute:,

Last Updated:Nov 03, 2025

MaxCompute Migration Service (MMS) migrates data from various data sources to MaxCompute. It integrates with the MaxCompute Spark engine to simplify large-scale data migration from self-managed data sources, which reduces configuration complexity and operations and maintenance (O&M) costs.

Overview

Architecture

MaxCompute Migration Service (MMS) migrates both metadata and data.

image
  • Metadata migration: MMS retrieves metadata from the data source using metadata APIs, such as Hive Metastore SDK and DataBricks SDK. It then generates MaxCompute Data Definition Language (DDL) statements and executes them in MaxCompute to complete the metadata migration.

  • Data migration: After the metadata is synchronized, MMS generates and submits one or more Spark jobs that run on MaxCompute based on the migration job configuration. These Spark jobs pull data from the data source and write it to the target tables in MaxCompute. This process is managed by the MMS service, which eliminates the need for Spark job development and O&M.

Migration flow

The following figure illustrates the workflow of MaxCompute Migration Service (MMS). The process includes the following core steps:

  1. Load metadata: After you create a migration job, MMS connects to the external data source to read and load the metadata, such as table schemas and partition information. MMS then stores the metadata in its own database for later use.

  2. Create a migration job: MMS supports three types of migration jobs: full database migration, partial table migration, and partial partition migration. Each migration job is split into multiple subtasks that run concurrently to migrate data.

  3. Transfer data and metadata: Each concurrent subtask independently pulls data from the data source. It first creates the corresponding target table or partition in the destination project and then writes the data.

  4. Verify data (Optional): After the data is migrated, MMS can perform a data verification step. It verifies data integrity by comparing the number of rows in the source and destination tables or partitions.

image

Glossary

  • Data source

    • The object to be migrated, such as one or more Hive databases. Different data sources have different data layers. MMS maps the data layers of different data sources to three layers: Database, Schema, and Table. The schema is a property of the table. The following table describes the data layers.

      Data source

      Data layer

      Hive

      Database.Table

      MaxCompute

      Project.Schema.Table or Project.Table

    • The following table lists the data retrieval APIs used by different data sources:

      Data source type

      Data retrieval API

      MaxCompute

      • Storage API

      • SQL

      BigQuery

      Storage Read API

      Hive

      HDFS or S3

      Databricks

      • Azure Blob Storage

      • Databricks JDBC

  • Migration job

    A migration job defines the objects to be migrated, which can be a database, multiple tables, or multiple partitions.

  • Migration task

    After you select the objects to migrate and submit the migration job, MMS splits the job into multiple independent migration tasks based on the configuration. A migration task is the actual unit of execution. The task types include Spark and SQL jobs. Each task can correspond to a non-partitioned table or multiple partitions of a partitioned table. The task execution process includes metadata migration, data migration, and data verification.

  • Data verification

    After data migration is complete, MMS performs verification to ensure data consistency. The verification method involves executing SELECT COUNT(*) on both the source and destination to compare the number of rows in a table or partition. This verifies data integrity. The verification results are recorded in the task logs.

Migration steps

FAQ

What fees are incurred when I use MMS for data migration?

MaxCompute Migration Service (MMS) is free of charge. The following fees are incurred during data migration:

  1. Computing resource fees at the destination: MMS submits Spark or SQL jobs in a MaxCompute project to perform data migration. These jobs consume MaxCompute computing units (CUs) and are billed based on MaxCompute billing standards. The supported billing methods are pay-as-you-go and subscription.

  2. Network traffic fees: Network connections are required during data migration, which incurs network traffic fees.

  3. Data read fees at the source: During data migration, MMS calls the data retrieval APIs of the source to read data. This may incur data read fees from the source, based on its billing rules.

How do I choose between MMS and DataWorks Data Integration for data migration?

  • MMS: MMS is suitable for one-time, large-batch data migration.

  • DataWorks Data Integration: This service is suitable for scheduled and continuous data synchronization and integration. It also supports a wide range of data sources.