All Products
Search
Document Center

Data Lake Formation:Metadata migration

Last Updated:Mar 26, 2026

When you move from an E-MapReduce (EMR) cluster or a self-managed Hive environment to Data Lake Formation (DLF), metadata migration lets you transfer databases, tables, partitions, and functions into DLF without rewriting your data pipelines.

Prerequisites

Before you begin, ensure that you have:

  • A Hive version that DLF supports: 2.3.x or 3.1.x

  • A MySQL-backed Hive Metastore (the only supported source database type)

  • Network access from DLF to your MySQL instance:

    • Alibaba Cloud VPC connection — the VPC, vSwitch, and Security Group that match your EMR cluster or ApsaraDB RDS instance

    • Public Network Connection — port 3306 on your EMR cluster opened to the DLF Elastic IP address for your region (see DLF region and Elastic IP address reference)

  • An OSS path for storing migration task logs

Create a migration task

  1. Log on to the DLF console.

  2. In the left-side navigation pane, click Metadata > Metadata Migration.

  3. On the Migration Task tab, click Create Migration Task.

  4. Configure the source database, then click Next.

    ParameterDescription
    Database TypeOnly MySQL is supported.
    Mysql TypeSelect the type that matches your Hive Metastore backend. See the options below.
    Network Connection MethodSelect Alibaba Cloud VPC or Public Network Connection based on your MySQL type.

    Mysql Type options:

    • Aliyun RDS — ApsaraDB RDS MySQL provided by Alibaba Cloud. Enter the RDS Instance, Database Name, Username, and Password.

      Important

      ApsaraDB RDS metadata only supports Alibaba Cloud VPC connection access.

    • Other Mysql — MySQL built into an EMR cluster, a self-managed MySQL instance, or any other MySQL database. Enter the JDBC URL, Username, and Password.

      Important

      Use the intranet IP in the JDBC URL and connect via Alibaba Cloud VPC when possible. If you use Public Network Connection, enter the public IP instead.

    Network Connection Method options:

    • Alibaba Cloud VPC — Select the Virtual Private Cloud (VPC), vSwitch, and Security Group that match your EMR cluster or ApsaraDB RDS instance to avoid connectivity issues.

    • Public Network Connection — Add a rule on the EMR console to open port 3306 (default) to the DLF Elastic IP Address for your region. For details on adding security group rules, see Manage security groups.

  5. Configure the migration task, then click Next.

    ParameterDescription
    Task NameEnter a name for the migration task.
    Task Description(Optional) Enter notes about the task.
    Data CatalogSelect the target Data Catalog.
    Conflict Resolution PolicyChoose how to handle metadata that already exists in DLF. See the options below.
    Log Storage PathAll migration task logs are stored at this OSS location.
    Synchronization ObjectSelect the objects to migrate: Database, Function, Table, and Partition. Select all unless you have a specific reason to exclude an object type.
    Location Replacement(Optional) Replace source paths with new paths during migration. For example, when moving from HDFS to OSS-based storage, replace hdfs:// paths with oss:// paths.

    Conflict Resolution Policy options:

    • Update legacy metadata (recommended) — Legacy data will not be deleted. Updates metadata based on the existing DLF metadata.

    • Rebuild metadata — Deletes all existing DLF metadata, then creates new records from the source.

  6. Review the task configuration and click Confirm to create the task.

Manage migration tasks

On the Migration Task tab, use the actions in the Operation column for the target task:

ActionDescription
RunStart the migration task.
StopStop a currently running task.
Run RecordView detailed information about past task runs.
EditModify the Source Database Configuration or Migration Task Configuration.
DeleteRemove the migration task.

To view run logs, click the Execution History tab, then click View Log in the Operation column. After migration completes, the log shows whether the task succeeded or failed.

Verify migration results

After a successful migration run, confirm that the metadata is available in DLF:

  1. In the left-side navigation pane, click Metadata > Metadata Management.

  2. On the Database tab, select a Data Catalog and enter the Database Name to look up the migrated database.

  3. On the Data Table tab, select a Data Catalog and Database Name, then enter the Table Name to look up a migrated table.

What's next

DLF region and Elastic IP address reference

Use this table to find the DLF Elastic IP address when configuring Public Network Connection access.

RegionElastic IP address
Hangzhou121.41.166.235
Shanghai47.103.63.0
Beijing47.94.234.203
Shenzhen39.108.114.206
Singapore161.117.233.48
Frankfurt8.211.38.47
Zhangjiakou8.142.121.7
Hong Kong (China)8.218.148.213