When you move from an E-MapReduce (EMR) cluster or a self-managed Hive environment to Data Lake Formation (DLF), metadata migration lets you transfer databases, tables, partitions, and functions into DLF without rewriting your data pipelines.
Prerequisites
Before you begin, ensure that you have:
A Hive version that DLF supports: 2.3.x or 3.1.x
A MySQL-backed Hive Metastore (the only supported source database type)
Network access from DLF to your MySQL instance:
Alibaba Cloud VPC connection — the VPC, vSwitch, and Security Group that match your EMR cluster or ApsaraDB RDS instance
Public Network Connection — port 3306 on your EMR cluster opened to the DLF Elastic IP address for your region (see DLF region and Elastic IP address reference)
An OSS path for storing migration task logs
Create a migration task
Log on to the DLF console.
In the left-side navigation pane, click Metadata > Metadata Migration.
On the Migration Task tab, click Create Migration Task.
Configure the source database, then click Next.
Parameter Description Database Type Only MySQL is supported. Mysql Type Select the type that matches your Hive Metastore backend. See the options below. Network Connection Method Select Alibaba Cloud VPC or Public Network Connection based on your MySQL type. Mysql Type options:
Aliyun RDS — ApsaraDB RDS MySQL provided by Alibaba Cloud. Enter the RDS Instance, Database Name, Username, and Password.
ImportantApsaraDB RDS metadata only supports Alibaba Cloud VPC connection access.
Other Mysql — MySQL built into an EMR cluster, a self-managed MySQL instance, or any other MySQL database. Enter the JDBC URL, Username, and Password.
ImportantUse the intranet IP in the JDBC URL and connect via Alibaba Cloud VPC when possible. If you use Public Network Connection, enter the public IP instead.
Network Connection Method options:
Alibaba Cloud VPC — Select the Virtual Private Cloud (VPC), vSwitch, and Security Group that match your EMR cluster or ApsaraDB RDS instance to avoid connectivity issues.
Public Network Connection — Add a rule on the EMR console to open port 3306 (default) to the DLF Elastic IP Address for your region. For details on adding security group rules, see Manage security groups.
Configure the migration task, then click Next.
Parameter Description Task Name Enter a name for the migration task. Task Description (Optional) Enter notes about the task. Data Catalog Select the target Data Catalog. Conflict Resolution Policy Choose how to handle metadata that already exists in DLF. See the options below. Log Storage Path All migration task logs are stored at this OSS location. Synchronization Object Select the objects to migrate: Database, Function, Table, and Partition. Select all unless you have a specific reason to exclude an object type. Location Replacement (Optional) Replace source paths with new paths during migration. For example, when moving from HDFS to OSS-based storage, replace hdfs://paths withoss://paths.Conflict Resolution Policy options:
Update legacy metadata (recommended) — Legacy data will not be deleted. Updates metadata based on the existing DLF metadata.
Rebuild metadata — Deletes all existing DLF metadata, then creates new records from the source.
Review the task configuration and click Confirm to create the task.
Manage migration tasks
On the Migration Task tab, use the actions in the Operation column for the target task:
| Action | Description |
|---|---|
| Run | Start the migration task. |
| Stop | Stop a currently running task. |
| Run Record | View detailed information about past task runs. |
| Edit | Modify the Source Database Configuration or Migration Task Configuration. |
| Delete | Remove the migration task. |
To view run logs, click the Execution History tab, then click View Log in the Operation column. After migration completes, the log shows whether the task succeeded or failed.
Verify migration results
After a successful migration run, confirm that the metadata is available in DLF:
In the left-side navigation pane, click Metadata > Metadata Management.
On the Database tab, select a Data Catalog and enter the Database Name to look up the migrated database.
On the Data Table tab, select a Data Catalog and Database Name, then enter the Table Name to look up a migrated table.
What's next
Migrate EMR metadata to DLF — Step-by-step walkthrough for migrating metadata from an EMR cluster.
DLF region and Elastic IP address reference
Use this table to find the DLF Elastic IP address when configuring Public Network Connection access.
| Region | Elastic IP address |
|---|---|
| Hangzhou | 121.41.166.235 |
| Shanghai | 47.103.63.0 |
| Beijing | 47.94.234.203 |
| Shenzhen | 39.108.114.206 |
| Singapore | 161.117.233.48 |
| Frankfurt | 8.211.38.47 |
| Zhangjiakou | 8.142.121.7 |
| Hong Kong (China) | 8.218.148.213 |