E-MapReduce (EMR) supports migrating Hive metadata stored in legacy storage types—Built-in MySQL or Unified Metabases—to Data Lake Formation (DLF). In 2020, Alibaba Cloud EMR launched DLF Unified Metadata as a new storage type to provide a better unified metadata service. This document covers when to migrate, what DLF provides, and how the four-phase migration process works.
When to migrate
Migrate to DLF if any of the following applies to your cluster:
-
Your cluster uses Built-in MySQL. An on-premises MySQL database runs in standalone mode, which cannot guarantee high availability and is prone to service interruptions.
-
Your cluster uses Unified Metabases. This storage type is being gradually discontinued. Clusters must switch to DLF Unified Metadata, available in the new EMR console.
-
Your cluster uses ApsaraDB RDS. Migration is optional but provides better storage performance and scalability.
Why DLF
DLF is a fully managed, highly available, and high-performance metadata service. It is compatible with multiple Hive metastore versions and integrates with open-source compute engines in EMR. Capabilities include:
-
Data profiling, data exploration, and data lake management
-
Data permission management
-
Integration with MaxCompute, Databricks DataInsight (DDI), and Hologres
For more information, see DLF overview.
Migration process
The Alibaba Cloud EMR and DLF teams support the entire migration. The following table describes each phase, the steps involved, and the estimated duration.
During migration (Phase 2), all cluster tasks must be suspended. Plan for approximately 30 minutes of task downtime.
| Phase | Steps | Participant | Estimated duration |
|---|---|---|---|
| 1. Preparations |
|
EMR team + you | 2 hours |
| 2. Migration | 1. Suspend running tasks and stop the metadata service. 2. Back up existing metadata. 3. Migrate metadata to DLF using the metadata migration feature, and check whether the migration is performed as expected. 4. Set the Type parameter to DLF Unified Metadata. 5. Resume suspended tasks. | EMR team + you | 30 minutes |
| 3. Check | Observe task execution for at least one week. If tasks run as expected, the migration is complete. If issues occur, determine whether to fix them online or initiate a rollback (see Phase 4). | EMR team + you | 1 week |
| 4. Rollback (optional) | 1. Suspend running tasks. 2. Compare metadata between DLF and the Hive metastore; write incremental data back to the Hive metastore. 3. Set the Type parameter to Unified Metabases. 4. Start the Hive metastore. 5. Resume suspended tasks and verify results. | EMR team + you | 30 minutes |
Get support
To start the migration, join the DingTalk group by searching for group number 33719678. Engineers will reach out to plan the migration with you.