All Products
Search
Document Center

E-MapReduce:Select a metadata service

Last Updated:Dec 10, 2025

E-MapReduce (EMR) metadata refers to the core information related to data storage, data structure, and access permissions of EMR clusters. EMR allows you to store metadata in Data Lake Formation (DLF), ApsaraDB RDS for MySQL, and built-in MySQL. This topic describes the differences among the metadata services. You can specify a metadata service based on your business requirements.

Differences among the metadata services

Item

DLF

Self-managed ApsaraDB RDS

Built-in MySQL

Backend storage

Data is stored in DLF.

Data is stored in ApsaraDB RDS for MySQL instances. You must purchase an ApsaraDB RDS for MySQL instance in advance and configure the network environment.

Data is stored in the MySQL instances of an EMR cluster.

Applicable environment

Test and production environments.

Test and production environments.

Proof of concept (POC) testing of a single cluster.

Note

We recommend that you do not use built-in MySQL. The local MySQL database is deployed on a single node of an EMR cluster. As a result, high availability for services cannot be ensured, and stability issues may occur.

Cross-cluster metadata sharing

Supported.

Supported.

Not supported.

Engine compatibility

Hive, Spark, Presto, MaxCompute, and Hologres are supported.

  • Hive, Spark, and Presto are supported.

  • MaxCompute and Hologres are not supported.

  • Hive, Spark, and Presto are supported.

  • MaxCompute and Hologres are not supported.

Metadata management

Visualized metadata retrieval, metadata management, multi-version management, data statistics, and lifecycle management are supported.

None.

None.

High availability (HA)

Primary/secondary disaster recovery is supported.

Primary/secondary disaster recovery is supported.

Disaster recovery is not supported.

O&M cost

Auto scaling is supported. You do not need to perform O&M operations.

You must manually perform O&M operations, such as upgrade and scale-out. Self-managed ApsaraDB RDS is suitable for scenarios in which you want to manage EMR clusters in a flexible and fine-grained manner.

The local MySQL database is deployed on a single node of an EMR cluster, which increases the upgrade costs.

Billing

At present, DLF is free of charge. For information about the billing of DLF, see Billing.

The basic fees of an ApsaraDB RDS instance consists of the fees of computing resources and the fees of storage resources. For more information, see Billable items.

None.

Note

When you select a metadata service, you must pay attention to the regions that support each metadata service. For information about the supported regions of DLF, see Supported regions and endpoints.

Deployment architecture of the metadata services

Deployment architecture of DLF

Metadata is stored in DLF. DLF supports metadata sharing among multiple clusters. The DLF client SDK provides APIs that are compatible with Hive Metastores. This way, specific engines can directly use the DLF client SDK to access metadata in DLF. Users can also use the DLF client to access metadata in DLF. For more information, see the topics in the Product Introduction directory.

Deployment architecture of DLF in a single cluster

Deployment architecture of DLF in multiple clusters

imageimage

Deployment architecture of self-managed ApsaraDB RDS

Metadata is stored in ApsaraDB RDS for MySQL instances. Self-managed ApsaraDB RDS supports metadata sharing among multiple clusters, and metadata can be accessed by Hive Metastores in these clusters.

Deployment architecture of self-managed ApsaraDB RDS in a single cluster

Deployment architecture of self-managed ApsaraDB RDS in multiple clusters

imageimage

Deployment architecture of built-in MySQL

Metadata is stored in MySQL. MySQL Server instances are deployed in EMR clusters, usually on the master nodes of the clusters. Metadata cannot be shared among multiple clusters because each cluster has a MySQL database.

Note

The username that is used to log on to the built-in MySQL is root and the password is EMRroot1234.

Deployment architecture of built-in MySQL in a single cluster

Deployment architecture of built-in MySQL in multiple clusters

imageimage

References