All Products
Search
Document Center

E-MapReduce:Select a metadata service

Last Updated:Mar 26, 2026

E-MapReduce (EMR) metadata includes core information about data storage, data structure, and access permissions within EMR clusters. EMR supports three metadata services: Data Lake Formation (DLF), self-managed ApsaraDB RDS, and built-in MySQL. This topic compares the three services to help you choose the right one for your workload.

Which service should I use?

Situation Recommended service
New cluster for test or production use DLF
Existing clusters that need cross-cluster metadata sharing DLF
Fine-grained control over the metadata store with your own RDS instance Self-managed ApsaraDB RDS
Short-term proof-of-concept (POC) testing on a single cluster Built-in MySQL

For most workloads, use DLF. It requires no operations and maintenance (O&M) effort, supports all major engines (including MaxCompute and Hologres), and provides built-in high availability (HA).

Warning

Do not use built-in MySQL in test or production environments. The MySQL database runs on a single master node with no HA or cross-cluster sharing support, which can cause service instability.

Comparison of metadata services

Item DLF Self-managed ApsaraDB RDS Built-in MySQL
Backend storage Data is stored in DLF. Data is stored in ApsaraDB RDS for MySQL instances. Purchase and configure an instance before cluster creation. Data is stored in the MySQL instance of an EMR cluster.
Applicable environment Test and production Test and production POC testing of a single cluster only
Cross-cluster metadata sharing Supported Supported Not supported
Engine compatibility Hive, Spark, Presto, MaxCompute, and Hologres Hive, Spark, and Presto Hive, Spark, and Presto
Metadata management Visualized metadata retrieval, metadata management, multi-version management, data statistics, and lifecycle management None None
High availability Primary/secondary disaster recovery Primary/secondary disaster recovery Not supported
O&M cost No O&M required. Auto scaling is supported. Manual O&M required (upgrades and scale-out). Suitable for fine-grained cluster management. Increases upgrade costs. MySQL runs on a single cluster node.
Billing Currently free. See Billing for details. Charged by computing and storage resources. See Billable items for details. Free
Check supported regions before selecting a metadata service. For DLF supported regions, see Supported regions and endpoints.

Deployment architecture

DLF

Metadata is stored in DLF and shared across multiple clusters. The DLF client SDK exposes APIs compatible with Hive Metastore, so engines can access metadata directly through the SDK. For more information, see the Product introduction documentation.

Deployment architecture of DLF in a single cluster Deployment architecture of DLF in multiple clusters
image image

Self-managed ApsaraDB RDS

Metadata is stored in ApsaraDB RDS for MySQL instances and shared across multiple clusters via Hive Metastore.

Deployment architecture of self-managed ApsaraDB RDS in a single cluster Deployment architecture of self-managed ApsaraDB RDS in multiple clusters
image image

Built-in MySQL

Metadata is stored in MySQL instances deployed in EMR clusters, usually on the master nodes of the clusters. Because each cluster has its own MySQL database, metadata cannot be shared across clusters.

The default credentials for built-in MySQL are username root and password EMRroot1234.
Deployment architecture of built-in MySQL in a single cluster Deployment architecture of built-in MySQL in multiple clusters
image image

What's next