All Products
Search
Document Center

E-MapReduce:Hive unified metadata

Last Updated:Mar 26, 2026

In E-MapReduce (EMR) V2.4.0 and later, you can replace the default per-cluster MySQL database with a centralized Hive metastore. The centralized metastore persists metadata independently of cluster lifecycle, enabling compute-storage separation and cross-cluster data sharing.

Important

The built-in ApsaraDB RDS metastore has fixed limits: 200 MiB total capacity, 720,000 queries per hour, and 144,000 updates per hour. It is not upgradeable. For production workloads or large datasets, create a dedicated ApsaraDB RDS instance to host your metastore.

Warning

The Hive unified metadata storage type is being phased out. Migrate to the Data Lake Formation (DLF) unified metadata storage type in the new EMR console. If you are a new EMR user, use DLF unified metadata from the start. See Migration of EMR metadata.

Hive metadatabases

Local vs. unified metastore

Dimension Local metastore (default) Unified metastore
Metadata lifecycle Tied to cluster — deleted when the cluster is released Independent — persists after cluster release
Cluster release Export metadata manually before releasing Release the cluster without migrating metadata
Data sharing Not shared across clusters All clusters read from the same metastore
Storage backend MySQL on the cluster node ApsaraDB RDS (managed)
Console management Not available — use Hue on the cluster Available via the EMR console
Recommended for Quick testing and development Production, shared workloads, and compute-storage separation

Benefits

Persistent metadata storage

With a local metastore, metadata is stored in MySQL on the cluster node and is deleted when the cluster is released — including pay-as-you-go clusters that are no longer needed. With unified metadata, the metastore survives cluster release. Before releasing a cluster or deleting data in Object Storage Service (OSS) or Hadoop Distributed File System (HDFS), delete the corresponding tables and databases from the metastore to prevent dirty metadata buildup.

Separation of compute and storage

EMR can store data in OSS at significantly lower cost than HDFS. Clusters then serve purely as compute resources and can be released when idle, with no metadata migration required.

Data sharing

When data lives in OSS, all clusters can query it directly through the shared metastore — no metadata migration or restructuring required. EMR clusters running different services can access the same datasets simultaneously.

Considerations

  • Public IP address required: The metastore is accessible only through a public IP address. Make sure your cluster has a public IP address configured before setup, and do not change it afterward. Changing the public IP address invalidates the database whitelist.

  • Local metastore is console-managed via Hue only: You cannot manage a local metastore from the EMR console. Use the Hue tool on the cluster instead.

Create a cluster with unified metadata

Use either of the following methods.

EMR console

When creating a cluster, set the Type parameter to Unified Metabases in the Basic Settings step. See Create a cluster.

CreateClusterV2 API

Call the CreateClusterV2 operation with the useLocalMetaDb parameter set to false.

Manage tables

See Basic operations on Hive metadata.

View metastore usage and limits

  1. Log on to the Alibaba Cloud EMR console.

  2. In the top navigation bar, select the region where your cluster resides and select a resource group.

  3. Click the Metadata tab.

  4. In the left-side navigation pane, click Metabase Information.

The Metabase Information page shows the current usage and limits of your ApsaraDB RDS instance.