In E-MapReduce (EMR) versions earlier than V2.4.0, on-premises MySQL databases are used to store the Hive metadata of clusters. In EMR V2.4.0 and later versions, high-reliability Hive metadatabases are used for centralized metadata management.
An EMR cluster is created. For more information, see Create a cluster.
A metadatabase can be accessed only by using a public IP address. Make sure that you have configured a public IP address for your cluster. Do not change the public IP address. Otherwise, the database whitelist becomes invalid.
You cannot manage the metadata of an on-premises metadatabase in the console. However, you can use the Hue tool on a cluster to manage the metadata.
- Total capacity: 200 MiB
- Maximum number of queries per hour: 720,000
- Maximum number of updates per hour: 144,000
- Persistent metadata storage
In earlier versions, metadata is stored in MySQL databases that are deployed on clusters and is deleted when the clusters are released. This issue becomes even more prominent because EMR allows you to release a pay-as-you-go cluster if it is no longer needed. To retain the metadata, you need to log on to a cluster and export the metadata manually.
After centralized metadata management is enabled, the metadata of released clusters is retained. Before you delete data in Object Storage Service (OSS) or in the Hadoop Distributed File System (HDFS) of a cluster or you release a cluster, make sure that the metadata is deleted. That means the tables and database that store the data are also deleted. This prevents a buildup of dirty metadata in the database.
- Separation of computing and storage
EMR can store data in Alibaba Cloud OSS, which significantly reduces the costs for storing large volumes of data. EMR clusters are mainly used as computing resources and can be released if they are no longer needed. You do not need to migrate metadata before cluster release because data is stored in OSS.
- Data sharing
If all data is stored in OSS, all clusters can access data without the need to migrate or restructure metadata. This way, EMR clusters that process different services can directly share data.
Create a cluster that uses unified metadata
- Use the EMR console
When you create a cluster, set Type to Unified Metabases in the Basic Settings step. For information about how to create a cluster, see Create a cluster.
- Call the CreateClusterV2 API operation
See the description of the CreateClusterV2 operation.Note Set the useLocalMetaDb parameter to false.
For more information, see Basic operations on Hive metadata.
View metadata information
- Go to the metadata management page.
- Log on to the Alibaba Cloud EMR console.
- In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
- Click the Metadata tab.
- In the left-side navigation pane, click Metabase Information.
On the Metabase Information page, you can view the usage and limits of the current ApsaraDB RDS instance. Submit a ticket if you want to modify the information. For information about how to submit a ticket, see .