In E-MapReduce (EMR) versions earlier than V2.4.0, local MySQL databases are used to store the Hive metadata of clusters. In EMR V2.4.0 and later versions, high-reliability Hive metadatabases are used for centralized metadata management.
A metadatabase can be accessed only by using a public IP address. Ensure that you have configured a public IP address for your cluster. Do not change the public IP address. Otherwise, the database whitelist becomes invalid.
You cannot manage the metadata of a local metadatabase in the console. However, you can use the Hue tool on a cluster to manage the metadata.
- Total capacity: 200 MiB
- Maximum number of queries per hour: 720,000
- Maximum number of updates per hour: 144,000
- Persistent metadata storage
In earlier versions, metadata is stored in MySQL databases that are deployed on clusters and is deleted when the clusters are released. You can release a pay-as-you-go cluster if it is no longer needed. To retain the metadata, you need to log on to a cluster and export the metadata manually.
After centralized metadata management is enabled, the metadata of released clusters is not cleared. Before you delete data in OSS or in the HDFS of a cluster or you release a cluster, make sure that the metadata is deleted. That means the tables and database that store the data are also deleted. This prevents a buildup of dirty metadata in the database.
- Separation of computing and storage
EMR can store data in Alibaba Cloud OSS, which significantly reduces the usage costs for large volumes of data. EMR clusters are mainly used as computing resources and can be released if they are no longer needed. You do not need to migrate metadata before cluster release because data is stored in OSS.
- Data sharing
If all data is stored in OSS, all clusters can access data without the need to migrate or restructure metadata. This way, EMR clusters that process different services can directly share data.
Create a cluster that uses unified metadata
- Use the EMR console
When you create a cluster, set Type to Unified Metabases in the Basic Settings step. For information about how to create a cluster, see Create a cluster.
- Call the CreateClusterV2 API operation
See the description of the CreateClusterV2 operation.Note Set the value of the useLocalMetaDb parameter to false.
For more information, see Basic operations on Hive metadata.
View metadata information
- Log on to the EMR console.
- In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
- Click the Metadata tab.
- In the left-side navigation pane, click Metabase Information.
On the Metabase Information page, you can view the usage and limits of the current ApsaraDB RDS instance. Submit a ticket if you need to modify the information. For information about how to submit a ticket, see submit a ticket.