All Products
Search
Document Center

E-MapReduce:Configure a self-managed ApsaraDB RDS for MySQL database

Last Updated:Mar 26, 2026

Use a self-hosted RDS for MySQL database or an ApsaraDB RDS for MySQL database as the Hive metastore backend for E-MapReduce (EMR) DataLake, Custom, and Hadoop clusters. Unlike the default embedded metastore, an external database persists metadata across cluster lifecycles and can be shared across multiple clusters.

Prerequisites

Before you begin, ensure that you have:

Network requirements

Before creating the EMR cluster, confirm that the two resources can reach each other over the private network.

Same VPC

The EMR cluster and the RDS for MySQL instance communicate through the private network by default. Add the IPv4 CIDR block of the VPC where the EMR cluster will be created to the RDS for MySQL whitelist. After the whitelist is updated, the two resources can connect.

Different VPCs

Establish connectivity first using a VPC peering connection or another network method. For more information, see Use VPC peering connections for private network communication between VPCs. Then add the IPv4 CIDR block of the EMR cluster's VPC to the RDS for MySQL whitelist.

Step 1: Prepare a metadatabase

  1. Create a database in the RDS for MySQL instance. For more information, see Create a database.

  2. Create a standard account and grant it read and write permissions on the database. For more information, see Create an account.

    Save the username and password — you will enter them when creating the cluster in Step 2.
  3. Obtain the internal endpoint of the RDS instance:

    1. Add the IPv4 CIDR block of the VPC where the EMR cluster will be created to the RDS for MySQL whitelist. For more information, see Configure an IP address whitelist.

    2. On the instance details page, click Database Connection in the left-side navigation pane.

    3. On the Database Connection page, click the internal endpoint address to copy it.

      Save the internal endpoint — you will use it to build the connection URL in Step 2.

      image

Step 2: Create a cluster

When creating the cluster, go to the Software Configuration step and configure the metastore connection. For all other parameters, see Create a cluster.

The parameters differ by cluster type:

DataLake and Custom clusters

The Metadata parameter is available only when you select the HDFS (OSS-HDFS), YARN, and Hive services.
ParameterValue
MetadataSelect Self-managed RDS.
javax.jdo.option.ConnectionURLEnter the JDBC connection URL in the format jdbc:mysql://rm-xxxxxx.mysql.rds.aliyuncs.com/<Database name>, where rm-xxxxxx.mysql.rds.aliyuncs.com is the internal endpoint from Step 1 and <Database name> is the name of the database you created.
javax.jdo.option.ConnectionUserNameEnter the username from Step 1.
javax.jdo.option.ConnectionPasswordEnter the password from Step 1.

Hadoop clusters

ParameterValue
RDS EndpointEnter the JDBC connection URL in the format jdbc:mysql://rm-xxxxxx.mysql.rds.aliyuncs.com/<Database name>, where rm-xxxxxx.mysql.rds.aliyuncs.com is the internal endpoint from Step 1 and <Database name> is the name of the database you created.
RDS UsernameEnter the username from Step 1.
RDS PasswordEnter the password from Step 1.

(Optional) Step 3: Initialize the Metastore service

Skip this step if you are using a DataLake or Custom cluster — those cluster types initialize the Hive metadatabase automatically during creation.

Initialize the Metastore service if either of the following applies:

  • You created a Hadoop cluster running EMR V3.38.X or earlier, EMR V4.9.X or earlier, or EMR V5.4.X or earlier.

  • You changed the metadata storage of an existing cluster to an RDS for MySQL database.

Important

Before initialization, the HiveMetaStore and HiveServer2 components of Hive and the ThriftServer component of Spark may be in an abnormal state. They recover automatically after initialization completes.

To initialize the Metastore service:

  1. Log on to the master node of the cluster over SSH. For more information, see Log on to a cluster.

  2. Switch to the hadoop user:

    su - hadoop
  3. Initialize the Metastore service:

    schematool -initSchema -dbType mysql

After the command completes, the RDS for MySQL database is ready to use as the Hive metadatabase.