This topic describes how to configure an independent ApsaraDB RDS for MySQL database as the metadatabase of a Hadoop cluster in the E-MapReduce (EMR) console.

Prerequisites

An ApsaraDB RDS for MySQL instance is purchased. For more information, see Create an ApsaraDB RDS for MySQL instance.
Note In this topic, an ApsaraDB RDS for MySQL instance that runs MySQL 5.7 is used.

Limits

When you create an ApsaraDB RDS for MySQL instance, you must set Database Engine to MySQL 5.7 and Edition to High-availability.

Procedure

  1. Step 1: Prepare a metadatabase
    Prepare a metadatabase.
  2. Step 2: Create a cluster
    Create a cluster in the EMR console and associate the cluster with the metadatabase.
  3. Optional:Step 3: Initialize the metastore service
    Initialize the metastore service based on the Hive version.
    Notice If you created a Hadoop cluster, perform this step.

Step 1: Prepare a metadatabase

  1. Create a database named hivemeta.
    For more information, see the "Create a database" section of the Create accounts and databases for an ApsaraDB RDS for MySQL instance topic. create_database
  2. Create an account and grant read and write permissions to the account.
    Notice

    For more information about how to create an account, see the "Create an account" section of the Create accounts and databases for an ApsaraDB RDS for MySQL instance topic.

    create_user

    Record the username and password of the account that you created. The username and password are required when you create a cluster. For information about how to create a cluster, see Step 2: Create a cluster.

  3. Obtain the internal endpoint of the database.
    1. Configure an IP address whitelist. For more information, see Use a database client or the CLI to connect to an ApsaraDB RDS for MySQL instance.
    2. In the left-side navigation pane of the instance details page, click Database Connection.
    3. On the Database Connection page, click the Copy icon on the right of the internal endpoint to copy the internal endpoint.
      net_inter

      Record the internal endpoint. This endpoint is required when you create a cluster. For more information about how to create a cluster, see Step 2: Create a cluster.

Step 2: Create a cluster

In the Basic Settings step, configure the parameters described in the following table. For more information about the configurations of other parameters, see Create a cluster.
Parameter Description
Cluster Name The name of the cluster. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_).
Type Select Independent ApsaraDB RDS for MySQL.
Connection URL Set Connection URL to a value in the format of jdbc:mysql://rm-xxxxxx.mysql.rds.aliyuncs.com/<Database name>?createDatabaseIfNotExist=true&characterEncoding=UTF-8.
Database Username Set this parameter to the username of the account created in Step 1: Prepare a metadatabase.
Database Password Set this parameter to the password of the account created in Step 1: Prepare a metadatabase.

Step 3: Initialize the metastore service

Notice For a Hadoop cluster, perform the following steps to initialize the metastore service based on the Hive version.
  1. Go to the Cluster Overview page.
    1. Log on to the Alibaba Cloud EMR console.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page, find your cluster and click Details in the Actions column.
  2. In the Software Info section of the Cluster Overview page, check the Hive version and initialize the metastore service of Hive.
    • If the Hive version is 2.3.X, perform the following steps to initialize the service:
      1. Enter the specified directory.
        cd /usr/lib/hive-current/scripts/metastore/upgrade/mysql/
      2. Log on to the ApsaraDB RDS for MySQL database.
        mysql -h {Private or public IP address of the ApsaraDB RDS for MySQL database} -u{Username to log on to the ApsaraDB RDS for MySQL database} -p{Password to log on to the ApsaraDB RDS for MySQL database}
      3. Run the following command on the command line of the ApsaraDB RDS for MySQL database:
        use {Name of the ApsaraDB RDS for MySQL database};
        source /usr/lib/hive-current/scripts/metastore/upgrade/mysql/hive-schema-2.3.0.mysql.sql;
      Parameter description:
    • If the Hive version is not 2.3.X, perform the following steps to initialize the service:
      1. Log on to the master node of the cluster by using SSH. For more information, see Log on to a cluster.
      2. Run the following command to switch to the hadoop user:
        su hadoop
      3. Run the following command to log on to the ApsaraDB RDS for MySQL database:
        schematool -initSchema -dbType mysql
    After the metastore service is initialized, you can use the created ApsaraDB RDS for MySQL database as the Hive metadatabase.
    Note Before the metastore service is initialized, exceptions may occur in the MetaStore and HiveServer2 processes for Hive and the ThriftServer process for Spark. After the metastore service is initialized, the processes are recovered.

FAQ