This topic describes how to configure an independent ApsaraDB RDS for MySQL database as the metadatabase of a Hadoop or Data Development cluster in the E-MapReduce (EMR) console.

Prerequisites

An ApsaraDB RDS for MySQL instance is purchased. For more information, see Create an ApsaraDB RDS for MySQL instance.
Note In this topic, an ApsaraDB RDS for MySQL instance that runs MySQL 5.7 is used.

Limits

  • For a Hadoop cluster, we recommend that you set Database Engine to MySQL 5.7 and Edition to High-availability when you create an ApsaraDB RDS for MySQL instance.
  • For a Data Development cluster, you must set Database Engine to MySQL 8.0 when you create an ApsaraDB RDS for MySQL instance.

Prepare a metadatabase

  1. Create a database named hivemeta.
    For more information, see the "Create a database" section in Create accounts and databases for an ApsaraDB RDS for MySQL instance. create_database
  2. Create an account and grant read and write permissions to the account.
    Notice
    create_user

    Record the username and password of the account that you created. They are required when you create a cluster. For information about how to create a cluster, see Create a cluster.

  3. Obtain the internal endpoint of the database.
    1. Configure an IP address whitelist. For more information, see Configure an IP address whitelist for an ApsaraDB RDS for MySQL instance.
    2. In the left-side navigation pane of the instance details page, click Database Connection.
    3. On the Database Connection page, click the Copy icon on the right of the internal endpoint to copy the internal endpoint.
      net_inter

      Record the internal endpoint. It is required when you create a cluster. For information about how to create a cluster, see Create a cluster.

Create a cluster

  • Hadoop clustercreate_hadoop
  • Data Development clustercreate_ddc
In the Basic Settings step, configure the parameters described in the following table. For more information about the configurations of other parameters, see Create a cluster.
Parameter Description
Cluster Name The name of the cluster. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_).
Type Select Independent ApsaraDB RDS for MySQL.
Connection URL Set Connection URL to a value in the format of jdbc:mysql://rm-xxxxxx.mysql.rds.aliyuncs.com/<Database name>?createDatabaseIfNotExist=true&characterEncoding=UTF-8.
Database Username Set this parameter to the username of the account created in Prepare a metadatabase.
Database Password Set this parameter to the password of the account created in Prepare a metadatabase.

Initialize the metastore service

For a Hadoop cluster, perform the following steps to initialize the metastore service based on the Hive version:

  1. Log on to the Alibaba Cloud EMR console.
  2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
  3. Click the Cluster Management tab.
  4. On the Cluster Management page, find your cluster and click Details in the Actions column.
  5. In the Software Info section of the Cluster Overview page, check the Hive version and initialize the metastore service of Hive.
    • Hive 2.3.5:
      Log on to the master node of the cluster in SSH mode and perform the following steps to initialize the metastore service:
      1. Enter the specified directory.
        cd /usr/lib/hive-current/scripts/metastore/upgrade/mysql/
      2. Log on to the ApsaraDB RDS for MySQL database.
        mysql -h {Private or public IP address of the ApsaraDB RDS for MySQL database} -u{Username to log on to the ApsaraDB RDS for MySQL database} -p{Password to log on to the ApsaraDB RDS for MySQL database}
      3. Run the following command on the command line of the ApsaraDB RDS for MySQL database:
        use {Name of the ApsaraDB RDS for MySQL database};
        source /usr/lib/hive-current/scripts/metastore/upgrade/mysql/hive-schema-2.3.0.mysql.sql;
      Parameter description:
      • {Username to log on to the ApsaraDB RDS for MySQL database} indicates the username of the account created in Prepare a metadatabase.
      • {Password to log on to the ApsaraDB RDS for MySQL database} indicates the password of the account created in Prepare a metadatabase.
      • {Name of the ApsaraDB RDS for MySQL database} indicates the name of the database created in Prepare a metadatabase.
    • Hive of a version other than 2.3.5:
      Log on to the master node of the cluster in SSH mode and run the following commands to initialize the metastore service:
      su hadoop
      schematool -initSchema -dbType mysql
    After the metastore service is initialized, you can use the created ApsaraDB RDS for MySQL database as the Hive metadatabase.
    Note Before the metastore service is initialized, exceptions may occur in the MetaStore and HiveServer2 processes for Hive and the ThriftServer process for Spark. After the metastore service is initialized, the processes are recovered.
    If an exception is reported during the initialization, check the security groups that are configured for the ApsaraDB RDS for MySQL instance. Make sure that the security groups of the ECS instances of your EMR cluster are added to the ApsaraDB RDS for MySQL instance. initialize_fail
    If Hive metadata contains Chinese characters, such as Chinese characters in column comments and partition names, you can run the following commands one at a time in the ApsaraDB RDS for MySQL database to encode the related fields in the UTF-8 format:
    1. Convert the data type of the COMMENT column:
      alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
    2. Convert the data type of the PARAM_VALUE column in the TABLE_PARAMS table:
      alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
    3. Convert the data type of the PARAM_VALUE column in the PARTITION_PARAMS table:
      alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
    4. Convert the data type of the PKEY_COMMENT column:
      alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
    5. Convert the data type of the PARAM_VALUE column in the INDEX_PARAMS table:
      alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;