To ensure stable running of large-scale Hive metadata services, you can migrate data from a unified metadatabase to your own ApsaraDB for RDS instance.

Procedure

  1. Purchase an ApsaraDB for RDS instance. Make sure that this instance can connect to the master node of an EMR cluster. Add the ApsaraDB for RDS instance to the security group of ECS instances that are in the EMR cluster, so you can use a private address to connect to the ApsaraDB for RDS instance.
  2. Create a database in the ApsaraDB for RDS instance and name it hivemeta. Create a user and grant the user the read/write permissions on hivemeta.
  3. Export data from a unified metadatabase (the table schema not included).
    1. To ensure data consistency and avoid the changes of metadata during the export, stop the Hive metastore service on the Hive service page. For information about how to stop the Hive metastore service, see Stop the Hive metastore service.
    2. On the Hive service page, click the Configure tab.
    3. On the configuration page, search for values of javax.jdo.option.ConnectionUserName, javax.jdo.option.ConnectionPassword, and javax.jdo.option.ConnectionURL.
      Note
      • For clusters of earlier versions, search for the values in the $HIVE_CONF_DIR/hive-site.xml file.
      • javax.jdo.option.ConnectionUserName indicates the database username. javax.jdo.option.ConnectionPassword indicates the database password. javax.jdo.option.ConnectionURL indicates the database connection URL and database name.
    4. Log on to the master node of the EMR cluster and run the following command:
      mysqldump -t DATABASENAME -h HOST -P PORT -u USERNAME -pPASSWORD > /tmp/metastore.sql
      Note Enter the value of javax.jdo.option.ConnectionPassword as the password.
  4. In the /usr/local/emr/emr-agent/run/meta_db_info.json file that is stored in the master node of the cluster, set the value of use_local_meta_db to false. Set the connection URL, username, and password of the ApsaraDB for RDS instance for the metadatabase. (If an HA cluster is deployed, both of the master nodes must be configured.)
    Note Skip this step if the cluster does not have this file.
  5. On the Hive service configuration page, set the connection URL, username, and password of the ApsaraDB for RDS instance for the metadatabase.
    For clusters of earlier versions, modify the required information in the $HIVE_CONF_DIR/hive-site.xml file.
  6. On the master node, set the connection URL, username, and password of the ApsaraDB for RDS instance for the metadatabase in the hive-site.xml file. Run the init schema command.
    cd /usr/lib/hive-current/bin
    ./schematool -initSchema -dbType mysql
  7. Import the exported metadata to the ApsaraDB for RDS instance. Access MySQL from the command line.
    mysql -h {rds url} -u {rds username} -p

    In the MySQL command line, data can be imported by running source /tmp/metastore.sql.

  8. Restart all Hive components. For more information, see Restart all Hive components.

    Run the hive cli command on the master node to check data consistency.

Access the Hive service configuration page

  1. Log on to the Alibaba Cloud E-MapReduce console.
  2. On the EMR homepage, click the Cluster Management tab.
  3. On the Cluster Management tab, click Details in the Actions column that corresponds to a cluster.
  4. In the left-side navigation pane, choose Cluster Service > Hive.

Stop the Hive metastore service

  1. Access the Hive service configuration page. For more information, see Access the Hive service configuration page.
  2. In the Components section, click Stop in the Actions column that corresponds to Hive MetaStore.
    1. In the Cluster Activities dialog box, set Target Nodes, Rolling Execute, Actions on Failures, and Description.
    2. Click OK.

Restart all Hive components

  1. Access the Hive service configuration page. For more information, see Access the Hive service configuration page.
  2. Click the Configure tab.
  3. Select Restart All Components from the Actions drop-down list in the upper-right corner.
    1. In the Cluster Activities dialog box, set Target Nodes, Rolling Execute, Actions on Failures, and Description.
    2. Click OK.