All Products
Search
Document Center

E-MapReduce:Balance the load of HiveServer2

Last Updated:Nov 07, 2024

If HiveServer2 is deployed on multiple nodes of an E-MapReduce (EMR) cluster, you can use ZooKeeper or Server Load Balancer (SLB) to balance the load of HiveServer2. This topic describes the methods that you can use to balance the load of HiveServer2. You can select a method based on whether Kerberos authentication is enabled for your EMR cluster.

Prerequisites

A high-availability EMR cluster is created. For more information, see Create a cluster.HA

Limits

This topic applies only to clusters for which High Service Availability is turned on.

Common EMR clusters

This section describes how to balance the load of HiveServer2 for clusters for which Kerberos Authentication is not turned on.

Use ZooKeeper to balance the load of HiveServer2

By default, ZooKeeper is installed in high-availability clusters. To implement load balancing on HiveServer2 by using ZooKeeper, perform the following steps:

  1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.

  2. Run the following command to enable ZooKeeper to select and connect to a node on which HiveServer2 is deployed:

    beeline -u 'jdbc:hive2://master-1-1:2181,master-1-2:2181,master-1-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'
    Note

    In the preceding command, master-1-1:2181,master-1-2:2181,master-1-3:2181 is the addresses of ZooKeeper.

    To connect Hue to HiveServer2 based on this load balancing method, add the following parameters on the hue tab of the Hue service page in the EMR console.

    Parameter

    Description

    zookeeper.clusters.default.hostports

    The addresses of ZooKeeper. Configure this parameter based on your business requirements. In this example, master-1-1:2181,master-1-2:2181,master-1-3:2181 is used.

    beeswax.hive_discovery_hs2

    Set this parameter to true.

    beeswax.hive_discovery_hiveserver2_znode

    Set this parameter to /hiveserver2.

Use SLB to balance the load of HiveServer2

  1. Create an SLB instance. For more information, see Create and manage a CLB instance.

  2. Add the Elastic Compute Service (ECS) instances on which HiveServer2 is deployed to the default server group and configure the weights of the ECS instances based on your business requirements.

    For more information, see Add and manage backend servers in the default server group.image

  3. Set the Select Listener Protocol parameter to TCP and configure the Listening Port parameter based on your business requirements. Set the Backend Protocol/Port parameter to TCP:10000.

    Configure the Scheduling Algorithm parameter based on your business requirements. For more information, see Add a TCP listener.

  4. Access HiveServer2.

    1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.

    2. Run the following command to connect to HiveServer2 on multiple nodes of the cluster to implement load balancing:

      beeline -u 'jdbc:hive2://<slb_ip_or_host>:<slb_port>'

      Configure the following parameters based on your business requirements:

      1. <slb_ip_or_host>: indicates the IP address of the SLB instance or the hostname that is associated with the IP address of the SLB instance.

      2. <slb_port>: indicates the frontend listening port of the SLB instance.

    To connect Hue to HiveServer2 based on this load balancing method, modify the following parameters on the hue tab of the Hue service page in the EMR console.

    Parameter

    Description

    hive_server2_host

    The IP address of the SLB instance or the hostname that is associated with the IP address of the SLB instance.

    hive_server2_port

    The frontend listening port of the SLB instance.

EMR clusters with Kerberos authentication enabled

This section describes how to balance the load of HiveServer2 for clusters for which Kerberos Authentication is turned on.

Use ZooKeeper to balance the load of HiveServer2

  1. Run the kinit command to obtain the Ticket Granting Ticket (TGT). For more information, see Use Kerberos authentication to access a Hive client.

  2. Access HiveServer2.

    1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.

    2. Run the following command to enable ZooKeeper to select and connect to a node on which HiveServer2 is deployed:

      beeline -u 'jdbc:hive2://master-1-1:2181,master-1-2:2181,master-1-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'
      Note

      In the preceding command, master-1-1:2181,master-1-2:2181,master-1-3:2181 is the addresses of ZooKeeper.

      Hue cannot connect to HiveServer2 of an EMR Kerberos cluster based on this load balancing method. To connect Hue to HiveServer2 of an EMR Kerberos cluster, use SLB to implement load balancing.

Use SLB to balance the load of HiveServer2

You can create and configure an SLB instance by following the instructions that are described in the Common EMR clusters section in this topic. To access HiveServer2 of Kerberos clusters, you must configure the Kerberos principal that is required by HiveServer2.

In this example, the IP address of the SLB instance is 121.40.**.**, and the frontend listening port is 10000. You can change the IP address and frontend listening port based on your business requirements.

  1. Run the kinit command to obtain the TGT. For more information, see Use Kerberos authentication to access a Hive client.

  2. Create a Hive principal that corresponds to the IP address 121.40.**.** and export the principal to a newly generated keytab file.

    1. Log on to the master-1-1 node. For more information, see Log on to a cluster.

    2. Run the following command to enable the Kerberos administration tool:

      kadmin.local
    3. Run the following commands to create a principal and export the principal to the /tmp/slb.keytab directory:

      addprinc -randkey hive/121.40.**.**
      xst -k /tmp/slb.keytab hive/121.40.**.**
      exit

      If an EMR Kerberos cluster uses open source MIT Kerberos for authentication, use the kadmin.local or kadmin CLI to connect to Key Distribution Center (KDC) and run the addprinc and xst commands.

      Important

      If you export the principal multiple times, the previously generated keytab files become invalid. Make sure that the slb.keytab file contains only the latest principal. If you want to export the principal again, you must delete the previously generated keytab file.

  3. Run the following commands to transfer the slb.keytab file to all nodes on which HiveServer2 is deployed and import the content of the slb.keytab file to the /etc/ecm/hive-conf/hive.keytab directory of each node:

    ktutil
    rkt /tmp/slb.keytab
    wkt /etc/taihao-apps/hive-conf/keytab/hive.keytab
    quit
  4. Run the following command to view the content of the hive.keytab file:

    klist -kt /etc/taihao-apps/hive-conf/keytab/hive.keytab

    image

  5. Modify the configurations of Hive.

    On the Configure tab of the Hive service page in the EMR console, search for the hive.server2.authentication.kerberos.principal parameter and set the value to the principal that is created in the previous step. In this example, the value is hive/121.40.**.**@EMR.**.COM. Save the configurations and make the configurations take effect.image

  6. Restart HiveServer.

    1. On the Status tab of the Hive service page in the EMR console, find HiveServer and click Restart in the Actions column.

    2. In the dialog box that appears, configure the Execution Reason parameter and click OK.

    3. In the Confirm message, click OK.

  7. Run the following command on the master-1-1 node to access HiveServer2:

    beeline -u 'jdbc:hive2://121.40.**.**:10000/default;principal=hive/121.40.**.**@EMR.**.COM'