If HiveServer2 is deployed on multiple nodes of an E-MapReduce (EMR) cluster, you can use ZooKeeper or Server Load Balancer (SLB) to balance the load of HiveServer2. This topic describes the methods that you can use to balance the load of HiveServer2. You can select a method based on whether Kerberos authentication is enabled for your EMR cluster.
Prerequisites
A high-availability EMR cluster is created. For more information, see Create a cluster.
Limits
This topic applies only to clusters for which High Service Availability is turned on.
Common EMR clusters
This section describes how to balance the load of HiveServer2 for clusters for which Kerberos Authentication is not turned on.
Use ZooKeeper to balance the load of HiveServer2
By default, ZooKeeper is installed in high-availability clusters. To implement load balancing on HiveServer2 by using ZooKeeper, perform the following steps:
Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
Run the following command to enable ZooKeeper to select and connect to a node on which HiveServer2 is deployed:
beeline -u 'jdbc:hive2://master-1-1:2181,master-1-2:2181,master-1-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'NoteIn the preceding command,
master-1-1:2181,master-1-2:2181,master-1-3:2181is the addresses of ZooKeeper.To connect Hue to HiveServer2 based on this load balancing method, add the following parameters on the hue tab of the Hue service page in the EMR console.
Parameter
Description
zookeeper.clusters.default.hostports
The addresses of ZooKeeper. Configure this parameter based on your business requirements. In this example, master-1-1:2181,master-1-2:2181,master-1-3:2181 is used.
beeswax.hive_discovery_hs2
Set this parameter to
true.beeswax.hive_discovery_hiveserver2_znode
Set this parameter to
/hiveserver2.
Use SLB to balance the load of HiveServer2
Create an SLB instance. For more information, see Create and manage a CLB instance.
Add the Elastic Compute Service (ECS) instances on which HiveServer2 is deployed to the default server group and configure the weights of the ECS instances based on your business requirements.
For more information, see Add and manage backend servers in the default server group.
Set the Select Listener Protocol parameter to TCP and configure the Listening Port parameter based on your business requirements. Set the Backend Protocol/Port parameter to TCP:10000.
Configure the Scheduling Algorithm parameter based on your business requirements. For more information, see Add a TCP listener.
Access HiveServer2.
Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
Run the following command to connect to HiveServer2 on multiple nodes of the cluster to implement load balancing:
beeline -u 'jdbc:hive2://<slb_ip_or_host>:<slb_port>'Configure the following parameters based on your business requirements:
<slb_ip_or_host>: indicates the IP address of the SLB instance or the hostname that is associated with the IP address of the SLB instance.<slb_port>: indicates the frontend listening port of the SLB instance.
To connect Hue to HiveServer2 based on this load balancing method, modify the following parameters on the hue tab of the Hue service page in the EMR console.
Parameter
Description
hive_server2_host
The IP address of the SLB instance or the hostname that is associated with the IP address of the SLB instance.
hive_server2_port
The frontend listening port of the SLB instance.
EMR clusters with Kerberos authentication enabled
This section describes how to balance the load of HiveServer2 for clusters for which Kerberos Authentication is turned on.
Use ZooKeeper to balance the load of HiveServer2
Run the
kinitcommand to obtain the Ticket Granting Ticket (TGT). For more information, see Use Kerberos authentication to access a Hive client.Access HiveServer2.
Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
Run the following command to enable ZooKeeper to select and connect to a node on which HiveServer2 is deployed:
beeline -u 'jdbc:hive2://master-1-1:2181,master-1-2:2181,master-1-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'NoteIn the preceding command,
master-1-1:2181,master-1-2:2181,master-1-3:2181is the addresses of ZooKeeper.Hue cannot connect to HiveServer2 of an EMR Kerberos cluster based on this load balancing method. To connect Hue to HiveServer2 of an EMR Kerberos cluster, use SLB to implement load balancing.
Use SLB to balance the load of HiveServer2
You can create and configure an SLB instance by following the instructions that are described in the Common EMR clusters section in this topic. To access HiveServer2 of Kerberos clusters, you must configure the Kerberos principal that is required by HiveServer2.
In this example, the IP address of the SLB instance is 121.40.**.**, and the frontend listening port is 10000. You can change the IP address and frontend listening port based on your business requirements.
Run the
kinitcommand to obtain the TGT. For more information, see Use Kerberos authentication to access a Hive client.Create a Hive principal that corresponds to the IP address 121.40.**.** and export the principal to a newly generated keytab file.
Log on to the master-1-1 node. For more information, see Log on to a cluster.
Run the following command to enable the Kerberos administration tool:
kadmin.localRun the following commands to create a principal and export the principal to the
/tmp/slb.keytabdirectory:addprinc -randkey hive/121.40.**.** xst -k /tmp/slb.keytab hive/121.40.**.** exitIf an EMR Kerberos cluster uses open source MIT Kerberos for authentication, use the kadmin.local or kadmin CLI to connect to Key Distribution Center (KDC) and run the
addprincandxstcommands.ImportantIf you export the principal multiple times, the previously generated keytab files become invalid. Make sure that the
slb.keytabfile contains only the latest principal. If you want to export the principal again, you must delete the previously generated keytab file.
Run the following commands to transfer the
slb.keytabfile to all nodes on which HiveServer2 is deployed and import the content of the slb.keytab file to the/etc/ecm/hive-conf/hive.keytabdirectory of each node:ktutil rkt /tmp/slb.keytab wkt /etc/taihao-apps/hive-conf/keytab/hive.keytab quitRun the following command to view the content of the hive.keytab file:
klist -kt /etc/taihao-apps/hive-conf/keytab/hive.keytab
Modify the configurations of Hive.
On the Configure tab of the Hive service page in the EMR console, search for the hive.server2.authentication.kerberos.principal parameter and set the value to the principal that is created in the previous step. In this example, the value is hive/121.40.**.**@EMR.**.COM. Save the configurations and make the configurations take effect.
Restart HiveServer.
On the Status tab of the Hive service page in the EMR console, find HiveServer and click Restart in the Actions column.
In the dialog box that appears, configure the Execution Reason parameter and click OK.
In the Confirm message, click OK.
Run the following command on the master-1-1 node to access HiveServer2:
beeline -u 'jdbc:hive2://121.40.**.**:10000/default;principal=hive/121.40.**.**@EMR.**.COM'