This topic describes how to use the Hive client, Beeline, and Java Database Connectivity (JDBC) to connect to Hive in an E-MapReduce (EMR) cluster.
Prerequisites
An EMR cluster that contains Hive is created, and the Assign Public Network IP switch is turned on for the master node in the cluster. For information about how to create a cluster, see Create a cluster.
Common clusters: Clusters for which Kerberos Authentication and High Service Availability are not turned on.
High-security clusters: Clusters for which Kerberos Authentication is turned on.
High-availability clusters: Clusters for which High Service Availability is turned on. When you create a high-availability cluster, you must select ZooKeeper.
You have logged on to the master node of a cluster. For more information, see Log on to a cluster.
Precautions
You can obtain the name and the public IP address of the master node of a cluster on the Nodes tab. For more information, see Log on to a cluster.
In most cases, the name of the master node is master-1-1. If your cluster is a Hadoop cluster, the name of the master node is emr-header-1.

By default, HiveServer2 does not verify the username and password. If you want the username and password to be authenticated, you must enable Lightweight Directory Access Protocol (LDAP) authentication. For more information, see Use LDAP authentication.
Procedure
Common clusters
Use the Hive client to connect to Hive
Run the following command to connect to Hive:
hiveOptional. Run the
quit;orexit;command to exit the Hive client.
Use the Beeline client to connect to Hive
Run the following command to connect to Hive:
beeline -u jdbc:hive2://master-1-1:10000Optional. Run the
!quitor!exitcommand to exit the Beeline client.
Use JDBC to connect to Hive
Before you perform the following steps, make sure that you have set up a Java environment, installed a Java programming tool, and configured environment variables.
Configure the hadoop-common and hive-jdbc project dependencies in the
pom.xmlfile. Example:<dependencies> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>3.1.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.2.1</version> </dependency> </dependencies>Make sure that the version information of hadoop-common and hive-jdbc is consistent with the version information of Hadoop-Common and Hive in the EMR cluster. You can view the version information of Hadoop-Common and Hive in the Software Information section of the Basic Information tab of the cluster in the EMR console.
Write code to connect to HiveServer2 and perform operations on data of a Hive table. Sample code:
import java.sql.*; public class App { private static String driverName = "org.apache.hive.jdbc.HiveDriver"; public static void main(String[] args) throws SQLException { try { Class.forName(driverName); } catch (ClassNotFoundException e) { e.printStackTrace(); } Connection con = DriverManager.getConnection( "jdbc:hive2://<Public IP address of the master node>:10000", "root", ""); Statement stmt = con.createStatement(); String sql = "select * from sample_tbl limit 10"; ResultSet res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getString(1) + "\t" + res.getString(2)); } } }NoteYou must enable port 10000. For more information, see Manage security groups.
Package the project to generate a JAR file and upload the JAR file to the host for running the JAR file.
ImportantThe hadoop-common and hive-jdbc dependencies are required to run the JAR file. If the two dependencies are not configured in the environment variables on the host, you must download and configure the dependencies on the host. Alternatively, you can package the two dependencies and the project to the same JAR file. If one of the dependencies is missing when you run the JAR file, an error message appears.
If
hadoop-commonis missing, the error message "java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration" appears.If
hive-jdbcis missing, the error message "java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver" appears.
In this example, the JAR file
emr-hiveserver2-1.0.jaris generated. You must upload the file to the master-1-1 node of the EMR cluster.Check whether the JAR file can run properly.
ImportantWe recommend that you run the JAR file on a host that is in the same virtual private cloud (VPC) and security group as the EMR cluster. Make sure that the host and the EMR cluster can communicate with each other. If the host and the EMR cluster are in different VPCs or of different network types, they can communicate only over the Internet. In this case, you need to connect them by using an Alibaba Cloud network service. This way, they can communicate over an internal network. Use the following methods to test the connectivity:
Internet:
telnet Public IP address of the master-1-1 node 10000Internal network:
telnet Internal IP address of the master-1-1 node 10000
java -jar emr-hiveserver2-1.0.jar
High-security clusters
Use the Hive client to connect to Hive
Open the admin.local CLI of Kerberos.
If you log on to the master-1-1 node on which the Key Distribution Center (KDC) server resides as the root user, run the following command to open the admin.local CLI. KDC is a Kerberos server.
kadmin.localIf the output contains the following information, the admin.local CLI is opened:
Authenticating as principal hadoop/admin@EMR.C-85D4B8D74296****.COM with password. kadmin.local:If you log on to a node other than master-1-1 or use a gateway, run the following command to open the admin.local CLI. Replace the admin-user and admin-password parameters with the actual username and password.
kadmin -p <admin-user> -w <admin-password>NoteIf you use a self-managed KDC server in EMR, take note of the following items:
<admin-user>: Set the value toroot/admin.<admin-password>: Enter the value of the admin_pwd parameter. You can obtain the value on the Configure tab of the Kerberos service in the EMR console.
If the output contains the following information, the admin.local CLI is opened:
Authenticating as principal root/admin with password. kadmin:
Create a principal named test.
In this example, the password is 123456.
addprinc -pw 123456 testIf the output contains the following information, the principal is created:
Principal "test@EMR.C-85D4B8D74296****.COM" created.NoteYou must record the username and password, which are required when you create a ticket-granting ticket (TGT).
To exit the admin.local CLI, run the
quitcommand.Log on to the node on which you want to run a Hive client. Run the following commands to create and switch to a user named test:
useradd test su testRun the following commands to create a TGT:
kinitPress Enter and enter the password of the test user. In this example, the password is 123456.
Run the
hivecommand to connect to the Hive client.hive
Use the Beeline client to connect to Hive
Open the admin.local CLI of Kerberos.
If you log on to the master-1-1 node on which the Key Distribution Center (KDC) server resides as the root user, run the following command to open the admin.local CLI. KDC is a Kerberos server.
kadmin.localIf the output contains the following information, the admin.local CLI is opened:
Authenticating as principal hadoop/admin@EMR.C-85D4B8D74296****.COM with password. kadmin.local:If you log on to a node other than master-1-1 or use a gateway, run the following command to open the admin.local CLI. Replace the admin-user and admin-password parameters with the actual username and password.
kadmin -p <admin-user> -w <admin-password>NoteIf you use a self-managed KDC server in EMR, take note of the following items:
<admin-user>: Set the value toroot/admin.<admin-password>: Enter the value of the admin_pwd parameter. You can obtain the value on the Configure tab of the Kerberos service in the EMR console.
If the output contains the following information, the admin.local CLI is opened:
Authenticating as principal root/admin with password. kadmin:
Create a principal named test.
In this example, the password is 123456.
addprinc -pw 123456 testIf the output contains the following information, the principal is created:
Principal "test@EMR.C-85D4B8D74296****.COM" created.NoteYou must record the username and password, which are required when you create a ticket-granting ticket (TGT).
To exit the admin.local CLI, run the
quitcommand.Log on to the node on which you want to run a Hive client. Run the following commands to create and switch to a user named test:
useradd test su testRun the following commands to create a TGT:
kinitPress Enter and enter the password of the test user. In this example, the password is 123456.
Run the following command to connect to the Hive client:
beeline -u "jdbc:hive2://master-1-1.c-56187feb57f0****.cn-hangzhou.emr.aliyuncs.com:10000/;principal=hive/_HOST@EMR.c-56187feb57f0****.COM"Replace the following information based on your business requirements:
master-1-1.c-56187feb57f0****.cn-hangzhou.emr.aliyuncs.com: The fully qualified hostname, including the domain name. You can obtain the hostname by running thehostname -fcommand on the node where the HiveServer2 service is deployed. This service is deployed on the master-1-1 node in most cases.EMR.c-56187feb57f0****.COM: The realm name. You can search for the realm parameter on the Configure tab of the Kerberos service page in the EMR console. The value of this parameter is the realm name.
High-availability clusters
Use the Beeline client to connect to Hive
Run the corresponding commands to use JDBC to connect to Hive based on the selected service discovery mode.
ZooKeeper
beeline -u 'jdbc:hive2://master-1-1:2181,master-1-2:2181,master-1-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'Multi-servers
beeline -u 'jdbc:hive2://master-1-1:10000,master-1-2:10000,master-1-3:10000/default;serviceDiscoveryMode=multiServers'