Connect to Hive in an E-MapReduce (EMR) cluster using the Hive client, Beeline, or Java Database Connectivity (JDBC).
Prerequisites
Before you begin, make sure that you have:
-
An EMR cluster with Hive installed, and the Assign Public Network IP switch turned on for the master node. To create a cluster, see Create a cluster.
-
Logged in to the master node of the cluster. See Log on to a cluster.
Cluster types referenced in this topic:
| Cluster type | Description |
|---|---|
| Common cluster | Kerberos Authentication and High Service Availability are both turned off |
| High-security cluster | Kerberos Authentication is turned on |
| High-availability cluster | High Service Availability is turned on; ZooKeeper must be selected during cluster creation |
Usage notes
-
Find the master node name and public IP address on the Nodes tab. See Log on to a cluster. The default master node name is
master-1-1. For Hadoop clusters, it isemr-header-1.
-
HiveServer2 does not verify the username and password by default. To enable authentication, turn on Lightweight Directory Access Protocol (LDAP) authentication. See Use LDAP authentication.
Connect to Hive on a common cluster
Use the Hive client
Run the following command:
hive
To exit, run quit; or exit;.
Use Beeline
Run the following command:
beeline -u jdbc:hive2://master-1-1:10000
To exit, run !quit or !exit.
Use JDBC
Make sure you have set up a Java environment, installed a Java development tool, and configured environment variables before proceeding.
-
Add
hadoop-commonandhive-jdbcas dependencies in yourpom.xmlfile:<dependencies> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>3.1.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.2.1</version> </dependency> </dependencies>Make sure the versions match the Hadoop-Common and Hive versions in your EMR cluster. Check these in the Software Information section of the Basic Information tab in the EMR console.
-
Write code to connect to HiveServer2 and query data:
import java.sql.*; public class App { private static String driverName = "org.apache.hive.jdbc.HiveDriver"; public static void main(String[] args) throws SQLException { try { Class.forName(driverName); } catch (ClassNotFoundException e) { e.printStackTrace(); } // Replace <Public IP address of the master node> with the actual public IP address Connection con = DriverManager.getConnection( "jdbc:hive2://<Public IP address of the master node>:10000", "root", ""); Statement stmt = con.createStatement(); String sql = "select * from sample_tbl limit 10"; ResultSet res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getString(1) + "\t" + res.getString(2)); } } }NoteThis method requires port 10000 to be open. For more information, see Manage security groups.
-
Package the project as a JAR file and upload it to the host where you want to run it.
ImportantThe JAR file requires hadoop-common and hive-jdbc to run. If your runtime environment does not include these dependency packages in its environment variables, you must download and configure them or package them together in the JAR file. If these dependencies are missing when you run the JAR file, the following errors are reported:
-
Missing hadoop-common:
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration -
Missing hive-jdbc:
java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver
In this example, the generated JAR file is
emr-hiveserver2-1.0.jar. Upload this JAR file to the master node of the E-MapReduce cluster. -
-
Run the JAR file:
ImportantThe server where you run the JAR file and the E-MapReduce cluster must be in the same VPC and security group and have network connectivity. If they are in different VPCs or network environments, you must use a public IP address for access. Alternatively, you can use a network product to establish a connection between them and then use the internal network for access. To test network connectivity:
-
Public network:
telnet <Public IP address of master-1-1> 10000 -
Internal network:
telnet <Internal IP address of master-1-1> 10000
java -jar emr-hiveserver2-1.0.jar -
Connect to Hive on a high-security cluster
High-security clusters require Kerberos authentication. Before connecting with the Hive client or Beeline, set up a Kerberos principal and obtain a Ticket Granting Ticket (TGT).
Set up Kerberos authentication
-
Open the Kerberos admin CLI.
-
On
master-1-1(the Key Distribution Center (KDC) node), run as root:kadmin.localThe CLI is ready when the output includes:
Authenticating as principal hadoop/admin@EMR.C-85D4B8D74296****.COM with password. kadmin.local: -
On any other node or gateway, run:
kadmin -p <admin-user> -w <admin-password>Replace the placeholders:
Placeholder Value <admin-user>root/admin(for a self-managed KDC in EMR)<admin-password>The value of the admin_pwdparameter, found on the Configure tab of the Kerberos service in the EMR consoleThe CLI is ready when the output includes:
Authenticating as principal root/admin with password. kadmin:
-
-
Create a Kerberos principal named
test:NoteSave the username and password — you need them to create a TGT.
addprinc -pw 123456 testThe principal is created when the output includes:
Principal "test@EMR.C-85D4B8D74296****.COM" created.To exit the admin CLI, run
quit. -
Create a system user and switch to it:
useradd test su test -
Create a TGT:
kinitWhen prompted, enter the password for the
testuser (in this example,123456).
Use the Hive client
After completing the Kerberos setup above, run:
hive
Use Beeline
After completing the Kerberos setup above, run:
beeline -u "jdbc:hive2://master-1-1.c-56187feb57f0****.cn-hangzhou.emr.aliyuncs.com:10000/;principal=hive/_HOST@EMR.c-56187feb57f0****.COM"
Replace the following values based on your cluster:
| Placeholder | How to get it |
|---|---|
master-1-1.c-56187feb57f0****.cn-hangzhou.emr.aliyuncs.com |
The fully qualified hostname of the HiveServer2 node (usually master-1-1). Run hostname -f on that node to get this value. |
EMR.c-56187feb57f0****.COM |
The realm name. Search for the realm parameter on the Configure tab of the Kerberos service in the EMR console. |
Connect to Hive on a high-availability cluster
High-availability clusters expose multiple HiveServer2 instances. Use Beeline with a service discovery mode that matches your cluster setup.
Use Beeline with ZooKeeper service discovery
beeline -u 'jdbc:hive2://master-1-1:2181,master-1-2:2181,master-1-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'
Use Beeline with multi-server service discovery
beeline -u 'jdbc:hive2://master-1-1:10000,master-1-2:10000,master-1-3:10000/default;serviceDiscoveryMode=multiServers'
Troubleshooting
Cannot connect to Hive on port 10000
Symptom: Connections to HiveServer2 on port 10000 time out or are refused.
Cause: Either HiveServer2 is not running, or the security group does not allow traffic on port 10000.
Resolution:
-
Verify that HiveServer2 is running. On the node where HiveServer2 is deployed, run:
netstat -tulnp | grep 10000If no process appears, HiveServer2 is not running. Check the HiveServer2 startup log to diagnose the issue.
-
Verify that port 10000 is open in the security group. In the EMR console, go to the Basic Information tab and click the link next to Cluster Security Group. On the Security Group Details tab, confirm that port 10000 is open. If not, enable it. See Manage security groups.
Both high availability and Kerberos authentication are turned on
Symptom: The cluster has both High Service Availability and Kerberos Authentication turned on, and you need to connect to Hive.
Cause: This combination requires Kerberos credentials and a load-balanced HiveServer2 endpoint, which standard Beeline commands do not handle automatically.
Resolution: Create a Kerberos principal and obtain a TGT, then use JDBC to connect. See Balance the load of HiveServer2.