Connect to Hive - E-MapReduce - Alibaba Cloud Documentation Center

This topic describes how to connect to Hive from an E-MapReduce (EMR) cluster.

Prerequisites

An EMR cluster is created. For more information, see Create a cluster.

Precautions

To connect to Hive from an EMR cluster, you must configure the following parameters:

<Master node name>: You can go to the cluster details page in the EMR console and obtain the name of the master node on the Nodes tab. For more information, see Log on to a cluster.
cluster-xxx@EMR.xxx.COM: You must replace xxx with the hostname of the master node. You can run the hostname command on the master node to obtain the hostname.
By default, HiveServer2 does not verify the username and password. If you want the username and password to be authenticated, you can enable LDAP authentication. For more information, see Use LDAP authentication.

Method 1: Use the Hive client to connect to Hive

Common cluster

hive

High-security cluster

Run the following command to perform authentication:

kinit -kt /etc/ecm/hive-conf/hive.keytab hive/<Master node name>.cluster-xxx@EMR.xxx.COM

You can also use the user management feature to add a user. Before you connect to Beeline, run the kinit Username command and enter the password of the user to perform authentication. For more information about how to add a user, see Manage users.

Run the following command to perform authentication:
```
kinit Username
```
Enter the password of the user.

Connect to Hive.

hive

Method 2: Use Beeline to connect to HiveServer2

Common cluster

beeline -u jdbc:hive2://<Master node name>:10000

Run one of the following commands based on your cluster type:

DataLake cluster

beeline -u jdbc:hive2://master-1-1:10000

Hadoop cluster

beeline -u jdbc:hive2://emr-header-1:10000

High-availability cluster

Run one of the following commands based on your cluster type:

DataLake cluster

Set serviceDiscoveryMode to zooKeeper.

beeline -u 'jdbc:hive2://master-1-1:2181,master-1-2:2181,master-1-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'

Set serviceDiscoveryMode to multiServers.

beeline -u 'jdbc:hive2://master-1-1:10000,master-1-2:10000,master-1-3:10000/default;serviceDiscoveryMode=multiServers'

Hadoop cluster

Set serviceDiscoveryMode to zooKeeper.

beeline -u 'jdbc:hive2://emr-header-1:2181,emr-header-2:2181,emr-header-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'

Set serviceDiscoveryMode to multiServers.

beeline -u 'jdbc:hive2://emr-header-1:10000,emr-header-2:10000,emr-header-3:10000/default;serviceDiscoveryMode=multiServers'

High-security cluster

Run the following command to perform authentication:
```
kinit -kt /etc/ecm/hive-conf/hive.keytab hive/<Master node name>.cluster-xxx@EMR.xxx.COM
```
You can also use the user management feature to add a user. Before you connect to Beeline, run the kinit Username command and enter the password of the user to perform authentication. For more information about how to add a user, see Manage users.
1. Run the following command to perform authentication:
```
kinit Username
```
2. Enter the password of the user.

Connect to HiveServer2.

beeline -u "jdbc:hive2://<Master node name>:10000/;principal=hive/<Master node name>.cluster-xxx@EMR.xxx.COM"

Note

The JDBC URL must be enclosed in a pair of double quotation marks (").

Method 3: Use Java to connect to HiveServer2

Important

Before you perform the following steps, make sure that you have set up a Java environment, installed a Java programming tool, and configured environment variables.

Configure the project dependencies hadoop-common and hive-jdbc in the pom.xml file. Example:

<dependencies>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>2.3.9</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.8.5</version>
        </dependency>
</dependencies>

Write code to connect to HiveServer2 and perform operations on data of a Hive table. Sample code:

import java.sql.*;

public class App
{
    private static String driverName = "org.apache.hive.jdbc.HiveDriver";

    public static void main(String[] args) throws SQLException {

        try {
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        }

        // 1. After the code is packaged to a JAR file, you must map master-1-1 to 
        //    the public or internal IP address of the EMR cluster in the hosts file on the host for running the JAR file.
        // 2. For more JDBC connection strings, see "Method 2: Use Beeline to connect to HiveServer2".
        Connection con = DriverManager.getConnection(
            "jdbc:hive2://master-1-1:10000", "root", "");

        Statement stmt = con.createStatement();

        String sql = "select * from sample_tbl limit 10";
        ResultSet res = stmt.executeQuery(sql);

        while (res.next()) {
            System.out.println(res.getString(1) + "\t" + res.getString(2));
        }

    }
}

Package the project to generate a JAR file and upload the JAR file to the host for running the JAR file.
Important
The hadoop-common and hive-jdbc dependencies are required to run the JAR file. If the two dependencies are not configured in the environment variables on the host, you must download the dependencies and configure the environment variables on the host. Alternatively, you can package the two dependencies and the project to the same JAR file. If one of the dependencies is missing when you run the JAR file, an error message appears.
- If hadoop-common is missing, the error message java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration appears.
- If hive-jdbc is missing, the error message java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver appears.
In this example, the JAR file emr-hiveserver2-1.0.jar is generated and uploaded to the master-1-1 node of the EMR cluster.
Check whether the JAR file can run properly.
Important
We recommend that you run the JAR file on a host that is in the same virtual private cloud (VPC) and security group as the EMR cluster. Make sure that the host and the EMR cluster can communicate with each other. If the host and the EMR cluster are in different VPCs or of different network types, they can communicate only over the Internet. In this case, you can also connect them by using an Alibaba Cloud network service. This way, they can communicate over an internal network. Use the following methods to test the connectivity:
- Internet: telnet Public IP address of the master-1-1 node 10000
- Internal network: telnet Internal IP address of the master-1-1 node 10000
```
java -jar emr-hiveserver2-1.0.jar
```