All Products
Search
Document Center

E-MapReduce:Connect to Hive

Last Updated:Feb 25, 2025

This topic describes how to use the Hive client, Beeline, and Java Database Connectivity (JDBC) to connect to Hive in an E-MapReduce (EMR) cluster.

Prerequisites

  • An EMR cluster that contains Hive is created, and the Assign Public Network IP switch is turned on for the master node in the cluster. For information about how to create a cluster, see Create a cluster.

    • Common clusters: Clusters for which Kerberos Authentication and High Service Availability are not turned on.

    • High-security clusters: Clusters for which Kerberos Authentication is turned on.

    • High-availability clusters: Clusters for which High Service Availability is turned on. When you create a high-availability cluster, you must select ZooKeeper.

  • You have logged on to the master node of a cluster. For more information, see Log on to a cluster.

Precautions

  • You can obtain the name and the public IP address of the master node of a cluster on the Nodes tab. For more information, see Log on to a cluster.

    In most cases, the name of the master node is master-1-1. If your cluster is a Hadoop cluster, the name of the master node is emr-header-1.

    image

  • By default, HiveServer2 does not verify the username and password. If you want the username and password to be authenticated, you must enable Lightweight Directory Access Protocol (LDAP) authentication. For more information, see Use LDAP authentication.

Procedure

Common clusters

Use the Hive client to connect to Hive

  1. Run the following command to connect to Hive:

    hive
  2. Optional. Run the quit; or exit; command to exit the Hive client.

Use the Beeline client to connect to Hive

  1. Run the following command to connect to Hive:

    beeline -u jdbc:hive2://master-1-1:10000
  2. Optional. Run the !quit or !exit command to exit the Beeline client.

Use JDBC to connect to Hive

Important

Before you perform the following steps, make sure that you have set up a Java environment, installed a Java programming tool, and configured environment variables.

  1. Configure the hadoop-common and hive-jdbc project dependencies in the pom.xml file. Example:

    <dependencies>
            <dependency>
                <groupId>org.apache.hive</groupId>
                <artifactId>hive-jdbc</artifactId>
                <version>3.1.3</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>3.2.1</version>
            </dependency>
    </dependencies>

    Make sure that the version information of hadoop-common and hive-jdbc is consistent with the version information of Hadoop-Common and Hive in the EMR cluster. You can view the version information of Hadoop-Common and Hive in the Software Information section of the Basic Information tab of the cluster in the EMR console.

  2. Write code to connect to HiveServer2 and perform operations on data of a Hive table. Sample code:

    import java.sql.*;
    
    public class App
    {
        private static String driverName = "org.apache.hive.jdbc.HiveDriver";
    
        public static void main(String[] args) throws SQLException {
    
            try {
                Class.forName(driverName);
            } catch (ClassNotFoundException e) {
                e.printStackTrace();
            }
    
             
            Connection con = DriverManager.getConnection(
                "jdbc:hive2://<Public IP address of the master node>:10000", "root", "");
    
            Statement stmt = con.createStatement();
    
            String sql = "select * from sample_tbl limit 10";
            ResultSet res = stmt.executeQuery(sql);
    
            while (res.next()) {
                System.out.println(res.getString(1) + "\t" + res.getString(2));
            }
    
        }
    }
    Note

    You must enable port 10000. For more information, see Manage security groups.

  3. Package the project to generate a JAR file and upload the JAR file to the host for running the JAR file.

    Important

    The hadoop-common and hive-jdbc dependencies are required to run the JAR file. If the two dependencies are not configured in the environment variables on the host, you must download and configure the dependencies on the host. Alternatively, you can package the two dependencies and the project to the same JAR file. If one of the dependencies is missing when you run the JAR file, an error message appears.

    • If hadoop-common is missing, the error message "java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration" appears.

    • If hive-jdbc is missing, the error message "java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver" appears.

    In this example, the JAR file emr-hiveserver2-1.0.jar is generated. You must upload the file to the master-1-1 node of the EMR cluster.

  4. Check whether the JAR file can run properly.

    Important

    We recommend that you run the JAR file on a host that is in the same virtual private cloud (VPC) and security group as the EMR cluster. Make sure that the host and the EMR cluster can communicate with each other. If the host and the EMR cluster are in different VPCs or of different network types, they can communicate only over the Internet. In this case, you need to connect them by using an Alibaba Cloud network service. This way, they can communicate over an internal network. Use the following methods to test the connectivity:

    • Internet: telnet Public IP address of the master-1-1 node 10000

    • Internal network: telnet Internal IP address of the master-1-1 node 10000

    java -jar emr-hiveserver2-1.0.jar

High-security clusters

Use the Hive client to connect to Hive

  1. Open the admin.local CLI of Kerberos.

    • If you log on to the master-1-1 node on which the Key Distribution Center (KDC) server resides as the root user, run the following command to open the admin.local CLI. KDC is a Kerberos server.

      kadmin.local

      If the output contains the following information, the admin.local CLI is opened:

      Authenticating as principal hadoop/admin@EMR.C-85D4B8D74296****.COM with password.
      kadmin.local:
    • If you log on to a node other than master-1-1 or use a gateway, run the following command to open the admin.local CLI. Replace the admin-user and admin-password parameters with the actual username and password.

      kadmin -p <admin-user> -w <admin-password>
      Note

      If you use a self-managed KDC server in EMR, take note of the following items:

      • <admin-user>: Set the value to root/admin.

      • <admin-password>: Enter the value of the admin_pwd parameter. You can obtain the value on the Configure tab of the Kerberos service in the EMR console.

      If the output contains the following information, the admin.local CLI is opened:

      Authenticating as principal root/admin with password.
      kadmin:
  2. Create a principal named test.

    In this example, the password is 123456.

    addprinc -pw 123456 test

    If the output contains the following information, the principal is created:

    Principal "test@EMR.C-85D4B8D74296****.COM" created.
    Note

    You must record the username and password, which are required when you create a ticket-granting ticket (TGT).

    To exit the admin.local CLI, run the quit command.

  3. Log on to the node on which you want to run a Hive client. Run the following commands to create and switch to a user named test:

    useradd test
    su test
  4. Run the following commands to create a TGT:

    kinit

    Press Enter and enter the password of the test user. In this example, the password is 123456.

  5. Run the hive command to connect to the Hive client.

    hive

Use the Beeline client to connect to Hive

  1. Open the admin.local CLI of Kerberos.

    • If you log on to the master-1-1 node on which the Key Distribution Center (KDC) server resides as the root user, run the following command to open the admin.local CLI. KDC is a Kerberos server.

      kadmin.local

      If the output contains the following information, the admin.local CLI is opened:

      Authenticating as principal hadoop/admin@EMR.C-85D4B8D74296****.COM with password.
      kadmin.local:
    • If you log on to a node other than master-1-1 or use a gateway, run the following command to open the admin.local CLI. Replace the admin-user and admin-password parameters with the actual username and password.

      kadmin -p <admin-user> -w <admin-password>
      Note

      If you use a self-managed KDC server in EMR, take note of the following items:

      • <admin-user>: Set the value to root/admin.

      • <admin-password>: Enter the value of the admin_pwd parameter. You can obtain the value on the Configure tab of the Kerberos service in the EMR console.

      If the output contains the following information, the admin.local CLI is opened:

      Authenticating as principal root/admin with password.
      kadmin:
  2. Create a principal named test.

    In this example, the password is 123456.

    addprinc -pw 123456 test

    If the output contains the following information, the principal is created:

    Principal "test@EMR.C-85D4B8D74296****.COM" created.
    Note

    You must record the username and password, which are required when you create a ticket-granting ticket (TGT).

    To exit the admin.local CLI, run the quit command.

  3. Log on to the node on which you want to run a Hive client. Run the following commands to create and switch to a user named test:

    useradd test
    su test
  4. Run the following commands to create a TGT:

    kinit

    Press Enter and enter the password of the test user. In this example, the password is 123456.

  5. Run the following command to connect to the Hive client:

    beeline -u "jdbc:hive2://master-1-1.c-56187feb57f0****.cn-hangzhou.emr.aliyuncs.com:10000/;principal=hive/_HOST@EMR.c-56187feb57f0****.COM"

    Replace the following information based on your business requirements:

    • master-1-1.c-56187feb57f0****.cn-hangzhou.emr.aliyuncs.com: The fully qualified hostname, including the domain name. You can obtain the hostname by running the hostname -f command on the node where the HiveServer2 service is deployed. This service is deployed on the master-1-1 node in most cases.

    • EMR.c-56187feb57f0****.COM: The realm name. You can search for the realm parameter on the Configure tab of the Kerberos service page in the EMR console. The value of this parameter is the realm name.

      image

High-availability clusters

Use the Beeline client to connect to Hive

Run the corresponding commands to use JDBC to connect to Hive based on the selected service discovery mode.

  • ZooKeeper

    beeline -u 'jdbc:hive2://master-1-1:2181,master-1-2:2181,master-1-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'
  • Multi-servers

    beeline -u 'jdbc:hive2://master-1-1:10000,master-1-2:10000,master-1-3:10000/default;serviceDiscoveryMode=multiServers'

FAQ

What do I do if I cannot connect to Hive over a specific port?

Troubleshoot the issue from the following aspects:

  • Check whether HiveServer2 is running.

    Run the following command on a node on which HiveServer2 is deployed to check whether a service process exists on the specific port. By default, port 10000 is used.

    netstat -tulnp | grep 10000

    Check whether a service process of HiveServer2 exists in the result. If no related service process exists, check the startup log of HiveServer2 and check whether HiveServer2 is started as expected.

  • Check whether port 10000 is enabled for the security group.

    In the Security section of the Basic Information tab in the EMR console, click the link to the right of Cluster Security Group. On the Security Group Details tab, check whether port 10000 is enabled. If the port is not enabled, enable port 10000 for the security group. For more information, see Manage security groups.

How do I connect to Hive if both High Service Availability and Kerberos Authentication are turned on for a cluster?

Create a Kerberos principal and obtain a Ticket Granting Ticket (TGT). Then, use JDBC to connect to Hive. For more information, see Balance the load of HiveServer2.