All Products
Search
Document Center

E-MapReduce:Hive connection methods

Last Updated:Mar 26, 2026

Connect to Hive in an E-MapReduce (EMR) cluster using the Hive client, Beeline, or Java Database Connectivity (JDBC).

Prerequisites

Before you begin, make sure that you have:

  • An EMR cluster with Hive installed, and the Assign Public Network IP switch turned on for the master node. To create a cluster, see Create a cluster.

  • Logged in to the master node of the cluster. See Log on to a cluster.

Cluster types referenced in this topic:

Cluster type Description
Common cluster Kerberos Authentication and High Service Availability are both turned off
High-security cluster Kerberos Authentication is turned on
High-availability cluster High Service Availability is turned on; ZooKeeper must be selected during cluster creation

Usage notes

  • Find the master node name and public IP address on the Nodes tab. See Log on to a cluster. The default master node name is master-1-1. For Hadoop clusters, it is emr-header-1.

    image

  • HiveServer2 does not verify the username and password by default. To enable authentication, turn on Lightweight Directory Access Protocol (LDAP) authentication. See Use LDAP authentication.

Connect to Hive on a common cluster

Use the Hive client

Run the following command:

hive

To exit, run quit; or exit;.

Use Beeline

Run the following command:

beeline -u jdbc:hive2://master-1-1:10000

To exit, run !quit or !exit.

Use JDBC

Important

Make sure you have set up a Java environment, installed a Java development tool, and configured environment variables before proceeding.

  1. Add hadoop-common and hive-jdbc as dependencies in your pom.xml file:

    <dependencies>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>3.1.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.2.1</version>
        </dependency>
    </dependencies>

    Make sure the versions match the Hadoop-Common and Hive versions in your EMR cluster. Check these in the Software Information section of the Basic Information tab in the EMR console.

  2. Write code to connect to HiveServer2 and query data:

    import java.sql.*;
    
    public class App {
        private static String driverName = "org.apache.hive.jdbc.HiveDriver";
    
        public static void main(String[] args) throws SQLException {
            try {
                Class.forName(driverName);
            } catch (ClassNotFoundException e) {
                e.printStackTrace();
            }
    
            // Replace <Public IP address of the master node> with the actual public IP address
            Connection con = DriverManager.getConnection(
                "jdbc:hive2://<Public IP address of the master node>:10000", "root", "");
    
            Statement stmt = con.createStatement();
    
            String sql = "select * from sample_tbl limit 10";
            ResultSet res = stmt.executeQuery(sql);
    
            while (res.next()) {
                System.out.println(res.getString(1) + "\t" + res.getString(2));
            }
        }
    }
    Note

    This method requires port 10000 to be open. For more information, see Manage security groups.

  3. Package the project as a JAR file and upload it to the host where you want to run it.

    Important

    The JAR file requires hadoop-common and hive-jdbc to run. If your runtime environment does not include these dependency packages in its environment variables, you must download and configure them or package them together in the JAR file. If these dependencies are missing when you run the JAR file, the following errors are reported:

    • Missing hadoop-common: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

    • Missing hive-jdbc: java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver

    In this example, the generated JAR file is emr-hiveserver2-1.0.jar. Upload this JAR file to the master node of the E-MapReduce cluster.

  4. Run the JAR file:

    Important

    The server where you run the JAR file and the E-MapReduce cluster must be in the same VPC and security group and have network connectivity. If they are in different VPCs or network environments, you must use a public IP address for access. Alternatively, you can use a network product to establish a connection between them and then use the internal network for access. To test network connectivity:

    • Public network: telnet <Public IP address of master-1-1> 10000

    • Internal network: telnet <Internal IP address of master-1-1> 10000

    java -jar emr-hiveserver2-1.0.jar

Connect to Hive on a high-security cluster

High-security clusters require Kerberos authentication. Before connecting with the Hive client or Beeline, set up a Kerberos principal and obtain a Ticket Granting Ticket (TGT).

Set up Kerberos authentication

  1. Open the Kerberos admin CLI.

    • On master-1-1 (the Key Distribution Center (KDC) node), run as root:

      kadmin.local

      The CLI is ready when the output includes:

      Authenticating as principal hadoop/admin@EMR.C-85D4B8D74296****.COM with password.
      kadmin.local:
    • On any other node or gateway, run:

      kadmin -p <admin-user> -w <admin-password>

      Replace the placeholders:

      Placeholder Value
      <admin-user> root/admin (for a self-managed KDC in EMR)
      <admin-password> The value of the admin_pwd parameter, found on the Configure tab of the Kerberos service in the EMR console

      The CLI is ready when the output includes:

      Authenticating as principal root/admin with password.
      kadmin:
  2. Create a Kerberos principal named test:

    Note

    Save the username and password — you need them to create a TGT.

    addprinc -pw 123456 test

    The principal is created when the output includes:

    Principal "test@EMR.C-85D4B8D74296****.COM" created.

    To exit the admin CLI, run quit.

  3. Create a system user and switch to it:

    useradd test
    su test
  4. Create a TGT:

    kinit

    When prompted, enter the password for the test user (in this example, 123456).

Use the Hive client

After completing the Kerberos setup above, run:

hive

Use Beeline

After completing the Kerberos setup above, run:

beeline -u "jdbc:hive2://master-1-1.c-56187feb57f0****.cn-hangzhou.emr.aliyuncs.com:10000/;principal=hive/_HOST@EMR.c-56187feb57f0****.COM"

Replace the following values based on your cluster:

Placeholder How to get it
master-1-1.c-56187feb57f0****.cn-hangzhou.emr.aliyuncs.com The fully qualified hostname of the HiveServer2 node (usually master-1-1). Run hostname -f on that node to get this value.
EMR.c-56187feb57f0****.COM The realm name. Search for the realm parameter on the Configure tab of the Kerberos service in the EMR console.
image

Connect to Hive on a high-availability cluster

High-availability clusters expose multiple HiveServer2 instances. Use Beeline with a service discovery mode that matches your cluster setup.

Use Beeline with ZooKeeper service discovery

beeline -u 'jdbc:hive2://master-1-1:2181,master-1-2:2181,master-1-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'

Use Beeline with multi-server service discovery

beeline -u 'jdbc:hive2://master-1-1:10000,master-1-2:10000,master-1-3:10000/default;serviceDiscoveryMode=multiServers'

Troubleshooting

Cannot connect to Hive on port 10000

Symptom: Connections to HiveServer2 on port 10000 time out or are refused.

Cause: Either HiveServer2 is not running, or the security group does not allow traffic on port 10000.

Resolution:

  1. Verify that HiveServer2 is running. On the node where HiveServer2 is deployed, run:

    netstat -tulnp | grep 10000

    If no process appears, HiveServer2 is not running. Check the HiveServer2 startup log to diagnose the issue.

  2. Verify that port 10000 is open in the security group. In the EMR console, go to the Basic Information tab and click the link next to Cluster Security Group. On the Security Group Details tab, confirm that port 10000 is open. If not, enable it. See Manage security groups.

Both high availability and Kerberos authentication are turned on

Symptom: The cluster has both High Service Availability and Kerberos Authentication turned on, and you need to connect to Hive.

Cause: This combination requires Kerberos credentials and a load-balanced HiveServer2 endpoint, which standard Beeline commands do not handle automatically.

Resolution: Create a Kerberos principal and obtain a TGT, then use JDBC to connect. See Balance the load of HiveServer2.