All Products
Search
Document Center

E-MapReduce:Use Java to connect to Hive for which Kerberos authentication is enabled

Last Updated:Mar 26, 2025

In an enterprise-level big data platform, Kerberos authentication is a key mechanism for ensuring the security of services such as Hadoop, Hive, and HBase. When a local Java client needs to connect to an E-MapReduce (EMR) cluster with Kerberos authentication enabled, you must properly configure Kerberos and use the Hive Java Database Connectivity (JDBC) driver to establish a connection. This topic describes how to use Java code to connect to the EMR Hive service for which Kerberos authentication is enabled in a macOS or Linux environment.

Prerequisites

An EMR cluster is created, and Kerberos Authentication in the Advanced Settings section of the Software Configuration step is turned on when you create the cluster. For more information, see Create a cluster.

Step 1: Obtain the Kerberos configuration of the EMR cluster

  1. Log on to the master node of the cluster by using SSH. For more information, see Log on to a cluster.

  2. Run the following command to obtain the Kerberos configuration file krb5.conf. In most cases, this file is located in the /etc/krb5.conf path of the master-1-1 node of the cluster.

    cat /etc/krb5.conf

    Obtain the information of the configuration item default_realm. The information will be used subsequently in Java code. In this example, the following configuration information is obtained:

    [logging]
      default = FILE:/mnt/disk1/log/kerberos/krb5libs.log
      kdc = FILE:/mnt/disk1/log/kerberos/krb5kdc.log
      admin_server = FILE:/mnt/disk1/log/kerberos/kadmind.log
    
    [libdefaults]
      default_realm = EMR.C-EXAMPLE.COM
      dns_lookup_realm = false
      dns_lookup_kdc = false
      ticket_lifetime = 24h
      renew_lifetime = 7d
      forwardable = true
      rdns = false
      dns_canonicalize_hostname = true
      pkinit_anchors = FILE:/etc/pki/tls/certs/ca-bundle.crt
      kdc_timeout = 30s
      max_retries = 3
    
    [realms]
      EMR.C-EXAMPLE.COM = {
        kdc = master-1-1.c-ce2fcb9c9c0b****.cn-hangzhou.emr.aliyuncs.com:88
        admin_server = master-1-1.c-ce2fcb9c9c0b****.cn-hangzhou.emr.aliyuncs.com:749
      }

Step 2: Copy the keytab file and obtain the principal

  1. Copy the keytab file of Hive to the local development environment.

    scp root@<Public IP address>:/etc/taihao-apps/hive-conf/keytab/hive.keytab /tmp/hive.keytab

    <Public IP address>: the public IP address of the master node. For more information about how to obtain the IP address, see Obtain the public IP address and the name of a node.

  2. Verify the validity of the keytab file in the local environment and obtain the principal.

    klist -kt /tmp/hive.keytab

    Sample command output:

    Keytab name: FILE:/tmp/hive.keytab
    KVNO Timestamp           Principal
    ---- ------------------- ------------------------------------------------------
       2 02/25/2025 10:40:41 hive/master-1-1.c-EXAMPLE.cn-hangzhou.emr.aliyuncs.com@EMR.C-EXAMPLE.COM
       2 02/25/2025 10:40:41 hive/master-1-1.c-EXAMPLE.cn-hangzhou.emr.aliyuncs.com@EMR.C-EXAMPLE.COM

    In this example, the obtained principal is in the hive/master-1-1.c-EXAMPLE.cn-hangzhou.emr.aliyuncs.com@EMR.C-EXAMPLE.COM format. The value of Principal will be used later when Java code is used to connect to Hive.

Step 3: Configure an access control policy

To ensure that the local development environment can access the EMR cluster, perform the following operations to configure a security group rule:

  1. Obtain the IP address of your local development environment.

    To obtain the IP address of your server, visit this website.

  2. Go to the Security Group Details tab.

    1. Log on to the EMR console.

    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the desired cluster and click the name of the cluster.

    4. In the Security section of the Basic Information tab, click the link to the right of Cluster Security Group.

  3. On the Security Group Details tab of the page that appears, click Add Rule.

    Select All for the Protocol Type parameter, enter the IP address that you obtained in Step 1 into the field for the Authorization Object parameter, and retain the default values for other parameters. For more information, see Add security group rules.

Step 4: Write Java code

Configure Maven dependencies

Add the following dependencies to the pom.xml file:

<dependencies>
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-jdbc</artifactId>
        <version>3.1.3</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>3.2.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-auth</artifactId>
        <version>3.2.1</version>
    </dependency>
</dependencies>

Sample code in Java

Modify the values of the parameters in the following sample code based on the configuration information obtained in Step 1: Obtain the Kerberos configuration of the EMR cluster and Step 2: Copy the keytab file and obtain the principal. Then, copy the code to the Main.java file.

package com.aliyun.emr.example;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.security.UserGroupInformation;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class Main {
    private static final String DRIVER_CLASS = "org.apache.hive.jdbc.HiveDriver";
    
    public static void main(String[] args) throws Exception {
        // Configure the Realm information and KDC address for Kerberos authentication. 
        System.setProperty("java.security.krb5.realm", "EMR.EXAMPLE.COM");
        System.setProperty("java.security.krb5.kdc", "$IPORHOST");
        
        Configuration conf = new Configuration();
        conf.set("hadoop.security.authentication", "kerberos");
        UserGroupInformation.setConfiguration(conf);
        // Use the keytab file for Kerberos logon. 
        UserGroupInformation.loginUserFromKeytab(
            "hive/master-1-1.c-EXAMPLE.cn-hangzhou.emr.aliyuncs.com@EMR.C-EXAMPLE.COM",
            "/tmp/hive.keytab"
        );
        
        Class.forName(DRIVER_CLASS);
        
        // Define the principal of Hive for JDBC authentication. 
        String hivePrincipal = "hive/master-1-1.c-EXAMPLE.cn-hangzhou.emr.aliyuncs.com@EMR.C-EXAMPLE.COM";
        // Construct a Hive JDBC URL that contains the connection address and principal information. 
        String hiveUrl = "jdbc:hive2://$IPORHOST:10000/;principal=" + hivePrincipal;
        Connection connection = DriverManager.getConnection(hiveUrl);
        Statement statement = connection.createStatement();
        
        ResultSet resultSet = statement.executeQuery("SHOW DATABASES");
        while (resultSet.next()) {
            System.out.println(resultSet.getString(1));
        }
        
        resultSet.close();
        statement.close();
        connection.close();
    }
}

The following table describes the parameters.

Parameter

Description

java.security.krb5.realm

Set this parameter to the value of default_realm in the krb5.conf file obtained in Step 1: Obtain the Kerberos configuration of the EMR cluster.

java.security.krb5.kdc

The address of the KDC server. You can set this parameter to the address of the master node, such as the public IP address or domain name. Make sure that it is accessible.

hivePrincipal

Set this parameter to the value of Principal obtained in Step 2: Copy the keytab file and obtain the principal.

UserGroupInformation.loginUserFromKeytab

Set the first parameter to the value of hivePrincipal.

hiveUrl

Set $IPORHOST to the value of java.security.krb5.kdc.

If you want to perform a debug, add the following code at the beginning of the main method.

System.setProperty("sun.security.krb5.debug", "true");

Common errors and solutions

Error message

Cause

Solution

Cannot contact any KDC

The KDC address is incorrect, or a network issue occurs.

Make sure that the KDC address in the krb5.conf file is correct, and use the nc -zv command to verify that the port can be connected.

keytab contains no suitable keys

The keytab file does not match.

Run the klist -kt /path/to/hive.keytab command to make sure that the principal is correct.

LoginException: Unable to obtain password

The keytab file is inaccessible.

Run the chmod 400 /path/to/hive.keytab command to configure permissions.

GSS initiate failed

Kerberos is incorrectly configured.

Make sure that the java.security.krb5.conf file is correctly configured.