In an enterprise-level big data platform, Kerberos authentication is a key mechanism for ensuring the security of services such as Hadoop, Hive, and HBase. When a local Java client needs to connect to an E-MapReduce (EMR) cluster with Kerberos authentication enabled, you must properly configure Kerberos and use the Hive Java Database Connectivity (JDBC) driver to establish a connection. This topic describes how to use Java code to connect to the EMR Hive service for which Kerberos authentication is enabled in a macOS or Linux environment.
Prerequisites
An EMR cluster is created, and Kerberos Authentication in the Advanced Settings section of the Software Configuration step is turned on when you create the cluster. For more information, see Create a cluster.
Step 1: Obtain the Kerberos configuration of the EMR cluster
Log on to the master node of the cluster by using SSH. For more information, see Log on to a cluster.
Run the following command to obtain the Kerberos configuration file krb5.conf. In most cases, this file is located in the /etc/krb5.conf path of the master-1-1 node of the cluster.
cat /etc/krb5.conf
Obtain the information of the configuration item default_realm. The information will be used subsequently in Java code. In this example, the following configuration information is obtained:
[logging] default = FILE:/mnt/disk1/log/kerberos/krb5libs.log kdc = FILE:/mnt/disk1/log/kerberos/krb5kdc.log admin_server = FILE:/mnt/disk1/log/kerberos/kadmind.log [libdefaults] default_realm = EMR.C-EXAMPLE.COM dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true rdns = false dns_canonicalize_hostname = true pkinit_anchors = FILE:/etc/pki/tls/certs/ca-bundle.crt kdc_timeout = 30s max_retries = 3 [realms] EMR.C-EXAMPLE.COM = { kdc = master-1-1.c-ce2fcb9c9c0b****.cn-hangzhou.emr.aliyuncs.com:88 admin_server = master-1-1.c-ce2fcb9c9c0b****.cn-hangzhou.emr.aliyuncs.com:749 }
Step 2: Copy the keytab file and obtain the principal
Copy the keytab file of Hive to the local development environment.
scp root@<Public IP address>:/etc/taihao-apps/hive-conf/keytab/hive.keytab /tmp/hive.keytab
<Public IP address>: the public IP address of the master node. For more information about how to obtain the IP address, see Obtain the public IP address and the name of a node.
Verify the validity of the keytab file in the local environment and obtain the principal.
klist -kt /tmp/hive.keytab
Sample command output:
Keytab name: FILE:/tmp/hive.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 2 02/25/2025 10:40:41 hive/master-1-1.c-EXAMPLE.cn-hangzhou.emr.aliyuncs.com@EMR.C-EXAMPLE.COM 2 02/25/2025 10:40:41 hive/master-1-1.c-EXAMPLE.cn-hangzhou.emr.aliyuncs.com@EMR.C-EXAMPLE.COM
In this example, the obtained principal is in the hive/master-1-1.c-EXAMPLE.cn-hangzhou.emr.aliyuncs.com@EMR.C-EXAMPLE.COM format. The value of Principal will be used later when Java code is used to connect to Hive.
Step 3: Configure an access control policy
To ensure that the local development environment can access the EMR cluster, perform the following operations to configure a security group rule:
Obtain the IP address of your local development environment.
To obtain the IP address of your server, visit this website.
Go to the Security Group Details tab.
Log on to the EMR console.
In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
On the EMR on ECS page, find the desired cluster and click the name of the cluster.
In the Security section of the Basic Information tab, click the link to the right of Cluster Security Group.
On the Security Group Details tab of the page that appears, click Add Rule.
Select All for the Protocol Type parameter, enter the IP address that you obtained in Step 1 into the field for the Authorization Object parameter, and retain the default values for other parameters. For more information, see Add security group rules.
Step 4: Write Java code
Configure Maven dependencies
Add the following dependencies to the pom.xml file:
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>3.1.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>3.2.1</version>
</dependency>
</dependencies>
Sample code in Java
Modify the values of the parameters in the following sample code based on the configuration information obtained in Step 1: Obtain the Kerberos configuration of the EMR cluster and Step 2: Copy the keytab file and obtain the principal. Then, copy the code to the Main.java file.
package com.aliyun.emr.example;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.security.UserGroupInformation;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class Main {
private static final String DRIVER_CLASS = "org.apache.hive.jdbc.HiveDriver";
public static void main(String[] args) throws Exception {
// Configure the Realm information and KDC address for Kerberos authentication.
System.setProperty("java.security.krb5.realm", "EMR.EXAMPLE.COM");
System.setProperty("java.security.krb5.kdc", "$IPORHOST");
Configuration conf = new Configuration();
conf.set("hadoop.security.authentication", "kerberos");
UserGroupInformation.setConfiguration(conf);
// Use the keytab file for Kerberos logon.
UserGroupInformation.loginUserFromKeytab(
"hive/master-1-1.c-EXAMPLE.cn-hangzhou.emr.aliyuncs.com@EMR.C-EXAMPLE.COM",
"/tmp/hive.keytab"
);
Class.forName(DRIVER_CLASS);
// Define the principal of Hive for JDBC authentication.
String hivePrincipal = "hive/master-1-1.c-EXAMPLE.cn-hangzhou.emr.aliyuncs.com@EMR.C-EXAMPLE.COM";
// Construct a Hive JDBC URL that contains the connection address and principal information.
String hiveUrl = "jdbc:hive2://$IPORHOST:10000/;principal=" + hivePrincipal;
Connection connection = DriverManager.getConnection(hiveUrl);
Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery("SHOW DATABASES");
while (resultSet.next()) {
System.out.println(resultSet.getString(1));
}
resultSet.close();
statement.close();
connection.close();
}
}
The following table describes the parameters.
Parameter | Description |
java.security.krb5.realm | Set this parameter to the value of default_realm in the krb5.conf file obtained in Step 1: Obtain the Kerberos configuration of the EMR cluster. |
java.security.krb5.kdc | The address of the KDC server. You can set this parameter to the address of the master node, such as the public IP address or domain name. Make sure that it is accessible. |
hivePrincipal | Set this parameter to the value of Principal obtained in Step 2: Copy the keytab file and obtain the principal. |
UserGroupInformation.loginUserFromKeytab | Set the first parameter to the value of hivePrincipal. |
hiveUrl | Set $IPORHOST to the value of java.security.krb5.kdc. |
If you want to perform a debug, add the following code at the beginning of the main method.
System.setProperty("sun.security.krb5.debug", "true");
Common errors and solutions
Error message | Cause | Solution |
Cannot contact any KDC | The KDC address is incorrect, or a network issue occurs. | Make sure that the KDC address in the krb5.conf file is correct, and use the |
keytab contains no suitable keys | The keytab file does not match. | Run the |
LoginException: Unable to obtain password | The keytab file is inaccessible. | Run the |
GSS initiate failed | Kerberos is incorrectly configured. | Make sure that the |