Kerberos is an identity authentication protocol based on symmetric-key cryptography.
Kerberos provides identity authentication and supports single sign-on (SSO). After
a client is authenticated, the client can access multiple services, such as HBase
and Hadoop Distributed File System (HDFS). This topic describes how to use Kerberos
authentication to access Hive.
Background information
The Kerberos protocol process is mainly divided into two stages: the KDC authenticates
the client ID, and the service authenticates the client ID.

- KDC
Kerberos server
- Client
If a user (principal) needs to access the service, the KDC and service authenticate
the principal's identity.
- Service
Services that have integrated with Kerberos include HDFS, YARN, and HBase.
- KDC ID authentication
Before a principal can access a service integrated with Kerberos, it must first pass
KDC ID authentication.
After doing so, the client receives a ticket-granting ticket (TGT), which can be used
to access a service that has integrated Kerberos.
- Service ID authentication
When a principal receives the TGT,it can access the service. It uses the TGT and the name of the service that it must
access (such as HDFS) to obtain a service-granting ticket (SGT) from the KDC, and
uses the SGT to access the service. This then uses the relevant information to conduct
ID authentication on the client. After passing authentication, the client can access
the service as normal.
Prerequisites
An E-MapReduce (EMR) Hadoop cluster is created. For more information, see Create a cluster.
The
Kerberos Mode switch in the
Advanced Settings section of the
Software Settings step is turned on when you create the Hadoop cluster.

Procedure
- Create a principal.
- Log on to the emr-header-1 node of your cluster in SSH mode. For more information,
see Log on to a cluster.
- Run the following command to start the Kerberos administration tool:
- Run the following command to create a principal named test.
In this example, the password is 123456.
addprinc -pw 123456 test
Note You must record the username and password, which will be used to create a Ticket Granting
Ticket (TGT). If you do not want to record the username and password, perform the
next step to generate a keytab file and import the username and password of the principal
into the keytab file.
- Optional:Run the following command to generate a keytab file:
ktadd -k /root/test.keytab test
To exit the Kerberos administration tool, run the quit
command.
- Create a TGT.
You can create a TGT on one of the nodes on which you want to run a Hive Client.
- Run the following command as the root user to create a user named test:
- Run the following command to switch to the test user:
- Create a TGT.
- Method 1: Use a username and password to create a TGT.
Enter
kinit
and press Enter. Then, enter the password 123456 of the test account.

- Method 2: Use a keytab file to create a TGT.
The
test.keytab file that you created in
Step 1 is stored in the
/root/ directory of the emr-header-1 node. You must run the
scp
command to copy the test.keytab file to the
/home/test/ directory of the current node.
kinit -kt /home/test/test.keytab test
- Check whether the TGT is created.
Run the
klist
command. If the following information appears, the TGT is created. Then, you can
use the TGT to access Hive.
Ticket cache: FILE:/tmp/krb5cc_1012
Default principal: test@EMR.23****.COM
Valid starting Expires Service principal
07/24/2021 13:20:44 07/25/2021 13:20:44 krbtgt/EMR.238075.COM@EMR.238075.COM
renew until 07/25/2021 13:20:44
Notice You must record the numeric string 23****
in EMR.23****.COM
in the preceding information. The numeric string is the value of the cluster_id
parameter and is required when you access Hive.
- Access Hive.
- Run the following command to open the Hive CLI:
hive
If the following information is returned, the Hive CLI is opened:
Logging initialized using configuration in file:/etc/ecm/hive-conf-2.3.5-2.0.3/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
- Run the following command to access Hive:
/usr/lib/hive-current/bin/beeline -u "jdbc:hive2://emr-header-1:10000/default;principal=hive/emr-header-1@EMR.<cluster_id>.COM"
You can obtain the value of the
cluster_id
parameter from the information that indicates a TGT is created. Alternatively, you
can perform the following steps to obtain the value of the cluster_id parameter: Go
to the Configure tab on the Hive service page in the EMR console, and search for the
hive.server2.authentication.kerberos.principal parameter in the Configuration Filter section. The numeric string in the value of
this parameter is the value of the
cluster_id
parameter.

References
- For more information about how to create a principal, see Database administration.
- For more information about how to create a TGT, see kinit.