Kerberos is an identity authentication protocol based on symmetric-key cryptography. Kerberos provides identity authentication and supports single sign-on (SSO). After a client is authenticated, the client can access multiple services, such as HBase and Hadoop Distributed File System (HDFS). This topic describes how to use Kerberos authentication to access Hive.

Background information

The Kerberos protocol process is mainly divided into two stages: the KDC authenticates the client ID, and the service authenticates the client ID.Kerberos authentication
  • KDC

    Kerberos server

  • Client

    If a user (principal) needs to access the service, the KDC and service authenticate the principal's identity.

  • Service

    Services that have integrated with Kerberos include HDFS, YARN, and HBase.

  • KDC ID authentication

    Before a principal can access a service integrated with Kerberos, it must first pass KDC ID authentication.

    After doing so, the client receives a ticket-granting ticket (TGT), which can be used to access a service that has integrated Kerberos.

  • Service ID authentication

    When a principal receives the TGT,it can access the service. It uses the TGT and the name of the service that it must access (such as HDFS) to obtain a service-granting ticket (SGT) from the KDC, and uses the SGT to access the service. This then uses the relevant information to conduct ID authentication on the client. After passing authentication, the client can access the service as normal.

Prerequisites

An E-MapReduce (EMR) Hadoop cluster is created. For more information, see Create a cluster.

The Kerberos Mode switch in the Advanced Settings section of the Software Settings step is turned on when you create the Hadoop cluster. Kerberos

Procedure

  1. Create a principal.
    1. Log on to the emr-header-1 node of your cluster in SSH mode. For more information, see Log on to a cluster.
    2. Run the following command to start the Kerberos administration tool:
      • EMR V3.30.0 and later V3.X.X, and EMR V4.5.1 and later V4.X.X:
        sh /usr/lib/has-current/bin/admin-local.sh /etc/ecm/has-conf -k /etc/ecm/has-conf/admin.keytab
      • EMR V3.X.X earlier than V3.30.0, and EMR V4.X.X earlier than V4.5.1:
        sh /usr/lib/has-current/bin/hadmin-local.sh /etc/ecm/has-conf -k /etc/ecm/has-conf/admin.keytab
    3. Run the following command to create a principal named test.

      In this example, the password is 123456.

      addprinc -pw 123456 test
      Note You must record the username and password, which will be used to create a Ticket Granting Ticket (TGT). If you do not want to record the username and password, perform the next step to generate a keytab file and import the username and password of the principal into the keytab file.
    4. Optional:Run the following command to generate a keytab file:
      ktadd -k /root/test.keytab test

      To exit the Kerberos administration tool, run the quit command.

  2. Create a TGT.
    You can create a TGT on one of the nodes on which you want to run a Hive Client.
    1. Run the following command as the root user to create a user named test:
      useradd test
    2. Run the following command to switch to the test user:
      su test
    3. Create a TGT.
      • Method 1: Use a username and password to create a TGT.
        Enter kinit and press Enter. Then, enter the password 123456 of the test account. kinit
      • Method 2: Use a keytab file to create a TGT.
        The test.keytab file that you created in Step 1 is stored in the /root/ directory of the emr-header-1 node. You must run the scp command to copy the test.keytab file to the /home/test/ directory of the current node.
        kinit -kt /home/test/test.keytab test
    4. Check whether the TGT is created.
      Run the klist command. If the following information appears, the TGT is created. Then, you can use the TGT to access Hive.
      Ticket cache: FILE:/tmp/krb5cc_1012
      Default principal: test@EMR.23****.COM
      
      Valid starting       Expires              Service principal
      07/24/2021 13:20:44  07/25/2021 13:20:44  krbtgt/EMR.238075.COM@EMR.238075.COM
              renew until 07/25/2021 13:20:44
      Notice You must record the numeric string 23**** in EMR.23****.COM in the preceding information. The numeric string is the value of the cluster_id parameter and is required when you access Hive.
  3. Access Hive.
    1. Run the following command to open the Hive CLI:
      hive
      If the following information is returned, the Hive CLI is opened:
      Logging initialized using configuration in file:/etc/ecm/hive-conf-2.3.5-2.0.3/hive-log4j2.properties Async: true
      Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    2. Run the following command to access Hive:
      /usr/lib/hive-current/bin/beeline -u "jdbc:hive2://emr-header-1:10000/default;principal=hive/emr-header-1@EMR.<cluster_id>.COM"
      You can obtain the value of the cluster_id parameter from the information that indicates a TGT is created. Alternatively, you can perform the following steps to obtain the value of the cluster_id parameter: Go to the Configure tab on the Hive service page in the EMR console, and search for the hive.server2.authentication.kerberos.principal parameter in the Configuration Filter section. The numeric string in the value of this parameter is the value of the cluster_id parameter. hive_cluster_id

References

  • For more information about how to create a principal, see Database administration.
  • For more information about how to create a TGT, see kinit.