This topic describes how to configure MIT Kerberos authentication when you access HDFS.

Prerequisites

An E-MapReduce (EMR) cluster is created. For more information about how to create a cluster, see Create a cluster.

Access HDFS by running the hadoop command

The following example demonstrates how to access HDFS as the test user:

  1. Run the following command on the gateway cluster that is associated with your EMR cluster to configure the krb5.conf file:
    scp root@emr-header-1:/etc/krb5.conf /etc/
  2. Set the hadoop.security.authentication.use.has parameter to false.
    1. Connect to the emr-header-1 node of the cluster.
      Note If you use a high-availability (HA) EMR cluster, you must also connect to the emr-header-1 node of the cluster.

      For more information about how to connect to the node, see Connect to the master node of an EMR cluster in SSH mode.

    2. Run the following command to edit the core-site.xml file:
      vim /etc/ecm/hadoop-conf/core-site.xml
    3. Check the value of the hadoop.security.authentication.use.has parameter.
      • If the value is true, change it to false.
      • If the value is false, go to Step 3.
  3. Add a principal.
    1. Run the following command to start the Kerberos administration tool:
      • EMR V3.30.0 and later V3.X.X, and EMR V4.5.1 and later V4.X.X:
        sh /usr/lib/has-current/bin/admin-local.sh /etc/ecm/has-conf -k /etc/ecm/has-conf/admin.keytab
      • EMR V3.X.X earlier than V3.30.0, and EMR V4.X.X earlier than V4.5.1:
        sh /usr/lib/has-current/bin/hadmin-local.sh /etc/ecm/has-conf -k /etc/ecm/has-conf/admin.keytab
    2. Run the following command to add a principal named test.

      In this example, the password is 123456.

      addprinc -pw 123456 test
    3. Run the following command to export the keytab file:
      ktadd -k /root/test.keytab test
  4. Obtain a ticket.
    Run the following commands on the client where you want to run the hadoop command. In this example, the gateway cluster is used.
    1. Create a Linux account named test.
      useradd test
    2. Install the MIT Kerberos client.

      You can use the MIT Kerberos client to perform related operations, such as kinit and klist. For more information, see MIT Kerberos.

      yum install krb5-libs krb5-workstation -y
    3. Switch to the test account and run the kinit command.
      su test
      • If no keytab file exists, run the kinit command. After you press Enter, enter the password 123456 of the test account.
      • If a keytab file exists, run the following commands:
        # Use a specific principal in a specific keytab file for authentication. 
        kinit -kt test.keytab test 
        # View the lifecycle of the ticket. 
        klist
        After you enter klist and press Enter, enter the password 123456 of the test account. Information similar to the following output is returned:
        Valid starting       Expires              Service principal
        03/30/2021 10:48:47  03/31/2021 10:48:47  krbtgt/EMR.209749.COM@EMR.209749.COM
                renew until 03/31/2021 10:48:47
    4. Optional:If you want to set the lifecycle of the ticket, perform the following operations:
      1. Set the lifecycle of the ticket.
        kinit -l 5d
      2. Run the klist command to view the lifecycle of the ticket.
        Valid starting       Expires              Service principal
        03/30/2021 10:50:51  04/04/2021 10:50:51  krbtgt/EMR.209749.COM@EMR.209749.COM
                renew until 04/01/2021 10:50:51
  5. Run the following command on the gateway cluster to import environment variables:
    export HADOOP_CONF_DIR=/etc/has/hadoop-conf
  6. Run the following hadoop command:
    hadoop fs -ls /
    Information similar to the following output is returned:
    Found 6 items
    drwxr-xr-x   - hadoop    hadoop          0 2021-03-29 11:16 /apps
    drwxrwxrwx   - flowagent hadoop          0 2021-03-29 11:18 /emr-flow
    drwxr-x---   - has       hadoop          0 2021-03-29 11:16 /emr-sparksql-udf
    drwxrwxrwt   - hadoop    hadoop          0 2021-03-29 11:17 /spark-history
    drwxr-x---   - hadoop    hadoop          0 2021-03-29 11:16 /tmp
    drwxrwxrwt   - hadoop    hadoop          0 2021-03-29 11:17 /user
    Note If you want to run a YARN job, you must add a Linux account to all the nodes of the EMR cluster in advance. For more information about how to add the account to a node, see RAM authentication.

Access HDFS by using Java code

  • Use a local ticket cache
    Note You must run the kinit command to obtain a ticket in advance. If applications attempt to access an expired ticket, an error occurs.
    public static void main(String[] args) throws IOException {
       Configuration conf = new Configuration();
       // Load the configurations of HDFS. You can retrieve a copy of configurations from the EMR cluster. 
       conf.addResource(new Path("/etc/ecm/hadoop-conf/hdfs-site.xml"));
       conf.addResource(new Path("/etc/ecm/hadoop-conf/core-site.xml"));
       //Run the kinit command to obtain a ticket in advance by using a Linux account. 
       UserGroupInformation.setConfiguration(conf);
       UserGroupInformation.loginUserFromSubject(null);
       FileSystem fs = FileSystem.get(conf);
       FileStatus[] fsStatus = fs.listStatus(new Path("/"));
       for(int i = 0; i < fsStatus.length; i++){
           System.out.println(fsStatus[i].getPath().toString());
       }
    }
  • (Recommended) Use the keytab file
    Note The keytab file is permanently valid. The validity of the keytab file is irrelevant to local tickets.
    public static void main(String[] args) throws IOException {
      String keytab = args[0];
      String principal = args[1];
      Configuration conf = new Configuration();
      // Load the configurations of HDFS. You can retrieve a copy of configurations from the EMR cluster. 
      conf.addResource(new Path("/etc/ecm/hadoop-conf/hdfs-site.xml"));
      conf.addResource(new Path("/etc/ecm/hadoop-conf/core-site.xml"));
      // Use the keytab file. You can retrieve the keytab file from the emr-header-1 node of the EMR cluster by running a command. 
      UserGroupInformation.setConfiguration(conf);
      UserGroupInformation.loginUserFromKeytab(principal, keytab);
      FileSystem fs = FileSystem.get(conf);
      FileStatus[] fsStatus = fs.listStatus(new Path("/"));
      for(int i = 0; i < fsStatus.length; i++){
          System.out.println(fsStatus[i].getPath().toString());
      }
      }
    Dependencies in the pom.xml file:
    <dependencies>
      <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>x.x.x</version>
      </dependency>
      <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>x.x.x</version>
       </dependency>
    </dependencies>
    Note x.x.x indicates the Hadoop version of the EMR cluster.