edit-icon download-icon

Introduction to Kerberos

Last Updated: Mar 23, 2018

E-MapReduce supports creating secure clusters from EMR-2.7.x and EMR-3.5.x where open source components in the cluster are started in the Kerberos security mode. In this mode, only authenticated clients can access the cluster service such as HDFS.

Prerequisites

Kerberos components supported by the current E-MapReduce version are shown in the following table:

Component Name Component Version
HDFS 2.7.2
YARN 2.7.2
SPARK 2.1.1/1.6.3
HIVE 2.0.1
TEZ 0.8.4
ZOOKEEPER 3.4.6
HUE 3.12.0
ZEPPELIN 0.7.1
OOZIE 4.2.0
SQOOP 1.4.6
HBASE 1.1.1
PHOENIX 4.7.0

Note: Kafka, Presto, and Storm currently do not support Kerberos

1. Create a save cluster

You can simply toggle on the Security button on the software configuration tab of the cluster creation page as shown in the following figure:Create cluster

2. Kerberos identity authentication principle

Kerberos is an identity authentication protocol based on the symmetric key technology. As an independent third-party identity authentication service, Kerberos can provide its ID authentication function for other services, and it supports SSO (the client ican access multiple services, such as HBase and HDFS, after ID authentication).

The Kerberos protocol process is mainly divided into two stages where KDC authenticates the Client identity in the first stage, and the Service authenticates the Client identity in the second stage.

kerberos

  • KDC

    Kerberos server

  • Client

    If a user (principal) needs to access the service, KDC and Service authenticates the principal’s identity.

  • Service

    Services that have integrated with Kerberos include HDFS, YARN, and HBase.

2.1 KDC ID authentication

Before a client user (principal) can access a service integrated with Kerberos, it must first pass the KDC ID authentication.

After passing the KDC ID authentication, the client receives a TGT (Ticket Granting Ticket), which can be used to access a service that has integrated Kerberos.

2.2 Service ID authentication

When a principal receives the TGT in step 2.1, it can access the Service. It uses the TGT and the name of the service that it must access (such as HDFS) to obtain an SGT (Service Granting Ticket) from KDC, and use the SGT to access Service, which uses the relevant information to conduct ID authentication on the client. After passing the ID authentication, the client can normally access the Service.

3. EMR practice

Services in the EMR Kerberos security cluster starts in the Kerberos security mode when creating a cluster.

a) The Kerberos server is HasServer

Log on to the EMR Console. Choose Cluster Configuration Management > HAS, and conduct operations including View, Modify configuration, and Restart.

Non-HA clusters are deployed on emr-header-1, while HA clusters are deployed on both the emr-header-1 and emr-header-2 nodes.

b) Supports four ID authentication methods

HasServer supports the following four ID authentication methods. The client can specify the method to be used by HasServer through configuring the related parameters.

  • ID authentication compatible with MIT Kerberos

    Client configuration:

    1. If you want to execute a client request on a cluster node, you must set
    2. hadoop.security.authentication.use.has in /etc/ecm/hadoop-conf/core-site.xml to false.
    3. In case of any jobs are running through the execution plan of the console, then values in the /etc/ecm/hadoop-conf/core-site.xml file on the master node must not be modified. Otherwise, the job in the execution plan fails because of the authentication failure. You can follow these steps:
    4. export HADOOP_CONF_DIR=/etc/has/hadoop-conf Export a temporary environment variable. The hadoop.security.authentication.use.has value under this path has already been set to false.

    Access method:

    You can use open source clients to access Service, such as HDFS client. For more information, click here.

  • RAM ID authentication

    Client configuration:

    1. If you want to run a client request on a cluster node, you must set
    2. hadoop.security.authentication.use.has in /etc/ecm/hadoop-conf/core-site.xml to true, and auth_type in /etc/has/has-client.conf to RAM.
    3. In case of any jobs are running through the execution plan of the console, then values in the /etc/ecm/hadoop-conf/core-site.xml and /etc/has/has-client.conf files on the master node must not be modified. Otherwise, the job in the execution plan fails because of the authentication failure. You can use the following method:
    4. export HADOOP_CONF_DIR=/etc/has/hadoop-conf; export HAS_CONF_DIR=/path/to/has-client.conf Export a temporary environment variable, and then set the auth_type in the has-client.conf file of the HAS_CONF_DIR folder to RAM.

    Access method: The client must use a software package of the cluster (such as Hadoop and HBase). For more information, click here.

  • LDAP ID authentication

    Client configuration:

    1. If you want to execute a client request on a cluster node, you must set
    2. hadoop.security.authentication.use.has in /etc/ecm/hadoop-conf/core-site.xml to true, and auth_type in /etc/has/has-client.conf to LDAP.
    3. In case of any jobs are running through the execution plan of the console, then values in the /etc/ecm/hadoop-conf/core-site.xml and /etc/has/has-client.conf files on the master node must not be modified. Otherwise, the job in the execution plan fails because of the authentication failure. You can follow these steps:
    4. export HADOOP_CONF_DIR=/etc/has/hadoop-conf; export HAS_CONF_DIR=/path/to/has-client.conf Export temporary environment viarables, and then set the auth_type in the has-client.conf file of the HAS_CONF_DIR folder to LDAP.

Access method: The client must use a software package of the cluster (such as Hadoop and HBase). For more information, click here.

  • Execution plan authentication

    If you have jobs submitted through the execution plan of the EMR console, you must not modify the default configuration of the emr-header-1 node.

    Client configuration:

    1. Set hadoop.security.authentication.use.has in /etc/ecm/hadoop-conf/core-site.xml to true, and auth_type in /etc/has/has-client.conf on emr-header-1 to EMR.

    For more information, click here.

c) Others

Log on to the master node to access the cluster

The cluster administrator can also log on to the master node to access the cluster service. The administrator can use the has account (the default logon method is the MIT-Kerberos-compatible method) to log on to the master node and access the cluster service, which is convenient to conduct some troubleshooting or O&M tasks.

  1. >sudo su has
  2. >hadoop fs -ls /

Note: Other accounts can also be used to log on the master node, provided that such accounts have already passed Kerberos authentication. In addition, if you must use the MIT-Kerberos-compatible method on the master node, you must first export an environment variable under this account.

  1. export HADOOP_CONF_DIR=/etc/has/hadoop-conf/
Thank you! We've received your feedback.