This topic describes the high security feature of YARN and related configurations.

Background information

If you turn on Kerberos Authentication when you create an E-MapReduce (EMR) cluster, the cluster is a high-security cluster. This allows you to use the high security feature of YARN. By default, the following features are enabled for the YARN service in a high-security cluster to limit user behavior and ensure the data security of the cluster:

Kerberos authentication

When you create a high-security cluster, EMR configures Kerberos-related parameters for YARN. Manual configurations are not required. For more information about Kerberos, see Overview.

For a high-security cluster, you can use a client to access the remote procedure call (RPC) and HTTP services of YARN only if the cluster passes Kerberos authentication. Sample code:
kinit
yarn node -list
kdestroy

kinit
curl --negotiate -u: http://master-1-1:8088/ws/v1/cluster/nodes
kdestroy

ACL-based authorization

For a high-security cluster, the access control list (ACL) feature of YARN is automatically enabled. By default, only the hadoop user group to which the service account belongs is granted permissions to manage the YARN service and queues, and submit YARN jobs. You can configure the yarn.acl.enable parameter to specify whether to enable the ACL feature of YARN. The value true indicates that the feature is enabled.

Management permissions of YARN

The default configuration for the high security feature is yarn.admin.acl= hadoop (a space exists before hadoop). This indicates that the hadoop user group serves as the service administrator. In most cases, the processes of an EMR cluster are started by Linux users that belong to the hadoop user group. The default user group mapping in Hadoop is based on the group information of the operating system for a node.

Note In most cases, the yarn.admin.acl parameter is set to a value in the User User group format. You must add a space between the user and user group. If you configure multiple users and user groups, you must separate the users with commas (,) and the user groups also with commas (,). Example: user1,user2 group1,group2. If the value of the yarn.admin.acl parameter contains only user groups, you must add a space at the beginning of the user groups. If you set the yarn.admin.acl parameter to a single space, no user or user group is granted the required permissions.

Queue management permissions of YARN

Queue permissions of YARN include the permissions to submit jobs and the permissions to manage queues.

If you create an EMR high-security cluster, the default configurations for Capacity Scheduler in the capacity-scheduler.xml file of the YARN service are yarn.scheduler.capacity.root.acl_submit_applications= (the value is a single space) and yarn.scheduler.capacity.root.acl_administer_queue= hadoop (a space exists before hadoop). This indicates that queue management permissions are granted to the hadoop user group. In this case, users that do not belong to the hadoop user group cannot submit jobs to queues. To view the user group to which the current user belongs, run the id command.

If your cluster is used by only a few users, and you do not require the ACL feature, you can clear the settings of the preceding two parameters. Then, perform the following steps to update the queue configurations: On the Status tab of the YARN service page in the EMR console, find the ResourceManager component, move the pointer over the More icon in the Actions column, and then select refresh_queues. In the dialog box that appears, configure the Execution Reason parameter and click OK. In the Confirm message, click OK. This way, the ACL feature is disabled.

Usage of queue management permissions
  • If you want to use the ACL feature, we recommend that you use the feature together with Ranger. This way, you can configure and manage user queue permissions in a visualized manner.
    Note Ranger supports only Capacity Scheduler. For more information, see Enable YARN in Ranger and configure related permissions.
  • To use the ACL feature, add the yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications and yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue parameters to the capacity-scheduler.xml configuration file for ACL authorization of queues. For more information, see Queue Properties and the ACL configurations of queues section in the YARN schedulers topic.

    The ACL feature can be used together with the queue mapping feature of Capacity Scheduler. To configure mappings between users or user groups and queues, configure the yarn.scheduler.capacity.queue-mappings parameter in the capacity-scheduelr.xml configuration file. Then, you can set the yarn.scheduler.capacity.queue-mappings-override.enable parameter to true. This way, the users or user groups can submit jobs to the queues to which the users or user groups are mapped.

Job management permissions of YARN

Job management permissions of YARN include VIEW_APP and MODIFY_APP. The VIEW_APP permission is used to view information about a job, and the MODIFY_APP permission is used to modify a job.

  • VIEW_APP: allows you to view information about jobs and logs of YARN components. VIEW_APP does not include the permissions that are configured on the engine side to limit the VIEW_APP behavior of users.

    By default, the mapred-site.xml file contains the mapreduce.job.acl-view-job=* setting. This indicates that the VIEW_APP permission is granted to users in MapReduce.

  • MODIFY_APP: When you stop a job in YARN, you can use the ADMINISTER_QUEUE permission to manage ACLs.

LCE

By default, the containers of the YARN service that is deployed in non-high-security clusters use the DefaultContainerExecutor implementation class. If the containers use the hadoop account, jobs that are submitted by different users cannot be isolated by using authentication. In this case, users can maliciously access and modify the files and configurations that are related to YARN. Tenants may also maliciously access the resources of each other. EMR high-security clusters use Linux Container Executor (LCE) to run secure containers. LCE allows containers to run applications by using the account of the user who submits jobs based on the setuid bit. This way, high-risk and unnecessary permissions on containers are revoked.

To use LCE, make sure that the operating system on which NodeManager is run has a Linux account that corresponds to the user who submits jobs. To add a user, we recommend that you use the user management feature in the EMR console. This way, a user can be added to the OpenLDAP service of an EMR cluster and mapped to a Linux account by using the Nslcd service of each node. If you use this method to add a user, the OpenLDAP service must be deployed in your cluster. You can also manage Linux accounts. You can add a Linux account that corresponds to YARN on each node of your cluster and add the required bootstrap action scripts. This ensures that a Linux account can be automatically added on a new node.