All Products
Search
Document Center

E-MapReduce:Enable HDFS authorization

Last Updated:Jul 07, 2023

After Hadoop Distributed File System (HDFS) authorization is enabled, only users that have the required permissions can access HDFS and perform operations such as reading data and creating folders. This topic describes how to enable HDFS authorization.

Background information

Hadoop provides the following modes to determine user identities:

  • Simple mode: User identities are determined by the operating system of the client that is connected to HDFS. If the client runs a UNIX-like operating system, the whoami command is run to display the logon username.

  • Kerberos mode: The user identity of a client is determined by its Kerberos credentials.

    You can turn on Kerberos Authentication when you create an E-MapReduce (EMR) cluster. For more information about Kerberos authentication, see Overview.

Prerequisites

An EMR cluster is created. For more information, see Create a cluster.

Procedure

Note
  • For a cluster that is deployed in Kerberos mode, HDFS permissions are automatically specified (with umask set to 027). You do not need to configure HDFS authorization and restart the HDFS service.

  • For a cluster that is not deployed in Kerberos mode, you must perform the following steps to configure HDFS authorization and restart the HDFS service.

  1. Go to the Services tab.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the desired cluster and click Services in the Actions column.

  2. On the Services tab, find the HDFS service and click Configure.

  3. On the Configure tab, modify the parameters based on your business requirements.

    Parameter

    Description

    dfs.permissions.enabled

    Specifies whether to enable permission check. Default value: false.

    To enable permission check, set this parameter to true.

    dfs.datanode.data.dir.perm

    The permissions on local folder storage directories for DataNodes.

    fs.permissions.umask-mode

    The permission mask. This is the default setting that is specified when you create a file or a folder.

    dfs.namenode.acls.enabled

    Specifies whether to enable access control lists (ACLs). Default value: false. To enable ACLs, set this parameter to true. Then, you can manage the permissions of owners, user groups, and other users.

    The commands that are used to configure ACLs include hadoop fs -getfacl [-R] <path> and hadoop fs -setfacl [-R] [-b |-k -m |-x <acl_spec> <path>] |[--set <acl_spec> <path>].

    dfs.permissions.superusergroup

    The name of the superuser group. Default value: hadoop. All users in this group are superusers.

  4. Save the modifications.

    1. Click Save in the lower part of the page.

    2. In the dialog box that appears, configure the Execution Reason parameter and click Save.

  5. Restart the HDFS service.

    1. In the upper-right corner of the HDFS service page, choose More > Restart.

    2. In the dialog box that appears, configure the Execution Reason parameter and click OK.

    3. In the Confirm message, click OK.

      You can click Operation History in the upper-right corner of the Services tab to view the task progress.

Example

  1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.

  2. Run the following command to switch to the created emrtest user:

    su emrtest
  3. Run the following command to use the emrtest user to create a directory:

    hadoop fs -mkdir /tmp/emrtest
  4. Run the following command to view the permissions of the created directory:

    hadoop fs -ls /tmp

    Information similar to the following output is returned:

    drwxr-x--x   - emrtest hadoop          0 2022-10-21 14:08 /tmp/emrtest
    drwxr-x--x   - hadoop  hadoop          0 2022-10-21 10:06 /tmp/hadoop-yarn
    drwx-wx-wx   - hive    hadoop          0 2022-10-21 10:13 /tmp/hive
    drwxr-x--x   - hadoop  hadoop          0 2022-10-21 10:23 /tmp/kyuubi-staging
    drwxrwxrwt   - hadoop  hadoop          0 2022-10-21 10:23 /tmp/logs                                 
  5. Run the following command to configure an ACL for the directory and grant User foo the rwx permissions:

    hadoop fs -setfacl -m user:foo:rwx /tmp/emrtest
  6. Run the following command to view the permissions of the directory:

    hadoop fs -ls /tmp/

    Information similar to the following output is returned:

    drwxrwx--x+  - emrtest hadoop          0 2022-10-21 14:08 /tmp/emrtest
    drwxr-x--x   - hadoop  hadoop          0 2022-10-21 10:06 /tmp/hadoop-yarn
    drwx-wx-wx   - hive    hadoop          0 2022-10-21 10:13 /tmp/hive
    drwxr-x--x   - hadoop  hadoop          0 2022-10-21 10:23 /tmp/kyuubi-staging
    drwxrwxrwt   - hadoop  hadoop          0 2022-10-21 10:23 /tmp/logs
    Note

    If the plus sign (+) follows permissions, an ACL is configured. Example: drwxrwx--x+.

  7. Run the following command to view the ACL:

    hadoop fs -getfacl /tmp/emrtest

    Information similar to the following output is returned:

    # file: /tmp/emrtest
    # owner: emrtest
    # group: hadoop
    user::rwx
    user:foo:rwx
    group::r-x
    mask::rwx
    other::--x