After HDFS authorization is enabled, only users that are granted the required permissions can access HDFS and perform operations such as reading data and creating folders. This topic describes how to enable HDFS authorization.

Prerequisites

An E-MapReduce (EMR) cluster is created. For more information, see Create a cluster.

Background information

Hadoop provides the following modes to determine user identities:
  • Simple mode: User identities are determined by the operating system of the client that is connected to HDFS. If the client runs a UNIX-like operating system, the whoami command is run to display the logon username.
  • Kerberos mode: The user identity of a client is determined by its Kerberos credentials. Kerberos

    You can turn on Kerberos Mode when you create an EMR cluster. For more information about Kerberos, see Introduction to Kerberos.

Usage notes

  • The umask value can be changed based on your business requirements.
  • HDFS is a basic service. Many services such as Hive and HBase rely on HDFS. Before you configure the upper-layer services, you must configure HDFS authorization.
  • After you enable HDFS authorization, you must configure log storage paths for these services. For example, the log storage path of Spark is /spark-history. The log storage path of YARN is /tmp/$user/.
  • You can set a sticky bit for a folder to prevent users other than superusers, file owners, and directory owners from deleting files or sub-folders in the folder (even if the users have rwx permissions on the folder).
    For example, in an HDFS client, use the account that has HDFS administrator permissions to run the following command to modify the permissions on the /user directory:
    hdfs dfs -chmod 1777 </user>

    In the command, 777 is preceded by 1, which increases stickiness of the directory. This indicates that only the user that created the directory can delete the directory.

Procedure

Notice For a cluster deployed in Kerberos mode, HDFS permissions are automatically configured (with umask set to 027). You do not need to configure HDFS authorization and restart HDFS. For a cluster that is not deployed in Kerberos mode, you must perform the following steps to configure HDFS authorization and restart HDFS.
  1. Go to the Cluster Overview page.
    1. Log on to the Alibaba Cloud EMR console.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page, find your cluster and click Details in the Actions column.
  2. In the left-side navigation pane of the Cluster Overview page, choose Cluster Service > HDFS.
  3. Modify parameters.
    1. On the HDFS page, click the Configure tab.
    2. In the Configuration Filter section, enter the keyword of each parameter that you want to modify in the search box and click the search icon.
    3. Modify the parameters based on your business requirements in the Service Configuration section.
      Parameter Description
      dfs.permissions.enabled Specifies whether to enable permission check. Default value: false. To enable permission check, set this parameter to true.
      dfs.datanode.data.dir.perm Permissions on local folder storage directories for DataNodes. Default value: 755.
      fs.permissions.umask-mode The permission mask. This is the default setting that is specified when you create a file or a folder.
      • Default value in simple mode: 022. If the default value is used, the permission for file creation is 644 (0666 & ^022 = 644), and the permission for folder creation is 755 (0777 & ^022 = 755).
      • Default value in Kerberos mode: 027. If the default value is used, the permission for file creation is 640, and the permission for folder creation is 750.
      dfs.namenode.acls.enabled Specifies whether to enable access control lists (ACLs). Default value: false. After you set this parameter to true, you can manage the permissions of owners, user groups, and other users.

      The commands used to configure ACLs include hadoop fs -getfacl [-R] <path> and hadoop fs -setfacl [-R] [-b |-k -m |-x <acl_spec> <path>] |[--set <acl_spec> <path>].

      dfs.permissions.superusergroup The name of the superuser group. Default value: hadoop. All users in this group are superusers.
  4. Restart the HDFS service.
    1. In the upper-right corner of the HDFS service page, choose Actions > Restart All Components.
    2. In the Cluster Activities dialog box, specify Description and click OK.
    3. In the Confirm message, click OK.
      You can click History in the upper-right corner to view the task progress.

Example

  1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
  2. Run the following command to switch to the created test user:
    su test
  3. Run the following command to use the test user to create a directory:
    hadoop fs -mkdir /tmp/test
  4. Run the following command to view the permissions of the created directory:
    hadoop fs -ls /tmp
    The information similar to the following output is returned:
    Found 4 items
    drwxrwxrwx   - root   hadoop          0 2021-06-08 13:14 /tmp/hadoop-yarn
    drwx-wx-wx   - hadoop hadoop          0 2021-06-16 15:54 /tmp/hive
    drwxrwxrwt   - hadoop hadoop          0 2021-06-08 13:16 /tmp/logs
    drwxr-x--x   - test   hadoop          0 2021-06-16 17:15 /tmp/test
  5. Run the following command to configure an ACL for the directory and grant User foo the rwx permissions:
    hadoop fs -setfacl -m user:foo:rwx /tmp/test
  6. Run the following command to view the permissions of the directory:
    hadoop fs -ls /tmp/
    The information similar to the following output is returned:
    Found 4 items
    drwxrwxrwx   - root   hadoop          0 2021-06-08 13:14 /tmp/hadoop-yarn
    drwx-wx-wx   - hadoop hadoop          0 2021-06-16 15:54 /tmp/hive
    drwxrwxrwt   - hadoop hadoop          0 2021-06-08 13:16 /tmp/logs
    drwxrwx--x+  - test   hadoop          0 2021-06-16 17:15 /tmp/test
    Note If the plus sign (+) follows permissions, an ACL is configured. Example: drwxrwx--x+.
  7. Run the following command to view the ACL:
    hadoop fs -getfacl /tmp/test
    The information similar to the following output is returned:
    # file: /tmp/test
    # owner: test
    # group: hadoop
    user::rwx
    user:foo:rwx
    group::r-x
    mask::rwx
    other::--x