After Hadoop Distributed File System (HDFS) authorization is enabled, only users that have the required permissions can access HDFS and perform operations such as reading data and creating folders. This topic describes how to enable HDFS authorization.
Background information
Hadoop provides the following modes to determine user identities:
Simple mode: User identities are determined by the operating system of the client that is connected to HDFS. If the client runs a UNIX-like operating system, the
whoamicommand is run to display the logon username.Kerberos mode: The user identity of a client is determined by its Kerberos credentials.
You can turn on Kerberos Authentication when you create an E-MapReduce (EMR) cluster. For more information about Kerberos authentication, see Overview.
Prerequisites
An EMR cluster is created. For more information, see Create a cluster.
Procedure
For a cluster that is deployed in Kerberos mode, HDFS permissions are automatically specified (with umask set to 027). You do not need to configure HDFS authorization and restart the HDFS service.
For a cluster that is not deployed in Kerberos mode, you must perform the following steps to configure HDFS authorization and restart the HDFS service.
Go to the Services tab.
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
On the EMR on ECS page, find the desired cluster and click Services in the Actions column.
On the Services tab, find the HDFS service and click Configure.
On the Configure tab, modify the parameters based on your business requirements.
Parameter
Description
dfs.permissions.enabled
Specifies whether to enable permission check. Default value: false.
To enable permission check, set this parameter to true.
dfs.datanode.data.dir.perm
The permissions on local folder storage directories for DataNodes.
fs.permissions.umask-mode
The permission mask. This is the default setting that is specified when you create a file or a folder.
dfs.namenode.acls.enabled
Specifies whether to enable access control lists (ACLs). Default value: false. To enable ACLs, set this parameter to true. Then, you can manage the permissions of owners, user groups, and other users.
The commands that are used to configure ACLs include
hadoop fs -getfacl [-R] <path>andhadoop fs -setfacl [-R] [-b |-k -m |-x <acl_spec> <path>] |[--set <acl_spec> <path>].dfs.permissions.superusergroup
The name of the superuser group. Default value: hadoop. All users in this group are superusers.
Save the modifications.
Click Save in the lower part of the page.
In the dialog box that appears, configure the Execution Reason parameter and click Save.
Restart the HDFS service.
In the upper-right corner of the HDFS service page, choose .
In the dialog box that appears, configure the Execution Reason parameter and click OK.
In the Confirm message, click OK.
You can click Operation History in the upper-right corner of the Services tab to view the task progress.
Example
Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
Run the following command to switch to the created emrtest user:
su emrtestRun the following command to use the emrtest user to create a directory:
hadoop fs -mkdir /tmp/emrtestRun the following command to view the permissions of the created directory:
hadoop fs -ls /tmpInformation similar to the following output is returned:
drwxr-x--x - emrtest hadoop 0 2022-10-21 14:08 /tmp/emrtest drwxr-x--x - hadoop hadoop 0 2022-10-21 10:06 /tmp/hadoop-yarn drwx-wx-wx - hive hadoop 0 2022-10-21 10:13 /tmp/hive drwxr-x--x - hadoop hadoop 0 2022-10-21 10:23 /tmp/kyuubi-staging drwxrwxrwt - hadoop hadoop 0 2022-10-21 10:23 /tmp/logsRun the following command to configure an ACL for the directory and grant User foo the rwx permissions:
hadoop fs -setfacl -m user:foo:rwx /tmp/emrtestRun the following command to view the permissions of the directory:
hadoop fs -ls /tmp/Information similar to the following output is returned:
drwxrwx--x+ - emrtest hadoop 0 2022-10-21 14:08 /tmp/emrtest drwxr-x--x - hadoop hadoop 0 2022-10-21 10:06 /tmp/hadoop-yarn drwx-wx-wx - hive hadoop 0 2022-10-21 10:13 /tmp/hive drwxr-x--x - hadoop hadoop 0 2022-10-21 10:23 /tmp/kyuubi-staging drwxrwxrwt - hadoop hadoop 0 2022-10-21 10:23 /tmp/logsNoteIf the plus sign (+) follows permissions, an ACL is configured. Example: drwxrwx--x+.
Run the following command to view the ACL:
hadoop fs -getfacl /tmp/emrtestInformation similar to the following output is returned:
# file: /tmp/emrtest # owner: emrtest # group: hadoop user::rwx user:foo:rwx group::r-x mask::rwx other::--x