After HDFS authorization is enabled, only users with the required permissions can access the Hadoop Distributed File System (HDFS) and perform operations such as reading data and creating directories. This topic describes how to enable HDFS authorization on a non-Kerberos EMR cluster.
Background
HDFS uses one of two modes to determine user identity:
-
Simple mode: The user identity is determined by the operating system of the client connecting to HDFS. On UNIX-like systems, this is equivalent to running the
whoamicommand. -
Kerberos mode: The user identity is determined by the client's Kerberos credentials. Enable Kerberos authentication when you create the cluster. For more information, see Overview.
For clusters deployed in Kerberos mode, HDFS permissions are automatically configured with a umask of 027. No additional configuration or service restart is required.
Prerequisites
Before you begin, ensure that you have:
-
An EMR cluster (non-Kerberos mode). For more information, see Create a cluster.
Enable HDFS authorization
-
Go to the Services tab.
-
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
-
In the top navigation bar, select the region where your cluster resides and select a resource group.
-
On the EMR on ECS page, find your cluster and click Services in the Actions column.
-
-
On the Services tab, find the HDFS service and click Configure.
-
On the Configure tab, set the following parameters:
Parameter Description Default dfs.permissions.enabledEnables permission checks. Set to trueto restrict HDFS access to authorized users only.falsedfs.datanode.data.dir.permPermissions on local storage directories for DataNodes. — fs.permissions.umask-modeThe permission mask applied when creating files and directories. — dfs.namenode.acls.enabledEnables access control lists (ACLs). Set to trueto manage fine-grained permissions for individual users, groups, and others beyond the standard owner/group/other model. When you runhadoop fs -ls, a trailing+in the permissions string indicates that an ACL is applied to that path.falsedfs.permissions.superusergroupThe name of the superuser group. All users in this group have superuser privileges. hadoop -
Save the configuration.
-
Click Save at the bottom of the page.
-
In the dialog box, fill in the Execution Reason field and click Save.
-
-
Restart the HDFS service.
-
In the upper-right corner of the HDFS service page, choose More > Restart.
-
In the dialog box, fill in the Execution Reason field and click OK.
-
In the Confirm dialog box, click OK.
To monitor restart progress, click Operation History in the upper-right corner of the Services tab.
-
Example: set and verify ACL permissions
After HDFS authorization is enabled, the following example shows how to configure ACL permissions for a directory and verify the results.
-
Log on to the cluster over SSH. For more information, see Log on to a cluster.
-
Switch to the
emrtestuser:su emrtest -
Create a directory:
hadoop fs -mkdir /tmp/emrtest -
View the directory permissions:
hadoop fs -ls /tmpThe output is similar to:
drwxr-x--x - emrtest hadoop 0 2022-10-21 14:08 /tmp/emrtest drwxr-x--x - hadoop hadoop 0 2022-10-21 10:06 /tmp/hadoop-yarn drwx-wx-wx - hive hadoop 0 2022-10-21 10:13 /tmp/hive drwxr-x--x - hadoop hadoop 0 2022-10-21 10:23 /tmp/kyuubi-staging drwxrwxrwt - hadoop hadoop 0 2022-10-21 10:23 /tmp/logs -
Grant user
fooread, write, and execute (rwx) permissions on the directory:hadoop fs -setfacl -m user:foo:rwx /tmp/emrtest -
Verify that the ACL is applied:
hadoop fs -ls /tmp/The
+after the permission string confirms that an ACL is set on/tmp/emrtest:drwxrwx--x+ - emrtest hadoop 0 2022-10-21 14:08 /tmp/emrtest drwxr-x--x - hadoop hadoop 0 2022-10-21 10:06 /tmp/hadoop-yarn drwx-wx-wx - hive hadoop 0 2022-10-21 10:13 /tmp/hive drwxr-x--x - hadoop hadoop 0 2022-10-21 10:23 /tmp/kyuubi-staging drwxrwxrwt - hadoop hadoop 0 2022-10-21 10:23 /tmp/logs -
View the full ACL entries:
hadoop fs -getfacl /tmp/emrtestThe output is similar to:
# file: /tmp/emrtest # owner: emrtest # group: hadoop user::rwx user:foo:rwx group::r-x mask::rwx other::--xUse
hadoop fs -getfacl [-R] <path>to view ACLs andhadoop fs -setfacl [-R] [-b |-k -m |-x <acl_spec> <path>] |[--set <acl_spec> <path>]to modify them.