After HDFS authorization is enabled, only users that are granted the required permissions can access HDFS and perform operations such as reading data and creating folders. This topic describes how to enable HDFS authorization.
An E-MapReduce (EMR) cluster is created. For more information, see Create a cluster.
- Simple mode: User identities are determined by the operating system of the client
that is connected to HDFS. If the client runs a UNIX-like operating system, the
whoamicommand is run to display the logon username.
- Kerberos mode: The user identity of a client is determined by its Kerberos credentials.
You can turn on Kerberos Mode when you create an EMR cluster. For more information about Kerberos, see Introduction to Kerberos.
- The umask value can be changed based on your business requirements.
- HDFS is a basic service. Many services such as Hive and HBase rely on HDFS. Before you configure the upper-layer services, you must configure HDFS authorization.
- After you enable HDFS authorization, you must configure log storage paths for these services. For example, the log storage path of Spark is /spark-history. The log storage path of YARN is /tmp/$user/.
- You can set a sticky bit for a folder to prevent users other than superusers, file
owners, and directory owners from deleting files or sub-folders in the folder (even
if the users have rwx permissions on the folder).
For example, in an HDFS client, use the account that has HDFS administrator permissions to run the following command to modify the permissions on the /user directory:
hdfs dfs -chmod 1777 </user>
In the command, 777 is preceded by 1, which increases stickiness of the directory. This indicates that only the user that created the directory can delete the directory.
- Go to the Cluster Overview page.
- Log on to the Alibaba Cloud EMR console.
- In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
- Click the Cluster Management tab.
- On the Cluster Management page, find your cluster and click Details in the Actions column.
- In the left-side navigation pane of the Cluster Overview page, choose .
- Modify parameters.
- On the HDFS page, click the Configure tab.
- In the Configuration Filter section, enter the keyword of each parameter that you want to modify in the search box and click the icon.
- Modify the parameters based on your business requirements in the Service Configuration section.
Parameter Description dfs.permissions.enabled Specifies whether to enable permission check. Default value: false. To enable permission check, set this parameter to true. dfs.datanode.data.dir.perm Permissions on local folder storage directories for DataNodes. Default value: 755. fs.permissions.umask-mode The permission mask. This is the default setting that is specified when you create a file or a folder.
- Default value in simple mode: 022. If the default value is used, the permission for file creation is 644 (0666 & ^022 = 644), and the permission for folder creation is 755 (0777 & ^022 = 755).
- Default value in Kerberos mode: 027. If the default value is used, the permission for file creation is 640, and the permission for folder creation is 750.
dfs.namenode.acls.enabled Specifies whether to enable access control lists (ACLs). Default value: false. After you set this parameter to true, you can manage the permissions of owners, user groups, and other users.
The commands used to configure ACLs include
hadoop fs -getfacl [-R] <path>and
hadoop fs -setfacl [-R] [-b |-k -m |-x <acl_spec> <path>] |[--set <acl_spec> <path>].
dfs.permissions.superusergroup The name of the superuser group. Default value: hadoop. All users in this group are superusers.
- Restart the HDFS service.
- In the upper-right corner of the HDFS service page, choose .
- In the Cluster Activities dialog box, specify Description and click OK.
- In the Confirm message, click OK. You can click History in the upper-right corner to view the task progress.
- Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
- Run the following command to switch to the created test user:
- Run the following command to use the test user to create a directory:
hadoop fs -mkdir /tmp/test
- Run the following command to view the permissions of the created directory:
hadoop fs -ls /tmpThe information similar to the following output is returned:
Found 4 items drwxrwxrwx - root hadoop 0 2021-06-08 13:14 /tmp/hadoop-yarn drwx-wx-wx - hadoop hadoop 0 2021-06-16 15:54 /tmp/hive drwxrwxrwt - hadoop hadoop 0 2021-06-08 13:16 /tmp/logs drwxr-x--x - test hadoop 0 2021-06-16 17:15 /tmp/test
- Run the following command to configure an ACL for the directory and grant User foo
the rwx permissions:
hadoop fs -setfacl -m user:foo:rwx /tmp/test
- Run the following command to view the permissions of the directory:
hadoop fs -ls /tmp/The information similar to the following output is returned:
Found 4 items drwxrwxrwx - root hadoop 0 2021-06-08 13:14 /tmp/hadoop-yarn drwx-wx-wx - hadoop hadoop 0 2021-06-16 15:54 /tmp/hive drwxrwxrwt - hadoop hadoop 0 2021-06-08 13:16 /tmp/logs drwxrwx--x+ - test hadoop 0 2021-06-16 17:15 /tmp/testNote If the plus sign (+) follows permissions, an ACL is configured. Example: drwxrwx--x+.
- Run the following command to view the ACL:
hadoop fs -getfacl /tmp/testThe information similar to the following output is returned:
# file: /tmp/test # owner: test # group: hadoop user::rwx user:foo:rwx group::r-x mask::rwx other::--x