This topic describes how to integrate Hadoop Distributed File System (HDFS) with Ranger. This topic also describes how to configure the required permissions.

Background information

The permissions that you configured on HDFS by using Ranger and HDFS access control list (ACL)-based permissions take effect at the same time. The priority of the permissions that you configured is lower than the priority of the HDFS ACL-based permissions. The permissions that you configured on HDFS by using Ranger are verified only if the system fails to verify the HDFS ACL-based permissions. The following figure shows the authentication process. HDFS Config

Prerequisites

A cluster is created in the E-MapReduce (EMR) console, and Ranger is selected from the list of optional services when you create the cluster. For more information, see Create a cluster.

Integrate HDFS with Ranger

  1. Go to the Cluster Overview page.
    1. Log on to the Alibaba Cloud EMR console.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page, find your cluster and click Details in the Actions column.
  2. Enable HDFS in Ranger.
    1. In the left-side navigation pane, choose Cluster Service > RANGER.
    2. On the page that appears, select EnabledHDFS from the Actions drop-down list in the upper-right corner.
      HDFS
    3. Perform the following operations on the cluster:
      1. In the Cluster Activities dialog box, configure the Description parameter and click OK.
      2. In the Confirm message, click OK.
      3. Click History in the upper-right corner to view the task progress.
  3. Add the HDFS service on the web UI of Ranger.
    1. Log on to Ranger. For more information, see Overview.
    2. On the Ranger web UI, click the Add icon in the row in which HDFS is located to add the HDFS service.
      Ranger UI
    3. Configure the parameters that are described in the following table.
      hdfs
      Parameter Description
      Service Name The value is emr-hdfs and you cannot change the value.
      Username The value is hadoop and you cannot change the value.
      Password You can specify a custom password.
      Namenode URL
      • Enter hdfs://emr-header-1:9000 for a non-HA cluster.
      • Enter hdfs://emr-cluster for an HA cluster.
      Authorization Enabled Select No for a common cluster and Yes for a high-security cluster.
      Authentication Type
      • Select Simple for a common cluster.
      • Select Kerberos for a high-security cluster.
      dfs.datanode.kerberos.principal The parameters are required only for a high-security cluster. Set the value to hdfs/_HOST@${REALM}.
      Note ${REALM} indicates the Realm value for the Key Distribution Center (KDC). To obtain the Realm value for a high-security cluster, go to the /etc directory of a node. Then, view the value of kdc_realm in the krb5.conf file in the directory.
      The following figure shows a sample value of kdc_realm. kdc_realm
      dfs.namenode.kerberos.principal
      dfs.secondary.namenode.kerberos.principal
      Add New Configurations You need to add the policy.download.auth.users parameter and set the value to hdfs. hdfs
      If your cluster is an HA cluster, you also need to add the following parameters:
      • dfs.nameservices: emr-cluster
      • dfs.ha.namenodes.emr-cluster
      • dfs.namenode.rpc-address.emr-cluster.nn1
      • dfs.namenode.rpc-address.emr-cluster.nn2
      • dfs.client.failover.proxy.provider.emr-cluster
      hdfs-ha

      Configure the parameters based on the actual environment. You can obtain the parameter values on the Configure tab of the HDFS service in the EMR console.

    4. Click Add.
  4. Restart HDFS.
    1. In the left-side navigation pane, choose Cluster Service > HDFS.
    2. Select Restart NameNode from the Actions drop-down list in the upper-right corner.
    3. Perform the following operations on the cluster:
      1. In the Cluster Activities dialog box, configure the Description parameter and click OK.
      2. In the Confirm message, click OK.
      3. Click History in the upper-right corner to view the task progress.

Permission configuration example

You can perform the following steps to grant users the permissions on resources in a directory. For example, you can grant the Write and Execute permissions on resources in the /user/foo directory to the test user:

  1. Log on to Ranger. For more information, see Overview.
  2. Click emr-hdfs.
    Configure permissions
  3. Click Add New Policy in the upper-right corner.
  4. Configure the parameters that are described in the following table.
    Add a new policy
    Parameter Description
    Policy Name The name of the policy. You can specify a custom name.
    Resource Path The path of the resources, such as /user/foo.
    recursive Specifies whether the permissions take effect on subdirectories or files.
    Select Group The user group to which you want to attach the policy.
    Select User The user to whom you want to attach the policy, such as test.
    Permissions The permissions that you want to grant, such as the Write and Execute permissions.
  5. Click Add.
    After you attach the policy to the test user, the test user is granted the permissions. The test user is granted the Write and Execute permissions on the HDFS path /user/foo.
    Note After you add, remove, or modify a policy, it takes about one minute for the configuration to take effect.