The DLF-Auth component is provided by Data Lake Formation (DLF). You can use DLF-Auth to enable the data permission feature of DLF. DLF-Auth allows you to implement fine-grained permission control on databases, tables, columns, and functions. This way, you can manage data permissions on data lakes in a centralized manner. This topic describes how to enable DLF-Auth to manage permissions.

Background information

DLF is a fully managed service that helps you build cloud-based data lakes. DLF provides centralized permission management and metadata management for cloud-based data lakes. For more information about DLF, see Overview. DLF-EMR

Prerequisites

  • An E-MapReduce (EMR) cluster is created. For more information, see Create a cluster.
    Note When you create an EMR cluster, select DLF Unified Metadata as Metadata in the Software Configuration step.
  • The data permission management feature of DLF is enabled. To enable the feature, you need to enable permission control on related catalogs after you activate DLF.

Limits

  • DLF allows only RAM users to manage permissions. Therefore, you must use the user management feature to add a RAM user in the EMR console.
  • For more information about the regions in which you can use the data permission management feature of DLF, see Supported regions and endpoints.
  • If you enable Hive or Spark for DLF-Auth, you cannot enable or disable Hive or Spark for Ranger. If you enable Hive or Spark for Ranger, you cannot enable or disable Hive or Spark for DLF-Auth.
  • The following table describes the EMR versions and compute engines that are supported by DLF-Auth.
    EMR major version Hive Spark Presto Impala
    EMR V3.X EMR V3.39.0 and earlier minor versions Not supported Not supported Not supported Not supported
    EMR-3.40.0 Supported Supported Supported Not supported
    EMR V3.41.0 to EMR V3.43.1 Supported Supported Not supported Not supported
    EMR V5.X EMR V5.5.0 and earlier minor versions Not supported Not supported Not supported Not supported
    EMR-5.6.0 Supported Supported Supported Not supported
    EMR V5.7.0 to EMR V5.9.1 Supported Supported Not supported Not supported

Procedure

This section describes how to enable DLF-Auth to implement fully managed and centralized permission management on data lakes.

  1. Step 1: Enable DLF-Auth to manage Hive permissions
  2. Step 2: Add a RAM user
  3. Step 3: Authenticate the permissions of the RAM user
  4. (Optional) Step 4: Enable LDAP authentication for Hive
    After you enable DLF-Auth to manage permissions, we recommend that you enable Lightweight Directory Access Protocol (LDAP) authentication for Hive. This way, all users who connect to Hive can run related scripts after they pass LDAP authentication.

Step 1: Enable DLF-Auth to manage Hive permissions

  1. Go to the DLF-Auth service page.
    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Cluster Management tab.
    4. On the EMR on ECS page, find the cluster that you want to manage and click Services in the Actions column.
    5. On the Services tab, click Status in the DLF-Auth section.
  2. Enable DLF-Auth to manage Hive permissions.
    1. In the Components section of the DLF-Auth service page, find DLFAuthRuntime and click enableHive in the Actions column.
    2. In the dialog box that appears, enter a reason in the Execution Reason field and click OK.
    3. In the Confirm message, click OK.
  3. Restart HiveServer.
    1. On the Services tab, click the HIVE service.
    2. In the Components section of the Hive service page, find HiveServer, move the pointer over the more icon in the Actions column, and then select Restart.
    3. In the dialog box that appears, enter a reason in the Execution Reason field and click OK.
    4. In the Confirm message, click OK.

Step 2: Add a RAM user

You can add a RAM user by using the user management feature.

Step 3: Authenticate the permissions of the RAM user

  1. Authenticate the permissions of the RAM user before you grant permissions to the RAM user.
    1. Log on to your EMR cluster in SSH mode. For more information, see Log on to a cluster.
    2. Run the following command to access HiveServer2:
      beeline -u jdbc:hive2://master-1-1:10000 -n <user> -p <password>
      Note Replace user and password with the username and password that you set in Step 2: Add a RAM user.
    3. Query the information about an existing table.
      For example, run the following command to query the test table. Replace testdb.test with your table name.
      select * from testdb.test;
      If the RAM user does not have permissions on the table, an error message is returned to indicate that the query fails because the RAM user does not have required permissions. error
  2. Grant permissions to the RAM user.
    1. Log on to the DLF console.
    2. In the left-side navigation pane, choose Data Permission > Data Permissions.
    3. On the Data Permissions page, click Add Permission.
    4. On the Add Permission page, configure the parameters described in the following table.
      Parameter Description
      Principal Principal Type The type of the principal. Default value: RAM User/Role.
      Choose Principal The RAM user to which you want to grant permissions. Select the user that you add in Step 2: Add a RAM user from the Choose Principal drop-down list.
      Resources Authorization Method The method of authorization. Default value: Resource Authorization.
      Resource Type The type of resources. Select a type based on your business requirements.

      In this example, a metadata table is used.

      Permissions Data Permission In this example, Select is used.
      Granted Permission
    5. Click OK.
  3. Authenticate the permissions that are granted to the RAM user.
    Query the information about the table by referring to Step 1. The query is successful because the RAM user is granted the Select permission.

(Optional) Step 4: Enable LDAP authentication for Hive

  1. Go to the Services tab.
    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. On the EMR on ECS page, click Services in the Actions column of the cluster that you want to manage.
  2. Enable Lightweight Directory Access Protocol (LDAP) authentication.
    1. On the Services page, click Status in the Hive section.
    2. In the Components section, find HiveServer and click enableLDAP in the Actions column.
    3. In the dialog box that appears, configure the Execution Reason parameter and click OK.
    4. In the Confirm message, click OK.
    5. Restart HiveServer.
      1. In the Components section, find HiveServer and choose more > Restart in the Actions column.
      2. In the dialog box that appears, configure the Execution Reason parameter and click OK.
      3. In the Confirm message, click OK.

FAQ

Q: If I use multiple catalogs, how do I configure the ID of each catalog in DLF-Auth?

A: You can perform the following steps to configure the ID of a catalog in DLF-Auth.
Note The Presto compute engine is not supported in this version. Therefore, you do not need to configure the Presto compute engine.
  1. Go to the DLF-Auth service page.
    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Cluster Management tab.
    4. On the EMR on ECS page, find the cluster that you want to manage and click Services in the Actions column.
    5. On the Services tab, click Configure in the DLF-Auth section.
  2. Configure the Hive compute engine.
    You can configure the Hive compute engine based on your business requirements.
    1. Add a configuration item to Hive.
      1. Click the dlf-hive-security.xml tab.
      2. Click Add Configuration Item.
      3. In the dialog box that appears, set the Key parameter to dlf.catalog.id and the Value parameter to the ID of the DLF catalog that is associated with your cluster.

        To obtain the ID of the DLF catalog, view the value of the dlf.catalog.id parameter on the Configure tab of the Hive service page.

      4. Click OK.
      5. In the dialog box that appears, enter a reason in the Execution Reason field and turn on Automatically Update Configurations. Then, click Save.
    2. Restart the HiveServer service.
      1. Go to the Status tab of the Hive service.
      2. In the Components section, find HiveServer, move the pointer over the more icon in the Actions column, and then select Restart.
      3. In the dialog box that appears, enter a reason in the Execution Reason field and click OK.
      4. In the Confirm message, click OK.
  3. Configure the Spark compute engine.
    You can configure the Spark compute engine based on your business requirements.
    1. Add a configuration item to Spark.
      1. Click the dlf-spark-security.xml tab.
      2. Click Add Configuration Item.
      3. In the dialog box that appears, set the Key parameter to dlf.catalog.id and the Value parameter to the ID of the DLF catalog that is associated with your cluster.

        To obtain the ID of the DLF catalog, view the value of the dlf.catalog.id parameter on the Configure tab of the Spark service page.

      4. Click OK.
      5. In the dialog box that appears, enter a reason in the Execution Reason field and turn on Automatically Update Configurations. Then, click Save.
    2. Restart the Spark Thrift Server service.
      1. Go to the Status tab of the Spark service.
      2. In the Components section, find Spark Thrift Server, move the pointer over the more icon in the Actions column, and then select Restart.
      3. In the dialog box that appears, enter a reason in the Execution Reason field and click OK.
      4. In the Confirm message, click OK.