All Products
Search
Document Center

Data Lake Formation:Use DLF and EMR to manage permissions

Last Updated:Mar 26, 2026

This tutorial shows how to use Data Lake Formation (DLF) and E-MapReduce (EMR) to enforce role-based, fine-grained access control across a multi-engine data lake — covering database-level, table-level, and column-level permissions.

Key concepts

  • DLF Catalog: The top-level entity in the DLF metadata architecture. A catalog groups databases and tables and serves as the permission boundary. For more information, see Data Catalog.

  • DLF data permissions: DLF's built-in access control system, supporting fine-grained access control across four dimensions: databases, tables, columns, and functions. For more information, see Overview.

  • EMR: Alibaba Cloud E-MapReduce, a managed big data platform. For more information, see What is EMR on ECS?.

Scenario

A company runs an EMR cluster with multiple query engines — Hive, Spark, Presto, and Impala — and needs a single permission system that enforces consistent access rules for all users across all engines.

The following table shows the four roles in this scenario and the permissions each role requires:

Role RAM user Scope Permissions
Super administrator dlf_data_admin All databases and tables Full access; can grant permissions to any user
Business A data administrator dlf_dba_admin db_a and all its tables Full access; can grant db_a permissions to other users
Business A data developer dlf_dba_dev db_a and all its tables Read and write access
Business A data analyst dlf_dba_analyst db_a.table1 (col1, col2 only) Read access to specific columns

Prerequisites

Before you begin, make sure you have:

  • An Alibaba Cloud account with permissions to create EMR clusters and RAM users

  • Familiarity with HiveQL and the Beeline CLI

Step 1: Create an EMR cluster with DLF metadata

  1. Log on to the E-MapReduce console.

  2. Create an EMR cluster with the following settings:

    Setting Value
    Business scenario Data Lake
    Optional services Select at least Hive and DLF-Auth
    Metadata DLF Unified Metadata
    DLF Catalog Select an existing catalog or create one. This tutorial uses catalog_test.
  3. Complete the remaining configuration and create the cluster.

If you already have an EMR cluster without the DLF-Auth component, add it through Add Service in the cluster management page. If your cluster uses Hive metadata that is not connected to DLF, migrate the metadata first. For migration assistance, contact support via DingTalk group 33719678.

Step 2: Initialize databases and tables

  1. Log on to the EMR cluster. For instructions, see Log on to a cluster.

  2. Connect to HiveServer2 using Beeline:

    beeline -u jdbc:hive2://<primary-node-name>:10000
  3. Run the following HiveQL statements to create the databases, tables, and test data:

    -- Create databases and tables
    CREATE DATABASE db_a;
    CREATE TABLE db_a.table1 (col1 STRING, col2 STRING, col3 STRING);
    CREATE TABLE db_a.table2 (col1 STRING, col2 STRING, col3 STRING);
    
    CREATE DATABASE db_b;
    CREATE TABLE db_b.table1 (col1 STRING, col2 STRING, col3 STRING);
    
    -- Insert test data
    INSERT OVERWRITE TABLE db_a.table1 VALUES ('1','aliyun','emrA1'), ('2','aliyun','dlfA1');
    INSERT OVERWRITE TABLE db_a.table2 VALUES ('1','aliyun','emrA2'), ('2','aliyun','dlfA2');
    INSERT OVERWRITE TABLE db_b.table1 VALUES ('1','aliyun','emrB1'), ('2','aliyun','dlfB1');

Step 3: Create RAM users for each role

In the RAM console, create the following RAM users:

Role RAM user
Super administrator dlf_data_admin
Business A data administrator dlf_dba_admin
Business A data developer dlf_dba_dev
Business A data analyst dlf_dba_analyst

Step 4: Enable access control

Complete both of the following steps to activate access control for the cluster:

  1. Enable DLF-Auth in the EMR cluster. For instructions, see DLF-Auth.

  2. Enable permission control for the catalog in DLF. For instructions, see Configure permissions.

For production environments, enable LDAP authentication to verify user identities. This tutorial skips LDAP authentication, so no password is required when connecting with Beeline.

Step 5: Grant the super administrator full access

Permissions required: Alibaba Cloud account owner or a user with AliyunDLFFullAccess and AliyunRAMReadOnlyAccess.

  1. Log on to the Data Lake Formation console.

  2. In the left navigation pane, choose Data Permission > Role.

  3. In the admin role, add dlf_data_admin. This grants the user administrator access to all data in DLF, including the ability to grant permissions to other users.

  4. To allow dlf_data_admin to manage permissions for RAM users in the DLF console, attach the following policies in the RAM console: AliyunDLFFullAccess and AliyunRAMReadOnlyAccess.

  5. Verify that dlf_data_admin can access all databases. Connect to Hive as dlf_data_admin:

    beeline -u jdbc:hive2://<primary-node-name>:10000 -n dlf_data_admin

    Run the following queries:

    SELECT * FROM db_a.table1;
    SELECT * FROM db_b.table1;

    Both queries should succeed. The super administrator has access to all databases and tables.

Step 6: Grant the business A data administrator access to db_a

Permissions required: Log in as dlf_data_admin (super administrator).

  1. Log on to the Data Lake Formation console.

  2. In the left navigation pane, choose Data Permission > Data Permissions, then click Add Permission.

  3. Fill in the following fields:

    Field Value
    Principal type RAM User/Role
    Choose principal dlf_dba_admin
    Resources Resource Authorization
    Select catalog catalog_test
    Database db_a
    Database – Data permission ALL
    Database – Granted permission ALL
    All objects in database – Data permission ALL
    All objects in database – Granted permission ALL
  4. Click OK.

  5. To allow dlf_dba_admin to manage permissions for other RAM users in the DLF console, attach the following policies in the RAM console: AliyunDLFFullAccess and AliyunRAMReadOnlyAccess.

  6. Verify the permissions. Connect to Hive as dlf_dba_admin:

    beeline -u jdbc:hive2://<primary-node-name>:10000 -n dlf_dba_admin

    Run the following queries:

    SELECT * FROM db_a.table1;   -- Expected: success
    SELECT * FROM db_b.table1;   -- Expected: permission denied

    The first query succeeds because dlf_dba_admin has full access to db_a. The second query fails because dlf_dba_admin has no access to db_b.

Step 7: Grant the business A data developer access to db_a

Permissions required: Log in as dlf_dba_admin (business A data administrator).

  1. Log on to the Data Lake Formation console.

  2. In the left navigation pane, choose Data Permission > Data Permissions, then click Add Permission.

  3. Fill in the following fields:

    Field Value
    Principal type RAM User/Role
    Choose principal dlf_dba_dev
    Resources Resource Authorization
    Select catalog catalog_test
    Database db_a
    Database – Data permission ALL
    Database – Granted permission ALL
    All objects in database – Data permission ALL
    All objects in database – Granted permission ALL
  4. Click OK.

  5. Verify the permissions. Connect to Hive as dlf_dba_dev:

    beeline -u jdbc:hive2://<primary-node-name>:10000 -n dlf_dba_dev

    Run the following queries:

    SELECT * FROM db_a.table1;                                              -- Expected: success
    INSERT INTO TABLE db_a.table1 VALUES ('3','aliyun','emrA1'), ('4','aliyun','dlfA1');  -- Expected: success
    SELECT * FROM db_b.table1;                                              -- Expected: permission denied
    INSERT INTO TABLE db_b.table1 VALUES ('3','aliyun','emrA1'), ('4','aliyun','dlfA1');  -- Expected: permission denied

    The queries against db_a succeed. The queries against db_b fail because dlf_dba_dev only has access to db_a.

Step 8: Grant the business A data analyst column-level access

Permissions required: Log in as dlf_dba_admin (business A data administrator).

  1. Log on to the Data Lake Formation console.

  2. In the left navigation pane, choose Data Permission > Data Permissions, then click Add Permission.

  3. Fill in the following fields:

    Field Value
    Principal type RAM User/Role
    Choose principal dlf_dba_analyst
    Resources Resource Authorization
    Resource type Column
    Select catalog catalog_test
    Select database db_a
    Select table table1
    DataColumn – Data permission ALL
  4. Click OK.

  5. Verify the permissions. Connect to Hive as dlf_dba_analyst:

    beeline -u jdbc:hive2://<primary-node-name>:10000 -n dlf_dba_analyst

    Run the following queries:

    SELECT col1, col2 FROM db_a.table1;   -- Expected: success
    SELECT * FROM db_a.table1;             -- Expected: permission denied (no access to col3)
    INSERT INTO TABLE db_a.table1 VALUES ('5','aliyun','emrA1'), ('6','aliyun','dlfA1');  -- Expected: permission denied
    SELECT * FROM db_b.table1;             -- Expected: permission denied

    dlf_dba_analyst can only read col1 and col2 from db_a.table1. All other access — including col3, write operations, and cross-database queries — is denied.

What's next

  • Enable LDAP authentication to enforce identity verification in production. See DLF-Auth.

  • Learn how to revoke or modify permissions in the DLF console. See Configure permissions.

  • Explore DLF's function-level permission controls to restrict UDF access by role.