This tutorial shows how to use Data Lake Formation (DLF) and E-MapReduce (EMR) to enforce role-based, fine-grained access control across a multi-engine data lake — covering database-level, table-level, and column-level permissions.
Key concepts
-
DLF Catalog: The top-level entity in the DLF metadata architecture. A catalog groups databases and tables and serves as the permission boundary. For more information, see Data Catalog.
-
DLF data permissions: DLF's built-in access control system, supporting fine-grained access control across four dimensions: databases, tables, columns, and functions. For more information, see Overview.
-
EMR: Alibaba Cloud E-MapReduce, a managed big data platform. For more information, see What is EMR on ECS?.
Scenario
A company runs an EMR cluster with multiple query engines — Hive, Spark, Presto, and Impala — and needs a single permission system that enforces consistent access rules for all users across all engines.
The following table shows the four roles in this scenario and the permissions each role requires:
| Role | RAM user | Scope | Permissions |
|---|---|---|---|
| Super administrator | dlf_data_admin |
All databases and tables | Full access; can grant permissions to any user |
| Business A data administrator | dlf_dba_admin |
db_a and all its tables |
Full access; can grant db_a permissions to other users |
| Business A data developer | dlf_dba_dev |
db_a and all its tables |
Read and write access |
| Business A data analyst | dlf_dba_analyst |
db_a.table1 (col1, col2 only) |
Read access to specific columns |
Prerequisites
Before you begin, make sure you have:
-
An Alibaba Cloud account with permissions to create EMR clusters and RAM users
-
Familiarity with HiveQL and the Beeline CLI
Step 1: Create an EMR cluster with DLF metadata
-
Log on to the E-MapReduce console.
-
Create an EMR cluster with the following settings:
Setting Value Business scenario Data Lake Optional services Select at least Hive and DLF-Auth Metadata DLF Unified Metadata DLF Catalog Select an existing catalog or create one. This tutorial uses catalog_test. -
Complete the remaining configuration and create the cluster.
If you already have an EMR cluster without the DLF-Auth component, add it through Add Service in the cluster management page. If your cluster uses Hive metadata that is not connected to DLF, migrate the metadata first. For migration assistance, contact support via DingTalk group 33719678.
Step 2: Initialize databases and tables
-
Log on to the EMR cluster. For instructions, see Log on to a cluster.
-
Connect to HiveServer2 using Beeline:
beeline -u jdbc:hive2://<primary-node-name>:10000 -
Run the following HiveQL statements to create the databases, tables, and test data:
-- Create databases and tables CREATE DATABASE db_a; CREATE TABLE db_a.table1 (col1 STRING, col2 STRING, col3 STRING); CREATE TABLE db_a.table2 (col1 STRING, col2 STRING, col3 STRING); CREATE DATABASE db_b; CREATE TABLE db_b.table1 (col1 STRING, col2 STRING, col3 STRING); -- Insert test data INSERT OVERWRITE TABLE db_a.table1 VALUES ('1','aliyun','emrA1'), ('2','aliyun','dlfA1'); INSERT OVERWRITE TABLE db_a.table2 VALUES ('1','aliyun','emrA2'), ('2','aliyun','dlfA2'); INSERT OVERWRITE TABLE db_b.table1 VALUES ('1','aliyun','emrB1'), ('2','aliyun','dlfB1');
Step 3: Create RAM users for each role
In the RAM console, create the following RAM users:
| Role | RAM user |
|---|---|
| Super administrator | dlf_data_admin |
| Business A data administrator | dlf_dba_admin |
| Business A data developer | dlf_dba_dev |
| Business A data analyst | dlf_dba_analyst |
Step 4: Enable access control
Complete both of the following steps to activate access control for the cluster:
-
Enable DLF-Auth in the EMR cluster. For instructions, see DLF-Auth.
-
Enable permission control for the catalog in DLF. For instructions, see Configure permissions.
For production environments, enable LDAP authentication to verify user identities. This tutorial skips LDAP authentication, so no password is required when connecting with Beeline.
Step 5: Grant the super administrator full access
Permissions required: Alibaba Cloud account owner or a user with AliyunDLFFullAccess and AliyunRAMReadOnlyAccess.
-
Log on to the Data Lake Formation console.
-
In the left navigation pane, choose Data Permission > Role.
-
In the admin role, add
dlf_data_admin. This grants the user administrator access to all data in DLF, including the ability to grant permissions to other users. -
To allow
dlf_data_adminto manage permissions for RAM users in the DLF console, attach the following policies in the RAM console: AliyunDLFFullAccess and AliyunRAMReadOnlyAccess. -
Verify that
dlf_data_admincan access all databases. Connect to Hive asdlf_data_admin:beeline -u jdbc:hive2://<primary-node-name>:10000 -n dlf_data_adminRun the following queries:
SELECT * FROM db_a.table1; SELECT * FROM db_b.table1;Both queries should succeed. The super administrator has access to all databases and tables.
Step 6: Grant the business A data administrator access to db_a
Permissions required: Log in as dlf_data_admin (super administrator).
-
Log on to the Data Lake Formation console.
-
In the left navigation pane, choose Data Permission > Data Permissions, then click Add Permission.
-
Fill in the following fields:
Field Value Principal type RAM User/Role Choose principal dlf_dba_adminResources Resource Authorization Select catalog catalog_testDatabase db_aDatabase – Data permission ALL Database – Granted permission ALL All objects in database – Data permission ALL All objects in database – Granted permission ALL -
Click OK.
-
To allow
dlf_dba_adminto manage permissions for other RAM users in the DLF console, attach the following policies in the RAM console: AliyunDLFFullAccess and AliyunRAMReadOnlyAccess. -
Verify the permissions. Connect to Hive as
dlf_dba_admin:beeline -u jdbc:hive2://<primary-node-name>:10000 -n dlf_dba_adminRun the following queries:
SELECT * FROM db_a.table1; -- Expected: success SELECT * FROM db_b.table1; -- Expected: permission deniedThe first query succeeds because
dlf_dba_adminhas full access todb_a. The second query fails becausedlf_dba_adminhas no access todb_b.
Step 7: Grant the business A data developer access to db_a
Permissions required: Log in as dlf_dba_admin (business A data administrator).
-
Log on to the Data Lake Formation console.
-
In the left navigation pane, choose Data Permission > Data Permissions, then click Add Permission.
-
Fill in the following fields:
Field Value Principal type RAM User/Role Choose principal dlf_dba_devResources Resource Authorization Select catalog catalog_testDatabase db_aDatabase – Data permission ALL Database – Granted permission ALL All objects in database – Data permission ALL All objects in database – Granted permission ALL -
Click OK.
-
Verify the permissions. Connect to Hive as
dlf_dba_dev:beeline -u jdbc:hive2://<primary-node-name>:10000 -n dlf_dba_devRun the following queries:
SELECT * FROM db_a.table1; -- Expected: success INSERT INTO TABLE db_a.table1 VALUES ('3','aliyun','emrA1'), ('4','aliyun','dlfA1'); -- Expected: success SELECT * FROM db_b.table1; -- Expected: permission denied INSERT INTO TABLE db_b.table1 VALUES ('3','aliyun','emrA1'), ('4','aliyun','dlfA1'); -- Expected: permission deniedThe queries against
db_asucceed. The queries againstdb_bfail becausedlf_dba_devonly has access todb_a.
Step 8: Grant the business A data analyst column-level access
Permissions required: Log in as dlf_dba_admin (business A data administrator).
-
Log on to the Data Lake Formation console.
-
In the left navigation pane, choose Data Permission > Data Permissions, then click Add Permission.
-
Fill in the following fields:
Field Value Principal type RAM User/Role Choose principal dlf_dba_analystResources Resource Authorization Resource type Column Select catalog catalog_testSelect database db_aSelect table table1DataColumn – Data permission ALL -
Click OK.
-
Verify the permissions. Connect to Hive as
dlf_dba_analyst:beeline -u jdbc:hive2://<primary-node-name>:10000 -n dlf_dba_analystRun the following queries:
SELECT col1, col2 FROM db_a.table1; -- Expected: success SELECT * FROM db_a.table1; -- Expected: permission denied (no access to col3) INSERT INTO TABLE db_a.table1 VALUES ('5','aliyun','emrA1'), ('6','aliyun','dlfA1'); -- Expected: permission denied SELECT * FROM db_b.table1; -- Expected: permission denieddlf_dba_analystcan only readcol1andcol2fromdb_a.table1. All other access — includingcol3, write operations, and cross-database queries — is denied.
What's next
-
Enable LDAP authentication to enforce identity verification in production. See DLF-Auth.
-
Learn how to revoke or modify permissions in the DLF console. See Configure permissions.
-
Explore DLF's function-level permission controls to restrict UDF access by role.