Running E-MapReduce (EMR) tasks in DataWorks requires setup on both the EMR side and the DataWorks side. Without proper configuration, tasks either fail outright or all workspace members share the same cluster permissions, which prevents user-level data isolation.
Who needs to do what
| Role | What to do |
|---|---|
| Workspace administrator | Register the EMR cluster to DataWorks, configure account mappings in SettingCenter |
| Data developer / task owner | Get added to LDAP users and the DataWorks workspace (required only for OpenLDAP account mapping) |
How it works
DataWorks maps workspace members to accounts on the registered EMR cluster. When a member runs an EMR task, DataWorks authenticates with the cluster using the mapped account, and the task runs with that account's data permissions. This gives each user their own data access boundaries.
Two account types are supported for mapping: system accounts and OpenLDAP accounts. Only one authentication method can be active per cluster.
Authentication method support varies by compute engine. Check whether your EMR cluster supports LDAP authentication in the EMR console before choosing a mapping type.
Before you begin
Confirm the following before starting configuration:
-
Mappings can only be configured at the cluster level.
-
The EMR cluster accounts and passwords used in mappings match the actual accounts and passwords of the EMR cluster registered to DataWorks. Mismatched credentials cause task failures.
-
If you use Mapping to OpenLDAP Account, LDAP authentication must be enabled for the cluster AND for each service that runs SQL tasks (Hive, Impala, Presto). If LDAP authentication is disabled for a service, SQL tasks on that service fail even when cluster-level mapping is configured.
-
If you use Mapping to System Account, LDAP authentication is not required.
Permission management at the EMR side
Enable LDAP authentication
To use an OpenLDAP account for identity authentication, enable LDAP authentication for the cluster and add the relevant accounts to LDAP users.
-
Enable LDAP authentication for the cluster. See Enable LDAP authentication.
-
Add the accounts that will create, test, commit, and deploy EMR tasks in DataStudio to LDAP users and the related DataWorks workspace. For details on adding accounts to a workspace, see Overview of users, roles, and permissions.
Manage data permissions
To isolate data permissions at the cluster level, use EMR Ranger to manage the permissions granted to each EMR cluster account that maps to an Alibaba Cloud account.
Permission management at the DataWorks side
The DataWorks side involves two distinct layers of permissions: platform permissions (which modules a user can access) and data permissions (which data a user can read or write based on account mapping). Both must be configured.
Register an EMR cluster to DataWorks
Register the EMR cluster to DataWorks so it can be used as a compute engine instance. Only accounts with the AliyunEMRFullAccess policy attached can perform this operation. For details on attaching this policy, see Overview of users, roles, and permissions.
Grant service module permissions
To run EMR tasks in DataWorks, each user needs permissions on the relevant service modules: DataStudio, Data Map, Data Quality, and intelligent monitoring. With these permissions, users can develop EMR tasks, perform O&M operations on the tasks, and monitor the data quality of the tasks. These permissions control access to platform features only — they do not determine what data a user can read or write. For details, see Overview of users, roles, and permissions.
Configure account mappings
After registering the EMR cluster in security mode, configure mappings between DataWorks workspace members and EMR cluster accounts. Members inherit the same data permissions as their mapped accounts.
-
In DataWorks, go to SettingCenter > Cluster Management.
-
Configure the mappings between workspace members and the accounts of the registered EMR cluster.
For step-by-step instructions, see Configure DataWorks.
Failure scenarios
| Mapping type | When tasks fail |
|---|---|
| Mapping to System Account | Accounts or passwords in the mapping do not match the actual EMR cluster credentials |
| Mapping to OpenLDAP Account | LDAP authentication is enabled for the cluster but no account mapping is configured in DataWorks |
| Mapping to OpenLDAP Account | LDAP authentication is enabled in DataWorks but disabled for the specific service (such as Hive, Impala, or Presto) in the EMR cluster |