Map a DataWorks tenant member account to an E-MapReduce (EMR) cluster account so that tasks submitted from DataWorks run under the correct cluster identity.
How it works
When DataWorks submits a task to an EMR cluster, it authenticates using a cluster account. The account used depends on the access identity configured when you registered the EMR cluster:
-
Cluster Account Mapped to Account of Task Owner or Cluster Account Mapped to RAM User — tasks run as a RAM user
-
Cluster Account Mapped to Alibaba Cloud Account — tasks run as an Alibaba Cloud account
Without a mapping, DataWorks falls back to default behavior, which works only in limited cases. For RAM users, DataWorks looks for a system account in the EMR cluster with the same name. If LDAP or Kerberos authentication is not enabled for the EMR cluster, you must configure a mapping between the RAM user and the system account of the EMR cluster; otherwise, tasks fail. For Alibaba Cloud accounts, there is no default — you must always configure a mapping manually, regardless of whether LDAP or Kerberos authentication is enabled.
Prerequisites
Before you begin, make sure you have:
-
An EMR cluster registered as a computing resource in your DataWorks workspace
-
One of the following roles or permissions (see Who can configure mappings)
Who can configure mappings
Your ability to configure mappings for other members depends on your role.
| Role | Can configure mappings for |
|---|---|
| Alibaba Cloud account | All workspace members |
| RAM user or RAM role with AliyunDataWorksFullAccess and AliyunEMRFullAccess policies | All workspace members |
| RAM user or RAM role assigned the Workspace Administrator role and the AliyunEMRFullAccess policy | All workspace members |
| Any other member | Themselves only |
Usage notes
Authentication constraints
Do not configure a mapping for an EMR cluster that has both LDAP authentication and Kerberos authentication enabled. Tasks will fail if you do.
Ranger authorization
If Ranger authorization is enabled for an EMR cluster, add DataWorks to the cluster whitelist before developing EMR tasks. Without this, tasks fail with the error Cannot modify spark.yarn.queue at runtime or Cannot modify SKYNET_BIZDATE at runtime. See Add DataWorks to the EMR cluster whitelist.
Kerberos user management
If you use Kerberos authentication, enable the Kerberos authentication service on the EMR cluster and add the task development account to the service. For details, see Configure Kerberos authentication.
Data permissions
Service-level permissions on an EMR cluster can isolate data operation access for DataWorks users. For example, use Ranger to control which operations the mapped cluster account can perform.
If Data Lake Formation (DLF) is configured as the metadata storage service and the DLF-Auth component is used for DLF data permission management, request data permissions from Security Center in the DataWorks console. For details, see DLF data access control.
Failure scenarios
Tasks fail in the following scenarios. Use this table to diagnose misconfiguration.
| Mapping type | Scenario | Why tasks fail |
|---|---|---|
| System account mapping | A RAM user runs tasks, but no EMR cluster system account has the same name | DataWorks cannot find a matching account |
| System account mapping | A RAM user is mapped, but the account name or password does not match the actual EMR cluster account | Authentication fails |
| System account mapping | An Alibaba Cloud account runs tasks, but no mapping exists | No default fallback for Alibaba Cloud accounts |
| LDAP account mapping | LDAP authentication is enabled on the EMR cluster, but the mapping is not configured or is misconfigured in DataWorks | DataWorks sends the wrong credentials |
| Kerberos account mapping | Kerberos authentication is enabled on the EMR cluster, but the mapping is not configured or is misconfigured in DataWorks | DataWorks sends the wrong credentials |
| Kerberos account mapping | Kerberos mapping is configured in DataWorks, but the Kerberos authentication service is not enabled on the EMR cluster | The Kerberos service is unavailable |
| LDAP account mapping | LDAP mapping is configured in DataWorks, but LDAP authentication is not enabled for the relevant component in the EMR cluster | SQL tasks (Hive, Impala, Presto, Trino) fail at authentication |
Open the account mapping editor
-
Log on to the DataWorks console. In the top navigation bar, select the target region. In the left navigation pane, choose More > Management Center.
-
On the Management Center page, select the target workspace from the drop-down list and click Go to Management Center.
-
In the left navigation pane, click Computing Resources.
-
In the computing resource list, find the target EMR cluster and click Account Mappings. On the page that appears, click Edit Account Mappings in the upper-right corner.

Configure a mapping
On the cluster account mapping editing page, choose one of the following mapping types based on the authentication method enabled for your EMR cluster.
A mapping applies to all workspaces that have the EMR cluster registered. Modify the configuration only when your business requires it.
Option 1: System account mapping
Use this option when LDAP and Kerberos authentication are not enabled for the EMR cluster.
-
Set Configuration Mode to either:
-
Custom — define the mapping for this cluster only
-
Reference Configurations of Another Cluster — reuse an existing cluster's mapping configuration
-
-
Set Mapping Type to System Account Mapping.
-
Click Confirm.
Option 2: LDAP account mapping
Use this option when Lightweight Directory Access Protocol (LDAP) authentication is enabled for the relevant component in the EMR cluster (such as Hive, Impala, Presto, or Trino).
If LDAP authentication is not enabled for the component, SQL tasks that use the mapped account will fail at authentication.
Before you configure this mapping, enable the LDAP authentication service for the relevant component in the EMR cluster.
-
Set Configuration Mode to either:
-
Custom — define the mapping for this cluster only
-
Reference Configurations of Another Cluster — reuse an existing cluster's mapping configuration
-
-
Set Mapping Type to OPEN LDAP Account Mapping.
-
Click Confirm.
Option 3: Kerberos account mapping
Use this option when Kerberos authentication is enabled for the EMR cluster.
Before you configure this mapping, enable Kerberos on the EMR cluster.
-
Download the authentication credentials from the EMR cluster.
-
Click Upload Keystore File and upload the downloaded credentials. This ensures that EMR Trino and EMR Presto tasks run correctly.
-
Set Configuration Mode to either:
-
Custom — define the mapping for this cluster only
-
Reference Configurations of Another Cluster — reuse an existing cluster's mapping configuration
-
-
Set Mapping Type to Kerberos Account Mapping.
-
Click Confirm.
Add DataWorks to the EMR cluster whitelist
If Ranger authorization is enabled for an EMR cluster, add DataWorks to the cluster whitelist and restart the Hive service before running EMR tasks. Without this step, tasks fail with Cannot modify spark.yarn.queue at runtime or Cannot modify SKYNET_BIZDATE at runtime.
-
Add a custom parameter to the Hive service configuration in the EMR cluster:
ALISA.*andSKYNET.*are DataWorks-specific prefixes and are required for DataWorks tasks to run.hive.security.authorization.sqlstd.confwhitelist.append=tez.*|spark.*|mapred.*|mapreduce.*|ALISA.*|SKYNET.* -
Restart the Hive service for the configuration to take effect.
What's next
-
Bind an EMR computing resource — configure the default access identity when registering an EMR cluster
-
Configure Kerberos authentication — set up Kerberos on an EMR cluster
-
DLF data access control — manage data permissions using DLF