Before you run E-MapReduce (EMR) nodes in DataWorks, you must complete authentication and authorization configurations at the EMR and DataWorks sides to ensure that the nodes can be run as expected. This topic describes how to manage permissions on DataWorks and EMR.

Background information

In DataWorks, you can configure mappings between the members in a workspace and the accounts of the EMR cluster associated with the workspace to obtain the permissions on the EMR cluster. This way, Alibaba Cloud accounts, node owners, or RAM users have different permissions on data when they run EMR nodes in DataWorks, and data permissions are isolated. For more information about permission configurations that are required to run EMR nodes in DataWorks, see Permission management at the EMR side and Permission management at the DataWorks side.

Limits

DataWorks allows you to use only the system account or OpenLDAP account to configure mappings between members in a workspace and the accounts of the EMR cluster associated with the workspace. When you configure the mappings, take note of the following items:
  • You can configure mappings only at the cluster level. Only one authentication method can be used.
  • The EMR cluster accounts and passwords in the mappings must be the same as the actual accounts and passwords of the EMR cluster associated with the workspace.
If the EMR cluster accounts and passwords in the mappings are inconsistent with the actual accounts and passwords, or authentication is not enabled for the cluster, EMR nodes fail to be run in DataWorks. The following table describes the details.
Value of the Mapping Type parameter Description
System Account If the accounts or passwords are inconsistent, EMR nodes fail to be run in DataWorks.
OpenLDAP Account In the following scenarios, EMR nodes fail to be run in DataWorks:
  • LDAP authentication is enabled for the desired cluster but no account mapping is configured in DataWorks.
  • LDAP authentication is enabled for DataWorks but is disabled for the desired service in the EMR cluster.
    Note If you use an openLDAP account to configure the mappings, SQL nodes such as Hive, Impala, and Presto nodes use this account for authentication by default. In this case, if LDAP authentication is not enabled for the desired service in the EMR cluster, the SQL nodes fails to be run.
Note The authentication method for EMR clusters varies with the compute engine. You can check whether an EMR cluster supports LDAP authentication in the EMR console.

Permission management at the EMR side

  • Enable LDAP authentication
    If you want to use a non-system account for identity authentication in an EMR cluster, you must enable LDAP authentication for the cluster and add the account that is used to develop EMR nodes in DataWorks to LDAP users. In this case, you must perform the following steps:
    1. Enable LDAP authentication for the cluster.

      To use LDAP for identity authentication, you must enable LDAP authentication for the cluster. For more information, see Enable LDAP authentication.

    2. Prepare the account that is used to run EMR nodes and add the account to LDAP users and the related DataWorks workspace.

      We recommend that you add users who need to create, test, commit, and deploy EMR nodes in DataStudio to LDAP users and the related DataWorks workspace. For more information about how to add an account to a DataWork workspace, see Users, roles, and permissions.

  • Manage data permissions

    You can manage the services in an EMR cluster to isolate data permissions. For example, you can use EMR Ranger to manage the permissions of users in an EMR cluster.

Permission management at the DataWorks side

  • Associate an EMR cluster with a DataWorks workspace

    Before you run EMR nodes in DataWorks, you must associate an EMR cluster with a DataWorks workspace. This way, the cluster can be used as a compute engine instance in DataWorks. Only accounts to which the AliyunEMRFullAccess policy is attached can be used to perform this operation. For more information about how to attach the AliyunEMRFullAccess policy to an account, see Overview of users, roles, and permissions.

  • Grant permissions on DataWorks service modules to an account

    If you want to run EMR nodes in DataWorks, you must be granted the permissions on DataWorks service modules such as DataStudio, Data Map, Data Quality, and intelligent monitoring. After you obtain the permissions, you can develop EMR nodes, perform O&M operations on the nodes, and monitor the data quality of the nodes. For more information about the permissions on service modules, see Users, roles, and permissions.

  • Configure account mappings
    After you associate an EMR cluster with a workspace by using the security mode, go to the EMR Cluster Configure page of DataWorks. On this page, configure mappings between the members in a DataWorks workspace and the accounts of the EMR cluster associated with the workspace. This way, the members in the DataWorks workspace have the same permissions as the mapped accounts.
    Note For more information about how to associate an EMR cluster with a DataWorks workspace and configure the mappings, see Configure DataWorks.