After you bind an E-MapReduce cluster to a DataWorks workspace, you can create E-MapReduce nodes such as EMR Hive, EMR MR, EMR Presto, and EMR Spark SQL nodes. Then, you can add the nodes to workflows and configure scheduling policies for the nodes. This facilitates metadata management and helps E-MapReduce users better produce data.

DataWorks allows you to bind an E-MapReduce cluster in Shortcut mode or Security mode to meet different security requirements of enterprises. In Shortcut mode, you create and run E-MapReduce nodes only to produce data. In Security mode, you can manage permissions on the data to ensure higher security.

Shortcut mode

Assume that an E-MapReduce cluster in Shortcut mode is bound to a DataWorks workspace. The code of E-MapReduce nodes is delivered to the E-MapReduce cluster, regardless of whether the nodes are scheduled by DataWorks or run by using an Alibaba Cloud account or a RAM user. Then, a Hadoop user in the E-MapReduce cluster is used to run the code.
Notice
  • The Hadoop user has all permissions on the E-MapReduce cluster. Proceed with caution.
  • In Shortcut mode, you must attach the AliyunEMRDevelopAccess policy to workspace members such as developers and administrators, so that they can create and run E-MapReduce nodes in DataStudio.
    • The AliyunEMRDevelopAccess policy is already attached to Alibaba Cloud accounts.
    • To run E-MapReduce nodes as a RAM user, you must attach the AliyunEMRDevelopAccess policy to the RAM user. For more information, see Authorize RAM users.

The Shortcut mode is applicable to workspaces that do not require data permission management and isolation for node executors.

To bind an E-MapReduce cluster in Shortcut mode, perform the following steps:
  1. Go to the Workspace Management page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. Select the region where the required workspace resides. Then, find the workspace and click Workspace Settings in the Actions column. In the Workspace Settings panel, click More. The Workspace Management page appears. More
      You can also go to the Workspace Management page by using the following method: Find the workspace to which you want to bind an E-MapReduce cluster and click Data Analytics in the Actions column. On the DataStudio page, click the Workspace Management page icon in the upper-right corner. The Workspace Management page appears. Workspace Settings panel
  2. In the Computing Engine information section, click the E-MapReduce tab.
  3. On the E-MapReduce tab, click Add instances.
  4. In the New EMR cluster dialog box, set the parameters as required.
    Parameters in the New EMR cluster dialog box vary based on the mode of your DataWorks workspace. The following table describes the parameters for a DataWorks workspace in standard mode. You must set the parameters for both the production environment and the development environment. Standard mode
    Parameter Description
    Instance display name The display name of the E-MapReduce cluster that you want to bind.
    Region The region of the current workspace. You cannot modify this parameter.
    Access Mode The access mode of the E-MapReduce cluster. Select Shortcut mode from the drop-down list.
    Scheduling access identity The identity that is used to deliver the code of an E-MapReduce node to the E-MapReduce cluster. The code is delivered after the node is committed to the scheduling system of DataWorks in the production environment. Valid values: Alibaba Cloud primary account and Alibaba Cloud sub-account.
    Note
    • This parameter is used only in the production environment.
    • In Shortcut mode, you must attach the AliyunEMRDevelopAccess policy to workspace members such as developers and administrators, so that they can create and run E-MapReduce nodes in DataStudio.
      • The AliyunEMRDevelopAccess policy is already attached to Alibaba Cloud accounts.
      • To run E-MapReduce nodes as a RAM user, you must attach the AliyunEMRDevelopAccess policy to the RAM user. For more information, see Authorize RAM users.
    Access identity The identity that is used to deliver the code of an E-MapReduce node in the development environment to the E-MapReduce cluster. Default value: Task owner.
    Note
    • This parameter is used only in the development environment of a workspace in standard mode.
    • The node owner can be an Alibaba Cloud account or a RAM user.
      In Shortcut mode, you must attach the AliyunEMRDevelopAccess policy to workspace members such as developers and administrators, so that they can create and run E-MapReduce nodes in DataStudio.
      • The AliyunEMRDevelopAccess policy is already attached to Alibaba Cloud accounts.
      • To run E-MapReduce nodes as a RAM user, you must attach the AliyunEMRDevelopAccess policy to the RAM user. For more information, see Authorize RAM users.
    Cluster ID The ID of the E-MapReduce cluster that you want to bind. Select an ID from the drop-down list. The selected E-MapReduce cluster is used as the runtime environment of E-MapReduce nodes.
    Project ID The ID of the E-MapReduce project that you want to bind. Select an ID from the drop-down list. The selected E-MapReduce project is used as the runtime environment of E-MapReduce nodes.
    Note E-MapReduce projects in Security mode are unavailable.
    YARN resource queue The name of the resource queue in the E-MapReduce cluster. Unless otherwise specified, set the parameter to default.
    Endpoint The endpoint of E-MapReduce. You cannot modify this parameter.
  5. Click Confirm.

Security mode

Assume that an E-MapReduce cluster in Security mode is bound to a DataWorks workspace. The code of E-MapReduce nodes is delivered to the E-MapReduce cluster, regardless of whether the nodes are scheduled by DataWorks or run by using an Alibaba Cloud account or a RAM user. Then, a Hadoop user in the E-MapReduce cluster with the same name as the Alibaba Cloud account or RAM user is used to run the code. E-MapReduce Ranger can be used to manage permissions of each Hadoop user in the E-MapReduce cluster. This ensures that different Alibaba Cloud accounts, node owners, or RAM users have different data permissions when they run E-MapReduce nodes in DataWorks.
Note
In Security mode, you must add the credentials of workspace members such as developers and administrators to the Lightweight Directory Access Protocol (LDAP) directory of the E-MapReduce cluster. In addition, you must attach the AliyunEMRDevelopAccess or AliyunEMRFullAccess policy and grant relevant data permissions to the workspace members. This way, the workspace members can create and run E-MapReduce nodes in DataStudio.
  • The credentials of Alibaba Cloud accounts are already in the LDAP directory of the E-MapReduce cluster. The AliyunEMRDevelopAccess and AliyunEMRFullAccess policies are already attached to Alibaba Cloud accounts.
  • To run E-MapReduce nodes as a RAM user, you must add the credential of the RAM user to the LDAP directory of the E-MapReduce cluster. For more information, see the Add the credentials of specific RAM users to the LDAP directory of the E-MapReduce cluster step. In addition, you must attach the AliyunEMRDevelopAccess or AliyunEMRFullAccess policy to the RAM user. For more information, see Authorize RAM users.

The Security mode is applicable to workspaces that require data permission management and isolation for node executors.

To bind an E-MapReduce cluster in Security mode, perform the following steps:
  1. Enable Security mode for the E-MapReduce project.
    1. Log on to the E-MapReduce console.
    2. Click Data Platform in the top navigation bar.
    3. In the Projects section, find the project for which you want to enable Security mode and click Edit Job in the Actions column.
    4. On the page that appears, click the Projects tab in the top navigation bar.
    5. In the left-side navigation pane, click General Configuration. On the General Configuration page, turn on Security Mode. Turn on Security Mode
  2. Add the credentials of specific RAM users to the LDAP directory of the E-MapReduce cluster.
    1. Go back to the E-MapReduce console. In the top navigation bar, click Cluster Management.
    2. Find the cluster and click Details in the Actions column.
    3. In the left-side navigation pane, click Users.
    4. On the Users page, click Add User.
    5. In the Add User dialog box, set the parameters as required.
      We recommend that you add the credentials of the following RAM users to the LDAP directory of the E-MapReduce cluster:
      • RAM users that are used to create, test, and run E-MapReduce nodes in DataStudio.
      • RAM users that are used to create, commit, and deploy E-MapReduce nodes in DataStudio.
    6. Click OK.
  3. Configure E-MapReduce Ranger and manage the permissions of the Hadoop users that correspond to your Alibaba Cloud account and RAM users. For more information, see Integrate Ranger UserSync with an LDAP server and Overview.
  4. Bind the E-MapReduce cluster to the current DataWorks workspace.
    1. Go to the Workspace Management page.
    2. In the Computing Engine information section, click the E-MapReduce tab.
    3. On the E-MapReduce tab, click Add instances.
    4. In the New EMR cluster dialog box, set the parameters as required.
      Parameters in the New EMR cluster dialog box vary based on the mode of your DataWorks workspace. The following table describes the parameters for a DataWorks workspace in standard mode. You must set the parameters for both the production environment and the development environment. Security mode
      Parameter Description
      Instance display name The display name of the E-MapReduce cluster that you want to bind.
      Region The region of the current workspace.
      Access Mode The access mode of the E-MapReduce cluster. Select Security mode from the drop-down list and click Confirm in the Please note message.
      Scheduling access identity The identity that is used to deliver the code of an E-MapReduce node to the E-MapReduce cluster. The code is delivered after the node is committed and deployed to the scheduling system of DataWorks in the production environment. The Hadoop user that corresponds to this identity is used to run the code.
      Valid values: Task owner, Alibaba Cloud primary account, and Alibaba Cloud sub-account.
      • Task owner: uses the node owner to deliver and run the code. If you select this option, data permissions of Hadoop users are isolated. The node owner can be an Alibaba Cloud account or a RAM user.
      • Alibaba Cloud primary account: uses an Alibaba Cloud account to deliver the code to the E-MapReduce cluster.
      • Alibaba Cloud sub-account: uses a RAM user to deliver the code to the E-MapReduce cluster.
      Note
      • This parameter is used only in the production environment.
      • The credentials of Alibaba Cloud accounts are already in the LDAP directory of the E-MapReduce cluster. The AliyunEMRDevelopAccess and AliyunEMRFullAccess policies are already attached to Alibaba Cloud accounts.
      • To run E-MapReduce nodes as a RAM user, you must add the credential of the RAM user to the LDAP directory of the E-MapReduce cluster. For more information, see the Add the credentials of specific RAM users to the LDAP directory of the E-MapReduce cluster step. In addition, you must attach the AliyunEMRDevelopAccess or AliyunEMRFullAccess policy to the RAM user. For more information, see Authorize RAM users.
      Access identity The identity that is used to deliver the code of an E-MapReduce node in the development environment to the E-MapReduce cluster. Default value: Task owner. The corresponding Hadoop user in the E-MapReduce cluster is used to run the code.
      Note
      • This parameter is used only in the development environment of a workspace in standard mode.
      • Make sure that the credential of the node executor is added to the LDAP directory of the E-MapReduce cluster. In addition, make sure that the AliyunEMRDevelopAccess or AliyunEMRFulAccess policy is attached to the node executor and relevant data permissions are granted to the node executor. This way, the node executor can create and run E-MapReduce nodes in DataStudio. The node owner can be an Alibaba Cloud account or a RAM user.
        • The credentials of Alibaba Cloud accounts are already in the LDAP directory of the E-MapReduce cluster. The AliyunEMRDevelopAccess and AliyunEMRFullAccess policies are already attached to Alibaba Cloud accounts.
        • To run E-MapReduce nodes as a RAM user, you must add the credential of the RAM user to the LDAP directory of the E-MapReduce cluster. For more information, see the Add the credentials of specific RAM users to the LDAP directory of the E-MapReduce cluster step. In addition, you must attach the AliyunEMRDevelopAccess or AliyunEMRFullAccess policy to the RAM user. For more information, see Authorize RAM users.
      Cluster ID The ID of the E-MapReduce cluster that you want to bind. Select an ID from the drop-down list. The selected E-MapReduce cluster is used as the runtime environment of E-MapReduce nodes.
      Project ID The ID of the E-MapReduce project that you want to bind. Select the ID of an E-MapReduce project in Security mode from the drop-down list.
      Note E-MapReduce projects that are not in Security mode are unavailable.
      YARN resource queue The name of the resource queue in the E-MapReduce cluster. Unless otherwise specified, set the parameter to default.
      Endpoint The endpoint of E-MapReduce. You cannot modify this parameter.
    5. Click Confirm.