All Products
Search
Document Center

:Scenario: Register a cross-account EMR cluster

Last Updated:Oct 18, 2023

You can register an E-MapReduce (EMR) cluster across Alibaba Cloud accounts. This operation must be performed by using a RAM role. This topic describes how to use a RAM role to enable Alibaba Cloud Account A to register an EMR cluster that belongs to Alibaba Cloud Account B in DataWorks. This way, you can implement cross-account access to EMR data.

Prerequisites

  • Alibaba Cloud Account A and Alibaba Cloud Account B are created. For information about how to create an Alibaba Cloud account, see Sign up with Alibaba Cloud.

    • Alibaba Cloud Account A: used to register an EMR cluster that belongs to Alibaba Cloud Account B in DataWorks.

    • Alibaba Cloud Account B: used to provide an EMR cluster.

  • An EMR cluster is created by using Alibaba Cloud Account B. For information about how to create an EMR cluster, see Create a cluster.

Limits

  • Only EMR Hadoop clusters of V3.38.3 or V3.38.2 for which the Metadata parameter is not set to DLF Unified Metadata can be used.

  • Kerberos authentication is not supported.

  • Spark supports table lineages of SQL nodes and does not support field lineages of SQL nodes.

Alibaba Cloud Account B: Create a RAM role and authorize Alibaba Cloud Account A to assume the RAM role

Alibaba Cloud Account B is assigned a RAM role that has permissions to access EMR resources. Alibaba Cloud Account B authorizes Alibaba Cloud Account A to assume this role to access the EMR resources.

  1. Create a RAM role.

    Log on to the RAM console by using Alibaba Cloud Account B. Create a RAM role and add Alibaba Cloud Account A as a trusted Alibaba Cloud account for the role. Then, Alibaba Cloud Account A can assume the role to access the authorized resources. For information about how to create a RAM role, see Create a RAM role for a trusted Alibaba Cloud account.

    image.png

    Sample key configurations of a RAM role:

    • Set the RAM Role Name parameter to EMRRole.

    • Set the Select Trusted Alibaba Cloud Account parameter to Other Alibaba Cloud Account, and enter the ID of Alibaba Cloud Account A in the field that appears. You can log on to the RAM console by using Alibaba Cloud Account A, and move the pointer over the profile picture in the top navigation bar to obtain the ID of Alibaba Cloud Account A.

    After the configuration is complete, Alibaba Cloud Account A can assume the EMRRole role and access the authorized resources.

  1. Modify the trust policy of the EMRRole role.

    You must go to the details page of the EMRRole role and modify its trust policy to authorize Alibaba Cloud Account A to access EMR clusters that belong to Alibaba Cloud Account B. For information about how to modify the trust policy of a RAM role, see Edit the trust policy of a RAM role. The following code shows the document of the trust policy:

    {
      "Statement": [
        {
          "Action": "sts:AssumeRole",
          "Effect": "Allow",
          "Principal": {
            "Service": [
              "ID of Alibaba Cloud Account A@emr.dataworks.aliyuncs.com"
            ]
          }
        }
      ],
      "Version": "1"
    }
  1. Attach the AliyunDataWorksAccessingEMRReadOnlyPolicy policy to the EMRRole role.

Alibaba Cloud Account A: Register an EMR cluster that belongs to Alibaba Cloud Account B

Note

In this step, use Alibaba Cloud Account A to register an EMR cluster that belongs to Alibaba Cloud Account B in a workspace of Alibaba Cloud Account A. Before you perform the following steps, you must obtain the ID of Alibaba Cloud Account B.

  1. Go to the SettingCenter page.

    Log on to the DataWorks console. In the left-side navigation pane, click Management Center. On the Management Center page, select the desired workspace from the drop-down list and click Go to Management Center.

  2. Configure information about an EMR cluster.

    1. Configure basic information about the EMR cluster.

      Configure the parameters that are shown in the following figure as prompted. If you use a workspace in standard mode, you must register EMR clusters in the development and production environments. For information about workspaces in different modes, see Differences between workspaces in basic mode and workspaces in standard mode.

      image.png

      Configuration descriptions of key parameters:

      • Set the Alibaba Cloud Primary Account UID parameter to the ID of the Alibaba Cloud account to which the EMR cluster belongs. In this example, set the parameter to the ID of Alibaba Cloud Account B.

      • Set the Opposite RAM Role parameter to the RAM role that can be assumed by Alibaba Cloud Account A to access the EMR resources of Alibaba Cloud Account B. In this example, set the parameter to EMRRole.

      • Set the Peer EMR Cluster parameter to the EMR cluster that you want to register in DataWorks. In this example, you can select only EMR Hadoop clusters of V3.38.3 or V3.38.2 for which the Metadata parameter is not set to DLF Unified Metadata.

      For more information about how to register an EMR cluster, see Register an EMR cluster in DataWorks.

    2. Initialize the resource group that you want to use.

      The first time you register an EMR cluster to DataWorks, or if the service configurations of your EMR cluster change or the version of a component in your EMR cluster is updated, you must initialize the resource group that you use. This ensures that the resource group can normally access the EMR cluster and EMR tasks can be normally run by using the current environment configurations of the resource group. For example, if you modify the core-site.xml configuration file of your EMR cluster, you must initialize the resource group. You can go to the EMR cluster page in SettingCenter, find the desired EMR cluster that is registered to DataWorks, and then click Initialize Resource Group in the section that displays the information of the EMR cluster to initialize the resource group that you want to use.

      Note

      DataWorks allows you to use only exclusive resource groups for scheduling to run EMR tasks. Therefore, you can select only an exclusive resource group for scheduling when you initialize a resource group.

What to do next

After you register the EMR cluster, you can perform the following operations: