All Products
Search
Document Center

E-MapReduce:Use EMR-CLI to deploy a gateway

Last Updated:Apr 08, 2024

A gateway is used to submit jobs to E-MapReduce (EMR) clusters and isolate clusters in a secure manner. EMR provides the EMR-CLI tool that you can use to deploy a gateway on an Alibaba Cloud Elastic Compute Service (ECS) instance. You can perform operations in this topic to deploy a gateway for DataLake, Dataflow, or online analytical processing (OLAP) clusters.

Prerequisites

A DataLake cluster, a Dataflow cluster, or an OLAP cluster is created, and the cluster is in the Running state. For more information about how to create a cluster, see Create a cluster.

Limits

  • This topic is applicable only to scenarios in which you want to deploy gateways for DataLake clusters, Dataflow clusters, and OLAP clusters.

    For more information about how to deploy gateways for Hadoop and Kafka clusters, see Create a gateway cluster.

    Note

    If this is the first time you create an EMR cluster after 17:00 (UTC+8) on December 19, 2022, you cannot create a Hadoop or Kafka cluster.

  • We recommend that you do not deploy a gateway on an ECS instance that hosts an EMR cluster. Otherwise, the environment in which the EMR cluster runs is affected by the gateway.

  • A gateway is deployed in overwrite mode by using EMR-CLI. If you deploy a gateway on an ECS instance on which another gateway already exists, the new gateway is installed in the same directory as the original gateway, and the original gateway is overwritten.

  • You can perform operations described in this topic to deploy a gateway for the following services: HDFS, YARN, HBase, Hive, Spark 2, Spark 3, JindoSDK, Flink, Sqoop, Impala, Presto, Hudi, Iceberg, Tez, and Delta Lake.

Deploy a gateway

  1. Create an ECS instance in the ECS console. For more information, see Create an instance on the Custom Launch tab.

    Note

    The created ECS instance does not need to be accessible over the Internet.

    The following table describes the parameter settings when you create an ECS instance.

    Parameter

    Description

    Region and Zone

    You must create an ECS instance in the same region and zone as your EMR cluster.

    Image

    We recommend that you use Alibaba Cloud Linux 2.1903 LTS 64-bit.

    System Disk

    We recommend that you use an enhanced SSD (ESSD) that has a storage capacity of at least 60 GiB.

    Network Type

    You must select the virtual private cloud (VPC) in which the EMR cluster resides.

    Security Group

    We recommend that you select the security group to which the master node group of the EMR cluster belongs. This ensures that a network connection between the ECS instance and the EMR cluster is established.

  2. Create a dedicated ECS RAM role.

    1. Log on to the RAM console with an Alibaba Cloud account or a RAM user who has administrative rights.

    2. In the left-side navigation pane, choose Identities > Roles.

    3. On the Roles page, click Create Role.

    4. In the Select Role Type step of the Create Role panel, select Alibaba Cloud Service for Select Trusted Entity and click Next. In the Configure Role step, configure the RAM Role Name parameter, select Elastic Compute Service from the Select Trusted Service drop-down list, and then click OK. For example, you can set RAM Role Name to ECSForEMRGatewayRole.

  3. Attach policies to the RAM role.

    1. In the Finish step of the Create Role panel, click Add Permissions to RAM Role.

    2. On the page that appears, click Grant Permission.

    3. In the Grant Permission panel, select System Policy and then the AliyunEMRFullAccess, AliyunOSSFullAccess, and AliyunDLFFullAccess policies and click OK.

      image.png

    4. Click Complete.

  4. Assign the RAM role to the ECS instance.

    1. Log on to the ECS console.

    2. In the left-side navigation pane, choose Instances & Images > Instances.

    3. In the top navigation bar, select the region where the ECS instance resides.

    4. Find the ECS instance, click the image icon in the Actions column, and then click Attach/Detach RAM Role in the Instance Settings section.

    5. In the Attach/Detach RAM Role dialog box, select ECSForEMRGatewayRole from the RAM Role drop-down list and click Confirm.

  5. Connect to the ECS instance. For more information, see Connect to an instance.

  6. Run the following command to install EMR-CLI:

    regionId=`curl http://100.100.100.200/latest/meta-data/region-id`; curl https://ecm-repo-${regionId}.oss-${regionId}-internal.aliyuncs.com/emrcli/emrcli.sh -o /tmp/emrcli.sh; chmod 755 /tmp/emrcli.sh; sh /tmp/emrcli.sh install ${regionId}

    If EMR-CLI is successfully installed, the following result is returned:

    install emrcli success
  7. Run the following command to deploy the gateway:

    emrcli gateway deploy \
      --clusterId <ClusterId> \
      --appNames <ApplicationName>

    Configure the parameters based on your business requirements. The following table describes the parameters.

    Parameter

    Required

    Description

    clusterId

    Yes

    The ID of the EMR cluster.

    appNames

    No

    The name of a service. Separate multiple services with commas (,), such as HDFS,YARN.

    If no service is specified, the gateway is deployed for all supported services of the EMR cluster, such as Hive and HDFS.

    If the gateway is successfully deployed, the following result is returned:

    deployGateway success
    Important

    After the gateway is deployed, the value of the JAVA_HOME system environment variable is changed to /usr/lib/jvm/java-1.8.0. You can change the value of the variable in the /etc/profile.d/emr_env.sh file. However, the change may affect the features of the gateway. Proceed with caution.

  8. Log on to the ECS instance to make the system environment variable take effect.

  9. Optional Configure domain name resolution for the gateway.

    Note

    This step is required when the gateway contains the Spark service.

    1. Add a zone. For more information, see Add a built-in authoritative zone.

    2. Add a DNS record. For more information, see Add DNS records.

      The following table describes the parameters.

      Parameter

      Description

      Record Type

      The type of the DNS record. Use the default value A.

      Hostname

      The hostname of the gateway, such as iZ2zea8r0aht2vzbqci****.

      You can run the hostname command to obtain the hostname of the gateway.

      Record Value

      The internal IP address of the gateway.

      You can view the internal IP address of the gateway on the Nodes tab of the new EMR console.

      TTL Period

      The time-to-live period. Use the default value.

Manage the gateway

If new services are added to the EMR cluster that is associated with the gateway or service configurations of the EMR cluster are modified, you can add the services or synchronize the most recent configurations of services to the gateway.

Add a service to the gateway

If a new service is added to the EMR cluster, you can use EMR-CLI to add the service to the gateway.

The command that is used to add a service to the gateway is similar to the command that is used to deploy the gateway. You need to specify the name of the service that you want to add to the gateway in the appNames parameter. Existing services are not affected.

emrcli gateway deploy \
  --clusterId <ClusterId> \
  --appNames <ApplicationName>

If the service is successfully added to the gateway, the following result is returned:

deployGateway success

Synchronize the modified configurations of a service for the EMR cluster to the gateway

If you modify the configurations of a service for the EMR cluster, such as the configurations of the core-site.xml file, you can use EMR-CLI to synchronize the modified configurations to the gateway.

Important

When the modified configurations are synchronized, the original configurations of the service for the gateway are overwritten. Proceed with caution.

emrcli gateway refreshConfigs \
  --clusterId <ClusterId> \
  --appNames <ApplicationName>

If the modified configurations are successfully synchronized to the gateway, the following result is returned:

refreshConfiguration success

Manage EMR-CLI

View the version of EMR-CLI

You can run the following command to view the version of EMR-CLI:

emrcli version

If the command is successfully run, the result similar to the following information is returned:

2.0.0

Update EMR-CLI

You can perform Step 6 in Deploy a gateway to update EMR-CLI to the latest version.

FAQ

Q: How do I switch to another cluster and deploy a gateway for the cluster?

A: Perform the following steps to switch to another cluster and deploy a gateway for the cluster:

  1. Run the -mv command to back up files in the original cluster to prevent data loss. The files include files in the /opt/apps and /etc/taihao-apps directories, and the /etc/profile.d/yarn.sh file.

  2. Perform the operations in this topic again to deploy a gateway for the new cluster.