All Products
Search
Document Center

E-MapReduce:Use EMR-CLI to deploy a custom gateway environment

Last Updated:Mar 26, 2026

A gateway has the same client tools installed and configured as the master node but runs no cluster services. Use it to submit jobs — such as spark-submit, hive -f, and yarn application — without adding load to the master or ResourceManager nodes.

E-MapReduce (EMR) provides EMR-CLI to automate gateway deployment on an Elastic Compute Service (ECS) instance. This topic covers how to deploy, manage, and upgrade a gateway environment for DataLake, Dataflow, OLAP, or Custom clusters.

Choose a gateway mode

EMR offers three gateway modes. Select the one that matches your cluster type and version.

Mode Supported clusters How it works When to use
Gateway node group (recommended) DataLake and Dataflow: EMR-5.10.1 and later; Custom: EMR-5.17.1 and later Add a node group directly to an existing cluster. Client configurations are synchronized automatically. See Manage node groups. Best choice for DataLake and Dataflow clusters. Lowest O&M overhead and highest configuration consistency.
Gateway environment DataLake, Dataflow, OLAP, and Custom clusters Deploy manually on an ECS instance (this topic). Provides an independent file system and runtime environment. Client configurations must be synchronized manually. Use when your cluster type or version does not support gateway node groups.
Gateway cluster Hadoop and Kafka clusters only Create a separate EMR cluster that contains only gateway nodes. Client configurations are synchronized automatically. See Create a gateway cluster. Use for Hadoop and Kafka clusters.

Prerequisites

Before you begin, ensure that you have:

  • A DataLake, Dataflow, OLAP, or Custom compute cluster in E-MapReduce that is in the Running state. To create one, see Create a cluster

Limitations

  • Supported cluster types: Only DataLake, Dataflow, OLAP, or Custom clusters. For Hadoop and Kafka clusters, see Create a gateway cluster.

    You can only create a Hadoop or Kafka cluster if your Alibaba Cloud account was used to create such clusters at or before 17:00 on December 19, 2022 (UTC+8).
  • Overwrite mode: EMR-CLI deploys the gateway client in overwrite mode. Redeploying on an ECS instance that already has a gateway overwrites the existing client in the same directory.

  • Dedicated ECS instance: Use a separate ECS instance — not an existing Master, Core, or Task node from the EMR cluster. Sharing a node risks client environment interference with cluster services.

  • Supported services: You can deploy clients for: HDFS, YARN, HBase, HIVE, SPARK2, SPARK3, JINDOSDK, FLINK, SQOOP, IMPALA, PRESTO, HUDI, ICEBERG, TEZ, and DELTALAKE.

Deploy a gateway

Step 1: Create an ECS instance

Create an ECS instance in the ECS console. See Create an instance using the wizard.

The instance does not require public network access. Use the following settings:

Parameter Setting
Region and zone Select the same region and zone as your EMR cluster
Image Select an image that matches the operating system of the EMR instance
System disk Select an enterprise SSD (ESSD) of at least 60 GiB
Network Select the same Virtual Private Cloud (VPC) as your EMR cluster
Security group Select the same security group as the master instance group of your EMR cluster — this ensures network connectivity between the ECS instance and the cluster

Step 2: Create a RAM role for the gateway

  1. Log on to the RAM console as a Resource Access Management (RAM) administrator.

  2. In the left navigation pane, choose Identity Management > Roles.

  3. On the Roles page, click Create Role.

  4. In the Create Role panel, set Trusted Entity Type to Alibaba Cloud Service and Trusted Service to ECS, then click OK.

  5. Enter a Role Name — for example, ECSForEMRGatewayRole — then click OK.

Step 3: Grant permissions to the RAM role

  1. On the Permission Management tab, click Grant Permission.

  2. In the Grant Permission panel, select the following policies from System Policy, then click OK: image.png

    • AliyunEMRFullAccess

    • AliyunOSSFullAccess

    • AliyunDLFFullAccess

  3. Click Close.

Step 4: Attach the RAM role to the ECS instance

  1. Log on to the ECS console.

  2. Log on to the ECS console.

  3. In the left navigation pane, choose Instances & Images > Instances.

  4. In the top navigation bar, select the region of your EMR cluster.

  5. Find the ECS instance and choose image > Instance Settings > Grant/Revoke RAM Role.

  6. In the dialog box, select ECSForEMRGatewayRole and click OK.

Step 5: Connect to the ECS instance

Connect to the ECS instance. See Connect to an ECS instance.

Step 6: Install EMR-CLI

Run the following command to install EMR-CLI:

regionId=`curl http://100.100.100.200/latest/meta-data/region-id`; curl https://ecm-repo-${regionId}.oss-${regionId}-internal.aliyuncs.com/emrcli/emrcli.sh -o /tmp/emrcli.sh; chmod 755 /tmp/emrcli.sh; sh /tmp/emrcli.sh install ${regionId}

A successful installation returns:

install emrcli success

Step 7: Deploy the gateway client

emrcli gateway deploy \
  --clusterId <ClusterId> \
  --appNames <ApplicationName>

Replace the following placeholders:

Parameter Required Description
--clusterId Yes The ID of your EMR cluster
--appNames No A comma-separated list of applications to deploy, for example, HDFS,YARN. If omitted, all supported applications in the cluster are deployed by default.

A successful deployment returns:

deployGateway success
Important

After installation, the JAVA_HOME environment variable is set to /usr/lib/jvm/java-1.8.0. To change it, edit /etc/profile.d/emr_env.sh. Modifying JAVA_HOME may affect gateway functionality — proceed with caution.

Step 8: Re-log on to the ECS instance

Log on to the ECS instance again to apply the updated environment variables.

Step 9 (optional): Configure domain name resolution

Important

This step is required if the gateway includes the Spark service.

  1. Add a zone. See Add a built-in authoritative domain name.

  2. Add a DNS record. See Add a DNS record. Use the following settings:

    Parameter Setting
    Record Type Use the default value A
    Host Record Enter the hostname of the gateway machine — run the hostname command to get it, for example, iZ2zea8r0aht2vzbqci****
    Record Value Enter the private IP address of the gateway machine
    TTL Value Use the default value

Manage the gateway environment

After the gateway is set up, use the following commands to keep client components and configurations in sync with changes made to the associated EMR cluster.

Update client components

The deploy command overwrites configurations for installed applications and incrementally installs new ones. To add a new service client — for example, Flink — to a gateway that already has HDFS and YARN:

# Add the FLINK client to an environment that already has HDFS and YARN
emrcli gateway deploy \
  --clusterId <ClusterId> \
  --appNames HDFS,YARN,FLINK

A successful update returns:

deployGateway success

Sync configurations from the EMR cluster

When service configurations change in the EMR cluster — for example, if you modify core-site.xml in the EMR console — sync the changes to the gateway:

Important

Syncing configurations overwrites the existing configurations on the gateway. Proceed with caution.

emrcli gateway refreshConfigs \
  --clusterId <ClusterId> \
  --appNames <ApplicationName>   # Optional. Omit to sync all applications.

A successful sync returns:

refreshConfiguration success

Manage EMR-CLI

Check the EMR-CLI version

emrcli version

Example output:

2.0.0

Upgrade EMR-CLI

To upgrade EMR-CLI to the latest version, repeat Step 6: Install EMR-CLI. The install command overwrites the existing version in place.

FAQ

How do I switch to a different compute cluster?

To switch compute clusters, use the -mv command to manually back up the following files from the old cluster to prevent data loss:

  • /opt/apps directory

  • /etc/taihao-apps directory

  • /etc/profile.d/yarn.sh file

Then repeat the steps in the Deploy a gateway section for the new cluster.

What's next