A gateway has the same client tools installed and configured as the master node but runs no cluster services. Use it to submit jobs — such as spark-submit, hive -f, and yarn application — without adding load to the master or ResourceManager nodes.
E-MapReduce (EMR) provides EMR-CLI to automate gateway deployment on an Elastic Compute Service (ECS) instance. This topic covers how to deploy, manage, and upgrade a gateway environment for DataLake, Dataflow, OLAP, or Custom clusters.
Choose a gateway mode
EMR offers three gateway modes. Select the one that matches your cluster type and version.
| Mode | Supported clusters | How it works | When to use |
|---|---|---|---|
| Gateway node group (recommended) | DataLake and Dataflow: EMR-5.10.1 and later; Custom: EMR-5.17.1 and later | Add a node group directly to an existing cluster. Client configurations are synchronized automatically. See Manage node groups. | Best choice for DataLake and Dataflow clusters. Lowest O&M overhead and highest configuration consistency. |
| Gateway environment | DataLake, Dataflow, OLAP, and Custom clusters | Deploy manually on an ECS instance (this topic). Provides an independent file system and runtime environment. Client configurations must be synchronized manually. | Use when your cluster type or version does not support gateway node groups. |
| Gateway cluster | Hadoop and Kafka clusters only | Create a separate EMR cluster that contains only gateway nodes. Client configurations are synchronized automatically. See Create a gateway cluster. | Use for Hadoop and Kafka clusters. |
Prerequisites
Before you begin, ensure that you have:
-
A DataLake, Dataflow, OLAP, or Custom compute cluster in E-MapReduce that is in the Running state. To create one, see Create a cluster
Limitations
-
Supported cluster types: Only DataLake, Dataflow, OLAP, or Custom clusters. For Hadoop and Kafka clusters, see Create a gateway cluster.
You can only create a Hadoop or Kafka cluster if your Alibaba Cloud account was used to create such clusters at or before 17:00 on December 19, 2022 (UTC+8).
-
Overwrite mode: EMR-CLI deploys the gateway client in overwrite mode. Redeploying on an ECS instance that already has a gateway overwrites the existing client in the same directory.
-
Dedicated ECS instance: Use a separate ECS instance — not an existing Master, Core, or Task node from the EMR cluster. Sharing a node risks client environment interference with cluster services.
-
Supported services: You can deploy clients for: HDFS, YARN, HBase, HIVE, SPARK2, SPARK3, JINDOSDK, FLINK, SQOOP, IMPALA, PRESTO, HUDI, ICEBERG, TEZ, and DELTALAKE.
Deploy a gateway
Step 1: Create an ECS instance
Create an ECS instance in the ECS console. See Create an instance using the wizard.
The instance does not require public network access. Use the following settings:
| Parameter | Setting |
|---|---|
| Region and zone | Select the same region and zone as your EMR cluster |
| Image | Select an image that matches the operating system of the EMR instance |
| System disk | Select an enterprise SSD (ESSD) of at least 60 GiB |
| Network | Select the same Virtual Private Cloud (VPC) as your EMR cluster |
| Security group | Select the same security group as the master instance group of your EMR cluster — this ensures network connectivity between the ECS instance and the cluster |
Step 2: Create a RAM role for the gateway
-
Log on to the RAM console as a Resource Access Management (RAM) administrator.
-
In the left navigation pane, choose Identity Management > Roles.
-
On the Roles page, click Create Role.
-
In the Create Role panel, set Trusted Entity Type to Alibaba Cloud Service and Trusted Service to ECS, then click OK.
-
Enter a Role Name — for example,
ECSForEMRGatewayRole— then click OK.
Step 3: Grant permissions to the RAM role
-
On the Permission Management tab, click Grant Permission.
-
In the Grant Permission panel, select the following policies from System Policy, then click OK:

-
AliyunEMRFullAccess
-
AliyunOSSFullAccess
-
AliyunDLFFullAccess
-
-
Click Close.
Step 4: Attach the RAM role to the ECS instance
-
Log on to the ECS console.
Log on to the ECS console.
-
In the left navigation pane, choose Instances & Images > Instances.
-
In the top navigation bar, select the region of your EMR cluster.
-
Find the ECS instance and choose
> Instance Settings > Grant/Revoke RAM Role. -
In the dialog box, select ECSForEMRGatewayRole and click OK.
Step 5: Connect to the ECS instance
Connect to the ECS instance. See Connect to an ECS instance.
Step 6: Install EMR-CLI
Run the following command to install EMR-CLI:
regionId=`curl http://100.100.100.200/latest/meta-data/region-id`; curl https://ecm-repo-${regionId}.oss-${regionId}-internal.aliyuncs.com/emrcli/emrcli.sh -o /tmp/emrcli.sh; chmod 755 /tmp/emrcli.sh; sh /tmp/emrcli.sh install ${regionId}
A successful installation returns:
install emrcli success
Step 7: Deploy the gateway client
emrcli gateway deploy \
--clusterId <ClusterId> \
--appNames <ApplicationName>
Replace the following placeholders:
| Parameter | Required | Description |
|---|---|---|
--clusterId |
Yes | The ID of your EMR cluster |
--appNames |
No | A comma-separated list of applications to deploy, for example, HDFS,YARN. If omitted, all supported applications in the cluster are deployed by default. |
A successful deployment returns:
deployGateway success
After installation, the JAVA_HOME environment variable is set to /usr/lib/jvm/java-1.8.0. To change it, edit /etc/profile.d/emr_env.sh. Modifying JAVA_HOME may affect gateway functionality — proceed with caution.
Step 8: Re-log on to the ECS instance
Log on to the ECS instance again to apply the updated environment variables.
Step 9 (optional): Configure domain name resolution
This step is required if the gateway includes the Spark service.
-
Add a zone. See Add a built-in authoritative domain name.
-
Add a DNS record. See Add a DNS record. Use the following settings:
Parameter Setting Record Type Use the default value A Host Record Enter the hostname of the gateway machine — run the hostnamecommand to get it, for example,iZ2zea8r0aht2vzbqci****Record Value Enter the private IP address of the gateway machine TTL Value Use the default value
Manage the gateway environment
After the gateway is set up, use the following commands to keep client components and configurations in sync with changes made to the associated EMR cluster.
Update client components
The deploy command overwrites configurations for installed applications and incrementally installs new ones. To add a new service client — for example, Flink — to a gateway that already has HDFS and YARN:
# Add the FLINK client to an environment that already has HDFS and YARN
emrcli gateway deploy \
--clusterId <ClusterId> \
--appNames HDFS,YARN,FLINK
A successful update returns:
deployGateway success
Sync configurations from the EMR cluster
When service configurations change in the EMR cluster — for example, if you modify core-site.xml in the EMR console — sync the changes to the gateway:
Syncing configurations overwrites the existing configurations on the gateway. Proceed with caution.
emrcli gateway refreshConfigs \
--clusterId <ClusterId> \
--appNames <ApplicationName> # Optional. Omit to sync all applications.
A successful sync returns:
refreshConfiguration success
Manage EMR-CLI
Check the EMR-CLI version
emrcli version
Example output:
2.0.0
Upgrade EMR-CLI
To upgrade EMR-CLI to the latest version, repeat Step 6: Install EMR-CLI. The install command overwrites the existing version in place.
FAQ
How do I switch to a different compute cluster?
To switch compute clusters, use the -mv command to manually back up the following files from the old cluster to prevent data loss:
-
/opt/appsdirectory -
/etc/taihao-appsdirectory -
/etc/profile.d/yarn.shfile
Then repeat the steps in the Deploy a gateway section for the new cluster.
What's next
-
To add gateway nodes directly to an existing cluster without manual ECS setup, see Manage node groups.
-
To deploy a gateway for Hadoop or Kafka clusters, see Create a gateway cluster.