Use EMR Gateway Nodes to Isolate & Submit Jobs - E-MapReduce

A Gateway node provides a unified, isolated entry point for submitting jobs from multiple users or applications. It separates client workloads — such as spark-submit, hive -f, and yarn application — from master nodes, protecting the stability of YARN ResourceManager and Hadoop Distributed File System (HDFS) NameNode.

Choose a Gateway mode

EMR offers three Gateway modes. Select the one that matches your cluster type and version.

Mode	Supported clusters	Deployment	When to use
Gateway node group (recommended)	DataLake and DataFlow clusters: EMR-5.10.1 and later; Custom clusters: EMR-5.17.1 and later	Add a node group to an existing cluster. Client configurations are automatically synchronized. See Manage node groups.	Best for existing DataLake or DataFlow clusters. Offers the lowest O&M cost and high configuration consistency.
Gateway environment	DataLake, DataFlow, Custom, and OLAP clusters	Manually deploy on an ECS instance with an independent file system. Client configurations must be manually synchronized. See Use the EMR command-line interface (CLI) to customize a Gateway environment deployment.	Use when your cluster does not support Gateway node groups.
Gateway cluster	Hadoop and Kafka clusters only	Create a separate EMR cluster that contains only Gateway nodes. Client configurations are automatically synchronized. See Create a Gateway cluster.	Use for Hadoop and Kafka clusters.

Submit a job from a Gateway node

Prerequisites

Before you begin, make sure that you have:

A Gateway node deployed (see Choose a Gateway mode above)
SSH access to the Gateway node (port 22 open in your security group)
A Knox account for accessing the YARN web UI (required for job monitoring)

Submit a Spark job

Connect to the Gateway node using Secure Shell (SSH). For instructions, see Log on to a cluster.
Run the following command to submit a job. This example runs the SparkPi application using Spark 3.1.1 in client deploy mode, where the driver runs directly on the Gateway node:
```
spark-submit --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode client \
  --driver-memory 512m \
  --num-executors 1 \
  --executor-memory 1g \
  --executor-cores 2 \
  /opt/apps/SPARK3/spark-current/examples/jars/spark-examples_2.12-3.1.1.jar 10
```
Note
spark-examples_2.12-3.1.1.jar is the JAR file included in the cluster. To find it, log on to the cluster and look in /opt/apps/SPARK3/spark-current/examples/jars.

View job details

After submitting a job, monitor it on the YARN web UI.

Open port 8443. For instructions, see Manage security groups.
Add a Knox user. For instructions, see OpenLDAP user management. Obtain the Knox username and password — you will need them to log in to the YARN web UI.
On the EMR on ECS page, click Cluster Services in the row of the target cluster.
Click the Access Links and Ports tab.
Click the public link in the YARN UI row. Log in with your Knox credentials.
On the All Applications page, click the job ID to view its details.