A Gateway node provides a unified, isolated entry point for submitting jobs from multiple users or applications. It separates client workloads — such as spark-submit, hive -f, and yarn application — from master nodes, protecting the stability of YARN ResourceManager and Hadoop Distributed File System (HDFS) NameNode.
Choose a Gateway mode
EMR offers three Gateway modes. Select the one that matches your cluster type and version.
| Mode | Supported clusters | Deployment | When to use |
|---|---|---|---|
| Gateway node group (recommended) | DataLake and DataFlow clusters: EMR-5.10.1 and later; Custom clusters: EMR-5.17.1 and later | Add a node group to an existing cluster. Client configurations are automatically synchronized. See Manage node groups. | Best for existing DataLake or DataFlow clusters. Offers the lowest O&M cost and high configuration consistency. |
| Gateway environment | DataLake, DataFlow, Custom, and OLAP clusters | Manually deploy on an ECS instance with an independent file system. Client configurations must be manually synchronized. See Use the EMR command-line interface (CLI) to customize a Gateway environment deployment. | Use when your cluster does not support Gateway node groups. |
| Gateway cluster | Hadoop and Kafka clusters only | Create a separate EMR cluster that contains only Gateway nodes. Client configurations are automatically synchronized. See Create a Gateway cluster. | Use for Hadoop and Kafka clusters. |
Submit a job from a Gateway node
Prerequisites
Before you begin, make sure that you have:
-
A Gateway node deployed (see Choose a Gateway mode above)
-
SSH access to the Gateway node (port 22 open in your security group)
-
A Knox account for accessing the YARN web UI (required for job monitoring)
Submit a Spark job
-
Connect to the Gateway node using Secure Shell (SSH). For instructions, see Log on to a cluster.
-
Run the following command to submit a job. This example runs the SparkPi application using Spark 3.1.1 in
clientdeploy mode, where the driver runs directly on the Gateway node:spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode client \ --driver-memory 512m \ --num-executors 1 \ --executor-memory 1g \ --executor-cores 2 \ /opt/apps/SPARK3/spark-current/examples/jars/spark-examples_2.12-3.1.1.jar 10Notespark-examples_2.12-3.1.1.jaris the JAR file included in the cluster. To find it, log on to the cluster and look in/opt/apps/SPARK3/spark-current/examples/jars.
View job details
After submitting a job, monitor it on the YARN web UI.
-
Open port 8443. For instructions, see Manage security groups.
-
Add a Knox user. For instructions, see OpenLDAP user management. Obtain the Knox username and password — you will need them to log in to the YARN web UI.
-
On the EMR on ECS page, click Cluster Services in the row of the target cluster.
-
Click the Access Links and Ports tab.
-
Click the public link in the YARN UI row. Log in with your Knox credentials.
-
On the All Applications page, click the job ID to view its details.
