Use a Gateway node when you need a unified and isolated entry point for multiple users or applications to submit jobs without affecting the stability of core EMR cluster services. A Gateway node separates client workloads from the cluster's master nodes. This separation protects core components and allows you to configure independent environments for different users.
Gateway deployment options
A Gateway is an isolation layer for job submission provided by EMR. Its core benefits include:
-
Decouple client workloads from core cluster services
Offloads client operations such as
spark-submit,hive -f, andyarn applicationfrom the master nodes. -
Enable isolation in a multi-tenant environment
Supports separate runtime environments for users or departments.
-
Improve cluster stability and maintainability
Prevents frequent job submissions, script debugging, environment conflicts, or resource contention from affecting critical services like YARN ResourceManager and HDFS NameNode.
EMR offers three Gateway options to suit different cluster types, versions, and architectural requirements.
|
Option |
Supported cluster types and versions |
Deployment and key features |
Use cases and recommendations |
|
Gateway node group |
Supports only the following clusters:
|
• Add a new node group to an existing cluster. For more information, see Manage node groups. |
Highly recommended: Use this option to quickly add a secure, isolated submission entry point to an existing DataLake or DataFlow cluster. It offers the lowest maintenance cost and ensures high configuration consistency. |
|
Gateway environment |
Supports DataLake, DataFlow, Custom, and OLAP clusters. |
• Manually deploy on an ECS instance. For details, see Use the EMR CLI to customize a Gateway environment deployment. |
A standard alternative when your cluster does not support a Gateway node group. |
|
Gateway cluster |
Supports only Hadoop and Kafka clusters. |
|
Suitable for Hadoop and Kafka clusters. |
Procedure
-
Connect to the Gateway instance using SSH. For more information, see Log on to a cluster.
-
After connecting to the node using SSH, run the following command to submit a job. This example uses Spark 3.1.1.
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 512m --num-executors 1 --executor-memory 1g --executor-cores 2 /opt/apps/SPARK3/spark-current/examples/jars/spark-examples_2.12-3.1.1.jar 10NoteThe JAR file name
spark-examples_2.12-3.1.1.jarmay vary by cluster. You can log on to the cluster and check the file name in the/opt/apps/SPARK3/spark-current/examples/jarsdirectory. -
After you submit the job, you can view its status in the YARN UI. Follow these steps:
-
Open port 8443 in the security group. For more information, see Manage security groups.
-
Add a user. For more information, see OpenLDAP user management.
You need a Knox account username and password to access the YARN UI.
-
On the EMR on ECS page, click Cluster Services in the row of your target cluster.
-
Click the Access Links and Ports tab.
-
Click the public link in the YARN UI row.
Log in with your user credentials to access the YARN UI page.
-
On the All Applications page, click the ID of the target job to view its details.
-