A Gateway node provides a unified, isolated entry point for submitting jobs for multiple users or applications. This method helps ensure the stability of core E-MapReduce (EMR) services. A Gateway node separates client workloads, such as job submissions, from the cluster's master nodes. This separation protects the stability of core components and lets you configure independent environments for different users.
Three Gateway deployment modes and selection guide
A Gateway is an EMR job submission isolation layer that provides the following core benefits:
Decoupling client workloads from core cluster services
It separates client operations, such as
spark-submit,hive -f, andyarn application, from the master or Resource Manager nodes.Implementing multi-tenant environment isolation
It lets you configure independent runtime environments for different users or departments.
Improving cluster stability and maintainability
It prevents issues such as high-frequency submissions, script debugging, environment conflicts, or resource contention from affecting key services such as YARN ResourceManager and Hadoop Distributed File System (HDFS) NameNode.
EMR offers three Gateway modes. Each mode is suitable for different cluster types, versions, and architectural requirements.
Type | Supported cluster types and version requirements | Deployment method and key features | Scenarios and recommendations |
Gateway node group | Only the following clusters are supported:
| • Add a node group directly to an existing cluster. For more information, see Manage node groups. | Recommended: Best for quickly adding a secure, isolated submission entry point to existing DataLake or DataFlow clusters. This option offers the lowest O&M costs and ensures high configuration consistency. |
Gateway environment | Supports DataLake, DataFlow, Custom, and OLAP clusters | • Manually deploy on an ECS instance. For more information, see Use the EMR command-line interface (CLI) to customize a Gateway environment deployment. | A standard alternative when a cluster does not support Gateway node groups. |
Gateway cluster | Supports only Hadoop and Kafka clusters |
| Suitable for Hadoop and Kafka clusters. |
Procedure
Connect to the Gateway instance using Secure Shell (SSH). For more information, see Log on to a cluster.
After connecting to the node using SSH, run the following command on the command line to submit and run a job. In this example, Spark 3.1.1 is used:
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 512m --num-executors 1 --executor-memory 1g --executor-cores 2 /opt/apps/SPARK3/spark-current/examples/jars/spark-examples_2.12-3.1.1.jar 10Notespark-examples_2.12-3.1.1.jaris the name of the JAR package in the cluster. You can log on to the cluster and find the package in the/opt/apps/SPARK3/spark-current/examples/jarspath.View the job details. After submitting a job, you can view its details on the YARN web UI. The following steps provide a brief description:
Enable port 8443. For more information, see Manage security groups.
Add a user. For more information, see OpenLDAP user management.
To access the YARN web UI using your Knox account, you must obtain the username and password of the Knox account.
On the EMR on ECS page, click Cluster Services in the row of the target cluster.
Click the Access Links and Ports tab.
Click the public link in the YARN UI row.
Use the added user for logon authentication and access the YARN web UI.
On the All Applications page, click the ID of the target job to view its details.
