All Products
Search
Document Center

E-MapReduce:Submit jobs through a cluster Gateway node

Last Updated:Jun 21, 2026

Use a Gateway node when you need a unified and isolated entry point for multiple users or applications to submit jobs without affecting the stability of core EMR cluster services. A Gateway node separates client workloads from the cluster's master nodes. This separation protects core components and allows you to configure independent environments for different users.

Gateway deployment options

A Gateway is an isolation layer for job submission provided by EMR. Its core benefits include:

  • Decouple client workloads from core cluster services

    Offloads client operations such as spark-submit, hive -f, and yarn application from the master nodes.

  • Enable isolation in a multi-tenant environment

    Supports separate runtime environments for users or departments.

  • Improve cluster stability and maintainability

    Prevents frequent job submissions, script debugging, environment conflicts, or resource contention from affecting critical services like YARN ResourceManager and HDFS NameNode.

EMR offers three Gateway options to suit different cluster types, versions, and architectural requirements.

Option

Supported cluster types and versions

Deployment and key features

Use cases and recommendations

Gateway node group
(Recommended)

Supports only the following clusters:

  • DataLake and DataFlow clusters: EMR-5.10.1 and later

  • Custom clusters: EMR-5.17.1 and later

• Add a new node group to an existing cluster. For more information, see Manage node groups.
• Automatically synchronizes client configurations from the cluster's main version.



Highly recommended: Use this option to quickly add a secure, isolated submission entry point to an existing DataLake or DataFlow cluster. It offers the lowest maintenance cost and ensures high configuration consistency.

Gateway environment

Supports DataLake, DataFlow, Custom, and OLAP clusters.

• Manually deploy on an ECS instance. For details, see Use the EMR CLI to customize a Gateway environment deployment.
• Provides a fully independent file system and runtime environment. You must manually synchronize client configurations from the cluster's main version.

A standard alternative when your cluster does not support a Gateway node group.

Gateway cluster

Supports only Hadoop and Kafka clusters.

  • Create a separate EMR cluster that contains only Gateway nodes. For more information, see Create a Gateway cluster.

  • Automatically synchronizes client configurations from the cluster's main version.

Suitable for Hadoop and Kafka clusters.

Procedure

  1. Connect to the Gateway instance using SSH. For more information, see Log on to a cluster.

  2. After connecting to the node using SSH, run the following command to submit a job. This example uses Spark 3.1.1.

    spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 512m --num-executors 1 --executor-memory 1g --executor-cores 2 /opt/apps/SPARK3/spark-current/examples/jars/spark-examples_2.12-3.1.1.jar 10
    Note

    The JAR file name spark-examples_2.12-3.1.1.jar may vary by cluster. You can log on to the cluster and check the file name in the /opt/apps/SPARK3/spark-current/examples/jars directory.

  3. After you submit the job, you can view its status in the YARN UI. Follow these steps:

    1. Open port 8443 in the security group. For more information, see Manage security groups.

    2. Add a user. For more information, see OpenLDAP user management.

      You need a Knox account username and password to access the YARN UI.

    3. On the EMR on ECS page, click Cluster Services in the row of your target cluster.

    4. Click the Access Links and Ports tab.

    5. Click the public link in the YARN UI row.

      Log in with your user credentials to access the YARN UI page.

    6. On the All Applications page, click the ID of the target job to view its details.