All Products
Search
Document Center

E-MapReduce:Submit jobs through a cluster Gateway node

Last Updated:Jan 05, 2026

A Gateway node provides a unified, isolated entry point for submitting jobs for multiple users or applications. This method helps ensure the stability of core E-MapReduce (EMR) services. A Gateway node separates client workloads, such as job submissions, from the cluster's master nodes. This separation protects the stability of core components and lets you configure independent environments for different users.

Three Gateway deployment modes and selection guide

A Gateway is an EMR job submission isolation layer that provides the following core benefits:

  • Decoupling client workloads from core cluster services

    It separates client operations, such as spark-submit, hive -f, and yarn application, from the master or Resource Manager nodes.

  • Implementing multi-tenant environment isolation

    It lets you configure independent runtime environments for different users or departments.

  • Improving cluster stability and maintainability

    It prevents issues such as high-frequency submissions, script debugging, environment conflicts, or resource contention from affecting key services such as YARN ResourceManager and Hadoop Distributed File System (HDFS) NameNode.

EMR offers three Gateway modes. Each mode is suitable for different cluster types, versions, and architectural requirements.

Type

Supported cluster types and version requirements

Deployment method and key features

Scenarios and recommendations

Gateway node group
(Recommended)

Only the following clusters are supported:

  • DataLake and DataFlow clusters: EMR-5.10.1 and later

  • Custom clusters: EMR-5.17.1 and later

• Add a node group directly to an existing cluster. For more information, see Manage node groups.
• Automatically synchronizes client configurations from the associated cluster.



Recommended: Best for quickly adding a secure, isolated submission entry point to existing DataLake or DataFlow clusters. This option offers the lowest O&M costs and ensures high configuration consistency.

Gateway environment

Supports DataLake, DataFlow, Custom, and OLAP clusters

• Manually deploy on an ECS instance. For more information, see Use the EMR command-line interface (CLI) to customize a Gateway environment deployment.
• Provides a completely independent file system and runtime environment. You must manually synchronize client configurations from the associated cluster.

A standard alternative when a cluster does not support Gateway node groups.

Gateway cluster

Supports only Hadoop and Kafka clusters

  • Create a separate EMR cluster that contains only Gateway nodes. For more information, see Create a Gateway cluster.

  • Automatically synchronizes client configurations from the associated cluster.

Suitable for Hadoop and Kafka clusters.

Procedure

  1. Connect to the Gateway instance using Secure Shell (SSH). For more information, see Log on to a cluster.

  2. After connecting to the node using SSH, run the following command on the command line to submit and run a job. In this example, Spark 3.1.1 is used:

    spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 512m --num-executors 1 --executor-memory 1g --executor-cores 2 /opt/apps/SPARK3/spark-current/examples/jars/spark-examples_2.12-3.1.1.jar 10
    Note

    spark-examples_2.12-3.1.1.jar is the name of the JAR package in the cluster. You can log on to the cluster and find the package in the /opt/apps/SPARK3/spark-current/examples/jars path.

  3. View the job details. After submitting a job, you can view its details on the YARN web UI. The following steps provide a brief description:

    1. Enable port 8443. For more information, see Manage security groups.

    2. Add a user. For more information, see OpenLDAP user management.

      To access the YARN web UI using your Knox account, you must obtain the username and password of the Knox account.

    3. On the EMR on ECS page, click Cluster Services in the row of the target cluster.

    4. Click the Access Links and Ports tab.

    5. Click the public link in the YARN UI row.

      Use the added user for logon authentication and access the YARN web UI.

    6. On the All Applications page, click the ID of the target job to view its details.

      Hadoop控制台