All Products
Search
Document Center

E-MapReduce:Job submission

Last Updated:Oct 17, 2025

Alibaba Cloud EMR clusters provide multiple job submission methods, covering scenarios from development and debugging (master node) to production management (Gateway node) and automated scheduling (DataWorks) to meet requirements of different roles. This topic introduces the advantages, disadvantages, and applicable scenarios of three methods.

Submission methods

Submission method

Advantages and disadvantages

Scenarios

Submit jobs through cluster Gateway nodes (recommended)

Advantages:

  • Network isolation: Access clusters through jump servers without exposing master nodes.

  • Elastic scaling: Dynamically adjust Gateway instance resources based on cluster load.

  • Convenient operation: No need for additional client environment configuration. You can directly use pre-installed command line interfaces (such as spark-submit) to submit jobs.

Disadvantages:

  • Increased cost: Additional cost for Gateway node ECS instances.

  • Enterprise production environment job submission.

  • Cross-VPC and hybrid cloud architecture.

Submit jobs through Alibaba Cloud DataWorks (recommended)

Advantages:

  • Automated O&M: Visualization of task orchestration and monitoring alerts.

  • Enterprise-level features: Support for task kinship analysis and cost optimization.

  • Good compatibility: Support for unified integration with other Alibaba Cloud products.

Disadvantages:

  • Learning curve: Requires familiarity with DataWorks development standards.

  • Increased cost: Additional fees for using DataWorks.

  • Periodic ETL task management.

  • DAG workflows that require complex dependency management.

Submit jobs through cluster master nodes

Advantages:

  • Convenient operation: No need for additional client environment configuration. You can directly use pre-installed command line interfaces (such as spark-submit) to submit jobs.

  • Lowest cost: No additional resource expenses.

Disadvantages:

  • Security risks: Master nodes typically have high permissions. Any incorrect operation may cause cluster crashes, such as accidentally deleting HDFS metadata.

  • Limited extensibility: Single point submission bottleneck without horizontal scaling capability.

  • Resource contention: Frequent submission of large jobs may consume computing resources (such as CPU and memory) of the master node, affecting the normal operation of cluster management services (such as ZooKeeper and HMaster).

  • Quick validation in development and test environments.

  • Quick debugging of temporary tasks.