All Products
Search
Document Center

E-MapReduce:Job submission

Last Updated:Jun 17, 2026

Alibaba Cloud EMR clusters support multiple job submission methods that cover development and debugging (master node), production management (Gateway node), and automated scheduling (DataWorks). Each method suits different roles and use cases. The following table compares the advantages, disadvantages, and applicable scenarios of the three methods.

Submission methods

Submission method

Advantages and disadvantages

Scenarios

Submit jobs through cluster Gateway nodes (recommended)

Advantages:

  • Network isolation: Access clusters through jump servers without exposing master nodes directly.

  • Elastic scaling: Dynamically adjust Gateway instance resources based on cluster load.

  • Convenient operation: Use pre-installed CLIs such as spark-submit to submit jobs without additional client environment configuration.

Disadvantages:

  • Increased cost: Requires additional ECS instances for Gateway nodes.

  • Enterprise production job submission.

  • Cross-VPC and hybrid cloud architectures.

Submit jobs through Alibaba Cloud DataWorks (recommended)

Advantages:

  • Automated O&M: Visual task orchestration and monitoring alerts.

  • Enterprise-level features: Task lineage analysis and cost optimization.

  • Good compatibility: Unified integration with other Alibaba Cloud services.

Disadvantages:

  • Learning curve: Requires familiarity with DataWorks development standards.

  • Increased cost: Additional fees for DataWorks.

  • Periodic ETL task management.

  • DAG workflows with complex dependency management.

Submit jobs through cluster master nodes

Advantages:

  • Convenient operation: Use pre-installed CLIs such as spark-submit to submit jobs without additional client environment configuration.

  • Lowest cost: No additional resource expenses.

Disadvantages:

  • Security risks: Master nodes typically have elevated permissions. Incorrect operations, such as accidentally deleting HDFS metadata, can cause cluster failures.

  • Limited extensibility: Single-point submission bottleneck with no horizontal scaling capability.

  • Resource contention: Submitting large jobs frequently may consume CPU and memory on the master node, which affects cluster management services such as ZooKeeper and HMaster.

  • Quick validation in development and test environments.

  • Quick debugging of temporary tasks.