All Products
Search
Document Center

E-MapReduce:Submit Spark jobs using DolphinScheduler

Last Updated:Mar 26, 2026

Apache DolphinScheduler is a distributed, extensible open source workflow orchestration platform with powerful Directed Acyclic Graph (DAG) visual interfaces. This guide walks you through connecting DolphinScheduler to E-MapReduce (EMR) Serverless Spark and submitting Java Archive (JAR), SQL, and PySpark jobs from the DolphinScheduler web UI.

Background

The AliyunServerlessSpark Task Plugin has been merged into the main branch of Apache DolphinScheduler and will ship in a future official release. Until then, install it using one of the methods described in the prerequisites below.

Prerequisites

Before you begin, ensure that you have:

  • Java Development Kit (JDK) 1.8 or later installed

  • AliyunServerlessSpark Task Plugin installed using one of the following methods:

Step 1: Create a data source

  1. Open the DolphinScheduler web UI and click Datasource in the top navigation bar.

  2. Click Create DataSource. In the Choose DataSource Type dialog box, select ALIYUN_SERVERLESS_SPARK.

  3. In the CreateDataSource dialog box, configure the following parameters:

    Parameter Description
    Datasource Name A name for the data source
    Access Key Id Your Alibaba Cloud AccessKey ID
    Access Key Secret Your Alibaba Cloud AccessKey secret
    Region Id The region where your EMR Serverless Spark workspace resides, for example, cn-beijing. For supported regions, see Supported regions.
  4. Click Test Connect. After the connectivity test passes, click Confirm.

Step 2: Create a project

  1. Click Project in the top navigation bar.

  2. Click Create Project.

  3. In the Create Project dialog box, set Project Name, User, and any other required fields. For details, see Project.

Step 3: Create a workflow

  1. Click the project name. In the left navigation pane, choose Workflow > Workflow Definition.

  2. Click Create Workflow. The workflow DAG edit page opens.

  3. In the left navigation pane, drag ALIYUN_SERVERLESS_SPARK onto the canvas.

  4. In the Current node settings dialog box, configure the node parameters based on your job type, then click Confirm. The following sections list the parameters for each job type. Parameters shared across all three job types are listed first; job-specific parameters follow in each section.

Shared parameters

These parameters apply to JAR, SQL, and PySpark jobs.

Parameter Description
Datasource types Select ALIYUN_SERVERLESS_SPARK
Datasource instances Select the data source created in Step 1
workspace id The ID of your EMR Serverless Spark workspace
resource queue id The ID of the resource queue in the EMR Serverless Spark workspace. Default: root_queue
is production Enable this toggle if the job runs in a production environment
engine release version The engine version. Default: esr-2.1-native (Spark 3.3.1, Scala 2.12, Native Runtime)

Submit a JAR job

Set code type to JAR, then configure the following parameters:

Parameter Description Example
code type Job type JAR
job name Name of the EMR Serverless Spark job ds-emr-spark-jar
entry point Path to the JAR file in OSS oss://<yourBucketName>/spark-resource/examples/jars/spark-examples_2.12-3.3.1.jar
entry point arguments Arguments passed to the job. Use # as the delimiter between arguments.
spark submit parameters Spark configuration flags passed to spark-submit See the example below

Example spark submit parameters for a JAR job:

--class org.apache.spark.examples.SparkPi --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1

Submit an SQL job

Set code type to SQL, then configure the following parameters:

Parameter Description Example
code type Job type SQL
job name Name of the EMR Serverless Spark job ds-emr-spark-sql
entry point A valid file path
entry point arguments The SQL script to run. Use # as the delimiter. See the examples below
spark submit parameters Spark configuration flags passed to spark-submit See the example below

entry point arguments examples:

  • Submit an inline SQL script: -e#show tables;show tables;

  • Submit an SQL script stored in OSS: -f#oss://<yourBucketName>/spark-resource/examples/sql/show_db.sql

Example spark submit parameters for an SQL job:

--class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1

Submit a PySpark job

Set code type to PYTHON, then configure the following parameters:

Parameter Description Example
code type Job type PYTHON
job name Name of the EMR Serverless Spark job ds-emr-spark-jar
entry point Path to the Python script in OSS oss://<yourBucketName>/spark-resource/examples/src/main/python/pi.py
entry point arguments Arguments passed to the script. Use # as the delimiter. 1
spark submit parameters Spark configuration flags passed to spark-submit See the example below

Example spark submit parameters for a PySpark job:

--conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1

What's next

For more information about DolphinScheduler workflows, task types, and scheduling options, see Apache DolphinScheduler documentation.