All Products
Search
Document Center

AnalyticDB:Use DolphinScheduler to schedule Spark jobs

Last Updated:Mar 28, 2026

DolphinScheduler is a distributed, extensible open source workflow orchestration platform with a visual Directed Acyclic Graph (DAG) editor. Use it to create, schedule, and monitor Spark jobs for AnalyticDB for MySQL clusters.

How it works

DolphinScheduler connects to AnalyticDB for MySQL Spark in two ways, depending on execution mode:

  • Batch mode and JAR jobs: DolphinScheduler uses a SHELL task to invoke the spark-submit command-line tool, which submits the job to an AnalyticDB for MySQL job resource group.

  • Interactive mode: DolphinScheduler uses a SQL task to connect to a Spark interactive resource group over JDBC (port 10000), and sends SQL statements directly.

Prerequisites

Before you begin, make sure you have:

Schedule Spark SQL jobs

AnalyticDB for MySQL supports Spark SQL in batch or interactive mode. The steps differ by mode.

Batch mode

In batch mode, DolphinScheduler runs a SHELL task that calls the spark-submit tool to submit Spark SQL to a job resource group.

Steps in this section:

  1. Install and configure spark-submit

  2. Create a project

  3. Create a workflow with a SHELL task

  4. Run the workflow

  5. View execution results

Step 1: Install and configure spark-submit

Install the spark-submit command-line tool and configure the required parameters.

For Spark SQL batch jobs, configure only these parameters: keyId, secretId, regionId, clusterId, and rgName.

Step 2: Create a project

  1. Open the DolphinScheduler web interface. In the top navigation bar, click Project.

  2. Click Create Project.

  3. In the Create Project dialog box, enter a Project Name and configure Owned Users.

Step 3: Create a workflow

  1. Click the project name. In the left-side navigation pane, choose Workflow > Workflow Definition.

  2. Click Create Workflow to open the workflow DAG edit page.

  3. In the left-side list, select SHELL and drag it onto the canvas.

  4. In the Current node settings dialog box, configure the following parameters.

    Important

    Always specify the full installation path of spark-submit in the script. If the path is omitted, the scheduling task cannot find the spark-submit command.

    For other SHELL task parameters, see DolphinScheduler Task Parameters Appendix.
    ParameterDescription
    Node nameA name for the workflow node.
    ScriptThe full installation path of spark-submit, followed by the job arguments. Example: /root/adb-spark-toolkit-submit/bin/spark-submit --class com.aliyun.adb.spark.sql.OfflineSqlTemplate local:///opt/spark/jars/offline-sql.jar "show databases" "select 100"
  5. Click Confirm.

  6. Click Save in the upper-right corner. In the Basic Information dialog box, enter a Workflow Name and click Confirm.

Step 4: Run the workflow

  1. Find the workflow in the list and click the image icon in the Operation column to publish it.

  2. Click the image icon in the Operation column.

  3. In the Please set the parameters before starting dialog box, configure the parameters.

  4. Click Confirm to start the workflow.

Step 5: View execution results

  1. In the left-side navigation pane, choose Task > Task Instance.

  2. Find the task and click the image icon in the Operation column to view the execution results and logs.

Interactive mode

In interactive mode, DolphinScheduler uses a SQL task that connects to a Spark interactive resource group over JDBC. This approach lets you send SQL statements without managing the spark-submit command.

Steps in this section:

  1. Get the connection URL of the Spark interactive resource group

  2. Create a data source in DolphinScheduler

  3. Create a project

  4. Create a workflow with a SQL task

  5. Run the workflow

  6. View execution results

Step 1: Get the connection URL

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters.

  2. On the Enterprise Edition, Basic Edition, or Data Lakehouse Edition tab, find your cluster and click the cluster ID.

  3. In the left-side navigation pane, choose Cluster Management > Resource Management. Click the Resource Groups tab.

  4. Find the Spark interactive resource group and click Details in the Actions column. Copy the internal or public connection URL. Click the image icon next to the port number to copy the URL.

Apply for a public endpoint if either of the following conditions is true:

  • The client tool is deployed on premises.

  • The client tool runs on an Elastic Compute Service (ECS) instance in a different virtual private cloud (VPC) from the cluster.

To apply, click Apply for Endpoint next to Public Endpoint.

Step 2: Create a data source

  1. In the DolphinScheduler top navigation bar, click Datasource.

  2. Click Create DataSource.

  3. In the Create DataSource dialog box, configure the following parameters.

    For other optional parameters, see MySQL.
    ParameterDescription
    DataSourceThe data source type. Select SPARK.
    Datasource nameA name for the data source.
    IPThe JDBC endpoint from Step 1, modified as follows: replace default with the actual database name, and remove the resource_group=<resource group name> suffix. Example: jdbc:hive2://amv-t4naxpqk****sparkwho.ads.aliyuncs.com:10000/adb_demo
    PortThe port number for Spark interactive resource groups. Enter 10000.
    User nameThe database account name for the AnalyticDB for MySQL cluster.
    Database nameThe name of the database in the cluster.
  4. Click Test Connect. After the test succeeds, click Confirm.

Step 3: Create a project

  1. In the top navigation bar, click Project.

  2. Click Create Project.

  3. In the Create Project dialog box, enter a Project Name and configure Owned Users.

Step 4: Create a workflow

  1. Click the project name. In the left-side navigation pane, choose Workflow > Workflow Definition.

  2. Click Create Workflow to open the workflow DAG edit page.

  3. In the left-side list, select SQL and drag it onto the canvas.

  4. In the Current node settings dialog box, configure the following parameters.

    ParameterDescription
    Datasource typesThe data source type. Select SPARK.
    Datasource instancesThe data source created in Step 2.
    SQL typeThe type of SQL job. Valid values: Query and Non Query.
    SQL statementThe SQL statement to run.
  5. Click Confirm.

  6. Click Save in the upper-right corner. In the Basic Information dialog box, enter a Workflow Name and click Confirm.

Step 5: Run the workflow

  1. Find the workflow and click the image icon in the Operation column to publish it.

  2. Click the image icon in the Operation column.

  3. In the Please set the parameters before starting dialog box, configure the parameters.

  4. Click Confirm to start the workflow.

Step 6: View execution results

  1. In the left-side navigation pane, choose Task > Task Instance.

  2. Find the task and click the image icon in the Operation column to view the execution results and logs.

Schedule Spark JAR jobs

Spark JAR jobs follow the same SHELL task pattern as Spark SQL batch mode, with spark-submit invoking your JAR file directly.

Steps in this section:

  1. Install and configure spark-submit

  2. Create a project

  3. Create a workflow with a SHELL task

  4. Run the workflow

  5. View execution results

Step 1: Install and configure spark-submit

Install the spark-submit command-line tool and configure the required parameters.

Configure at minimum: keyId, secretId, regionId, clusterId, and rgName. If the JAR package is stored on your local device rather than in Object Storage Service (OSS), also specify OSS parameters such as ossUploadPath.

Step 2: Create a project

  1. Open the DolphinScheduler web interface. In the top navigation bar, click Project.

  2. Click Create Project.

  3. In the Create Project dialog box, enter a Project Name and configure Owned Users.

Step 3: Create a workflow

  1. Click the project name. In the left-side navigation pane, choose Workflow > Workflow Definition.

  2. Click Create Workflow to open the workflow DAG edit page.

  3. In the left-side list, select SHELL and drag it onto the canvas.

  4. In the Current node settings dialog box, configure the following parameters.

    Important

    Always specify the full installation path of spark-submit in the script. If the path is omitted, the scheduling task cannot find the spark-submit command.

    For other SHELL task parameters, see DolphinScheduler Task Parameters Appendix.
    ParameterDescription
    Node nameA name for the workflow node.
    ScriptThe full installation path of spark-submit, followed by the JAR job arguments. Example: /root/adb-spark-toolkit-submit/bin/spark-submit --class org.apache.spark.examples.SparkPi --name SparkPi --conf spark.driver.resourceSpec=medium --conf spark.executor.instances=2 --conf spark.executor.resourceSpec=medium local:///tmp/spark-examples.jar 1000
  5. Click Confirm.

  6. Click Save in the upper-right corner. In the Basic Information dialog box, enter a Workflow Name and click Confirm.

Step 4: Run the workflow

  1. Find the workflow and click the image icon in the Operation column to publish it.

  2. Click the image icon in the Operation column.

  3. In the Please set the parameters before starting dialog box, configure the parameters.

  4. Click Confirm to start the workflow.

Step 5: View execution results

  1. In the left-side navigation pane, choose Task > Task Instance.

  2. Find the task and click the image icon in the Operation column to view the execution results and logs.