All Products
Search
Document Center

AnalyticDB:Use DolphinScheduler to schedule Spark jobs

Last Updated:Apr 22, 2025

DolphinScheduler is a distributed and extensible open source workflow orchestration platform with powerful Directed Acyclic Graph (DAG) visual interfaces. DolphinScheduler can help you efficiently execute and manage workflows for large amounts of data. You can create, edit, and schedule Spark jobs of AnalyticDB for MySQL on the DolphinScheduler web interface.

Prerequisites

Schedule Spark SQL jobs

AnalyticDB for MySQL allows you to execute Spark SQL in batch or interactive mode. The schedule procedure varies based on the execution mode.

Batch mode

  1. Install the spark-submit command-line tool and specify the relevant parameters.

    Note

    You need to specify only the following parameters: keyId, secretId, regionId, clusterId, and rgName.

  2. Create a project.

    1. Access the DolphinScheduler web interface. In the top navigation bar, click Project.

    2. Click Create Project.

    3. In the Create Project dialog box, configure the parameters such as Project Name and Owned Users.

  3. Create a workflow.

    1. Click the name of the created project. In the left-side navigation pane, choose Workflow > Workflow Definition to go to the Workflow Definition page.

    2. Click Create Workflow to go to the workflow DAG edit page.

    3. In the left-side list of the page, select SHELL and drag it to the right-side canvas.

    4. In the Current node settings dialog box, configure the parameters that are described in the following table.

      Parameter

      Description

      Node Name

      The name of the workflow node.

      Script

      The installation path of the spark-submit tool and the business code of Spark jobs. Example: /root/adb-spark-toolkit-submit/bin/spark-submit --class com.aliyun.adb.spark.sql.OfflineSqlTemplate local:///opt/spark/jars/offline-sql.jar "show databases" "select 100".

      Important

      When you use the spark-submit tool to schedule Spark jobs, you must specify the installation path of the spark-submit tool in the script. Otherwise, the scheduling task may fail to find the spark-submit command.

      Note

      For information about other parameters, see DolphinScheduler Task Parameters Appendix.

    5. Click Confirm.

    6. In the upper-right corner of the page, click Save. In the Basic Information dialog box, configure the parameters such as Workflow Name. Click Confirm.

  4. Run the workflow.

    1. Find the created workflow and click the image icon in the Operation column to publish the workflow.

    2. Click the image icon in the Operation column.

    3. In the Please set the parameters before starting dialog box, configure the parameters.

    4. Click Confirm to run the workflow.

  5. View the details about the workflow.

    1. In the left-side navigation pane, choose Task > Task Instance.

    2. Find the tasks of the workflow and click the image icon in the Operation column to view the execution results and logs of the workflow.

Interactive mode

  1. Obtain the connection URL of the Spark interactive resource group.

    1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Enterprise Edition, Basic Edition, or Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.

    2. In the left-side navigation pane, choose Cluster Management > Resource Management. On the page that appears, click the Resource Groups tab.

    3. Find the Spark interactive resource group that you created and click Details in the Actions column to view the internal or public connection URL of the resource group. You can click the image icon within the parentheses next to the corresponding port number to copy the connection URL.

      You must click Apply for Endpoint next to Public Endpoint to manually apply for a public endpoint in the following scenarios:

      • The client tool that is used to submit a Spark SQL job is deployed on premises.

      • The client tool that is used to submit a Spark SQL job is deployed on an Elastic Compute Service (ECS) instance that resides in a different virtual private cloud (VPC) from your AnalyticDB for MySQL cluster.

  2. Create a data source.

    1. Access the DolphinScheduler web interface. In the top navigation bar, click Datasource.

    2. Click Create DataSource.

    3. In the Create DataSource dialog box, configure the parameters that are described in the following table.

      Parameter

      Description

      DataSource

      The type of the data source. Select SPARK.

      Datasource Name

      The name of the data source.

      IP

      The endpoint obtained in Step 1. Replace default in the endpoint with the actual name of the database and delete the resource_group=<resource group name> suffix from the endpoint.

      Example: jdbc:hive2://amv-t4naxpqk****sparkwho.ads.aliyuncs.com:10000/adb_demo.

      Port

      The port number for Spark interactive resource groups. Set the value to 10000.

      User Name

      The name of the database account of the AnalyticDB for MySQL cluster.

      Database Name

      The name of the database in the AnalyticDB for MySQL cluster.

      Note

      For information about other optional parameters, see MySQL.

    4. Click Test Connect. After the test is successful, click Confirm.

  3. Create a project.

    1. Access the DolphinScheduler web interface. In the top navigation bar, click Project.

    2. Click Create Project.

    3. In the Create Project dialog box, configure the parameters such as Project Name and Owned Users.

  4. Create a workflow.

    1. Click the name of the created project. In the left-side navigation pane, choose Workflow > Workflow Definition to go to the Workflow Definition page.

    2. Click Create Workflow to go to the workflow DAG edit page.

    3. In the left-side list of the page, select SQL and drag it to the right-side canvas.

    4. In the Current node settings dialog box, configure the parameters that are described in the following table.

      Parameter

      Description

      Datasource types

      The type of the data source. Select SPARK.

      Datasource instances

      The data source created in Step 1.

      SQL Type

      The type of the SQL job. Valid values: Query and Non Query.

      SQL Statement

      The SQL statement.

    5. Click Confirm.

    6. In the upper-right corner of the page, click Save. In the Basic Information dialog box, configure the parameters such as Workflow Name. Click Confirm.

  5. Run the workflow.

    1. Find the created workflow and click the image icon in the Operation column to publish the workflow.

    2. Click the image icon in the Operation column.

    3. In the Please set the parameters before starting dialog box, configure the parameters.

    4. Click Confirm to run the workflow.

  6. View the details about the workflow.

    1. In the left-side navigation pane, choose Task > Task Instance.

    2. Find the tasks of the workflow and click the image icon in the Operation column to view the execution results and logs of the workflow.

Schedule Spark JAR jobs

  1. Install the spark-submit command-line tool and specify the relevant parameters.

    Note

    You need to specify only the following parameters: keyId, secretId, regionId, clusterId, and rgName. If your Spark JAR package is stored on your on-premises device, you must specify Object Storage Service (OSS) parameters such as ossUploadPath.

  2. Create a project.

    1. Access the DolphinScheduler web interface. In the top navigation bar, click Project.

    2. Click Create Project.

    3. In the Create Project dialog box, configure the parameters such as Project Name and Owned Users.

  3. Create a workflow.

    1. Click the name of the created project. In the left-side navigation pane, choose Workflow > Workflow Definition to go to the Workflow Definition page.

    2. Click Create Workflow to go to the workflow DAG edit page.

    3. In the left-side list of the page, select SHELL and drag it to the right-side canvas.

    4. In the Current node settings dialog box, configure the parameters that are described in the following table.

      Parameter

      Description

      Node Name

      The name of the workflow node.

      Script

      The installation path of the spark-submit tool and the business code of Spark jobs. Example:

      /root/adb-spark-toolkit-submit/bin/spark-submit --class org.apache.spark.examples.SparkPi --name SparkPi --conf spark.driver.resourceSpec=medium --conf spark.executor.instances=2 --conf spark.executor.resourceSpec=medium local:///tmp/spark-examples.jar 1000.

      Important

      When you schedule Spark jobs, you must specify the installation path of the spark-submit tool in the script. Otherwise, the scheduling task may fail to find the spark-submit command.

      Note

      For information about other parameters, see DolphinScheduler Task Parameters Appendix.

    5. Click Confirm.

    6. In the upper-right corner of the page, click Save. In the Basic Information dialog box, configure the parameters such as Workflow Name. Click Confirm.

  4. Run the workflow.

    1. Find the created workflow and click the image icon in the Operation column to publish the workflow.

    2. Click the image icon in the Operation column.

    3. In the Please set the parameters before starting dialog box, configure the parameters.

    4. Click Confirm to run the workflow.

  5. View the details about the workflow.

    1. In the left-side navigation pane, choose Task > Task Instance.

    2. Find the tasks of the workflow and click the image icon in the Operation column to view the execution results and logs of the workflow.