All Products
Search
Document Center

AnalyticDB:Use Azkaban to schedule Spark jobs

Last Updated:Mar 28, 2026

Azkaban is an open-source batch workflow job scheduler for creating, executing, and managing workflows with complex dependencies. Use it to schedule AnalyticDB for MySQL Spark jobs from the Azkaban web interface.

Prerequisites

Before you begin, make sure you have:

Schedule Spark SQL jobs

AnalyticDB for MySQL supports Spark SQL in batch and interactive mode. The steps differ based on which mode you use.

Batch mode

In batch mode, submit Spark SQL through the spark-submit command-line tool from the adb-spark-toolkit-submit package.

  1. Install the spark-submit tool and configure the required parameters.

    Configure only the following parameters: keyId, secretId, regionId, clusterId, and rgName.
  2. Write a workflow file. Azkaban uses the Flow 2.0 YAML format: each entry under nodes is a job with name, type, and config.command fields. Use dependsOn to define dependencies between jobs. Compress the workflow folder as a ZIP file.

    Important

    - Replace <your path> with the actual installation path of the spark-submit tool. - Do not use backslashes (\) in the command.

    nodes:
      - name: SparkPi
        type: command
        config:
          command: /<your path>/adb-spark-toolkit-submit/bin/spark-submit
                    --class com.aliyun.adb.spark.sql.OfflineSqlTemplate
                    local:///opt/spark/jars/offline-sql.jar
                    "show databases"
                    "select 100"
        dependsOn:
          - jobA
          - jobB
    
      - name: jobA
        type: command
        config:
          command: echo "This is an echoed text."
    
      - name: jobB
        type: command
        config:
          command: pwd
  3. Create a project and upload the workflow file.

    1. Open the Azkaban web interface. In the top navigation bar, click Projects.

    2. Click Create Project in the upper-right corner.

    3. In the Create Project dialog, fill in the Name and Description fields, then click Create Project.

    4. Click Upload in the upper-right corner.

    5. In the Upload Project Files dialog, select the ZIP file and click Upload.

  4. Run the workflow.

    1. On the Projects page, click the Flows tab.

    2. Click Execute Flow.

    3. Click Execute.

    4. In the Flow submitted message, click Continue.

  5. View workflow details.

    1. In the top navigation bar, click Executing.

    2. Click the Recently Finished tab.

    3. Click the execution ID of the workflow, then click the Job List tab to view details for each job.

    4. Click Logs to view job logs.

Interactive mode

In interactive mode, submit Spark SQL through the Beeline client, which connects to a Spark interactive resource group via JDBC.

  1. Get the connection URL of the Spark interactive resource group.

    1. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters.

    2. On the Enterprise Edition, Basic Edition, or Data Lakehouse Edition tab, find the cluster and click the cluster ID.

    3. In the left-side navigation pane, choose Cluster Management > Resource Management. Click the Resource Groups tab.

    4. Find the Spark interactive resource group, then click Details in the Actions column to view the internal or public connection URL.

    Apply for a public endpoint by clicking Apply for Endpoint next to Public Endpoint when: - The Beeline client runs on-premises. - The Beeline client runs on an Elastic Compute Service (ECS) instance in a different virtual private cloud (VPC) from your AnalyticDB for MySQL cluster.
  2. Write a workflow file and compress the workflow folder as a ZIP file. Each node runs a Beeline command that connects to the resource group and executes SQL statements. Use dependsOn to define dependencies between nodes.

    ParameterDescriptionExample
    <path>Path to the Beeline client/path/to/spark/bin/beeline
    -uJDBC connection URL from step 1. Replace default in the URL with your database name and remove the resource_group=<resource group name> suffixjdbc:hive2://amv-t4naxpqk****sparkwho.ads.aliyuncs.com:10000/adb_demo
    -nDatabase account and resource group in the format resource_group_name/database_account_namespark_interactive_prod/spark_user
    -pPassword of the database account
    -eSQL statements to run. Separate multiple statements with semicolons (;)show databases;show tables;
    nodes:
      - name: jobB
        type: command
        config:
          command: <path> -u "jdbc:hive2://amv-t4n83e67n7b****sparkwho.ads.aliyuncs.com:10000/adb_demo" -n spark_interactive_prod/spark_user -p "spark_password" -e "show databases;show tables;"
        dependsOn:
          - jobA
    
      - name: jobA
        type: command
        config:
          command: <path> -u "jdbc:hive2://amv-t4n83e67n7b****sparkwho.ads.aliyuncs.com:10000/adb_demo" -n spark_interactive_prod/spark_user -p "spark_password" -e "show tables;"

    Replace the placeholders with your actual values:

  3. Create a project and upload the workflow file.

    1. Open the Azkaban web interface. In the top navigation bar, click Projects.

    2. Click Create Project in the upper-right corner.

    3. In the Create Project dialog, fill in the Name and Description fields, then click Create Project.

    4. Click Upload in the upper-right corner.

    5. In the Upload Project Files dialog, select the ZIP file and click Upload.

  4. Run the workflow.

    1. On the Projects page, click the Flows tab.

    2. Click Execute Flow.

    3. Click Execute.

    4. In the Flow submitted message, click Continue.

  5. View workflow details.

    1. In the top navigation bar, click Executing.

    2. Click the Recently Finished tab.

    3. Click the execution ID of the workflow, then click the Job List tab to view details for each job.

    4. Click Logs to view job logs.

Schedule Spark JAR jobs

Submit Spark JAR jobs using the spark-submit tool. The workflow structure is the same as batch mode Spark SQL, with the class and JAR path pointing to your application.

  1. Install the spark-submit tool and configure the required parameters.

    Configure only the following parameters: keyId, secretId, regionId, clusterId, and rgName. If the Spark JAR package is stored on your local machine, also specify Object Storage Service (OSS) parameters such as ossUploadPath.
  2. Write a workflow file and compress the workflow folder as a ZIP file.

    Important

    - Replace <your path> with the actual installation path of the spark-submit tool. - Do not use backslashes (\) in the command.

    ParameterDescription
    --classThe fully qualified main class of your application
    --nameA display name for the Spark job
    --conf spark.driver.resourceSpecResource size for the Spark driver (e.g., medium)
    --conf spark.executor.instancesNumber of Spark executors to launch
    --conf spark.executor.resourceSpecResource size for each Spark executor (e.g., medium)
    nodes:
      - name: SparkPi
        type: command
        config:
          command: /<your path>/adb-spark-toolkit-submit/bin/spark-submit
                    --class org.apache.spark.examples.SparkPi
                    --name SparkPi
                    --conf spark.driver.resourceSpec=medium
                    --conf spark.executor.instances=2
                    --conf spark.executor.resourceSpec=medium
                    local:///tmp/spark-examples.jar 1000
        dependsOn:
          - jobA
          - jobB
    
      - name: jobA
        type: command
        config:
          command: echo "This is an echoed text."
    
      - name: jobB
        type: command
        config:
          command: pwd

    Key spark-submit parameters:

  3. Create a project and upload the workflow file.

    1. Open the Azkaban web interface. In the top navigation bar, click Projects.

    2. Click Create Project in the upper-right corner.

    3. In the Create Project dialog, fill in the Name and Description fields, then click Create Project.

    4. Click Upload in the upper-right corner.

    5. In the Upload Project Files dialog, select the ZIP file and click Upload.

  4. Run the workflow.

    1. On the Projects page, click the Flows tab.

    2. Click Execute Flow.

    3. Click Execute.

    4. In the Flow submitted message, click Continue.

  5. View workflow details.

    1. In the top navigation bar, click Executing.

    2. Click the Recently Finished tab.

    3. Click the execution ID of the workflow, then click the Job List tab to view details for each job.

    4. Click Logs to view job logs.