All Products
Search
Document Center

AnalyticDB:Use Azkaban to schedule Spark jobs

Last Updated:Apr 24, 2025

Azkaban is a batch workflow job scheduler that can be used to create, execute, and manage workflows that contain complex dependencies. You can schedule AnalyticDB for MySQL Spark jobs on the Azkaban web interface.

Prerequisites

Schedule Spark SQL jobs

AnalyticDB for MySQL allows you to execute Spark SQL in batch or interactive mode. The schedule procedure varies based on the execution mode.

Batch mode

  1. Install the spark-submit command-line tool and specify the relevant parameters.

    Note

    You need to specify only the following parameters: keyId, secretId, regionId, clusterId, and rgName.

  2. Write a workflow file and compress the workflow folder in the ZIP format.

    nodes:
      - name: SparkPi
        type: command
        config:
          command: /<your path>/adb-spark-toolkit-submit/bin/spark-submit 
                    --class com.aliyun.adb.spark.sql.OfflineSqlTemplate 
                    local:///opt/spark/jars/offline-sql.jar 
                    "show databases" 
                    "select 100"
        dependsOn:
          - jobA
          - jobB
    
      - name: jobA
        type: command
        config:
          command: echo "This is an echoed text."
    
      - name: jobB
        type: command
        config:
          command: pwd
    Important
    • Replace the <your path> parameter with the actual installation path of the spark-submit tool.

    • Do not use backslashes (\) in the command.

  3. Create a project and upload the workflow file that is created in Step 2.

    1. Access the Azkaban web interface. In the top navigation bar, click Projects.

    2. In the upper-right corner of the page, click Create Project.

    3. In the Create Project dialog box, configure the Name and Description parameters and click Create Project.

    4. In the upper-right corner of the page, click Upload.

    5. In the Upload Project Flies dialog box, select the workflow file and click Upload.

  4. Run the workflow.

    1. On the Projects page, click the Flows tab.

    2. Click Execute Flow.

    3. Click Execute.

    4. In the Flow submitted message, click Continue.

  5. View the details about the workflow.

    1. In the top navigation bar, click Executing.

    2. Click the Recently Finished tab.

    3. Click the execution ID of the workflow. Click the Job List tab to view details about each job.

    4. Click Logs to view the job logs.

Interactive mode

  1. Obtain the connection URL of the Spark interactive resource group.

    1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Enterprise Edition, Basic Edition, or Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.

    2. In the left-side navigation pane, choose Cluster Management > Resource Management. On the page that appears, click the Resource Groups tab.

    3. Find the Spark interactive resource group that you created and click Details in the Actions column to view the internal or public connection URL of the resource group. You can click the image icon within the parentheses next to the corresponding port number to copy the connection URL.

      You must click Apply for Endpoint next to Public Endpoint to manually apply for a public endpoint in the following scenarios:

      • The client tool that is used to submit a Spark SQL job is deployed on premises.

      • The client tool that is used to submit a Spark SQL job is deployed on an Elastic Compute Service (ECS) instance that resides in a different virtual private cloud (VPC) from your AnalyticDB for MySQL cluster.

  2. Write a workflow file and compress the workflow folder in the ZIP format.

    nodes:
      - name: jobB
        type: command
        config:
          command: <path> -u "jdbc:hive2://amv-t4n83e67n7b****sparkwho.ads.aliyuncs.com:10000/adb_demo" -n spark_interactive_prod/spark_user -p "spark_password" -e "show databases;show tables;"
    
        dependsOn:
          - jobA
    
      - name: jobA
        type: command
        config:
          command: <path> -u "jdbc:hive2://amv-t4n83e67n7b****sparkwho.ads.aliyuncs.com:10000/adb_demo" -n spark_interactive_prod/spark_user -p "spark_password" -e "show tables;"

    The following table describes the parameters.

    Parameter

    Description

    path

    The path of the Beeline client. Example: /path/to/spark/bin/beeline.

    -u

    The endpoint obtained in Step 1. Replace default in the endpoint with the actual name of the database and delete the resource_group=<resource group name> suffix from the endpoint.

    Example: jdbc:hive2://amv-t4naxpqk****sparkwho.ads.aliyuncs.com:10000/adb_demo.

    -n

    The names of the database account and the resource group in the AnalyticDB for MySQL cluster. Format: resource_group_name/database_account_name.

    Example: spark_interactive_prod/spark_user.

    -p

    The password of the database account of the AnalyticDB for MySQL cluster.

    -e

    The SQL statement. Separate multiple SQL statements with semicolons (;).

  3. Create a project and upload the workflow file that is created in Step 2.

    1. Access the Azkaban web interface. In the top navigation bar, click Projects.

    2. In the upper-right corner of the page, click Create Project.

    3. In the Create Project dialog box, configure the Name and Description parameters and click Create Project.

    4. In the upper-right corner of the page, click Upload.

    5. In the Upload Project Flies dialog box, select the workflow file and click Upload.

  4. Run the workflow.

    1. On the Projects page, click the Flows tab.

    2. Click Execute Flow.

    3. Click Execute.

    4. In the Flow submitted message, click Continue.

  5. View the details about the workflow.

    1. In the top navigation bar, click Executing.

    2. Click the Recently Finished tab.

    3. Click the execution ID of the workflow. Click the Job List tab to view details about each job.

    4. Click Logs to view the job logs.

Schedule Spark JAR jobs

  1. Install the spark-submit command-line tool and specify the relevant parameters.

    Note

    You need to specify only the following parameters: keyId, secretId, regionId, clusterId, and rgName. If your Spark JAR package is stored on your on-premises device, you must specify Object Storage Service (OSS) parameters such as ossUploadPath.

  2. Write a workflow file and compress the workflow folder in the ZIP format.

    nodes:
      - name: SparkPi
        type: command
        config:
          command: /<your path>/adb-spark-toolkit-submit/bin/spark-submit 
                    --class org.apache.spark.examples.SparkPi 
                    --name SparkPi 
                    --conf spark.driver.resourceSpec=medium 
                    --conf spark.executor.instances=2 
                    --conf spark.executor.resourceSpec=medium 
                    local:///tmp/spark-examples.jar 1000
        dependsOn:
          - jobA
          - jobB
    
      - name: jobA
        type: command
        config:
          command: echo "This is an echoed text."
    
      - name: jobB
        type: command
        config:
          command: pwd
    Important
    • Replace the <your path> parameter with the actual installation path of the spark-submit tool.

    • Do not use backslashes (\) in the command.

  3. Create a project and upload the workflow file that is created in Step 2.

    1. Access the Azkaban web interface. In the top navigation bar, click Projects.

    2. In the upper-right corner of the page, click Create Project.

    3. In the Create Project dialog box, configure the Name and Description parameters and click Create Project.

    4. In the upper-right corner of the page, click Upload.

    5. In the Upload Project Flies dialog box, select the workflow file and click Upload.

  4. Run the workflow.

    1. On the Projects page, click the Flows tab.

    2. Click Execute Flow.

    3. Click Execute.

    4. In the Flow submitted message, click Continue.

  5. View the details about the workflow.

    1. In the top navigation bar, click Executing.

    2. Click the Recently Finished tab.

    3. Click the execution ID of the workflow. Click the Job List tab to view details about each job.

    4. Click Logs to view the job logs.