Azkaban is an open-source batch workflow job scheduler for creating, executing, and managing workflows with complex dependencies. Use it to schedule AnalyticDB for MySQL Spark jobs from the Azkaban web interface.
Prerequisites
Before you begin, make sure you have:
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster
A job resource group or a Spark interactive resource group created for the cluster
Beeline installed
The IP address of the Azkaban server added to the cluster's IP address whitelist
Schedule Spark SQL jobs
AnalyticDB for MySQL supports Spark SQL in batch and interactive mode. The steps differ based on which mode you use.
Batch mode
In batch mode, submit Spark SQL through the spark-submit command-line tool from the adb-spark-toolkit-submit package.
Install the spark-submit tool and configure the required parameters.
Configure only the following parameters:
keyId,secretId,regionId,clusterId, andrgName.Write a workflow file. Azkaban uses the Flow 2.0 YAML format: each entry under
nodesis a job withname,type, andconfig.commandfields. UsedependsOnto define dependencies between jobs. Compress the workflow folder as a ZIP file.Important- Replace
<your path>with the actual installation path of the spark-submit tool. - Do not use backslashes (\) in the command.nodes: - name: SparkPi type: command config: command: /<your path>/adb-spark-toolkit-submit/bin/spark-submit --class com.aliyun.adb.spark.sql.OfflineSqlTemplate local:///opt/spark/jars/offline-sql.jar "show databases" "select 100" dependsOn: - jobA - jobB - name: jobA type: command config: command: echo "This is an echoed text." - name: jobB type: command config: command: pwdCreate a project and upload the workflow file.
Open the Azkaban web interface. In the top navigation bar, click Projects.
Click Create Project in the upper-right corner.
In the Create Project dialog, fill in the Name and Description fields, then click Create Project.
Click Upload in the upper-right corner.
In the Upload Project Files dialog, select the ZIP file and click Upload.
Run the workflow.
On the Projects page, click the Flows tab.
Click Execute Flow.
Click Execute.
In the Flow submitted message, click Continue.
View workflow details.
In the top navigation bar, click Executing.
Click the Recently Finished tab.
Click the execution ID of the workflow, then click the Job List tab to view details for each job.
Click Logs to view job logs.
Interactive mode
In interactive mode, submit Spark SQL through the Beeline client, which connects to a Spark interactive resource group via JDBC.
Get the connection URL of the Spark interactive resource group.
Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters.
On the Enterprise Edition, Basic Edition, or Data Lakehouse Edition tab, find the cluster and click the cluster ID.
In the left-side navigation pane, choose Cluster Management > Resource Management. Click the Resource Groups tab.
Find the Spark interactive resource group, then click Details in the Actions column to view the internal or public connection URL.
Apply for a public endpoint by clicking Apply for Endpoint next to Public Endpoint when: - The Beeline client runs on-premises. - The Beeline client runs on an Elastic Compute Service (ECS) instance in a different virtual private cloud (VPC) from your AnalyticDB for MySQL cluster.
Write a workflow file and compress the workflow folder as a ZIP file. Each node runs a Beeline command that connects to the resource group and executes SQL statements. Use
dependsOnto define dependencies between nodes.Parameter Description Example <path>Path to the Beeline client /path/to/spark/bin/beeline-uJDBC connection URL from step 1. Replace defaultin the URL with your database name and remove theresource_group=<resource group name>suffixjdbc:hive2://amv-t4naxpqk****sparkwho.ads.aliyuncs.com:10000/adb_demo-nDatabase account and resource group in the format resource_group_name/database_account_namespark_interactive_prod/spark_user-pPassword of the database account -eSQL statements to run. Separate multiple statements with semicolons ( ;)show databases;show tables;nodes: - name: jobB type: command config: command: <path> -u "jdbc:hive2://amv-t4n83e67n7b****sparkwho.ads.aliyuncs.com:10000/adb_demo" -n spark_interactive_prod/spark_user -p "spark_password" -e "show databases;show tables;" dependsOn: - jobA - name: jobA type: command config: command: <path> -u "jdbc:hive2://amv-t4n83e67n7b****sparkwho.ads.aliyuncs.com:10000/adb_demo" -n spark_interactive_prod/spark_user -p "spark_password" -e "show tables;"Replace the placeholders with your actual values:
Create a project and upload the workflow file.
Open the Azkaban web interface. In the top navigation bar, click Projects.
Click Create Project in the upper-right corner.
In the Create Project dialog, fill in the Name and Description fields, then click Create Project.
Click Upload in the upper-right corner.
In the Upload Project Files dialog, select the ZIP file and click Upload.
Run the workflow.
On the Projects page, click the Flows tab.
Click Execute Flow.
Click Execute.
In the Flow submitted message, click Continue.
View workflow details.
In the top navigation bar, click Executing.
Click the Recently Finished tab.
Click the execution ID of the workflow, then click the Job List tab to view details for each job.
Click Logs to view job logs.
Schedule Spark JAR jobs
Submit Spark JAR jobs using the spark-submit tool. The workflow structure is the same as batch mode Spark SQL, with the class and JAR path pointing to your application.
Install the spark-submit tool and configure the required parameters.
Configure only the following parameters:
keyId,secretId,regionId,clusterId, andrgName. If the Spark JAR package is stored on your local machine, also specify Object Storage Service (OSS) parameters such asossUploadPath.Write a workflow file and compress the workflow folder as a ZIP file.
Important- Replace
<your path>with the actual installation path of the spark-submit tool. - Do not use backslashes (\) in the command.Parameter Description --classThe fully qualified main class of your application --nameA display name for the Spark job --conf spark.driver.resourceSpecResource size for the Spark driver (e.g., medium)--conf spark.executor.instancesNumber of Spark executors to launch --conf spark.executor.resourceSpecResource size for each Spark executor (e.g., medium)nodes: - name: SparkPi type: command config: command: /<your path>/adb-spark-toolkit-submit/bin/spark-submit --class org.apache.spark.examples.SparkPi --name SparkPi --conf spark.driver.resourceSpec=medium --conf spark.executor.instances=2 --conf spark.executor.resourceSpec=medium local:///tmp/spark-examples.jar 1000 dependsOn: - jobA - jobB - name: jobA type: command config: command: echo "This is an echoed text." - name: jobB type: command config: command: pwdKey spark-submit parameters:
Create a project and upload the workflow file.
Open the Azkaban web interface. In the top navigation bar, click Projects.
Click Create Project in the upper-right corner.
In the Create Project dialog, fill in the Name and Description fields, then click Create Project.
Click Upload in the upper-right corner.
In the Upload Project Files dialog, select the ZIP file and click Upload.
Run the workflow.
On the Projects page, click the Flows tab.
Click Execute Flow.
Click Execute.
In the Flow submitted message, click Continue.
View workflow details.
In the top navigation bar, click Executing.
Click the Recently Finished tab.
Click the execution ID of the workflow, then click the Job List tab to view details for each job.
Click Logs to view job logs.