Azkaban is a batch workflow job scheduler that can be used to create, execute, and manage workflows that contain complex dependencies. You can schedule AnalyticDB for MySQL Spark jobs on the Azkaban web interface.
Prerequisites
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster is created.
A job resource group or a Spark interactive resource group is created for the AnalyticDB for MySQL cluster.
Beeline is installed.
The IP address of the server that runs Azkaban is added to an IP address whitelist of the AnalyticDB for MySQL cluster.
Schedule Spark SQL jobs
AnalyticDB for MySQL allows you to execute Spark SQL in batch or interactive mode. The schedule procedure varies based on the execution mode.
Batch mode
Install the spark-submit command-line tool and specify the relevant parameters.
NoteYou need to specify only the following parameters:
keyId
,secretId
,regionId
,clusterId
, andrgName
.Write a workflow file and compress the workflow folder in the ZIP format.
nodes: - name: SparkPi type: command config: command: /<your path>/adb-spark-toolkit-submit/bin/spark-submit --class com.aliyun.adb.spark.sql.OfflineSqlTemplate local:///opt/spark/jars/offline-sql.jar "show databases" "select 100" dependsOn: - jobA - jobB - name: jobA type: command config: command: echo "This is an echoed text." - name: jobB type: command config: command: pwd
ImportantReplace the
<your path>
parameter with the actual installation path of the spark-submit tool.Do not use backslashes (\) in the command.
Create a project and upload the workflow file that is created in Step 2.
Access the Azkaban web interface. In the top navigation bar, click Projects.
In the upper-right corner of the page, click Create Project.
In the Create Project dialog box, configure the Name and Description parameters and click Create Project.
In the upper-right corner of the page, click Upload.
In the Upload Project Flies dialog box, select the workflow file and click Upload.
Run the workflow.
On the Projects page, click the Flows tab.
Click Execute Flow.
Click Execute.
In the Flow submitted message, click Continue.
View the details about the workflow.
In the top navigation bar, click Executing.
Click the Recently Finished tab.
Click the execution ID of the workflow. Click the Job List tab to view details about each job.
Click Logs to view the job logs.
Interactive mode
Obtain the connection URL of the Spark interactive resource group.
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Enterprise Edition, Basic Edition, or Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose . On the page that appears, click the Resource Groups tab.
Find the Spark interactive resource group that you created and click Details in the Actions column to view the internal or public connection URL of the resource group. You can click the
icon within the parentheses next to the corresponding port number to copy the connection URL.
You must click Apply for Endpoint next to Public Endpoint to manually apply for a public endpoint in the following scenarios:
The client tool that is used to submit a Spark SQL job is deployed on premises.
The client tool that is used to submit a Spark SQL job is deployed on an Elastic Compute Service (ECS) instance that resides in a different virtual private cloud (VPC) from your AnalyticDB for MySQL cluster.
Write a workflow file and compress the workflow folder in the ZIP format.
nodes: - name: jobB type: command config: command: <path> -u "jdbc:hive2://amv-t4n83e67n7b****sparkwho.ads.aliyuncs.com:10000/adb_demo" -n spark_interactive_prod/spark_user -p "spark_password" -e "show databases;show tables;" dependsOn: - jobA - name: jobA type: command config: command: <path> -u "jdbc:hive2://amv-t4n83e67n7b****sparkwho.ads.aliyuncs.com:10000/adb_demo" -n spark_interactive_prod/spark_user -p "spark_password" -e "show tables;"
The following table describes the parameters.
Parameter
Description
path
The path of the Beeline client. Example:
/path/to/spark/bin/beeline
.-u
The endpoint obtained in Step 1. Replace
default
in the endpoint with the actual name of the database and delete theresource_group=<resource group name>
suffix from the endpoint.Example:
jdbc:hive2://amv-t4naxpqk****sparkwho.ads.aliyuncs.com:10000/adb_demo
.-n
The names of the database account and the resource group in the AnalyticDB for MySQL cluster. Format:
resource_group_name/database_account_name
.Example:
spark_interactive_prod/spark_user
.-p
The password of the database account of the AnalyticDB for MySQL cluster.
-e
The SQL statement. Separate multiple SQL statements with semicolons (;).
Create a project and upload the workflow file that is created in Step 2.
Access the Azkaban web interface. In the top navigation bar, click Projects.
In the upper-right corner of the page, click Create Project.
In the Create Project dialog box, configure the Name and Description parameters and click Create Project.
In the upper-right corner of the page, click Upload.
In the Upload Project Flies dialog box, select the workflow file and click Upload.
Run the workflow.
On the Projects page, click the Flows tab.
Click Execute Flow.
Click Execute.
In the Flow submitted message, click Continue.
View the details about the workflow.
In the top navigation bar, click Executing.
Click the Recently Finished tab.
Click the execution ID of the workflow. Click the Job List tab to view details about each job.
Click Logs to view the job logs.
Schedule Spark JAR jobs
Install the spark-submit command-line tool and specify the relevant parameters.
NoteYou need to specify only the following parameters:
keyId
,secretId
,regionId
,clusterId
, andrgName
. If your Spark JAR package is stored on your on-premises device, you must specify Object Storage Service (OSS) parameters such asossUploadPath
.Write a workflow file and compress the workflow folder in the ZIP format.
nodes: - name: SparkPi type: command config: command: /<your path>/adb-spark-toolkit-submit/bin/spark-submit --class org.apache.spark.examples.SparkPi --name SparkPi --conf spark.driver.resourceSpec=medium --conf spark.executor.instances=2 --conf spark.executor.resourceSpec=medium local:///tmp/spark-examples.jar 1000 dependsOn: - jobA - jobB - name: jobA type: command config: command: echo "This is an echoed text." - name: jobB type: command config: command: pwd
ImportantReplace the
<your path>
parameter with the actual installation path of the spark-submit tool.Do not use backslashes (\) in the command.
Create a project and upload the workflow file that is created in Step 2.
Access the Azkaban web interface. In the top navigation bar, click Projects.
In the upper-right corner of the page, click Create Project.
In the Create Project dialog box, configure the Name and Description parameters and click Create Project.
In the upper-right corner of the page, click Upload.
In the Upload Project Flies dialog box, select the workflow file and click Upload.
Run the workflow.
On the Projects page, click the Flows tab.
Click Execute Flow.
Click Execute.
In the Flow submitted message, click Continue.
View the details about the workflow.
In the top navigation bar, click Executing.
Click the Recently Finished tab.
Click the execution ID of the workflow. Click the Job List tab to view details about each job.
Click Logs to view the job logs.