The AnalyticDB for MySQL spark-submit command-line tool lets you submit and manage Spark JAR applications from a client. This topic explains how to install the tool, configure it, submit a Spark JAR application, and manage running jobs.
Limitations
-
The spark-submit tool supports only Spark JAR applications. Spark SQL applications cannot be submitted.
-
Submission commands are not retried on failure because submission is not an idempotent operation. A command that appears to fail due to network timeout may have already succeeded. To verify whether a submission succeeded, use
--listor check the AnalyticDB for MySQL console.
Prerequisites
Before you begin, ensure that you have:
-
An AnalyticDB for MySQL Data Lakehouse Edition cluster
-
A job resource group. For details, see Create and manage a resource group
-
An Object Storage Service (OSS) bucket in the same region as the cluster
-
Java Development Kit (JDK) 1.8 or later installed on the client machine
Install spark-submit
-
Download the installation package.
wget https://dla003.oss-cn-hangzhou.aliyuncs.com/adb-spark-toolkit-submit-0.0.1.tar.gz -
Decompress the package.
tar zxvf adb-spark-toolkit-submit-0.0.1.tar.gz
Configure spark-submit
After decompressing the package, open adb-spark-toolkit-submit/conf/spark-defaults.conf. The spark-submit script reads this file automatically, and the settings apply to all Spark applications.
vim adb-spark-toolkit-submit/conf/spark-defaults.conf
The following table describes the configuration parameters.
| Parameter | Required | Description |
|---|---|---|
keyId |
Yes | The AccessKey ID of an Alibaba Cloud account or a Resource Access Management (RAM) user with permissions to access AnalyticDB for MySQL. For details, see Accounts and permissions. |
secretId |
Yes | The AccessKey secret of the Alibaba Cloud account or RAM user. For details, see Accounts and permissions. |
regionId |
Yes | The region ID of the AnalyticDB for MySQL cluster. |
clusterId |
Yes | The ID of the AnalyticDB for MySQL cluster. |
rgName |
Yes | The name of the job resource group used to run Spark applications. |
ossKeyId |
No | The AccessKey ID for OSS access. Required when JAR files stored on-premises need to be uploaded to OSS. The RAM user must have the AliyunOSSFullAccess permission. |
ossSecretId |
No | The AccessKey secret for OSS access. Required together with ossKeyId. |
ossUploadPath |
No | The OSS path for uploading on-premises JAR files. Required together with ossKeyId. |
conf |
No | Additional Spark configuration parameters in key:value format. Separate multiple parameters with commas. For available parameters, see Spark application configuration parameters. |
Compatibility
To maintain compatibility with the open-source Spark-Submit, you can also set the following parameters in the AnalyticDB for MySQL Spark Conf: keyId, secretId, regionId, clusterId, rgName, ossKeyId, and ossUploadPath. These parameters are not part of the open-source Spark configuration. When you configure these parameters using the Conf format, you must use the names shown in the following code.
--conf spark.adb.access.key.id=<value>
--conf spark.adb.access.secret.id=<value>
--conf spark.adb.regionId=<value>
--conf spark.adb.clusterId=<value>
--conf spark.adb.rgName=<value>
--conf spark.adb.oss.akId=<value>
--conf spark.adb.oss.akSec=<value>
--conf spark.adb.oss.endpoint=<value>
--conf spark.adb.oss.uploadPath=<value>
Submit a Spark application
-
Upload the JAR files for your Spark application to OSS. For details, see Simple upload.
-
Go to the spark-submit directory.
cd adb-spark-toolkit-submit -
Submit the Spark application. Replace the placeholder values with your actual OSS paths and resource names.
./bin/spark-submit \ --class com.aliyun.spark.oss.SparkReadOss \ --verbose \ --name Job1 \ --jars oss://<bucket-name>/jars/test.jar,oss://<bucket-name>/jars/search.jar \ --conf spark.driver.resourceSpec=medium \ --conf spark.executor.instances=1 \ --conf spark.executor.resourceSpec=medium \ oss://<bucket-name>/jars/test1.jar args0 args1Replace the placeholders as follows:
Placeholder Description <bucket-name>Name of the OSS bucket containing your JAR files args0 args1Arguments for the JAR files, separated by spaces
After you submit the application, one of the following return codes is returned:
| Return code | Meaning |
|---|---|
0 |
Application ran successfully |
255 |
Application failed |
143 |
Application was terminated |
The following table describes the spark-submit parameters used in the command above.
| Parameter | Description |
|---|---|
--class |
The main class of the Java or Scala application. Not required for Python applications. |
--verbose |
Prints logs generated during submission. |
--name |
The name of the Spark application. |
--jars |
Comma-separated absolute OSS paths of additional JAR files the application depends on. If you specify on-premises paths, the tool uploads them to the OSS path configured in ossUploadPath. While an on-premises JAR file is being uploaded, the system verifies the file using its MD5 checksum — if a file with the same name and MD5 value already exists in OSS, the upload is canceled. If you have manually updated a JAR file in OSS, delete its corresponding MD5 checksum file so the updated file is uploaded. The RAM user must have the AliyunOSSFullAccess permission. |
--conf |
Spark configuration parameters. Specify multiple parameters as separate flags: --conf key1=value1 --conf key2=value2. For parameters specific to AnalyticDB for MySQL, see Differences from open source spark-submit. |
| Main file path | The absolute OSS path of the main JAR file (for Java/Scala) or the entry point script (for Python). All main files must be stored in OSS. |
args |
Arguments for the JAR files, separated by spaces. |
Manage Spark applications
After submitting an application, follow this typical workflow:
-
Run
--listto get the application ID. -
Use the application ID with
--status,--get-log, or--detailto monitor progress. -
Run
--killto stop the application if needed.
List applications
Use the application ID returned here with all other management commands.
./bin/spark-submit --list \
--clusterId <cluster-ID> \
--rgName <resource-group-name> \
--pagenumber 1 \
--pagesize 3
Replace <cluster-ID> with the ID of your AnalyticDB for MySQL cluster, and <resource-group-name> with the name of the job resource group.
Check application status
./bin/spark-submit --status <application-ID>
View submission details and Spark UI URL
./bin/spark-submit --detail <application-ID>
The output includes a Spark WEB UI field with the URL of the Spark UI for that application.
View application logs
./bin/spark-submit --get-log <application-ID>
Terminate an application
./bin/spark-submit --kill <application-ID>
Differences from open source spark-submit
Parameters specific to AnalyticDB for MySQL spark-submit
The following parameters are available only in the AnalyticDB for MySQL spark-submit tool.
| Parameter | Default | Description |
|---|---|---|
--api-retry-times |
3 |
Maximum number of retries for failed commands. Submission commands are not retried — see Limitations for details. |
--time-out-seconds |
10 |
Timeout period in seconds before retrying a failed command. |
--enable-inner-endpoint |
— | Enables internal network access within a virtual private cloud (VPC). Use this when submitting applications from an Elastic Compute Service (ECS) instance. |
--list |
— | Lists submitted applications. Use with --pagenumber and --pagesize to paginate results. |
--pagenumber |
1 |
Page number for list results. |
--pagesize |
10 |
Number of applications returned per page. |
--kill |
— | Terminates the specified application. |
--get-log |
— | Retrieves logs for the specified application. |
--status |
— | Shows the status of the specified application. |
Parameters not supported by AnalyticDB for MySQL spark-submit
Some open source spark-submit parameters are not supported. For the full list, see the Spark application configuration parameters topic.