All Products
Search
Document Center

AnalyticDB:Introduction to Spark application development

Last Updated:Mar 30, 2026

AnalyticDB for MySQL uses the same development method for Spark batch applications and streaming applications. This topic describes the available development tools, configuration parameters, and language-specific parameters for Java, Scala, and Python applications.

Prerequisites

Before you begin, make sure that:

  • You have an AnalyticDB for MySQL cluster

  • You have an Object Storage Service (OSS) bucket in the same region as the cluster

All application files — JARs, Python files, dependencies, and compressed packages — must be stored in OSS.

Development tools

You can use one of the following tools to develop Spark batch applications and streaming applications:

Application configuration

Spark applications in AnalyticDB for MySQL are configured using JSON. The following example shows a Java application that reads data from OSS, including the common parameters (name, file, conf) and the Java-specific parameters (args, className).

{
  "args": ["args0", "args1"],
  "name": "spark-oss-test",
  "file": "oss://<testBucketName>/jars/test/spark-examples-0.0.1-SNAPSHOT.jar",
  "className": "com.aliyun.spark.oss.SparkReadOss",
  "conf": {
    "spark.driver.resourceSpec": "medium",
    "spark.executor.resourceSpec": "medium",
    "spark.executor.instances": 2,
    "spark.adb.connectors": "oss"
  }
}

For language-specific parameter details, see Java application parameters, Scala application parameters, or Python application parameters.

Common parameters

Parameter

Required

Description

Example

name

No

The name of the Spark application.

"name": "spark-oss-test"

file

Yes (Java, Scala, Python)

The absolute OSS path of the application's main file. For Java and Scala, this is the JAR file that contains the entry point. For Python, this is the executable entry point. The OSS bucket must be in the same region as the cluster.

"file": "oss://<testBucketName>/jars/test/spark-examples-0.0.1-SNAPSHOT.jar"

files

No

OSS paths of additional files to download to the driver and executor working directories. Supports aliases using # (for example, oss://<testBucketName>/test/test1.txt#test1 makes the file accessible as ./test1 or ./test1.txt). Separate multiple paths with commas. If you include log4j.properties, Spark uses it as the log configuration file.

"files": ["oss://<testBucketName>/path/to/file1", "oss://<testBucketName>/path/to/file2"]

archives

No

OSS paths of TAR.GZ compressed packages to decompress into the Spark process working directory. Supports aliases using # (for example, oss://testBucketName/test/test1.tar.gz#test1 makes test2.txt inside accessible as ./test1/test2.txt or ./test1.tar.gz/test2.txt). Separate multiple paths with commas. If a package fails to decompress, the job fails.

"archives": ["oss://<testBucketName>/path/to/archive1.tar.gz", "oss://<testBucketName>/path/to/archive2.tar.gz"]

conf

Yes

Spark configuration in key: value format, similar to Apache Spark. Separate multiple entries with commas. For parameters specific to AnalyticDB for MySQL, see Spark application configuration parameters.

"conf": {"spark.driver.resourceSpec": "medium", "spark.executor.resourceSpec": "medium", "spark.executor.instances": 2, "spark.adb.connectors": "oss"}

Java application parameters

Parameter

Required

Description

Example

args

No

Arguments passed to the JAR. Separate multiple arguments with commas.

"args": ["args0", "args1"]

className

Yes

The main class of the Java application.

"className": "com.aliyun.spark.oss.SparkReadOss"

jars

No

Absolute OSS paths of additional JAR files added to the driver and executor JVM (Java Virtual Machine) classpaths at runtime. The OSS bucket must be in the same region as the cluster. Separate multiple paths with commas.

"jars": ["oss://<testBucketName>/path/to/app.jar", "oss://testBucketName/path/to/lib.jar"]

Scala application parameters

Parameter

Required

Description

Example

className

Yes

The main class of the Scala application.

"className": "com.aliyun.spark.oss.SparkReadOss"

jars

No

Absolute OSS paths of additional JAR files added to the driver and executor JVM classpaths at runtime. The OSS bucket must be in the same region as the cluster. Separate multiple paths with commas.

"jars": ["oss://<testBucketName>/path/to/app.jar", "oss://testBucketName/path/to/lib.jar"]

Python application parameters

Parameter

Required

Description

Example

pyFiles

Yes

OSS paths of Python files for the PySpark application. Supported formats: ZIP, PY, or EGG. For multiple files, use ZIP or EGG format. Python files can be imported as modules in your code. Separate multiple paths with commas.

"pyFiles": ["oss://<testBucketName>/path/to/app.zip", "oss://<testBucketName>/path/to/lib.egg"]