AnalyticDB for MySQL uses the same development method for Spark batch applications and streaming applications. This topic describes the available development tools, configuration parameters, and language-specific parameters for Java, Scala, and Python applications.
Prerequisites
Before you begin, make sure that:
You have an AnalyticDB for MySQL cluster
You have an Object Storage Service (OSS) bucket in the same region as the cluster
All application files — JARs, Python files, dependencies, and compressed packages — must be stored in OSS.
Development tools
You can use one of the following tools to develop Spark batch applications and streaming applications:
Application configuration
Spark applications in AnalyticDB for MySQL are configured using JSON. The following example shows a Java application that reads data from OSS, including the common parameters (name, file, conf) and the Java-specific parameters (args, className).
{
"args": ["args0", "args1"],
"name": "spark-oss-test",
"file": "oss://<testBucketName>/jars/test/spark-examples-0.0.1-SNAPSHOT.jar",
"className": "com.aliyun.spark.oss.SparkReadOss",
"conf": {
"spark.driver.resourceSpec": "medium",
"spark.executor.resourceSpec": "medium",
"spark.executor.instances": 2,
"spark.adb.connectors": "oss"
}
}For language-specific parameter details, see Java application parameters, Scala application parameters, or Python application parameters.
Common parameters
Parameter | Required | Description | Example |
| No | The name of the Spark application. |
|
| Yes (Java, Scala, Python) | The absolute OSS path of the application's main file. For Java and Scala, this is the JAR file that contains the entry point. For Python, this is the executable entry point. The OSS bucket must be in the same region as the cluster. |
|
| No | OSS paths of additional files to download to the driver and executor working directories. Supports aliases using |
|
| No | OSS paths of TAR.GZ compressed packages to decompress into the Spark process working directory. Supports aliases using |
|
| Yes | Spark configuration in |
|
Java application parameters
Parameter | Required | Description | Example |
| No | Arguments passed to the JAR. Separate multiple arguments with commas. |
|
| Yes | The main class of the Java application. |
|
| No | Absolute OSS paths of additional JAR files added to the driver and executor JVM (Java Virtual Machine) classpaths at runtime. The OSS bucket must be in the same region as the cluster. Separate multiple paths with commas. |
|
Scala application parameters
Parameter | Required | Description | Example |
| Yes | The main class of the Scala application. |
|
| No | Absolute OSS paths of additional JAR files added to the driver and executor JVM classpaths at runtime. The OSS bucket must be in the same region as the cluster. Separate multiple paths with commas. |
|
Python application parameters
Parameter | Required | Description | Example |
| Yes | OSS paths of Python files for the PySpark application. Supported formats: ZIP, PY, or EGG. For multiple files, use ZIP or EGG format. Python files can be imported as modules in your code. Separate multiple paths with commas. |
|