All Products
Search
Document Center

AnalyticDB:Spark application configuration parameters

Last Updated:Mar 28, 2026

AnalyticDB for MySQL Spark uses configuration parameters that extend or replace those of Apache Spark. This topic covers only the parameters that differ from standard Apache Spark.

Parameter format by development tool

The format for specifying parameters depends on which tool you use to submit Spark jobs.

Development toolFormatExample
SQL editorset key=value;set spark.sql.hive.metastore.version=adb;
Spark Jar editor"key": "value""spark.sql.hive.metastore.version":"adb"
Notebook editor"key": "value""spark.sql.hive.metastore.version":"adb"
spark-submit CLIkey=valuespark.sql.hive.metastore.version=adb

Specify driver and executor resources

ParameterRequiredDefaultDescriptionCorresponding Apache Spark parameter
spark.adb.acuPerAppNoNoneThe number of AnalyticDB compute units (ACUs) for a single Spark job. Valid values: [2, maximum computing resources of the job resource group]. When set, the system automatically calculates driver specifications, executor specifications, and the number of executor nodes. See the priority rules below.N/A
spark.driver.resourceSpecYesmediumThe resource specification for the Spark driver. Each type maps to specific CPU and memory allocations. See the resource specifications table below. Example: CONF spark.driver.resourceSpec = c.small; sets the driver to 1 core and 2 GB memory.spark.driver.cores and spark.driver.memory
spark.executor.resourceSpecYesmediumThe resource specification for each Spark executor. Each type maps to specific CPU and memory allocations. See the resource specifications table below. Example: CONF spark.executor.resourceSpec = c.small; sets each executor to 1 core and 2 GB memory.spark.executor.cores and spark.executor.memory
spark.executor.instancesNoMaximum computing resources of the job resource group / 5The number of Spark executors to start.spark.executor.instances

spark.adb.executor.cpu-vcores-ratio

No

None

The ratio of virtual cores to actual CPU cores for the Executor. The default value is 1. When the CPU utilization of a single task is low, you can use this configuration to improve CPU utilization. If the Executor is Medium specification (2 cores 8 GB) and this parameter is set to 2, the Executor process can perform concurrency control based on 4 cores, which means scheduling 4 concurrent tasks simultaneously, equivalent to spark.executor.cores=4.

N/A

spark.adb.driver.cpu-vcores-ratio

No

None

The ratio of virtual cores to actual CPU cores for the Driver. The default value is 1. If the Driver is Medium specification (2 cores 8 GB) and this parameter is set to 2, the Driver process can perform concurrency control based on 4 cores, which is equivalent to spark.driver.cores=4.

N/A

spark.adb.driverDiskSizeNoNoneAdditional disk storage mounted on the Spark driver, mounted at /user_data_dir. Unit: GiB. Valid values: (0, 100]. Example: spark.adb.driverDiskSize=50Gi.N/A
spark.adb.executorDiskSizeNoNoneAdditional disk storage mounted on each Spark executor, mounted at /shuffle_volume for shuffle operations. Unit: GiB. Valid values: (0, 100]. Example: spark.adb.executorDiskSize=50Gi.N/A

spark.adb.acuPerApp priority rules

When spark.adb.acuPerApp is combined with other resource parameters, the following rules apply:

  • If spark.adb.acuPerApp and all other resource parameters (spark.driver.resourceSpec, spark.executor.resourceSpec, spark.executor.instances) are all set, spark.adb.acuPerApp is invalid and the explicitly set values take effect.

  • If only spark.adb.acuPerApp is set, it is valid and all other resource parameters are auto-calculated.

  • In any other combination, spark.adb.acuPerApp is valid and auto-calculates only the resource parameters that are not explicitly set.

Spark resource specifications

Important

The following ACU calculations apply when using on-demand elastic resources in a job resource group:

  • 1:2 CPU-to-memory ratio: ACUs = CPU cores × 0.8

  • 1:4 CPU-to-memory ratio: ACUs = CPU cores × 1

  • 1:8 CPU-to-memory ratio: ACUs = CPU cores × 1.5 For pricing details, see Pricing for Data Lakehouse Edition.

TypeCPU coresMemory (GB)Disk storage (GB)Used ACUs
c.small12200.8
small14201
m.small18201.5
c.medium24201.6
medium28202
m.medium216203
c.large48203.2
large416204
m.large432206
c.xlarge816206.4
xlarge832208
m.xlarge8642012
c.2xlarge16322012.8
2xlarge16642016
m.2xlarge161282024
m.4xlarge322562048
m.8xlarge645122096
The system reserves approximately 1% of disk storage. The actual available disk space may be less than 20 GB.

Example

The following configuration allocates 32 executors with medium specification (2 cores, 8 GB each) and a driver with small specification (1 core, 4 GB), totaling 65 ACUs.

{
  "spark.driver.resourceSpec": "small",
  "spark.executor.resourceSpec": "medium",
  "spark.executor.instances": "32",
  "spark.adb.executorDiskSize": "100Gi"
}

Set job priority

ParameterRequiredDefaultDescription
spark.adb.priorityNoNORMALThe priority of a Spark job. When resources are insufficient, higher-priority jobs in the queue run first. Valid values: HIGH, NORMAL, LOW, LOWEST.
Important

For long-running streaming Spark jobs, set this parameter to HIGH.

Access metadata

ParameterRequiredDefaultDescription
spark.sql.catalogImplementationNohive (Spark SQL jobs); in-memory (non-Spark SQL jobs)The metadata source. hive: uses the built-in Hive Metastore of Apache Spark. in-memory: uses the temporary directory.
spark.sql.hive.metastore.versionNoadb (Spark SQL jobs); <hive_version> (non-Spark SQL jobs)The metastore version. adb: connects to AnalyticDB for MySQL metadata. <hive_version>: specifies a Hive Metastore version. For supported Hive versions and self-managed Hive Metastore configuration, see Spark Configuration.

Examples

Access AnalyticDB for MySQL metadata:

spark.sql.hive.metastore.version=adb;

Access the built-in Hive Metastore of Apache Spark:

spark.sql.catalogImplementation=hive;
spark.sql.hive.metastore.version=2.1.3;

Access metadata in the temporary directory:

spark.sql.catalogImplementation=in-memory;

Configure the Spark UI

All the following parameters are optional.

ParameterDefaultDescription
spark.app.log.rootPathoss://<aliyun-oa-adb-spark-{Account ID}-oss-{Zone ID}>/<Cluster ID>/<Spark app ID>The OSS directory for Spark job logs and Linux OS output. The folder named after the Spark application ID contains: the event log file (Spark app ID-000X) for Spark UI rendering, driver and numbered node log folders, and stdout/stderr folders for OS output.
spark.adb.event.logUploadDurationfalseSpecifies whether to record the duration of each event log upload.
spark.adb.buffer.maxNumEvents1000Maximum number of events cached by the driver.
spark.adb.payload.maxNumEvents10000Maximum number of events uploaded to Object Storage Service (OSS) per batch.
spark.adb.event.pollingIntervalSecs0.5Interval between event uploads to OSS, in seconds.
spark.adb.event.maxPollingIntervalSecs60Maximum retry interval after a failed upload to OSS, in seconds. The retry interval stays within the range of spark.adb.event.pollingIntervalSecs to spark.adb.event.maxPollingIntervalSecs.
spark.adb.event.maxWaitOnEndSecs10Maximum wait time for an upload to complete, in seconds. If the upload does not complete within this time, it is retried.
spark.adb.event.waitForPendingPayloadsSleepIntervalSecs1Wait time before retrying an upload that exceeded spark.adb.event.maxWaitOnEndSecs, in seconds.
spark.adb.eventLog.rolling.maxFileSize209715200Maximum size of each event log file in OSS, in bytes. Event logs are split into multiple files (for example, Eventlog.0, Eventlog.1).

Grant permissions to RAM users

ParameterRequiredDefaultDescription
spark.adb.roleArnNoN/AThe Alibaba Cloud Resource Name (ARN) of the Resource Access Management (RAM) role to attach to the RAM user, granting permission to submit Spark applications. Required only when submitting Spark applications as a RAM user. Not required when submitting with an Alibaba Cloud account or when permissions are already granted in the RAM console. For more information, see RAM role overview and Account authorization.

Enable built-in data source connectors

ParameterRequiredDefaultDescription
spark.adb.connectorsNoN/AThe built-in AnalyticDB for MySQL Spark connectors to enable. Separate multiple values with commas. Valid values: oss, hudi, delta, adb, odps, external_hive, jindo, default.
spark.hadoop.io.compression.codec.snappy.nativeNofalseSpecifies whether to treat Snappy files as standard Snappy format. false: uses the Hadoop Snappy library. true: uses the standard Snappy library for decompression.

Enable VPC and data source access

ParameterRequiredDefaultDescription
spark.adb.eni.enabledNofalseSpecifies whether to enable Elastic Network Interface (ENI). Set to true when using external tables to access external data sources.
spark.adb.eni.vswitchIdNoN/AThe vSwitch ID associated with the ENI. Required when connecting to AnalyticDB for MySQL from an Elastic Compute Service (ECS) instance over a virtual private cloud (VPC). Requires spark.adb.eni.enabled=true.
spark.adb.eni.securityGroupIdNoN/AThe security group ID associated with the ENI. Required when connecting to AnalyticDB for MySQL from an ECS instance over a VPC. Requires spark.adb.eni.enabled=true.
spark.adb.eni.extraHostsNoN/AIP-to-hostname mappings that allow Spark to resolve data source hostnames. Required for accessing a self-managed Hive data source. Format: ip0 master0,ip1 master1. Requires spark.adb.eni.enabled=true.
spark.adb.eni.adbHostAlias.enabledNofalseSpecifies whether to automatically write AnalyticDB for MySQL domain name resolution entries to the hostname-to-IP mapping table. Set to true when reading from or writing to E-MapReduce (EMR) Hive via ENI.

Configure application retries

ParameterRequiredDefaultDescription
spark.adb.maxAttemptsNo1Maximum number of attempts to run an application. The default value of 1 means no retries. For example, setting this to 3 allows the system to attempt the application up to three times within the sliding window.
spark.adb.attemptFailuresValidityIntervalNoInteger.MAXDuration of the sliding window for retry counting, in seconds. For example, setting this to 6000 causes the system to count failed attempts within the last 6,000 seconds. If the count is below spark.adb.maxAttempts, the system retries.

Specify a Python runtime environment

Use spark.pyspark.python with virtual environment packaging to submit PySpark jobs.

ParameterRequiredDefaultDescription
spark.pyspark.pythonNoN/APath to the Python interpreter on the local device.

Specify the Spark version

ParameterRequiredDefaultDescription
spark.adb.versionNo3.2The Spark version. Valid values: 2.4, 3.2, 3.3, 3.5, 4.0.

Enable the vectorized execution engine

ParameterRequiredDefaultDescription
spark.adb.native.enabledNofalseSpecifies whether to enable the high-performance vectorized execution engine built into AnalyticDB for MySQL Spark. The engine is fully compatible with open source Spark and requires no code changes.

Enable lake storage acceleration

ParameterRequiredDefaultDescription
spark.adb.lakecache.enabledNofalseSpecifies whether to enable LakeCache for lake storage acceleration.

Spark SQL read and write C-Store data

When you read and write C-Store tables only through Spark SQL, the following configuration parameters are supported:

Parameter

Required

Default value

Description

spark.adb.write.batchSize

No

600

The number of records to write in a single batch. Valid values: positive integers greater than 0.

spark.adb.write.arrow.maxMemoryBufferSize

No

1024

The maximum memory buffer size for writing. Valid values: positive integers greater than 0. Unit: MB.

spark.adb.write.arrow.maxRecordSizePerBatch

No

500

The maximum number of records to write in a single batch. Valid values: positive integers greater than 0.

spark.adb.createSnapshot

No

false

Specifies whether to create a snapshot after data is written using the INSERT OVERWRITE statement. Valid values:

  • true: Yes.

  • false (default): No.

spark.adb.readDataVersion

No

LATEST_BUILD

The version of data to read. Valid values:

  • CURRENT: the latest data.

  • LATEST_BUILD: the latest built full data.

  • LATEST_SNAPSHOT: the latest snapshot data.

Unsupported parameters

The following Apache Spark parameters are not supported by AnalyticDB for MySQL Spark and are ignored if specified. AnalyticDB for MySQL manages these settings automatically in its hosted environment. Where an alternative exists, it is noted.

--deploy-mode
--master
--packages             # Use --jars instead
--exclude-packages
--proxy-user
--repositories
--keytab
--principal
--queue
--total-executor-cores
--driver-library-path
--driver-class-path
--supervise
-S, --silent
-i <filename>