Tune Spark Job Parameters to Optimize LDPS Performance - Lindorm

Lindorm Distributed Processing System (LDPS) runs Spark jobs on Kubernetes-managed elastic resource pools. This page lists all configurable parameters for LDPS Spark jobs and explains how to pass them for each submission method.

Spark parameters

Restricted parameters

The following parameters are set by the system and cannot be customized.

Parameter	Description
`spark.master`	Endpoint of the cluster management system.
`spark.submit.deployMode`	Deployment mode of the Spark driver.

Resource parameters

LDPS runs on elastic resource pools billed on a pay-as-you-go basis. By default, there is no upper limit on the resources a job can request. To set a maximum, see Modify the configurations of LDPS.

Resource parameters apply to all JDBC, JAR, and Python jobs submitted to LDPS. They are divided into specification parameters and capacity parameters.

Specification parameters

Basic specification parameters

Parameter	Description	Default
`spark.driver.memory`	Heap memory of the driver. Unit: mebibytes.	`8192m`
`spark.driver.memoryOverhead`	Off-heap memory of the driver. Unit: mebibytes.	`8192m`
`spark.kubernetes.driver.disk.size`	Local disk size of the driver. Unit: GB.	`50`
`spark.executor.cores`	CPU cores per executor.	`4`
`spark.executor.memory`	Heap memory per executor. Unit: mebibytes.	`8192m`
`spark.executor.memoryOverhead`	Off-heap memory per executor. Unit: mebibytes.	`8192m`
`spark.kubernetes.executor.disk.size`	Local disk size per executor. Unit: GB.	`50`

Advanced specification parameters

Parameter	Description	Default
`spark.{driver/executor}.resourceTag`	Predefined resource specification set. When set, LDPS automatically applies the corresponding CPU, memory, and disk values. Valid values: `xlarge`, `2xlarge`, `4xlarge`, `8xlarge`, `16xlarge`.	None
`spark.kubernetes.{driver/executor}.ecsModelPreference`	Preferred compute node models, listed in priority order. LDPS tries each model in sequence; if all are unavailable, it selects an available model that matches the resource specification. Specify up to four models, separated by commas. Example: `hfg6,g6`.	None
`spark.kubernetes.{driver/executor}.annotation.k8s.aliyun.com/eci-use-specs`	GPU specification of the Elastic Container Instance (ECI). For supported GPU instance types, see Specify ECS instance types to create pods.	`ecs.gn7i-c8g1.2xlarge`
`spark.{driver/executor}.resource.gpu.vendor`	GPU vendor. Must match the GPU specification set in `eci-use-specs`.	`nvidia.com`
`spark.{driver/executor}.resource.gpu.amount`	Number of GPUs. Set to `1`.	`1`
`spark.{driver/executor}.resource.gpu.discoveryScript`	Path to the GPU discovery script used to identify and bind GPU resources at driver or executor startup. Set to `/opt/spark/examples/src/main/scripts/getGpusResources.sh`.	`/opt/spark/examples/src/main/scripts/getGpusResources.sh`
`spark.kubernetes.executor.annotation.k8s.aliyun.com/eci-use-specs`	Executor instance specification with expanded local disk capacity. See the supported values below.	None

The spark.kubernetes.executor.annotation.k8s.aliyun.com/eci-use-specs parameter supports the following executor specifications:

Value	CPU cores	Memory
`ecs.d1ne.2xlarge`	8	32 GB
`ecs.d1ne.4xlarge`	16	64 GB
`ecs.d1ne.6xlarge`	24	96 GB
`ecs.d1ne.8xlarge`	32	128 GB
`ecs.d1ne.14xlarge`	56	224 GB

Important

When using spark.kubernetes.executor.annotation.k8s.aliyun.com/eci-use-specs, also set the following two parameters:

spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-1.mount.path=/var
spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-1.options.medium=LocalRaid0 If the specified executor instance type is unavailable, contact Lindorm technical support (DingTalk ID: s0s3eg3).

resourceTag specification mapping

resourceTag value	`spark.{driver/executor}.cores`	`spark.{driver/executor}.memory`	`spark.{driver/executor}.memoryOverhead`	`spark.kubernetes.{driver/executor}.disk.size`
`xlarge`	4	8192m	8192m	50 GB
`2xlarge`	8	16384m	16384m	100 GB
`4xlarge`	16	32768m	32768m	200 GB
`8xlarge`	32	65536m	65536m	400 GB
`16xlarge`	64	131072m	131072m	400 GB

Capacity parameters

Parameter	Description	Default
`spark.executor.instances`	Number of executors allocated for the job.	`2`
`spark.dynamicAllocation.enabled`	Enables dynamic resource allocation. When enabled, LDPS automatically requests and releases executors based on real-time job workload. Valid values: `true`, `false`.	`true`
`spark.dynamicAllocation.minExecutors`	Minimum number of executors when dynamic resource allocation is enabled.	`0`
`spark.dynamicAllocation.maxExecutors`	Maximum number of executors when dynamic resource allocation is enabled. This value equals the number of concurrent tasks.	`Infinity`
`spark.dynamicAllocation.executorIdleTimeout`	How long an idle executor is kept before being released. Unit: seconds.	`600s`

Execution parameters

Parameter	Description	Default
`spark.speculation`	Enables speculative execution. When enabled, the driver re-submits tasks that are running significantly slower than other tasks in the same stage (long tails), to avoid job delays. Valid values: `true`, `false`.	`true`
`spark.task.maxFailures`	Maximum number of task failures allowed before the job fails.	`4`
`spark.dfsLog.executor.enabled`	Stores executor logs to LindormDFS. Set to `false` for large-scale jobs to reduce DFS load from log streams. Valid values: `true`, `false`.	`true`
`spark.jars`	Path to the JAR package required for the job. Accepts OSS or Hadoop Distributed File System (HDFS) paths. If you use JDBC, set this to an HDFS path only. If you use an OSS path, also configure the OSS parameters below.	None
`spark.hadoop.fs.oss.endpoint`	OSS endpoint. For endpoint values by region, see Regions and endpoints.	None
`spark.hadoop.fs.oss.accessKeyId`	AccessKey ID of your Alibaba Cloud account or RAM user. See Obtain an AccessKey pair.	None
`spark.hadoop.fs.oss.accessKeySecret`	AccessKey secret of your Alibaba Cloud account or RAM user. See Obtain an AccessKey pair.	None
`spark.hadoop.fs.oss.impl`	File system implementation class for OSS. Set to `org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem`.	None
`spark.default.parallelism`	Default parallelism for non-SQL tasks, including data source reads and shuffle stages.	None
`spark.sql.shuffle.partitions`	Number of shuffle partitions for SQL tasks.	`200`

Monitoring parameters

Parameter	Description	Default
`spark.monitor.cmd`	Monitoring commands to run at regular intervals. Separate multiple commands with semicolons (`;`). Results are written to job logs. Note This parameter cannot be configured when submitting jobs via Beeline or JDBC.	None
`spark.monitor.interval`	Interval between monitoring command executions. Unit: seconds.	`60`
`spark.monitor.timeout`	Timeout for each monitoring command. If a command exceeds this limit, it is skipped and the next command runs. Unit: seconds.	`2`

Example monitoring commands:

# Single command
"spark.monitor.cmd": "top -b -n 1"

# Multiple commands
"spark.monitor.cmd": "top -b -n 1; vmstat; free -m; iostat -d -x -c -k; df -h; sar -n DEV 1 1; netstat"

Common monitoring commands by category:

Category	Commands
System status	`top -b -n 1`, `vmstat`
Memory	`free -m`
Disk I/O	`iostat -d -x -c -k`
Disk usage	`df -h`
Network	`sar -n DEV 1 1`, `netstat`

Log parameters

Parameter	Description	Default
`spark.log.level`	Log output level for the job.	`INFO`

Valid values for spark.log.level, from most to least verbose:

Level	Description
`ALL`	All log output, including the most granular debug information.
`TRACE`	More detailed than DEBUG; records fine-grained execution steps.
`DEBUG`	Debug-level logs including detailed runtime status.
`INFO`	General informational logs for normal execution events.
`WARN`	Warnings about potential issues that do not stop execution.
`ERROR`	Errors that occur during execution.
`FATAL`	Critical errors that prevent the program from continuing.
`OFF`	Disables all log output.

Open-source Spark parameters

For parameters inherited from open-source Spark, see Spark configuration.

Configure parameters by submission method

When you submit a job to LDPS, the method you use determines how parameters are passed.

Beeline

Edit conf/beeline.conf in the Spark package directory where the Beeline command line tool is located.

# LDPS endpoint
# Format: jdbc:hive2://<host>:<port>/;?token=<token>
endpoint=jdbc:hive2://ld-bp13ez23egd123****-proxy-ldps-pub.lindorm.aliyuncs.com:10009/;?token=jfjwi2453-fe39-cmkfe-afc9-01eek2j5****

# Connection credentials (default: root/root)
user=root
password=root

# Set to false to use a dedicated Spark session for this connection
shareResource=false

# Spark parameters
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=3

For the full Beeline setup, see Getting started.

JDBC

Append Spark parameters to the JDBC connection string as key-value pairs after the token.

jdbc:hive2://<host>:<port>/;?token=<token>;spark.executor.memory=8g;spark.sql.shuffle.partitions=2

For the full JDBC URL format, see Use JDBC in application development.

spark.jars cannot be set to an OSS path when submitting jobs via JDBC. Use an HDFS path instead.

JAR jobs

Configure parameters in the job template when submitting a Java job:

Lindorm console: Configure in the job content template. See Manage jobs in the Lindorm console.
Data Management Service (DMS): Configure in the Job configuration section of the job node page. See Use DMS to manage jobs.

Python jobs

Configure parameters in the job template when submitting a Python job:

Lindorm console: Configure in the job content template. See Manage jobs in the Lindorm console.
Data Management Service (DMS): Configure in the Job configuration section of the job node page. See Use DMS to manage jobs.