Lindorm Distributed Processing System (LDPS) runs Spark jobs on Kubernetes-managed elastic resource pools. This page lists all configurable parameters for LDPS Spark jobs and explains how to pass them for each submission method.
Spark parameters
Restricted parameters
The following parameters are set by the system and cannot be customized.
| Parameter | Description |
|---|---|
spark.master | Endpoint of the cluster management system. |
spark.submit.deployMode | Deployment mode of the Spark driver. |
Resource parameters
LDPS runs on elastic resource pools billed on a pay-as-you-go basis. By default, there is no upper limit on the resources a job can request. To set a maximum, see Modify the configurations of LDPS.
Resource parameters apply to all JDBC, JAR, and Python jobs submitted to LDPS. They are divided into specification parameters and capacity parameters.
Specification parameters
Basic specification parameters
| Parameter | Description | Default |
|---|---|---|
spark.driver.memory | Heap memory of the driver. Unit: mebibytes. | 8192m |
spark.driver.memoryOverhead | Off-heap memory of the driver. Unit: mebibytes. | 8192m |
spark.kubernetes.driver.disk.size | Local disk size of the driver. Unit: GB. | 50 |
spark.executor.cores | CPU cores per executor. | 4 |
spark.executor.memory | Heap memory per executor. Unit: mebibytes. | 8192m |
spark.executor.memoryOverhead | Off-heap memory per executor. Unit: mebibytes. | 8192m |
spark.kubernetes.executor.disk.size | Local disk size per executor. Unit: GB. | 50 |
Advanced specification parameters
| Parameter | Description | Default |
|---|---|---|
spark.{driver/executor}.resourceTag | Predefined resource specification set. When set, LDPS automatically applies the corresponding CPU, memory, and disk values. Valid values: xlarge, 2xlarge, 4xlarge, 8xlarge, 16xlarge. | None |
spark.kubernetes.{driver/executor}.ecsModelPreference | Preferred compute node models, listed in priority order. LDPS tries each model in sequence; if all are unavailable, it selects an available model that matches the resource specification. Specify up to four models, separated by commas. Example: hfg6,g6. | None |
spark.kubernetes.{driver/executor}.annotation.k8s.aliyun.com/eci-use-specs | GPU specification of the Elastic Container Instance (ECI). For supported GPU instance types, see Specify ECS instance types to create pods. | ecs.gn7i-c8g1.2xlarge |
spark.{driver/executor}.resource.gpu.vendor | GPU vendor. Must match the GPU specification set in eci-use-specs. | nvidia.com |
spark.{driver/executor}.resource.gpu.amount | Number of GPUs. Set to 1. | 1 |
spark.{driver/executor}.resource.gpu.discoveryScript | Path to the GPU discovery script used to identify and bind GPU resources at driver or executor startup. Set to /opt/spark/examples/src/main/scripts/getGpusResources.sh. | /opt/spark/examples/src/main/scripts/getGpusResources.sh |
spark.kubernetes.executor.annotation.k8s.aliyun.com/eci-use-specs | Executor instance specification with expanded local disk capacity. See the supported values below. | None |
The spark.kubernetes.executor.annotation.k8s.aliyun.com/eci-use-specs parameter supports the following executor specifications:
| Value | CPU cores | Memory |
|---|---|---|
ecs.d1ne.2xlarge | 8 | 32 GB |
ecs.d1ne.4xlarge | 16 | 64 GB |
ecs.d1ne.6xlarge | 24 | 96 GB |
ecs.d1ne.8xlarge | 32 | 128 GB |
ecs.d1ne.14xlarge | 56 | 224 GB |
When using spark.kubernetes.executor.annotation.k8s.aliyun.com/eci-use-specs, also set the following two parameters:
spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-1.mount.path=/varspark.kubernetes.executor.volumes.emptyDir.spark-local-dir-1.options.medium=LocalRaid0If the specified executor instance type is unavailable, contact Lindorm technical support (DingTalk ID: s0s3eg3).
resourceTag specification mapping
| resourceTag value | spark.{driver/executor}.cores | spark.{driver/executor}.memory | spark.{driver/executor}.memoryOverhead | spark.kubernetes.{driver/executor}.disk.size |
|---|---|---|---|---|
xlarge | 4 | 8192m | 8192m | 50 GB |
2xlarge | 8 | 16384m | 16384m | 100 GB |
4xlarge | 16 | 32768m | 32768m | 200 GB |
8xlarge | 32 | 65536m | 65536m | 400 GB |
16xlarge | 64 | 131072m | 131072m | 400 GB |
Capacity parameters
| Parameter | Description | Default |
|---|---|---|
spark.executor.instances | Number of executors allocated for the job. | 2 |
spark.dynamicAllocation.enabled | Enables dynamic resource allocation. When enabled, LDPS automatically requests and releases executors based on real-time job workload. Valid values: true, false. | true |
spark.dynamicAllocation.minExecutors | Minimum number of executors when dynamic resource allocation is enabled. | 0 |
spark.dynamicAllocation.maxExecutors | Maximum number of executors when dynamic resource allocation is enabled. This value equals the number of concurrent tasks. | Infinity |
spark.dynamicAllocation.executorIdleTimeout | How long an idle executor is kept before being released. Unit: seconds. | 600s |
Execution parameters
| Parameter | Description | Default |
|---|---|---|
spark.speculation | Enables speculative execution. When enabled, the driver re-submits tasks that are running significantly slower than other tasks in the same stage (long tails), to avoid job delays. Valid values: true, false. | true |
spark.task.maxFailures | Maximum number of task failures allowed before the job fails. | 4 |
spark.dfsLog.executor.enabled | Stores executor logs to LindormDFS. Set to false for large-scale jobs to reduce DFS load from log streams. Valid values: true, false. | true |
spark.jars | Path to the JAR package required for the job. Accepts OSS or Hadoop Distributed File System (HDFS) paths. If you use JDBC, set this to an HDFS path only. If you use an OSS path, also configure the OSS parameters below. | None |
spark.hadoop.fs.oss.endpoint | OSS endpoint. For endpoint values by region, see Regions and endpoints. | None |
spark.hadoop.fs.oss.accessKeyId | AccessKey ID of your Alibaba Cloud account or RAM user. See Obtain an AccessKey pair. | None |
spark.hadoop.fs.oss.accessKeySecret | AccessKey secret of your Alibaba Cloud account or RAM user. See Obtain an AccessKey pair. | None |
spark.hadoop.fs.oss.impl | File system implementation class for OSS. Set to org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem. | None |
spark.default.parallelism | Default parallelism for non-SQL tasks, including data source reads and shuffle stages. | None |
spark.sql.shuffle.partitions | Number of shuffle partitions for SQL tasks. | 200 |
Monitoring parameters
| Parameter | Description | Default |
|---|---|---|
spark.monitor.cmd | Monitoring commands to run at regular intervals. Separate multiple commands with semicolons (;). Results are written to job logs. Note This parameter cannot be configured when submitting jobs via Beeline or JDBC. | None |
spark.monitor.interval | Interval between monitoring command executions. Unit: seconds. | 60 |
spark.monitor.timeout | Timeout for each monitoring command. If a command exceeds this limit, it is skipped and the next command runs. Unit: seconds. | 2 |
Example monitoring commands:
# Single command
"spark.monitor.cmd": "top -b -n 1"
# Multiple commands
"spark.monitor.cmd": "top -b -n 1; vmstat; free -m; iostat -d -x -c -k; df -h; sar -n DEV 1 1; netstat"Common monitoring commands by category:
| Category | Commands |
|---|---|
| System status | top -b -n 1, vmstat |
| Memory | free -m |
| Disk I/O | iostat -d -x -c -k |
| Disk usage | df -h |
| Network | sar -n DEV 1 1, netstat |
Log parameters
| Parameter | Description | Default |
|---|---|---|
spark.log.level | Log output level for the job. | INFO |
Valid values for spark.log.level, from most to least verbose:
| Level | Description |
|---|---|
ALL | All log output, including the most granular debug information. |
TRACE | More detailed than DEBUG; records fine-grained execution steps. |
DEBUG | Debug-level logs including detailed runtime status. |
INFO | General informational logs for normal execution events. |
WARN | Warnings about potential issues that do not stop execution. |
ERROR | Errors that occur during execution. |
FATAL | Critical errors that prevent the program from continuing. |
OFF | Disables all log output. |
Open-source Spark parameters
For parameters inherited from open-source Spark, see Spark configuration.
Configure parameters by submission method
When you submit a job to LDPS, the method you use determines how parameters are passed.
Beeline
Edit conf/beeline.conf in the Spark package directory where the Beeline command line tool is located.
# LDPS endpoint
# Format: jdbc:hive2://<host>:<port>/;?token=<token>
endpoint=jdbc:hive2://ld-bp13ez23egd123****-proxy-ldps-pub.lindorm.aliyuncs.com:10009/;?token=jfjwi2453-fe39-cmkfe-afc9-01eek2j5****
# Connection credentials (default: root/root)
user=root
password=root
# Set to false to use a dedicated Spark session for this connection
shareResource=false
# Spark parameters
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=3For the full Beeline setup, see Getting started.
JDBC
Append Spark parameters to the JDBC connection string as key-value pairs after the token.
jdbc:hive2://<host>:<port>/;?token=<token>;spark.executor.memory=8g;spark.sql.shuffle.partitions=2For the full JDBC URL format, see Use JDBC in application development.
spark.jars cannot be set to an OSS path when submitting jobs via JDBC. Use an HDFS path instead.JAR jobs
Configure parameters in the job template when submitting a Java job:
Lindorm console: Configure in the job content template. See Manage jobs in the Lindorm console.
Data Management Service (DMS): Configure in the Job configuration section of the job node page. See Use DMS to manage jobs.
Python jobs
Configure parameters in the job template when submitting a Python job:
Lindorm console: Configure in the job content template. See Manage jobs in the Lindorm console.
Data Management Service (DMS): Configure in the Job configuration section of the job node page. See Use DMS to manage jobs.