Configure parameters for jobs - Lindorm - Alibaba Cloud Documentation Center

This topic describes how to configure parameters for the Spark jobs of Lindorm Distributed Processing System (LDPS).

Configure parameters for Spark jobs

LDPS allows you to configure common parameters for Spark jobs, including parameters related to resources, execution, and monitoring.

Restricted parameters

The spark.master and spark.submit.deployMode parameters are system parameters and cannot be customized.

Parameter	Description
spark.master	The endpoint of the cluster management system.
spark.submit.deployMode	The mode in which the Spark driver is deployed.

Resource parameters

LDPS provides services based on elastic resource pools. By default, the maximum number of resources that you can configure is not limited. These resources are billed based on a pay-as-you-go basis. For more information about how to modify the maximum number of resources that you can configure, see Modify the configurations of LDPS.

You can configure resource parameters for each job that you submit to LDPS, such as a JDBC, JAR, or Python job. Resource parameters include specification parameters and capacity parameters.

Table 2. Specification parameters
Parameter	Description	Default value
spark.driver.memory	The size of heap memory of the driver. Unit: MB.	11264m
spark.driver.memoryOverhead	The size of off-heap memory of the driver. Unit: MB.	5120m
spark.kubernetes.driver.disk.size	The size of the local disk of the driver. Unit: GB.	20
spark.executor.cores	The number of CPU cores of a single executor node.	4
spark.executor.memory	The size of heap memory of a single executor. Unit: MB.	11264m
spark.executor.memoryOverhead	The size of off-heap memory of a single executor. Unit: MB.	5120m
spark.kubernetes.executor.disk.size	The size of the local disk of a single executor. Unit: GB.	40
spark.executor.resourceTag	The configurations of memory resources. You can configure this parameter to simplify the configurations of executors. The values of this parameter correspond to different executor specifications. Important This parameter cannot be configured together with the spark.executor.cores, spark.executor.memory, and spark.executor.memoryOverhead parameters. Otherwise, errors are reported. You can set this parameter to the following values: large-mem: This value is applicable to jobs that require large numbers of memory resource. This value corresponds to the following executor specification: spark.executor.cores=2 spark.executor.memory=11264m spark.executor.memoryOverhead=5120m xlarge-mem: This value is applicable to jobs that require extremely large numbers of memory resources. This value corresponds to the following executor specification: spark.executor.cores=1 spark.executor.memory=11264m spark.executor.memoryOverhead=5120m large-memory-overhead: This value is applicable to jobs that require large numbers of off-heap memory resources, such as PySpark jobs. This value corresponds to the following executor specification: spark.executor.cores=2 spark.executor.memory=8192m spark.executor.memoryOverhead=8192m xlarge-memory-overhead: This value is applicable to jobs that require extremely large numbers of off-heap memory resources, such as PySpark jobs. This value corresponds to the following executor specification: spark.executor.cores=1 spark.executor.memory=8192m spark.executor.memoryOverhead=8192m	None
spark.kubernetes.{driver/executor}.annotation.k8s.aliyun.com/eci-use-specs	The specification and model of the GPU. For more information, see GPU-accelerated instance specifications.	ecs.gn7i-c8g1.2xlarge
spark.{driver/executor}.resource.gpu.vendor	The manufacturer of the GPU. Note The value of this parameter must correspond to the specified GPU specification.	nvidia.com
spark.{driver/executor}.resource.gpu.amount	The number of GPUs. Note Set this parameter to 1.	1
spark.{driver/executor}.resource.gpu.discoveryScript	The path in which the script file is located. Note The script file specified by this parameter is used to query and associated with GPU resources when you start the Spark driver or executor. Set this parameter to `/opt/spark/examples/src/main/scripts/getGpusResources.sh`.	/opt/spark/examples/src/main/scripts/getGpusResources.sh
spark.kubernetes.executor.annotation.k8s.aliyun.com/eci-use-specs	The specification of executor instances. Expand the disk of executors to ensure sufficient capacity. The following specifications are supported: ecs.d1ne.2xlarge: 8 cores and 32 GB of memory. ecs.d1ne.4xlarge: 16 cores and 64 GB of memory. ecs.d1ne.6xlarge: 24 cores and 96 GB of memory. ecs.d1ne.8xlarge: 32 cores and 128 GB of memory. ecs.d1ne.14xlarge: 56 cores and 224 GB of memory. Note Select the specification of executor instances based on your requirement. You must configure the following two parameters together with this parameter: spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-1.mount.path=/var spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-1.options.medium=LocalRaid0 In some cases, executors with the specified specification may be unavailable. If an error occurs during configuration, contact the technical support of Lindorm (DingTalk ID: s0s3eg3).	None

Table 3. Capacity parameters
Parameter	Description	Default value
spark.executor.instances	The number of executors that are applied for the job.	2
spark.dynamicAllocation.enabled	Specifies whether to enable dynamic resource allocation. Valid values: true: Enable speculative execution. false: Disable dynamic resource allocation. After dynamic resource allocation is enabled, LDPS applies for and releases executors based on the real-time workload of the job.	true
spark.dynamicAllocation.minExecutors	The minimum number of executors when dynamic resource allocation is enabled.	0
spark.dynamicAllocation.maxExecutors	The maximum number of executors when dynamic resource allocation is enabled. Note The maximum number of executors is the same as the specified number of concurrent tasks.	Infinity
spark.dynamicAllocation.executorIdleTimeout	The maximum idle period for executors when dynamic resource allocation is enabled. If an executor is idle for a time period longer than the specified value, the executor is released. Unit: seconds.	600s

Execution parameters

Parameter	Description	Default value
spark.speculation	Specifies whether to enable speculative execution. Valid values: true: Enable speculative execution. false: Disable speculative execution. If the execution of a task takes a large amount of time, the driver resubmits the task to avoid long tails. Note Long tails indicate that the execution periods of some tasks are significantly longer than those of other tasks.	true
spark.task.maxFailures	The maximum number of failures that is allowed for a task. If the number for which a task has been failed exceeds the value, the job to which the task belongs fails.	4
spark.dfsLog.executor.enabled	Specifies whether to store the logs of executors to LindormDFS. Valid values: true: Store the logs of executors to LindormDFS. false: Do not store the logs of executors to LindormDFS. If the jobs in LDPS is large in scale, you can set this parameter to false to prevent excess DFS loads caused by log streams.	true
spark.jars	The path of the JAR package that is required when you submit a task. The value of this parameter can be a path in OSS or HDFS. If you set this parameter to an OSS path, you must also configure the following parameters: spark.hadoop.fs.oss.endpoint, spark.hadoop.fs.oss.accessKeyId, and spark.hadoop.fs.oss.accessKeySecretspark.hadoop.fs.oss.impl. Important If you use JDBC to connect to LDFS, this parameter can be set only to an HDFS path.	None
spark.hadoop.fs.oss.endpoint	The endpoint of OSS. For more information about how to obtain the endpoint, see Regions and OSS endpoints in the public cloud.	None
spark.hadoop.fs.oss.accessKeyId	The AccessKey ID of your Alibaba Cloud account or a RAM user of your Alibaba Cloud account. For more information about how to obtain the AccessKey ID and AccessKey secret, see Obtain an AccessKey pair.	None
spark.hadoop.fs.oss.accessKeySecret	The AccessKey secret of your Alibaba Cloud account or a RAM user of your Alibaba Cloud account. For more information about how to obtain the AccessKey ID and AccessKey secret, see Obtain an AccessKey pair.	None
spark.hadoop.fs.oss.impl	The class that is used to access OSS. Set this parameter to org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.	None
spark.default.parallelism	The default concurrency of non-SQL tasks, including the data source concurrency and shuffle concurrency.	None
spark.sql.shuffle.partitions	The default shuffle concurrency of SQL tasks.	200

Monitoring parameters

LDFS allows you to use customized parameters to monitor the status of the instance. You can configure these parameter to record the status of the drivers and executors in job logs.

Parameter	Description	Default value
spark.monitor.cmd	The command group for job monitoring. Separate multiple commands with semicolons (;). The commands specified by this parameter are executed in sequence at regular intervals. The execution results of the commands are recorded in job logs. Sample monitoring commands: Monitor the system status: top -b -n 1, vmstat. Monitor the memory status: free -m. Monitor the I/O status: iostat -d -x -c -k. Monitor the disk status: df -h. Monitor the network status: sar -n DEV 1 1, netstat. Sample statements: Configure a single monitoring command: `"spark.monitor.cmd": "top -b -n 1"` Configure multiple monitoring commands: `"spark.monitor.cmd": "top -b -n 1; vmstat; free -m; iostat -d -x -c -k; df -h; sar -n DEV 1 1; netstat"` Important If you use Beeline or JDBC to submit a job, this parameter cannot be configured.	None
spark.monitor.interval	The interval at which the commands in the group are executed. Unit: seconds. The commands specified by the spark.monitor.cmd parameter are executed at the interval specified by this parameter.	60
spark.monitor.timeout	The timeout period for the monitoring commands. Unit: seconds. If the execution time of a monitoring command in the command group specified by the spark.monitor.cmd parameter exceeds the specified timeout period, the command is skipped and the subsequent commands are executed. This way, monitoring information can be recorded in logs without being blocked.	2

Parameters related to open source Spark

For more information about parameters related to open source Spark, see Spark Configuration.

Configuration method

When you submit jobs to LDPS, you can configure customized resource parameters. The configuration method varies with the method that you use to submit jobs.

Beeline

You can specify the configuration method by modifying the conf/beeline.conf configuration file in the Spark package where the Beeline command line tool is located. For more information, see Getting started.

The following code shows a sample configuration file:

# Endpoint of Lindorm Compute Engine, e.g. jdbc:hive2://123.456.XX.XX:10009/;?token=bb8a15-jaksdj-sdfjsd-ak****
endpoint=jdbc:hive2://ld-bp13ez23egd123****-proxy-ldps-pub.lindorm.aliyuncs.com:10009/;?token=jfjwi2453-fe39-cmkfe-afc9-01eek2j5****

# Username for connection, by default root.
user=root

# Password for connection, by default root.
password=root

# Whether to share Spark resource between different sessions, by default true.
shareResource=false

# Normal Spark configurations
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=3

JDBC

You can configure parameters by using the JDBC connection string. For more information about the JDBC URL, see Use JDBC in application development.

For example, you can use a JDBC connection string to set the default number of Spark shuffle partitions to 2 and the memory space used by the executor to 8 GB.

jdbc:hive2://${host}:${port}/;?token=${token};spark.executor.memory=8g;spark.sql.shuffle.partitions=2

JAR

You can configure parameters for a Java job based on a job content template when you submit the Java job in the Lindorm console. For more information, see Manage jobs in the Lindorm console.
When you use DMS to submit a Java job, you can configure customized parameters for the job in the Job configuration section of the job node page. For more information, see Use DMS to manage jobs.

Python

You can configure parameters for a Python job based on a job content template when you submit the Java job in the Lindorm console. For more information, see Manage jobs in the Lindorm console.
When you use DMS to submit a Python job, you can configure customized parameters for the job in the Job configuration section of the job node page. For more information, see Use DMS to manage jobs.