Spark 3.4.2 configuration - MaxCompute - Alibaba Cloud Documentation Center

This topic describes the configurations required for using Spark 3.4.2 and 3.5.2.

Submit tasks

Use cluster mode

Submit tasks using the Spark client.

Add the following parameters to specify the version. For the client, download Download Spark 3.4.2 or Download Spark 3.5.2.

# Enable kube mode and event log
spark.hadoop.odps.kube.mode=true
spark.hadoop.odps.cupid.data.proxy.enable=true
spark.hadoop.odps.cupid.fuxi.shuffle.enable=true

## for spark 3.4.2
spark.hadoop.odps.spark.version=spark-3.4.2-odps0.48.0

## for spark 3.5.2
spark.hadoop.odps.spark.version=spark-3.5.2-odps0.49.0
spark.hadoop.odps.spark.libs.public.enable=true
spark.eventLog.enabled=true
spark.eventLog.dir=/workdir/eventlog/

# For reading and writing to MaxCompute
spark.sql.defaultCatalog=odps
spark.sql.catalog.odps=org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog
spark.sql.sources.partitionOverwriteMode=dynamic
spark.sql.extensions=org.apache.spark.sql.execution.datasources.v2.odps.extension.OdpsExtensions

Submit tasks using a DataWorks node. Add the following parameters to specify the version.

## for spark 3.4.2
spark.hadoop.odps.spark.version=spark-3.4.2-odps0.48.0

## for spark 3.5.2
spark.hadoop.odps.spark.version=spark-3.5.2-odps0.49.0

Parameter settings

Parameter Name	Value	Description
`spark.sql.defaultCatalog`	Set the value to `odps`.
`spark.sql.catalog.odps`	Set the value to `org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog`.
`spark.sql.sources.partitionOverwriteMode`	Set the value to `dynamic`.
`spark.sql.extensions`	Set the value to `org.apache.spark.sql.execution.datasources.v2.odps.extension.OdpsExtensions`.
`spark.sql.catalog.odps.enableNamespaceSchema`	The default value is `false`.	If the MaxCompute project enables the schema-level syntax switch, set this to true.
`spark.sql.catalog.odps.enableVectorizedReader`	The default value is `true`.	Enable vectorized reading.
`spark.sql.catalog.odps.enableVectorizedWriter`	The default value is `true`.	Enable vectorized writing.
`spark.sql.catalog.odps.splitSizeInMB`	The default value is `256`.	This parameter adjusts the concurrency for reading MaxCompute tables. The default value for each partition is 256 MB.
`spark.sql.catalog.odps.tableReadProvider`	The default value is `v1`.	When using `local` mode, set this to tunnel.
`spark.sql.catalog.odps.tableWriteProvider`	The default value is `v1`.	When using `local` mode, set this to tunnel.
`spark.hadoop.odps.spark.alinux3.enabled`	The default value is `false`.	Cluster mode uses the Alibaba Cloud Linux 3 (Alinux 3) base runtime image and Python 3.11.
`spark.hadoop.odps.native.engine.enable`	The default value is `false`.	In cluster mode, use Native Engine to accelerate computation. Native Engine uses the alinuX3 base image by default.