Spark 2.3.0 Usage - MaxCompute - Alibaba Cloud Documentation Center

This topic describes the configuration required to use Spark 2.3.0.

Important

Use Spark 3 or later.

Submit Tasks

When you submit tasks using the Spark client, add the following parameters to specify the version. You can download the client.

# Enable kube mode and event log
spark.hadoop.odps.kube.mode=true
spark.hadoop.odps.cupid.data.proxy.enable=true
spark.hadoop.odps.cupid.fuxi.shuffle.enable=true
spark.hadoop.odps.spark.version=spark-2.3.0-odps0.47.0
spark.hadoop.odps.spark.libs.public.enable=true
spark.eventLog.enabled=true
spark.eventLog.dir=/workdir/eventlog/

# Read and write MaxCompute
spark.sql.catalogImplementation=odps

When you submit tasks using a DataWorks node, select Spark 2.x and add the following parameters to specify the version.
```
spark.hadoop.odps.spark.version=spark-2.3.0-odps0.47.0
```

Parameter Settings

Parameter Name	Value	Description
`spark.sql.catalogImplementation`	`odps`
`spark.hadoop.odps.cupid.vectorization.enable`	Set to `true`.	When set to true, batch read/write optimization is used.
`spark.hadoop.odps.input.split.size`	Default value is `256`.	This parameter adjusts the concurrency for reading MaxCompute tables. Each partition defaults to 256 MB.