This topic describes the configuration settings required when using Spark version 2.4.5.
We recommend that you use Spark version 3 or later.
Submit tasks
When you submit tasks using the Spark client, specify the version by adding the following parameters. Download the client from here.
# Enable kube mode and event logging spark.hadoop.odps.kube.mode=true spark.hadoop.odps.cupid.data.proxy.enable=true spark.hadoop.odps.cupid.fuxi.shuffle.enable=true spark.hadoop.odps.spark.version=spark-2.4.5-odps0.47.0 spark.hadoop.odps.spark.libs.public.enable=true spark.eventLog.enabled=true spark.eventLog.dir=/workdir/eventlog/ # Read from and write to MaxCompute spark.sql.catalogImplementation=hive spark.sql.sources.default=hiveWhen you submit tasks using a DataWorks node, specify the version by adding the following parameter.
spark.hadoop.odps.spark.version=spark-2.4.5-odps0.47.0
Parameter settings
Parameter name | Value | Description |
| Set to | |
| Set to | |
| Default value is | Number of rows in each batch for vectorized reading. |
| Default value is | Enable vectorized reading. |
| Default value is | Enable vectorized writing. |
| Default value is | This parameter controls the concurrency level when reading MaxCompute tables. By default, each partition is 256 MB. |
| Default value is |
|