This topic describes the specific configuration information for using Spark 3.1.1.
Submit jobs
Spark client: To submit a job, you must add the following parameters to specify the version. You can download the client.
# Enables kube mode and eventlog spark.hadoop.odps.kube.mode=true spark.hadoop.odps.cupid.data.proxy.enable=true spark.hadoop.odps.cupid.fuxi.shuffle.enable=true spark.hadoop.odps.spark.version=spark-3.1.1-odps0.47.0 spark.hadoop.odps.spark.libs.public.enable=true spark.eventLog.enabled=true spark.eventLog.dir=/workdir/eventlog/ # For reading from and writing to MaxCompute spark.sql.defaultCatalog = odps spark.sql.catalog.odps = org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog spark.sql.sources.partitionOverwriteMode = dynamic spark.sql.extensions = org.apache.spark.sql.execution.datasources.v2.odps.extension.OdpsExtensionsCluster mode: To run a PySpark job, you must add the following parameters to use Python 3.
spark.hadoop.odps.cupid.resources = public.python-3.7.9-ucs4.tar.gz spark.pyspark.python = ./public.python-3.7.9-ucs4.tar.gz/python-3.7.9-ucs4/bin/python3DataWorks node: To submit a job, you can simply select Spark 3.x.
Parameter settings
Parameter Name | Value | Description |
| Set this parameter to | |
| Set this parameter to | |
| Set this parameter to | |
| Set this parameter to | |
| Default value: | If schema-level syntax is enabled for the MaxCompute project, set this parameter to true. |
| Default value: | Enables vectorized reads. |
| Default value: | Enables vectorized writes. |
| Default value: | This parameter adjusts the concurrency for reading MaxCompute tables. The default partition size is 256 MB. |