All Products
Search
Document Center

MaxCompute:Use Spark 3.1.1

Last Updated:Mar 16, 2026

This topic describes the specific configuration information for using Spark 3.1.1.

Submit jobs

  • Spark client: To submit a job, you must add the following parameters to specify the version. You can download the client.

    # Enables kube mode and eventlog
    spark.hadoop.odps.kube.mode=true
    spark.hadoop.odps.cupid.data.proxy.enable=true
    spark.hadoop.odps.cupid.fuxi.shuffle.enable=true
    spark.hadoop.odps.spark.version=spark-3.1.1-odps0.47.0
    spark.hadoop.odps.spark.libs.public.enable=true
    spark.eventLog.enabled=true
    spark.eventLog.dir=/workdir/eventlog/
    
    # For reading from and writing to MaxCompute
    spark.sql.defaultCatalog = odps
    spark.sql.catalog.odps = org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog
    spark.sql.sources.partitionOverwriteMode = dynamic
    spark.sql.extensions = org.apache.spark.sql.execution.datasources.v2.odps.extension.OdpsExtensions
  • Cluster mode: To run a PySpark job, you must add the following parameters to use Python 3.

    spark.hadoop.odps.cupid.resources = public.python-3.7.9-ucs4.tar.gz
    spark.pyspark.python = ./public.python-3.7.9-ucs4.tar.gz/python-3.7.9-ucs4/bin/python3
  • DataWorks node: To submit a job, you can simply select Spark 3.x.

Parameter settings

Parameter Name

Value

Description

spark.sql.defaultCatalog

Set this parameter to odps.

spark.sql.catalog.odps

Set this parameter to org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog.

spark.sql.sources.partitionOverwriteMode

Set this parameter to dynamic.

spark.sql.extensions

Set this parameter to org.apache.spark.sql.execution.datasources.v2.odps.extension.OdpsExtensions.

spark.sql.catalog.odps.enableNamespaceSchema

Default value: false

If schema-level syntax is enabled for the MaxCompute project, set this parameter to true.

spark.sql.catalog.odps.enableVectorizedReader

Default value: true

Enables vectorized reads.

spark.sql.catalog.odps.enableVectorizedWriter

Default value: true

Enables vectorized writes.

spark.sql.catalog.odps.splitSizeInMB

Default value: 256

This parameter adjusts the concurrency for reading MaxCompute tables. The default partition size is 256 MB.