All Products
Search
Document Center

MaxCompute:Use Spark 2.4.5

Last Updated:Mar 13, 2026

This topic describes the configuration settings required when using Spark version 2.4.5.

Important

We recommend that you use Spark version 3 or later.

Submit tasks

  • When you submit tasks using the Spark client, specify the version by adding the following parameters. Download the client from here.

    # Enable kube mode and event logging
    spark.hadoop.odps.kube.mode=true
    spark.hadoop.odps.cupid.data.proxy.enable=true
    spark.hadoop.odps.cupid.fuxi.shuffle.enable=true
    spark.hadoop.odps.spark.version=spark-2.4.5-odps0.47.0
    spark.hadoop.odps.spark.libs.public.enable=true
    spark.eventLog.enabled=true
    spark.eventLog.dir=/workdir/eventlog/
    
    # Read from and write to MaxCompute
    spark.sql.catalogImplementation=hive
    spark.sql.sources.default=hive
  • When you submit tasks using a DataWorks node, specify the version by adding the following parameter.

    spark.hadoop.odps.spark.version=spark-2.4.5-odps0.47.0

Parameter settings

Parameter name

Value

Description

spark.sql.catalogImplementation

Set to hive.

spark.sql.sources.default

Set to hive.

spark.sql.odps.columnarReaderBatchSize

Default value is 4096.

Number of rows in each batch for vectorized reading.

spark.sql.odps.enableVectorizedReader

Default value is true.

Enable vectorized reading.

spark.sql.odps.enableVectorizedWriter

Default value is true.

Enable vectorized writing.

spark.sql.odps.split.size

Default value is 256 m.

This parameter controls the concurrency level when reading MaxCompute tables. By default, each partition is 256 MB.

spark.hadoop.odps.cupid.vnet.capacity

Default value is 256.

  • This parameter sets the maximum number of instances. Set it to spark.executor.instances + 2. Otherwise, you might encounter a create virtual net failed error.

  • Add this parameter to spark-defaults.conf or the DataWorks configuration item.