This topic describes the general parameter configurations for Spark clients across different versions.
MaxCompute account parameter configurations
Parameter | Description |
| The MaxCompute project name. If you submit jobs through DataWorks, use the default value. No configuration is required. |
| The AccessKey ID that has access permissions to the target MaxCompute project. You can obtain the AccessKey ID on the AccessKey Management page. If you submit jobs through DataWorks, use the default value. No configuration is required. |
| The AccessKey secret corresponding to the AccessKey ID. If you submit jobs through DataWorks, use the default value. No configuration is required. |
| The STS token for the MaxCompute project. If you submit jobs through DataWorks, use the default value. No configuration is required. |
|
|
| The Cloud Product Interconnection Endpoint for the region where MaxCompute resides. For example, the service interconnection endpoint for the China (Hangzhou) region is |
MaxCompute Spark job submission, version, and log configurations
Parameter | Description |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Spark resource allocation configurations
Parameter | Description |
| Default value: 1. The total number of Executor processes launched by the Spark application in the cluster. |
| Default value: 1. The number of CPU cores available to each Executor process. |
| Default value: 2 g. The total memory per Executor process, including heap and off-heap memory. |
| Default value: 1. The number of CPU cores used by the Driver process. |
| Default value: 2 g. The total memory for the Driver process. |
|
|
|
|
|
|
MaxCompute read and write configurations
The following configurations that start with spark.sql.catalog.odps apply only to Spark 3.x versions.
Parameter | Description |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MaxCompute data interoperability configurations
spark.hadoop.odps.cupid.resources
You must configure this parameter in the spark-defaults.conf file or as a DataWorks configuration item. Do not configure this parameter in your code.
Description:
Specifies the MaxCompute resources required for a job to run. The format is
<projectname>.<resourcename>. To specify multiple resources, separate them with commas.The specified resources are downloaded to the current working directory (
/workdir) of the Driver and Executors. After the download is complete, the default filename is<projectname>.<resourcename>. Compressed resources are automatically extracted. The name of the top-level directory matches the name of the original archive. For example, if a resource is namedexamples.tar.gzand is not renamed, its contents are extracted to the/workdir/examples.tar.gz/sub/...path. If you rename the resource toexamples, its contents are extracted to the/workdir/examples/sub/...path. The exact path depends on the name of the archive and its internal directory structure.Example:
spark.hadoop.odps.cupid.resources = public.python-python-2.7-ucs4.zip,public.myjar.jar.Rename resources: To rename a resource during configuration, use the format
<projectname>.<resourcename>:<newresourcename>.Example of renaming:
spark.hadoop.odps.cupid.resources = public.myjar.jar:myjar.jar.
Other MaxCompute configurations
Parameter | Description |
| Configure VPC settings. For details, see Accessing Alibaba Cloud VPC. |
| No default value. If your Spark cluster cannot access Alibaba Cloud service interconnection sites over the network, configure this parameter. See Accessing Alibaba Cloud OSS. |
|
|
|
|
|
|
|
|
|
|