【 MaxCompute FAQ】 MaxCompute Spark

MaxCompute Spark

MaxCompute Spark.How to migrate open source Spark code to Spark on MaxCompute ? There are three situations: The job does not need to access MaxCompute tables and OSS. Your Jar package can be run directly. For details, see Setting Up a Development Environment. Note that dependencies on Spark or Hadoop must be set to provided.

MaxCompute Spark



Common configuration and usage issues of Spark on MaxCompute

1. In MaxCompute , how to transfer the parameters of node tasks to the input parameters of spark, such as setting the task parameter bizdate , can it be used in the input parameters of the spark program?
Yes, you can directly refer to the parameters in the spark node parameters, refer to the documentation .

2. Is there any reference document or code for MaxCompute Spark to stream data read from datahub and write to MaxCompute ?
Documentation is available .

3. How does MaxCompute Spark debug locally?
You can use IDEA to debug MaxCompute Spark locally, refer to the documentation .

4. Can Spark programs process table data on MaxCompute ?
Currently , MaxCompute Spark supports three operation modes: Local mode, Cluster mode, and execution mode in DataWorks .
The three modes require different configurations, please refer to the official documentation .

5. Which version of native Spark does MaxCompute Spark currently support?
Currently, Spark-1.6.3, Spark-2.3.0 and Spark-2.4.5 are supported. For how to Spark on MaxCompute, please refer to an article in the community .

6. How to migrate open source Spark code to Spark on MaxCompute ? There are three situations:
•Jobs do not need to access MaxCompute tables and OSS. Your Jar package can be run directly. For details, see Setting Up a Development Environment . Note that dependencies on Spark or Hadoop must be set to provided.
•Jobs need to access MaxCompute tables. After configuring the relevant dependencies, you can repackage it. For steps to configure dependencies, see Setting Up a Development Environment .
•The job requires access to OSS. After configuring the relevant dependencies, you can repackage it. For steps to configure dependencies, see Setting Up a Development Environment .

7. The ID and Key provided by spark- defaults.conf are wrong Stack:
com.aliyun .odps.OdpsException:ODPS-0410042:Invalid signature value - User signature dose not match


Please check whether the ID and Key provided by spark- defaults.conf are consistent with the AccessKey ID and Access Key Secret in the user information management of the Alibaba Cloud official website management console . HYPERLINK "https://account.aliyun.com/login/login.htm?oauth_callback=https://usercenter.console.aliyun.com/" "_blank"

8. Error: Stack: com.aliyun .odps.OdpsException : ODPS-0420095: Access Denied - Authorization Failed [4019], You have NO privilege'odps:CreateResource ' on { acs:odps :*:projects/*}


Ask the Project Owner to authorize the Read and Create permissions of the Grant Resource.

9. MaxCompute Spark.Running error: No space left on device

Spark uses a network disk for local storage. Both Shuffle data and BlockManager overflow data are stored on the network disk. The size of the network disk is controlled by the parameter spark.hadoop.odps.cupid.disk.driver.device_size , the default is 20GB, and the maximum is 100GB.
If this error is still reported after adjusting to 100GB, you need to analyze the specific reasons. The common reason is data skew: the data set is distributed in some blocks during the Shuffle or Cache process. At this point, you can reduce the concurrency of a single Executor ( spark.executor.cores ) and increase the number of Executors ( spark.executor.instances ).

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00