This topic describes the FAQ related to Spark on MaxCompute.

How should I migrate open-source Spark code to Spark on MaxCompute?

The method by which you migrate open-source Spark code to Spark on MaxCompute depends on the access requirements of your Spark on MaxCompute jobs. Specifically, consider the following:
  • If Spark on MaxCompute jobs require access to MaxCompute tables or OSS, configure the required objects, re-package them, and then upload these packages to Spark on MaxCompute.
  • If Spark on MaxCompute jobs do not require such access, migrate Spark code by running the JAR packages of the required objects on Spark on MaxCompute. For this method, you must set the scope parameter to provided for the corresponding Spark or Hadoop module.
For more information on migrating Spark code, see Set up a Spark on MaxCompute development environment.

How do I use Spark on MaxCompute to access services in a VPC?

If you want to access services in a VPC, open a ticket.

What can I do if the ID and key in the spark-defaults.conf file are incorrect?

If you receive an error message similar to the following:

Stack:
com.aliyun.odps.OdpsException: ODPS-0410042:
Invalid signature value - User signature dose not match

You can log on to the Alibaba Cloud CDN console and obtain the AccessKey ID and Access Key Secret from the User Management page. Then change the ID and key in the spark-defaults.conf file to the obtained AccessKey ID and Access Key Secret, respectively.

What can I do if I do not have the permissions to operate a project?

If you receive an error message similar to the following:

Stack:
com.aliyun.odps.OdpsException: ODPS-0420095: 
Access Denied - Authorization Failed [4019], You have NO privilege 'odps:CreateResource' on {acs:odps:*:projects/*}

You can ask the project owner to grant you the permissions to read and create resources in the project.

What can I do if Spark on MaxCompute tasks cannot run in a project?

If you receive an error message similar to the following:

Exception in thread "main" org.apache.hadoop.yarn.exceptions.YarnException: com.aliyun.odps.OdpsException: ODPS-0420095: Access Denied - The task is not in release range: CUPID

You can check whether the Spark on MaxCompute service is enabled for the region to which the project belongs. In addition, check whether the Spark-defaults.conf file is correctly configured according to the MaxCompute product documentation. If the Spark on MaxCompute service is enabled and the Spark-defaults.conf file is correctly configured, open a ticket or join our DingTalk group 21969532 for technical support.

What can I do if the system reports a No space left on device error?

You can increase the bucket size, which is determined by the spark.hadoop.odps.cupid.disk.driver.device_size parameter. The default bucket size is 20 GB and the maximum bucket size is 100 GB. If the error persists even after you increase the bucket size to 100 GB, check whether your data, including shuffled data and the data overflowing from BlockManager, is skewed among blocks during the shuffle or cache process. If data skew is found, set the spark.executor.cores parameter to a smaller value to decrease the number of cores that can run concurrently in each executor while you set the spark.executor.instances parameter to a greater value to increase the number of executors.