MaxCompute Spark supports three running modes: Local, Cluster, and DataWorks.

Local mode

The Local mode is used to facilitate code debugging for applications. In Local mode, you can use MaxCompute Spark the same way as native Spark in the community. Additionally, you can use Tunnel to read data from and write data to MaxCompute tables. In this mode, you can use either an IDE or the command line to run MaxCompute Spark. If you use this mode, you must add the spark.master=local[N] configuration. N indicates the CPU resources required to implement this mode. To use Tunnel to read data from and write data to tables in Local mode, you must add the Tunnel configuration item to Spark-defaults.conf. Enter the endpoint based on the region and network environment where the MaxCompute project is located. For more information about how to obtain the endpoint, see Configure endpoints. The following code provides an example on how to use the command line to run MaxCompute Spark in this mode:
1.bin/spark-submit --master local[4] \
--class com.aliyun.odps.spark.examples.SparkPi \
${path to aliyun-cupid-sdk}/spark/spark-2.x/spark-examples/target/spark-examples_2.11-version-shaded.jar

Cluster mode

In Cluster mode, you must specify the Main method as the entry point of a custom application. A Spark job ends when Main succeeds or fails. This mode is suitable for offline jobs. You can use MaxCompute Spark in this mode together with DataWorks to schedule jobs. The following code provides an example on how to use the command line to run MaxCompute Spark in this mode:
1.bin/spark-submit --master yarn-cluster \
-class SparkPi \
${ProjectRoot}/spark/spark-2.x/spark-examples/target/spark-examples_2.11-version-shaded.jar

DataWorks mode

You can run offline jobs of MaxCompute Spark (in Cluster mode) in DataWorks to integrate and schedule the other types of nodes.
Note DataWorks supports the Spark node in the following regions: China (Hangzhou), China (Beijing), China (Shanghai), China (Shenzhen), China (Hong Kong), US (Silicon Valley), Germany (Frankfurt), India (Mumbai), and Singapore.
To use MaxCompute Spark in this mode, follow these steps:
  1. Upload the resources in the DataWorks business flow and click Submit.

    After the resources are uploaded, the following figure appears.

  2. In the created business flow, select ODPS Spark from Data Analytics.
  3. Double-click the Spark node and define the Spark job. Select a Spark version, a development language, and a resource file. The resource file is the file uploaded and published in the business flow. You can specify configuration items, such as the number of executors and memory size, for the job to be submitted. You also need to set spark.hadoop.odps.cupid.webproxy.endpoint to the endpoint of the region where the project is located, for example, http://service.cn.maxcompute.aliyun-inc.com/api.
  4. Run the Spark node to view the task operational log and obtain the URLs of LogView and JobView from the log for further analysis and diagnosis.

    After you have defined the Spark job, orchestrate and schedule services of different types in the business flow if required.