This topic introduces Spark-1.x dependency configurations, including some examples.
Configure dependencies for Spark-1.x
If you want to submit your application by using the Spark on MaxCompute client, you
must add the following dependencies in the pom.xml file.
<properties>
<spark.version>1.6.3</spark.version>
<cupid.sdk.version>3.3.3-public</cupid.sdk.version>
<scala.version>2.10.4</scala.version>
<scala.binary.version>2.10</scala.binary.version>
</properties>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>cupid-sdk</artifactId>
<version>${cupid.sdk.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>hadoop-fs-oss</artifactId>
<version>${cupid.sdk.version}</version>
</dependency>
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-spark-datasource_${scala.binary.version}</artifactId>
<version>${cupid.sdk.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-actors</artifactId>
<version>${scala.version}</version>
</dependency>
In the preceding code, set the scope parameter as follows:
- Set it to provided for all packages that are released in the Apache Spark community, such as spark-core and spark-sql.
- Set it to compile for the odps-spark-datasource module.
WordCount example
- Detailed code
- How to submit
Step 1. build aliyun-cupid-sdk Step 2. properly set spark.defaults.conf Step 3. bin/spark-submit --master yarn-cluster --class \ com.aliyun.odps.spark.examples.WordCount \ ${path to aliyun-cupid-sdk}/spark/spark-1.x/spark-examples/target/spark-examples_2.10-version-shaded.jar
Example of Spark-SQL on MaxCompute Table
- Detailed code
- How to submit
# If the table you specify in the code cannot be found in the MaxCompute project, a "Table Not Found" error will be returned. # You can develop a Spark SQL application for the target table with reference to various APIs in the code. Step 1. build aliyun-cupid-sdk Step 2. properly set spark.defaults.conf Step 3. bin/spark-submit --master yarn-cluster --class \ com.aliyun.odps.spark.examples.sparksql.SparkSQL \ ${path to aliyun-cupid-sdk}/spark/spark-1.x/spark-examples/target/spark-examples_2.10-version-shaded.jar
GraphX PageRank example
- Detailed code
- How to submit
Step 1. build aliyun-cupid-sdk Step 2. properly set spark.defaults.conf Step 3. bin/spark-submit --master yarn-cluster --class \ com.aliyun.odps.spark.examples.graphx.PageRank \ ${path to aliyun-cupid-sdk}/spark/spark-1.x/spark-examples/target/spark-examples_2.10-version-shaded.jar
Mllib Kmeans-ON-OSS example
- Detailed code
- How to submit
# Enter your OSS account information before you compile the code. conf.set("spark.hadoop.fs.oss.accessKeyId", "***") conf.set("spark.hadoop.fs.oss.accessKeySecret", "***") conf.set("spark.hadoop.fs.oss.endpoint", "oss-cn-hangzhou-zmf.aliyuncs.com") Step 1. build aliyun-cupid-sdk Step 2. properly set spark.defaults.conf Step 3. bin/spark-submit --master yarn-cluster --class \ com.aliyun.odps.spark.examples.mllib.KmeansModelSaveToOss \ ${path to aliyun-cupid-sdk}/spark/spark-1.x/spark-examples/target/spark-examples_2.10-version-shaded.jar
OSS UnstructuredData example
- Detailed code
- How to submit
# Enter your OSS account information before you compile the code. conf.set("spark.hadoop.fs.oss.accessKeyId", "***") conf.set("spark.hadoop.fs.oss.accessKeySecret", "***") conf.set("spark.hadoop.fs.oss.endpoint", "oss-cn-hangzhou-zmf.aliyuncs.com") Step 1. build aliyun-cupid-sdk Step 2. properly set spark.defaults.conf Step 3. bin/spark-submit --master yarn-cluster --class \ com.aliyun.odps.spark.examples.oss.SparkUnstructuredDataCompute \ ${path to aliyun-cupid-sdk}/spark/spark-1.x/spark-examples/target/spark-examples_2.10-version-shaded.jar