This topic describes how to connect Spark to OSS.
EMR provides the following features for accessing OSS:
The following example shows how Spark reads data from OSS and writes the processed data back to OSS without using an AccessKey pair.
val conf = new SparkConf().setAppName("Test OSS") val sc = new SparkContext(conf) val pathIn = "oss://bucket/path/to/read" val inputData = sc.textFile(pathIn) val cnt = inputData.count println(s"count: $cnt") val outputPath = "oss://bucket/path/to/write" val outpuData = inputData.map(e => s"$e has been processed.") outpuData.saveAsTextFile(outputPath)
Note PySpark reads data from OSS in the same way as Spark.
For the complete sample code, see Use Spark to access OSS.