This topic describes how to access OSS data.

Background information

In EMR, Spark and Hadoop are seamlessly compatible with OSS. You can manage OSS files in the same way as you manage files in HDFS. You can use one of the following methods to access OSS data:
  • (Recommended) Access OSS data without an AccessKey pair
  • Explicitly enter an AccessKey pair

Access OSS data without an AccessKey pair

[Scala] 
   import org.apache.hadoop.conf.Configuration
   import org.apache.hadoop.fs.{ Path, FileSystem}
   val dir = "oss://bucket/dir"
   val path = new Path(dir)
   val conf = new Configuration()
   conf.set("fs.oss.impl", "com.aliyun.emr.fs.oss.JindoOssFileSystem")
   val fs = FileSystem.get(path.toUri, conf)
   val fileList = fs.listStatus(path)
   ...
[Java]
   import org.apache.hadoop.conf.Configuration;
   import org.apache.hadoop.fs.Path;
   import org.apache.hadoop.fs.FileStatus;
   import org.apache.hadoop.fs.FileSystem;
   String dir = "oss://bucket/dir";
   Path path = new Path(dir);
   Configuration conf = new Configuration();
   conf.set("fs.oss.impl", "com.aliyun.emr.fs.oss.JindoOssFileSystem");
   FileSystem fs = FileSystem.get(path.toUri(), conf);
   FileStatus[] fileList = fs.listStatus(path);
   ...