Spark processes data in JindoFileSystem (JindoFS) by using one of the following methods: call methods and use Spark SQL to read data from tables stored in JindoFS.
JindoFS configuration
For example, a namespace named emr-jfs is created with the following configuration:
- jfs.namespaces=emr-jfs
- jfs.namespaces.emr-jfs.oss.uri=oss://oss-bucket/oss-dir
- jfs.namespaces.emr-jfs.mode=block
Process data in JindoFS
- Call methods
The read and write operations performed by Spark in JindoFS are similar to those in other file systems. For example, to access data in JindoFS, use a directory with the jfs prefix in the following Resilient Distributed Dataset (RDD) operation:
val a = sc.textFile("jfs://emr-jfs/README.md")
To write data to JindoFS, call the following method:
scala> a.collect().saveAsTextFile("jfs://emr-jfs/output")
- Use Spark SQL
Configure the parameter that sets the storage location to a directory in JindoFS when you create databases, tables, or partitions. For more information, see Use Hive to query data in JindoFS. Then, you can query data from tables stored in JindoFS.