Spark processes data in JindoFileSystem (JindoFS) by calling methods or using Spark SQL to read data from tables stored in JindoFS.
For example, a namespace named emr-jfs is created with the following configuration:
Process data in JindoFS
- Call methods
The read and write operations performed by Spark in JindoFS are similar to those in other file systems. For example, to access data in JindoFS, use a directory with the jfs prefix in the following resilient distributed dataset (RDD) operation:
val a = sc.textFile("jfs://emr-jfs/README.md")
To write data to JindoFS, call the following method:
- Use Spark SQL
Set the parameter that specifies the storage location to a directory in JindoFS when you create databases, tables, and partitions. For more information, see Use Hive to query data in JindoFS. Then, you can query data from tables stored in JindoFS.