Available in E-MapReduce (EMR) V3.30, JindoFS tiered storage lets you move data between Object Storage Service (OSS) storage classes — Standard, Infrequent Access (IA), Archive, and Cold Archive — and local disk cache, so you can balance access speed against storage cost.
All tiered storage commands are asynchronous. Each command starts a background task rather than completing the operation immediately. Use the status command to monitor progress.
Commands at a glance
| Command | What it does |
|---|---|
cache |
Copies data to local disk for faster reads |
uncache |
Removes the local copy; data stays in OSS Standard |
archive |
Moves data to OSS IA, Archive, or Cold Archive |
unarchive |
Restores archived data to OSS IA or Standard |
status |
Shows tiered storage task progress |
ls2 |
Lists files with their current storage class |
To see built-in help, run:
jindo jfs -help archive
Cache
Copies data at the specified path to local disk. Subsequent reads serve from local disk instead of OSS, which reduces read latency for frequently accessed data.
jindo jfs -cache -p <path>
The -p flag pins the local copy so it is not evicted based on disk usage.
Use cache for datasets that are read repeatedly, such as fact tables in join-heavy queries or the working set of high-priority jobs.
Uncache
Removes the local disk copy. Data remains accessible from OSS Standard storage.
jindo jfs -uncache <path>
Use this command when a dataset is no longer accessed frequently and local disk space should be reclaimed.
Archive
Removes the local disk copy and moves data to a lower-cost OSS storage class. Use this for cold data that is rarely or never accessed.
jindo jfs -archive -i|-a|-c <path>
| Option | Target storage class |
|---|---|
-i |
OSS Infrequent Access (IA) |
-a |
OSS Archive |
-c |
OSS Cold Archive |
For a description of each storage class, see OSS storage classes overview.
Unarchive
Converts data from Archive storage back to a more accessible storage class. By default (no option specified), data is restored to OSS Standard storage.
jindo jfs -unarchive -i/-o <path>
| Option | Behavior |
|---|---|
| (none) | Restores data to OSS Standard storage |
-i |
Restores data to OSS Infrequent Access (IA) storage |
-o |
Temporarily unarchives data from OSS Archive storage; data becomes readable for one day |
Status
Shows the progress of a tiered storage task for the specified path. By default, reports the number of files targeted for tiered storage in the directory and the data to which tiered storage has been applied.
jindo jfs -status -detail/-sync <path>
| Option | Behavior |
|---|---|
-detail |
Shows per-file storage progress |
-sync |
Blocks until the tiered storage task completes |
ls2
Lists files at the specified path along with their current storage class, extending the standard hadoop fs -ls output.
hadoop fs -ls2 <path>
Example output:
drwxrwxrwx - - 0 2020-06-05 04:27 oss://xxxx/warehouse
-rw-rw-rw- 1 Archive 1484 2020-09-23 16:40 oss://xxxx/wikipedia_data.csv
-rw-rw-rw- 1 Standard 1676 2020-06-07 20:04 oss://xxxx/wikipedia_data.json
The third column shows the storage class for each file.