Starting from E-MapReduce (EMR) V3.30, JindoFS provides tiered storage. Use tiered storage commands to move data between local disks and OSS storage classes to balance access speed against storage cost.
All tiered storage commands are asynchronous — they submit a task and return immediately. To monitor progress or wait for completion, use the Status command.
Prerequisites
-
EMR cluster version V3.30 or later
Storage classes
OSS provides three storage classes. Choose a class based on how frequently data is accessed:
| Storage class | Access speed | Cost | Best for |
|---|---|---|---|
| Standard | Fastest | Highest | Hot data accessed frequently |
| Infrequent Access (IA) | Fast | Lower | Warm data accessed less than once a month |
| Archive | Requires restore (up to 1 day) | Lowest | Cold data accessed rarely |
For more information about OSS storage classes, see Overview.
Storage class transitions
| From | To | Command |
|---|---|---|
| Local disk | OSS Standard | uncache |
| Local disk | OSS IA | archive -i |
| Local disk | OSS Archive | archive -a |
| OSS Archive | OSS Standard | unarchive |
| OSS Archive | OSS IA | unarchive -i |
| OSS Archive | Temporarily readable | unarchive -o |
| OSS Standard | Local disk | cache |
Commands
Cache
Back up data at a path to local disks. After caching, reads are served from local disks instead of OSS.
jindo jfs -cache -p <path>
| Option | Description |
|---|---|
-p |
Pin local data so it is not evicted based on disk usage |
Uncache
Remove the local disk backup for a path. Data is retained only in OSS Standard storage.
jindo jfs -uncache <path>
Archive
Move data off local disks and into OSS Infrequent Access (IA) or Archive storage. The local disk backup is deleted after the data is moved.
jindo jfs -archive -i|-a <path>
| Option | Target storage class |
|---|---|
-i |
Infrequent Access (IA) |
-a |
Archive |
Unarchive
Convert data from Archive storage to a more accessible storage class, or temporarily restore it for reading.
jindo jfs -unarchive [-i|-o] <path>
| Option | Target storage class | Notes |
|---|---|---|
| *(none)* | Standard | Default behavior |
-i |
Infrequent Access (IA) | |
-o |
Temporarily readable | Data becomes readable within one day; not a permanent storage class change |
Status
View the progress of a tiered storage task.
jindo jfs -status [-detail|-sync] <path>
| Option | Behavior |
|---|---|
| *(none)* | Show the number of files targeted for tiered storage in the directory and the data to which tiered storage has been applied |
-detail |
Show per-file storage progress |
-sync |
Block until the tiered storage task completes, then exit |
Because all tiered storage commands are asynchronous, use -sync in scripts that must wait for a task to finish before proceeding — for example, archive data and then validate the result.
ls2
View the storage class of files at a path. The ls2 command extends the standard Hadoop ls command with a storage class column.
hadoop fs -ls2 <path>
Example output:
drwxrwxrwx - - 0 2020-06-05 04:27 oss://xxxx/warehouse
-rw-rw-rw- 1 Archive 1484 2020-09-23 16:40 oss://xxxx/wikipedia_data.csv
-rw-rw-rw- 1 Standard 1676 2020-06-07 20:04 oss://xxxx/wikipedia_data.json
The third column shows the storage class of each file. Possible values: Standard, Archive.
Get help
Run the following command to view help information for tiered storage commands:
jindo jfs -help archive