In E-MapReduce (EMR) V3.30, JindoFS provides a tiered storage feature. This feature allows you to store cold and hot data in different storage media. This helps reduce the data storage costs or accelerate data access.

Use jindo jfs

Run the following command to obtain the help information:
[root@emr-header-1 ~]# jindo jfs -help archive
-archive -i/a <path> ... :
  Archive commands.

Tiered storage commands in JindoFS are asynchronous and start only related tasks.

Cache

You can use this command to back up data stored in a specific path to local disks. Then, you can read the data from local disks, without the need to read data from Object Storage Service (OSS).
jindo jfs -cache -p <path>

-p can be used to ensure that local data is not cleared based on disk usage.

Uncache

You can use this command to delete backup data from local disks and store data only to OSS Standard storage.
jindo jfs -uncache  <path>

Archive

You can use this command to delete backup data from local disks and store data to OSS in Infrequent Access (IA) or Archive mode. For information about the storage classes, see Overview.
jindo jfs -archive -i|-a <path>

-i is used to store data to OSS in IA mode. -a is used to store data to OSS in Archive mode.

Unarchive

You can use this command to restore data from the Archive mode to the IA or Standard mode. You can temporarily restore data stored in Archive mode to allow the data to be readable within one day.
jindo jfs -unarchive -i/-o <path>

-i is used to store data to OSS in IA mode. -o is used to temporarily restore data stored in Archive mode to allow the data to be readable.

Status

You can use this command to view task details. By default, the number of files to be subject to tiered storage in a specific directory and the data that has been subject to tiered storage are measured.
jindo jfs -status -detail/-sync <path>

-detail is used to view the storage progress of file data. -sync indicates that the command exits after a tiered storage task is completed.

ls2 command

JindoFS provides the ls2 command that allows you to view the file storage status on the basis of Hadoop ls commands:
hadoop fs -ls2 <path>
Example of command output, which includes the file storage class:
drwxrwxrwx  - -         0    2020-06-05 04:27 oss://xxxx/warehouse
-rw-rw-rw-  1 Archive   1484 2020-09-23 16:40 oss://xxxx/wikipedia_data.csv
-rw-rw-rw-  1 Standard  1676 2020-06-07 20:04 oss://xxxx/wikipedia_data.json