Use the trash bin feature of OSS-HDFS to restore deleted data within a specific period of time - Object Storage Service

If you want to use the trash bin feature of OSS-HDFS, you must move the files that you want to delete to a specific directory by using a client. The server periodically deletes the files in the directory.

How the trash bin feature works

If you delete files from OSS-HDFS unforcefully, the files are not immediately deleted. The files are moved to the /user/<username>/.Trash/Current directory.
After 30 minutes, the files in the /user/<username>/.Trash/Current directory are transferred to the /user/<username>/.Trash/<timestamp> directory.
Note
The value of the <timestamp> parameter must be a timestamp that follows the UNIX time format. It is the number of seconds that have elapsed since 00:00:00 Thursday, January 1, 1970. The <timestamp> parameter specifies that the files deleted within a specific period of time are classified into directories that have timestamps.
After a specific number of days, the /user/<username>/.Trash/<timestamp> directory is permanently deleted.
By default, the data is retained in the trash bin for 3 days. You can specify a custom retention period of 1 to 14 days. For example, you can specify that data is retained in the recycle bin for 5 days. You can restore deleted files from the corresponding timestamp directory at any time within 5 days after you delete the files. You can perform the following steps to specify the number of days for which data is retained in the trash bin:
1. Log on to the OSS console.
2. In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the desired bucket.
3. In the left-side navigation tree, choose Data Lake > OSS-HDFS.
4. On the OSS-HDFS tab, click in the Trash Bin section to modify the retention period of deleted data.
5. Click OK.

Note

The trash feature depends on the cooperation between the client and the server. The client moves the files that you delete to the .Trash directory. By default, the server periodically deletes the files from the /user/<username>/.Trash directory.

Use the trash feature in Hadoop FileSystem Shell

Sample command:

hadoop fs -rm oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/a/b/c

In Hadoop FileSystem Shell, the trash feature is enabled for the client by default. When you run the preceding remove command, the client automatically converts the command into the move command hadoop fs -mv oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/a/b/c /user/<username>/.Trash/Current/a/b/c. This way, you do not need to manage the trash feature. The server will periodically clear files in the /user/<username>/.Trash directory.

If you want to immediately delete a file to free up storage space, add the -skipTrash parameter to the remove command. In this case, the file is directly deleted from the file system instead of being moved to the trash directory. Sample command:

hadoop fs -rm -skipTrash oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/a/b/c

Use the trash feature in Hadoop ecosystem services

Services such as Hive, Spark, and Flink are not aware of the trash feature of OSS-HDFS. When you call the delete interface of HDFS to delete a file, the file is immediately deleted.

OSS-HDFS adopts a similar policy to open source Hadoop. To use the trash feature in Hadoop ecosystem services, you must explicitly call the rename interface of HDFS to move the files that you want to delete to the /user/<username>/.Trash/Current directory. Then, the OSS-HDFS server periodically clears files from the trash directory.