Accidental file deletion in OSS-HDFS can result in permanent data loss. The trash bin feature intercepts delete operations and moves files to a .Trash directory instead of removing them immediately. Files remain recoverable for the duration of the configured retention period, after which the OSS-HDFS server permanently removes them.
The trash bin relies on cooperation between the client and the server. The client moves deleted files to .Trash. The server periodically purges files from that directory based on the configured retention period.How it works
Deleting a file without -skipTrash triggers a three-stage lifecycle:
The file moves to
/user/<username>/.Trash/Current.After 30 minutes, files in
.Trash/Currentare grouped into a timestamped checkpoint directory:/user/<username>/.Trash/<timestamp>. The<timestamp>value is a Unix timestamp (the number of seconds elapsed since 00:00:00 Thursday, January 1, 1970). Files deleted within the same 30-minute interval are placed in the same timestamped directory.After the retention period expires, the timestamped directory is permanently deleted.
The default retention period is 3 days. To restore a file, retrieve it from the corresponding timestamp directory before the retention period expires.
Set the retention period
The retention period ranges from 1 to 14 days. The default is 3 days.
Log on to the OSS console.
In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the target bucket.
In the left-side navigation tree, choose Data Lake > OSS-HDFS.
On the OSS-HDFS tab, click
in the Trash Bin section.Set the retention period, then click OK.
Use the trash bin in Hadoop FileSystem Shell
The trash bin is enabled by default in Hadoop FileSystem Shell.
Delete a file (trash-enabled)
When you run hadoop fs -rm, the client automatically intercepts the delete and converts it to a move operation:
hadoop fs -rm oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/a/b/cThe client converts this to:
hadoop fs -mv oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/a/b/c /user/<username>/.Trash/Current/a/b/cThe OSS-HDFS server then periodically purges the .Trash directory based on the configured retention period.
Skip the trash bin
To delete a file immediately without moving it to trash, add -skipTrash:
hadoop fs -rm -skipTrash oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/a/b/cUse -skipTrash in automated pipelines (Spark, Flink, or Hive batch jobs) where large volumes of intermediate data are deleted frequently, to prevent unintended storage costs from trash accumulation.
Use the trash bin in Hadoop ecosystem services
Hadoop ecosystem services — Hive, Spark, and Flink — are not trash-aware. When these services call the HDFS delete interface directly, OSS-HDFS permanently deletes the file immediately, bypassing the trash bin.
OSS-HDFS adopts a similar policy to open source Hadoop. To protect files deleted by these services, explicitly call the HDFS rename interface to move files to /user/<username>/.Trash/Current before the delete operation. The OSS-HDFS server then clears those files according to the configured retention period.