Hadoop Distributed File System (HDFS) snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. In most cases, snapshots are used to back up data so that data can be restored even when accidental operations occur. This topic describes the directories and paths of HDFS snapshots and related snapshot operations.
Background information
- Snapshot creation is instantaneous. The time complexity is O(1), excluding the inode lookup time.
- Additional memory is used only when snapshot data is modified. The memory usage complexity is O(M), where M is the number of files or directories whose data is modified.
- Data blocks managed by DataNodes are not copied. Snapshot files record only the block list and the file size.
- Snapshots do not have a negative impact on regular HDFS operations. Modifications are recorded in reverse chronological order so that the latest data can be directly accessed. The snapshot data is obtained after the modifications are subtracted from the current data.
Snapshot directories
Snapshots can be taken on a directory only after the directory is set as snapshottable. A snapshot directory can contain a maximum of 65,536 snapshots. No limit is imposed on the number of snapshot directories. Administrators can set any directory to be snapshottable. If a directory contains snapshots, the directory cannot be deleted or renamed until all the snapshots are deleted.
Nested snapshot directories are not allowed. If the parent directory or subdirectory of a directory is a snapshot directory, the directory cannot be set as snapshottable.
Snapshot paths
- Query all snapshots in a snapshot directory.
hdfs dfs -ls /foo/.snapshot
- Query all files in Snapshot s0.
hdfs dfs -ls /foo/.snapshot/s0
- Copy a file from Snapshot s0.
hdfs dfs -cp -ptopax /foo/.snapshot/s0/bar /tmp
Note-ptopax
is used to retain timestamps, ownership, permissions, access control lists (ACLs), and XAttrs.
Snapshot operations
- Administrator operationsNote Superuser permissions are required to perform administrator operations.
- Allow snapshotsAllow snapshots to be created for a directory. If the operation succeeds, the directory becomes snapshottable.
hdfs dfsadmin -allowSnapshot <path>
- Prohibit snapshotsProhibit snapshots from being created for a directory. All snapshots of the directory must be deleted before the operation is performed.
hdfs dfsadmin -disallowSnapshot <path>
- Allow snapshots
- User operationsNote An HDFS superuser has the permissions to perform all user operations.
- Create a snapshotCreate a snapshot for a snapshot directory. Only the owner of a snapshot directory is allowed to create a snapshot for the directory.
hdfs dfs -createSnapshot <path> [snapshotName]
Note The[snapshotName]
parameter specifies the name of the snapshot. This parameter is optional. If you do not specify this parameter, a default name in the syyyyMMdd-HHmmss.SSSS format is generated. Example: s20130412-151029.033. - Delete a snapshotDelete a snapshot from a snapshot directory. Only the owner of a snapshot directory is allowed to delete a snapshot from the directory.
hdfs dfs -deleteSnapshot <path> <snapshotName>
- Rename a snapshotRename a snapshot. Only the owner of a snapshot directory is allowed to rename a snapshot in the directory.
hdfs dfs -renameSnapshot <path> <oldName> <newName>
Note The<oldName>
parameter specifies the original snapshot name. The<newName>
parameter specifies the new snapshot name. - Query all snapshot directoriesQuery all snapshot directories in which the current user has the permissions to take snapshots.
hdfs lsSnapshottableDir
- Compare snapshotsObtain the differences between two snapshots. This operation requires the read permissions on all files or directories in both snapshots.
hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
Note The<fromSnapshot>
parameter specifies the original snapshot. The<toSnapshot>
parameter specifies the snapshot that you want to compare with the original snapshot.
- Create a snapshot