Hadoop Distributed File System (HDFS) snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. In most cases, snapshots are used to back up data so that data can be restored even when accidental operations occur. This topic describes the directories and paths of HDFS snapshots and related snapshot operations.

Background information

HDFS snapshots have the following features:
  • Snapshot creation is instantaneous. The time complexity is O(1), excluding the inode lookup time.
  • Additional memory is used only when snapshot data is modified. The memory usage complexity is O(M), where M is the number of files or directories whose data is modified.
  • Data blocks managed by DataNodes are not copied. Snapshot files record only the block list and the file size.
  • Snapshots do not have a negative impact on regular HDFS operations. Modifications are recorded in reverse chronological order so that the latest data can be directly accessed. The snapshot data is obtained after the modifications are subtracted from the current data.

Snapshot directories

Snapshots can be taken on a directory only after the directory is set as snapshottable. A snapshot directory can contain a maximum of 65,536 snapshots. No limit is imposed on the number of snapshot directories. Administrators can set any directory to be snapshottable. If a directory contains snapshots, the directory cannot be deleted or renamed until all the snapshots are deleted.

Nested snapshot directories are not allowed. If the parent directory or subdirectory of a directory is a snapshot directory, the directory cannot be set as snapshottable.

Snapshot paths

For a snapshot directory, you must suffix /.snapshot to the directory when you access snapshots in the directory. For example, /foo is a snapshot directory, /foo/bar is a file or directory in /foo, and the /foo directory has a snapshot s0. The /foo/.snapshot/s0/bar file is the snapshot copy of /foo/bar. When you call API operations or run commands on CLIs, you can add the /.snapshot suffix to access snapshots in a snapshot directory. Sample commands:
  • Query all snapshots in a snapshot directory.
    hdfs dfs -ls /foo/.snapshot
  • Query all files in Snapshot s0.
    hdfs dfs -ls /foo/.snapshot/s0
  • Copy a file from Snapshot s0.
    hdfs dfs -cp -ptopax /foo/.snapshot/s0/bar /tmp
    Note -ptopax is used to retain timestamps, ownership, permissions, access control lists (ACLs), and XAttrs.

Snapshot operations

  • Administrator operations
    Note Superuser permissions are required to perform administrator operations.
    • Allow snapshots
      Allow snapshots to be created for a directory. If the operation succeeds, the directory becomes snapshottable.
      hdfs dfsadmin -allowSnapshot <path>
    • Prohibit snapshots
      Prohibit snapshots from being created for a directory. All snapshots of the directory must be deleted before the operation is performed.
      hdfs dfsadmin -disallowSnapshot <path>
  • User operations
    Note An HDFS superuser has the permissions to perform all user operations.
    • Create a snapshot
      Create a snapshot for a snapshot directory. Only the owner of a snapshot directory is allowed to create a snapshot for the directory.
      hdfs dfs -createSnapshot <path> [snapshotName]
      Note The [snapshotName] parameter specifies the name of the snapshot. This parameter is optional. If you do not specify this parameter, a default name in the syyyyMMdd-HHmmss.SSSS format is generated. Example: s20130412-151029.033.
    • Delete a snapshot
      Delete a snapshot from a snapshot directory. Only the owner of a snapshot directory is allowed to delete a snapshot from the directory.
      hdfs dfs -deleteSnapshot <path> <snapshotName>
    • Rename a snapshot
      Rename a snapshot. Only the owner of a snapshot directory is allowed to rename a snapshot in the directory.
      hdfs dfs -renameSnapshot <path> <oldName> <newName>
      Note The <oldName> parameter specifies the original snapshot name. The <newName> parameter specifies the new snapshot name.
    • Query all snapshot directories
      Query all snapshot directories in which the current user has the permissions to take snapshots.
      hdfs lsSnapshottableDir
    • Compare snapshots
      Obtain the differences between two snapshots. This operation requires the read permissions on all files or directories in both snapshots.
      hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
      Note The <fromSnapshot> parameter specifies the original snapshot. The <toSnapshot> parameter specifies the snapshot that you want to compare with the original snapshot.