Hadoop Distributed File System (HDFS) snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. In most cases, snapshots are used to back up data so that data can be restored even when misoperations occur. This topic describes the directories and paths of HDFS snapshots and related snapshot operations.

Background information

HDFS snapshots have the following features:
  • Snapshot creation is instantaneous. The time complexity is O(1), excluding the inode lookup time.
  • Additional memory is used only when snapshot data is modified. The memory usage complexity is O(M), where M is the number of files or directories whose data is modified.
  • Data blocks managed by DataNodes are not copied. Snapshot files record only the block list and the file size.
  • Snapshots do not have a negative impact on regular HDFS operations. Modifications are recorded in reverse chronological order so that the latest data can be directly accessed. The snapshot data is obtained by subtracting the modifications from the current data.

Snapshot directories

Snapshots can be taken on a directory only after the directory is set as snapshottable. A snapshot directory can contain a maximum of 65,536 snapshots. No limit is imposed on the number of snapshot directories. Administrators can set any directory to be snapshottable. If a directory contains snapshots, the directory cannot be deleted or renamed until all the snapshots are deleted.

Nested snapshot directories are not allowed. If the parent directory or subdirectory of a directory is a snapshot directory, the directory cannot be set as snapshottable.

Snapshot paths

For a snapshot directory, you must suffix /.snapshot to the directory when you access snapshots in the directory. For example, /foo is a snapshot directory, /foo/bar is a file or directory in /foo, and the /foo directory has a snapshot s0. The /foo/.snapshot/s0/bar file is the snapshot copy of /foo/bar. You can include .snapshot in common APIs and CLIs to view snapshot paths. Sample commands:
  • Query all snapshots in a snapshot directory.
    hdfs dfs -ls /foo/.snapshot
  • Query all files in snapshot s0.
    hdfs dfs -ls /foo/.snapshot/s0
  • Copy a file from snapshot s0.
    hdfs dfs -cp -ptopax /foo/.snapshot/s0/bar /tmp
    Note -ptopax is used to retain timestamps, ownership, permissions, access control lists (ACLs), and XAttrs.

Snapshot operations

  • Administrator operations
    Note Superuser permissions are required to perform the following operations.
    • Allow snapshots to be created for a directory
      If the operation succeeds, the directory becomes snapshottable.
      hdfs dfsadmin -allowSnapshot <path>
    • Prohibit snapshots from being created for a directory
      All snapshots of the directory must be deleted before the operation is performed.
      hdfs dfsadmin -disallowSnapshot <path>
  • User operations
    Note The HDFS superuser has the permissions to perform all the following operations.
    • Create a snapshot for a snapshot directory
      Only the owner of a snapshot directory is allowed to create a snapshot for the directory.
      hdfs dfs -createSnapshot <path> [snapshotName]
      Note The [snapshotName] parameter indicates the name of the snapshot. This parameter is optional. If you do not specify this parameter, a default name in the syyyyMMdd-HHmmss.SSSS format is generated. Example: s20130412-151029.033.
    • Delete a snapshot from a snapshot directory
      Only the owner of a snapshot directory is allowed to delete a snapshot from the directory.
      hdfs dfs -deleteSnapshot <path> <snapshotName>
    • Rename a snapshot
      Only the owner of a snapshot directory is allowed to rename a snapshot in the directory.
      hdfs dfs -renameSnapshot <path> <oldName> <newName>
      Note <oldName> indicates the original snapshot name. <newName> indicates the new snapshot name.
    • Obtain all snapshot directories in which the current user has the permissions to take snapshots
      hdfs lsSnapshottableDir
    • Obtain differences between two snapshots
      This operation requires the read permissions on all files or directories in both snapshots.
      hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
      Note <fromSnapshot> indicates the original snapshot. <toSnapshot> indicates the snapshot you want to compare with the original snapshot.