This blog gives an overview of HDFS snapshots and different operations that users / cluster administrators can perform related to HDFS snapshots. It also explains snapshot management through Cloudera Manager.
Overview of HDFS Snapshots
- Snapshots are data backup for protection against user errors and disaster recovery.
- Snapshots can be taken on a sub-tree of the file system or the entire file system.
- Snapshots allow administrators to create point-in-time backups of directories or the entire file-system without actually cloning the data.
- Snapshots appear on the file-system as read-only directories that can be accessed just like other ordinary directories.
- Blocks in DataNodes are not copied: the snapshot files record the block list and the file size. There is no data copying.
- Administrators can create snapshots using Cloudera Manager or by using the command line.
- Snapshots can be taken on any directory once the directory has been set as snapshottable.
- A snapshottable directory is able to accommodate 65,536 simultaneous snapshots.
- There is no limit on the number of snapshottable directories.
- Administrators may set any directory to be snapshottable.
- If there are snapshots in a snapshottable directory, the directory can be neither deleted nor renamed before all the snapshots are deleted.
- Nested snapshottable directories are currently not allowed.
Example of Snapshottable Directory
For a snapshottable directory, the path component “.snapshot” is used for accessing its snapshots. Suppose if:
- /foo is a snapshottable directory
- /foo/bar is a file/directory in /foo
- /foo has a snapshot s0
Then, the snapshot copy of /foo/bar is /foo/.snapshot/s0/bar.
- Enabling and Disabling HDFS Snapshots # Designate HDFS directories to be “snapshottable” so snapshots can be created for those directories.
- Taking Snapshots # Initiate immediate (unscheduled) snapshots of a table.
- Deleting Snapshots # Delete a saved snapshot.
- Restoring Snapshots # Restore an HDFS directory or file from a saved snapshot or create a new directory or file using “Restore As”.
- Viewing Snapshots # View the list of saved snapshots currently being maintained.
Managing HDFS Snapshots using Cloudera Manager
- Available for Cloudera Manager Enterprise version only (CDH 5)
- For HDFS service, use the File Browser tab to view the HDFS directories associated with your cluster.
- Each directory in the File browser has a drop-down menu next to the full file path.
- The menu has multiple options for snapshot operations.