This blog gives an overview of HDFS snapshots and different operations that users / cluster administrators can perform related to HDFS snapshots. It also explains snapshot management through Cloudera Manager.

Overview of HDFS Snapshots

  • Snapshots are data backup for protection against user errors and disaster recovery.
  • Snapshots can be taken on a sub-tree of the file system or the entire file system.
  • Snapshots allow administrators to create point-in-time backups of directories or the entire file-system without actually cloning the data.
  • Snapshots appear on the file-system as read-only directories that can be accessed just like other ordinary directories.
  • Blocks in DataNodes are not copied: the snapshot files record the block list and the file size. There is no data copying.
  • Administrators can create snapshots using Cloudera Manager or by using the command line.

Reference # http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html

 

Snapshottable Directories

  • Snapshots can be taken on any directory once the directory has been set as snapshottable.
  • A snapshottable directory is able to accommodate 65,536 simultaneous snapshots.
  • There is no limit on the number of snapshottable directories.
  • Administrators may set any directory to be snapshottable.
  • If there are snapshots in a snapshottable directory, the directory can be neither deleted nor renamed before all the snapshots are deleted.
  • Nested snapshottable directories are currently not allowed.

Example of Snapshottable Directory

For a snapshottable directory, the path component “.snapshot” is used for accessing its snapshots. Suppose if:

  • /foo is a snapshottable directory
  • /foo/bar is a file/directory in /foo
  • /foo has a snapshot s0

Then, the snapshot copy of /foo/bar is /foo/.snapshot/s0/bar.

 

Snapshot Operations

  • Enabling and Disabling HDFS Snapshots # Designate HDFS directories to be “snapshottable” so snapshots can be created for those directories.
  • Taking Snapshots # Initiate immediate (unscheduled) snapshots of a table.
  • Deleting Snapshots # Delete a saved snapshot.
  • Restoring Snapshots # Restore an HDFS directory or file from a saved snapshot or create a new directory or file using “Restore As”.
  • Viewing Snapshots # View the list of saved snapshots currently being maintained.

 

Managing HDFS Snapshots using Cloudera Manager

  • Available for Cloudera Manager Enterprise version only (CDH 5)
  • For HDFS service, use the File Browser tab to view the HDFS directories associated with your cluster.
  • Each directory in the File browser has a drop-down menu next to the full file path.
  • The menu has multiple options for snapshot operations.

Manage Hdfs Snapshots-Cloudera Manager

Reference # http://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_bdr_managing_hdfs_snapshots.html

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *