Home Forums HDFS HDFS knowbouts

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #869
    ProTechSkills
    Keymaster

    q. what is fsimage and editlog in hdfs?
    fsimage:
    Entire filesystem namespace including mapping of blocks to files and file system properties is stored in a file FsImage. Stored in Namenode’s local filesystem.
    Editlog:
    Namenode uses a transaction log called the EditLog to record every change that occurs to the filesystem meta data.(change happening in the fsimage)
    For example, creating a new file.
    Change replication factor of a file
    EditLog is stored in the Namenode’s local filesystem.

    When the Namenode starts up it gets the FsImage and Editlog from its local file system, update FsImage with EditLog information and then stores a copy of the FsImage on the filesytstem as a checkpoint.
    FsImage and EditLog are central data structures of HDFS.
    A corruption of these files can cause a HDFS instance to be non-functional.
    For this reason, a Namenode can be configured to maintain multiple copies of the FsImage and EditLog.
    Multiple copies of the FsImage and EditLog files are updated synchronously.

    FSImage

    Image of entire file system namespace
    Mappings of blocks to files
    File system properties
    Stored in a file on the local OS file system
    Editlog

    Transaction log
    Records all changes to file
    System metadata

    All file transaction in filesystem(HDFS) would be stored in Editlog file. Editlogs gets merged to FSImage whenever Namenode would be restart. Which will than loaded in RAM for further use.

    q. what is secondary namenode?
    ans. because of some problems which can happen in the hdfs environment as followed:
    1. Editlog file may increase drastically,which will be challenging to manage.
    2. Longer Namenode restarting due to lot of changes needs to be merged.
    3. In the case of crash, we will lost huge amount of metadata since fsimage is very old.

    Secondary namenode is solution for this issue. This is another machine having connectivity with namenode. It periodically copies FSImage and Editlog from name node and merged FSImage with log file. Moved back to updated FSImage file to Namenode. Secondary Namenode is not supposed to provide High Availability Namenode. Highlevel task performed by secondary namenode is

    Received edit logs from the namenode and merged to fsimage
    Copies back updated FSImage to namenode
    Updated FSImage will reduce the startup time

    Secondary Namenode whole purpose is to have a checkpoint in HDFS.

    q. tool that converts editlogs and fsimage contents into human readable format?
    ans. offline image viewer
    The simplest usage of the Offline Image Viewer is to provide just an input and output file, via the -i and -o command-line switches:

    bash$ bin/hdfs oiv -i fsimage -o fsimage.txt

    This will create a file named fsimage.txt in the current directory using the Ls output processor. For very large image files, this process may take several minutes.

    One can specify which output processor via the command-line switch -p. For instance:

    bash$ bin/hdfs oiv -i fsimage -o fsimage.xml -p XML

    or

    bash$ bin/hdfs oiv -i fsimage -o fsimage.txt -p Indented

    This will run the tool using either the XML or Indented output processor, respectively.

    machine:hadoop-0.21.0-dev theuser$ bin/hdfs oiv -i fsimagedemo -p Indented -o fsimage.txt

    FSImage
    ImageVersion = -19
    NamespaceID = 2109123098
    GenerationStamp = 1003
    INodes [NumInodes = 12]
    Inode
    INodePath =
    Replication = 0
    ModificationTime = 2009-03-16 14:16
    AccessTime = 1969-12-31 16:00
    BlockSize = 0
    Blocks [NumBlocks = -1]
    NSQuota = 2147483647
    DSQuota = -1
    Permissions
    Username = theuser
    GroupName = supergroup
    PermString = rwxr-xr-x
    …remaining output omitted…

    q. decommissioning of nodes?
    ans. Decommissioning a host decommissions and stops all roles on the host without having to go to each service and individually decommission the roles. Decommissioning applies to only to HDFS DataNode, MapReduce TaskTracker, YARN NodeManager, and HBase RegionServer roles. If the host has other roles running on it, those roles are stopped.

    Once all roles on the host have been decommissioned and stopped, the host can be removed from service. You can decommission multiple hosts in parallel.

Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.