Comparison Between Hadoop 2.x vs Hadoop 3.x

Posted on May 3, 2020May 9, 2020 by ProTechSkills

Hadoop has undergone many changes in three different versions. Hadoop 3 combines the efforts of hundreds of contributors over the last six years since Hadoop 2 launched. In this tutorial, we will discuss the Comparison between Hadoop 2.x vs Hadoop 3.x. So, let’s first see comparison in tabular format: Hadoop […]

Working with HDFS Snapshots

Posted on June 7, 2016June 7, 2016 by ProTechSkills

This blog gives an overview of HDFS snapshots and different operations that users / cluster administrators can perform related to HDFS snapshots. It also explains snapshot management through Cloudera Manager. Overview of HDFS Snapshots Snapshots are data backup for protection against user errors and disaster recovery. Snapshots can be taken on a sub-tree of […]

Introduction to Hadoop

Posted on October 18, 2015 by ProTechSkills

The document starts with the introduction to Hadoop and covers the Hadoop 1.x / 2.x services (HDFS / MapReduce / YARN). It also explains the architecture of Hadoop, the working of Hadoop distributed file system and MapReduce programming model. Hadoop Introduction

Start HDFS High Availability Cluster

Posted on September 22, 2014April 10, 2016 by ProTechSkills

In the previous blog, we discussed about the HDFS high availability configuration. This blog describes the steps to start an HDFS high availability cluster. Pre-requisites Before starting with HDFS high availability cluster, make sure that the cluster meets the following pre-requisites: a) If you have enabled Automatic Failover for Hot-BackUp during NameNode failover, then before starting with HDFS high availability cluster, […]

HDFS High Availability Configuration

Posted on August 29, 2014November 29, 2015 by ProTechSkills

In the previous blog, we discussed about the HDFS High availability architecture. This blog describes the configurations for HDFS high availability in a Hadoop cluster. Pre-requisites Before configuring HDFS high availability, make sure that your Hadoop cluster has the following pre-requisites: a) You must have at-least two nodes to enable HDFS high availability. b) If you want to configure […]

HDFS High Availability Architecture

Posted on August 6, 2014September 22, 2014 by ProTechSkills

In the previous blog, we discussed about the need and design goals of HDFS High Availability. In this blog, we will talk about the architecture of HDFS high availability. HDFS High Availability Architecture In order to provide a HOT back-up and consistent solution for NameNode failure, a concept of using two […]

HDFS High Availability Overview

Posted on July 18, 2014September 22, 2014 by ProTechSkills

Background Single Point of Failure (SPOF) in HDFS: Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine. Ecosystem Dependency: The Hadoop ecosystem components like […]

PROTECHSKILLS

hdfs