How Combiner works in Hadoop MapReduce

Hadoop is a framework used for handling Big Data. It uses HDFS as the distributed storage mechanism and MapReduce as the parallel processing paradigm for data residing in HDFS. The key components of Mapreduce are Mapper and Reducer. When a MapReduce Job runs on a large dataset, Mappers generate large […]

Working with HDFS Snapshots

This blog gives an overview of HDFS snapshots and different operations that users / cluster administrators can perform related to HDFS snapshots. It also explains snapshot management through Cloudera Manager. Overview of HDFS Snapshots Snapshots are data backup for protection against user errors and disaster recovery. Snapshots can be taken on a sub-tree of […]

Big Data – Hadoop Interview Questions

To crack an interview, it is must that your basic concepts about frameworks are pretty clear. On request of many of our students, we’ve put together a comprehensive list of questions to help you get through your Big Data – Hadoop interview. We’ve made sure that the most probable questions […]

Start HDFS High Availability Cluster

In the previous blog, we discussed about the HDFS high availability configuration. This blog describes the steps to start an HDFS high availability cluster. Pre-requisites Before starting with HDFS high availability cluster, make sure that the cluster meets the following pre-requisites: a)  If you have enabled Automatic Failover for Hot-BackUp during NameNode failover,  then before starting with HDFS high availability cluster, […]

HDFS High Availability Configuration

In the previous blog, we discussed about the HDFS High availability architecture. This blog describes the configurations for HDFS high availability in a Hadoop cluster. Pre-requisites Before configuring HDFS high availability, make sure that your Hadoop cluster has the following pre-requisites: a) You must have at-least two nodes to enable HDFS high availability. b) If you want to configure […]

HDFS High Availability Architecture

In the previous blog, we discussed about the need and design goals of HDFS High Availability. In this blog, we will talk about the architecture of HDFS high availability. HDFS High Availability Architecture In order to provide a HOT back-up and consistent solution for NameNode failure, a concept of using two […]

Hadoop 2 Single Node Installation with YARN

Pre-requisites Before starting with Hadoop 2 single node installation, make sure that the node have the following pre-requisites: a)    Any Linux Operating system b)    Sun Java 1.6 or above should already be installed and the version should be same across all the nodes. To install Java, you can refer to the installation steps […]