MapReduce Introduction

Hadoop MapReduce is a software framework designed to develop applications to process large dataset in parallel in a reliable and fault tolerant manner. A MapReduce application processes the input dataset into chunks in parallel on multiple nodes. The below diagram shows the different phases for a MapReduce application:   There are two […]

Zookeeper Standalone Installation

Pre-requisites Before starting with Zookeeper standalone installation, make sure that the node have the following pre-requisites: a)  Supported Platforms: GNU/Linux, Win32, MacOSX, FreeBSD and Sun Solaris. This blog describes the installation steps for Linux. b)  Sun Java 1.6 or above should already be installed. To install Java, you can refer to the installation steps mentioned in the blog. […]

Zookeeper Clustered Mode Installation

Pre-requisites Before starting with Zookeeper cluster mode installation, make sure that the node have the following pre-requisites: a)  Supported Platforms: GNU/Linux, Win32, MacOSX, FreeBSD and Sun Solaris. This blog describes the installation steps for Linux. b)  Sun Java 1.6 or above should already be installed. To install Java, you can refer to the installation steps […]

HDFS High Availability Architecture

In the previous blog, we discussed about the need and design goals of HDFS High Availability. In this blog, we will talk about the architecture of HDFS high availability. HDFS High Availability Architecture In order to provide a HOT back-up and consistent solution for NameNode failure, a concept of using two […]

Flume Installation

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. In this post, we would discuss about flume installation. The use of Apache Flume is not only restricted to log data aggregation. […]

Hadoop Single Node Installation

Pre-requisites Before starting with Hadoop single node installation, make sure that the node have the following pre-requisites: a)    Any Linux Operating system b)    Sun Java 1.6 or above should already be installed and the version should be same across all the nodes. To install Java, you can refer to the installation steps […]

Hadoop 2 Single Node Installation with YARN

Pre-requisites Before starting with Hadoop 2 single node installation, make sure that the node have the following pre-requisites: a)    Any Linux Operating system b)    Sun Java 1.6 or above should already be installed and the version should be same across all the nodes. To install Java, you can refer to the installation steps […]