Passwordless SSH (Secure Shell) between two machines is required by a lot of distributed frameworks. It creates a secure shell connection from the host machine to the remote machine without password prompt. Follow the steps below to configure Passwordless SSH between two linux machines. Prerequisites 1. Install Open SSH Server package on […]
Hadoop
A guide for professionals to start working with Hadoop, understand its architecture and explore the power of Hadoop.
Introduction to Hadoop
The document starts with the introduction to Hadoop and covers the Hadoop 1.x / 2.x services (HDFS / MapReduce / YARN). It also explains the architecture of Hadoop, the working of Hadoop distributed file system and MapReduce programming model. Hadoop Introduction
Apache Hadoop YARN: Best Practices
Found a nice presentation on YARN:Best Practices by Hortonworks…!!
Start HDFS High Availability Cluster
In the previous blog, we discussed about the HDFS high availability configuration. This blog describes the steps to start an HDFS high availability cluster. Pre-requisites Before starting with HDFS high availability cluster, make sure that the cluster meets the following pre-requisites: a) If you have enabled Automatic Failover for Hot-BackUp during NameNode failover, then before starting with HDFS high availability cluster, […]
HDFS High Availability Configuration
In the previous blog, we discussed about the HDFS High availability architecture. This blog describes the configurations for HDFS high availability in a Hadoop cluster. Pre-requisites Before configuring HDFS high availability, make sure that your Hadoop cluster has the following pre-requisites: a) You must have at-least two nodes to enable HDFS high availability. b) If you want to configure […]
Hadoop HDFS Concepts
This presentation gives an overview of Hadoop HDFS concepts like Blocks, Rack Awareness, Safe Mode etc. Hadoop HDFS Concepts from tutorialvillage
Hadoop Cluster Setup
If you wish to deploy Hadoop Single node setup, please follow the blog here. Pre-requisites Before starting with Hadoop cluster setup, make sure that the node have the following pre-requisites: a) Any Linux Operating system b) Sun Java 1.6 or above should already be installed and the version should be same across all […]
MapReduce Introduction
Hadoop MapReduce is a software framework designed to develop applications to process large dataset in parallel in a reliable and fault tolerant manner. A MapReduce application processes the input dataset into chunks in parallel on multiple nodes. The below diagram shows the different phases for a MapReduce application: There are two […]
HDFS High Availability Architecture
In the previous blog, we discussed about the need and design goals of HDFS High Availability. In this blog, we will talk about the architecture of HDFS high availability. HDFS High Availability Architecture In order to provide a HOT back-up and consistent solution for NameNode failure, a concept of using two […]
Hadoop Single Node Installation
Pre-requisites Before starting with Hadoop single node installation, make sure that the node have the following pre-requisites: a) Any Linux Operating system b) Sun Java 1.6 or above should already be installed and the version should be same across all the nodes. To install Java, you can refer to the installation steps […]