The document starts with the introduction to Hadoop and covers the Hadoop 1.x / 2.x services (HDFS / MapReduce / YARN). It also explains the architecture of Hadoop, the working of Hadoop distributed file system and MapReduce programming model. Hadoop Introduction
Big Data
All frameworks and technologies in Big Data domain.
Install Hive with local metastore
Being a data-warehousing framework, a single session for Hive is not preferred. To solve this limitation of Embedded Metastore, a support for Local Metastore was developed. A separate database service runs as a process on same or remote machine. The Metastore service still runs in the same JVM within hive […]
Install Hive with embedded metastore
Hive package comes with derby as default embeded metastore. Follow below mentioned steps to install Hive with embedded metastore: 1. Download the latest version of Hive from here. 2. Uncompress the package on linux: tar –xzvf apache-hive-0.13.1-bin.tar.gz 3. Add following to ~/.bash_profile sudo nano ~/.bash_profile export HIVE_HOME=/home/hduser/hive-0.13.1 export PATH=$PATH:$HIVE_HOME/bin Where […]
Apache Hadoop YARN: Best Practices
Found a nice presentation on YARN:Best Practices by Hortonworks…!!
Start HDFS High Availability Cluster
In the previous blog, we discussed about the HDFS high availability configuration. This blog describes the steps to start an HDFS high availability cluster. Pre-requisites Before starting with HDFS high availability cluster, make sure that the cluster meets the following pre-requisites: a) If you have enabled Automatic Failover for Hot-BackUp during NameNode failover, then before starting with HDFS high availability cluster, […]
HDFS High Availability Configuration
In the previous blog, we discussed about the HDFS High availability architecture. This blog describes the configurations for HDFS high availability in a Hadoop cluster. Pre-requisites Before configuring HDFS high availability, make sure that your Hadoop cluster has the following pre-requisites: a) You must have at-least two nodes to enable HDFS high availability. b) If you want to configure […]
Hadoop HDFS Concepts
This presentation gives an overview of Hadoop HDFS concepts like Blocks, Rack Awareness, Safe Mode etc. Hadoop HDFS Concepts from tutorialvillage
Hadoop Cluster Setup
If you wish to deploy Hadoop Single node setup, please follow the blog here. Pre-requisites Before starting with Hadoop cluster setup, make sure that the node have the following pre-requisites: a) Any Linux Operating system b) Sun Java 1.6 or above should already be installed and the version should be same across all […]
MongoDB installation from tar distribution
Follow the steps below for MongoDB installation using tar distribution: 1. Download the stable release of MongoDB from here. 2. Extract the distribution tar –xzvf mongodb-linux 3. Create a directory for mongo db in /opt. mkdir /opt/mongodb 4. Move the distribution files to mongodb directory mv mongolinux/* /opt/mongodb 5. Add […]
MapReduce Introduction
Hadoop MapReduce is a software framework designed to develop applications to process large dataset in parallel in a reliable and fault tolerant manner. A MapReduce application processes the input dataset into chunks in parallel on multiple nodes. The below diagram shows the different phases for a MapReduce application: There are two […]