Hadoop 2 Single Node Installation

Posted on August 2, 2014August 22, 2015 by ProTechSkills

Pre-requisites

Before starting with Hadoop 2 single node installation, make sure that the node have the following pre-requisites:

a)    Any Linux Operating system
b)    Sun Java 1.6 or above should already be installed and the version should be same across all the nodes. To install Java, you can refer to the installation steps mentioned in the blog.
c)     Secure Shell (ssh) must be already installed and its service (sshd) should be running. PasswordLess SSH should be configured from node to its own IP address or hostname. To setup passwordless SSH across your nodes, you can refer to the blog here.
d)    Make sure that node is able to identify its HostName, i.e. node is able to resolve HostName to its IP address. For single node installation, you should be able to ping localhost (the Loopback HostName).

To prepare a node for Hadoop, you can follow the blog here.

Hadoop 2 Single Node Installation Steps

The Hadoop 2 single node installation is summarized as a simple 3 step process.

a) Download and extract Hadoop 2 Tarball bundle from Hadoop repository.
b) Configure Hadoop environment variables and configuration files.
c) Format NameNode and start DFS & YARN services using Hadoop scripts.

Follow the steps below for Hadoop single node installation on a Linux machine:

1. Login to your system and download Hadoop 2.x bundle (tar.gz) file from Apache archive (Link for Hadoop 2.4.1).

wget http://archive.apache.org/dist/hadoop/core/hadoop-2.4.1/hadoop-2.4.1.tar.gz

2. You can use your home directory for Hadoop installation (/home/<username>).

mv hadoop-2.4.1.tar.gz /home/hduser/

3. Extract the contents of the tar file.

cd /home/hduser
tar -xzvf hadoop-2.4.1.tar.gz

4. Configure the Hadoop environment variables in ~/.bashrc (for Ubuntu) or ~/.bash_profile (for CentOS) using any text editor.

sudo nano ~/.bashrc

Append the following variables in this file:

export HADOOP_PREFIX="/home/hduser/hadoop-2.4.1/"
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}

After saving the file, run the source command to refresh the values of the environment variables.

source ~/.bashrc

5. Edit the /home/hduser/hadoop-2.4.1/etc/hadoop/core-site.xml file and add the Hadoop HDFS URI (NameNode and its port) property under configuration tag.

    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:8020</value>
        <final>true</final>
    </property>

6. Edit the /home/hduser/hadoop-2.4.1/etc/hadoop/hdfs-site.xml file and add the Hadoop HDFS properties:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hduser/hadoop-2.4.1/hadoop_data/dfs/name</value>
    </property>
  
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///home/hduser/hadoop-2.4.1/hadoop_data/dfs/data</value>
    </property>
</configuration>

7. Edit the /home/hduser/hadoop-2.4.1/etc/hadoop/mapred-site.xml file and specify the MapReduce framework as YARN.

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

8. Edit the /home/hduser/hadoop-2.4.1/etc/hadoop/yarn-site.xml file and specify the YARN properties.

<configuration>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>localhost:8030</value>
    </property>

    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>localhost:8031</value>
    </property>

    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>localhost:8033</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>localhost:8088</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>
</configuration>

9. Hadoop requires JAVA_HOME environment variable. You can check the value of JAVA_HOME in your system using the following command:

echo $JAVA_HOME

Edit the /home/hduser/hadoop-2.4.1/etc/hadoop/hadoop-env.sh file and specify the JAVA_HOME for Hadoop.

export JAVA_HOME=<value for JAVA_HOME variable>

10. Last step before running the Hadoop services is to format the NameNode. Before executing the format command, make sure that the dfs.namenode.name.dir and dfs.datanode.data.dir directories specified in hdfs-site.xml file does not exists. The hdfs command exists in the /home/hduser/hadoop-2.4.1/bin directory.

hdfs namenode –format

11. Start Hadoop Services using the Hadoop 2 Scripts in the /home/hduser/hadoop-2.4.1/sbin/ directory.

Service	Command
Namenode	hadoop-daemon.sh start namenode
Datanode	hadoop-daemon.sh start datanode
Resourcemanager	yarn-daemon.sh start resourcemanager
Nodemanager	yarn-daemon.sh start nodemanager
Job History Server	mr-jobhistory-daemon.sh start historyserver

The output of the jps command (list java processes running on a system) will have the service name in the list. If not, you can refer to the service logs under /home/hduser/hadoop-2.4.1/logs directory.

12. Browse to Hadoop 2 HDFS and ResourceManager WebUI.

Service	Link
Hadoop HDFS	http://<NameNode machine IP>:50070/
ResourceManager	http://<ResourceManager machine IP>:8088/

2 thoughts on “Hadoop 2 Single Node Installation with YARN”

Nitin Mahesh says:

July 13, 2014 at 7:57 am

This process worked step by step. No issues. Can you please link the next blog for how to check Hadoop and its running services. Do we have any GUI available for the same?

Lending says:

January 6, 2016 at 4:16 pm

Im thankful for the blog article.Really thank you! Cool.

PROTECHSKILLS

Hadoop 2 Single Node Installation with YARN

Pre-requisites

Hadoop 2 Single Node Installation Steps

2 thoughts on “Hadoop 2 Single Node Installation with YARN”

Leave a Reply Cancel reply

Get in touch with us today! Let's Talk About Your Needs.