Pre-requisites

Before starting with Hadoop 2 single node installation, make sure that the node have the following pre-requisites:

a)    Any Linux Operating system
b)    Sun Java 1.6 or above should already be installed and the version should be same across all the nodes. To install Java, you can refer to the installation steps mentioned in the blog.
c)     Secure Shell (ssh) must be already installed and its service (sshd) should be running. PasswordLess SSH should be configured from node to its own IP address or hostname. To setup passwordless SSH across your nodes, you can refer to the blog here.
d)    Make sure that node is able to identify its HostName, i.e. node is able to resolve HostName to its IP address. For single node installation, you should be able to ping localhost (the Loopback HostName).

To prepare a node for Hadoop, you can follow the blog here.

Hadoop 2 Single Node Installation Steps

The Hadoop 2 single node installation is summarized as a simple 3 step process.

a)   Download and extract Hadoop 2 Tarball bundle from Hadoop repository.
b)   Configure Hadoop environment variables and configuration files.
c)   Format NameNode and start DFS & YARN services using Hadoop scripts.

Follow the steps below for Hadoop single node installation on a Linux machine:

1. Login to your system and download Hadoop 2.x bundle (tar.gz) file from Apache archive (Link for Hadoop 2.4.1).

wget http://archive.apache.org/dist/hadoop/core/hadoop-2.4.1/hadoop-2.4.1.tar.gz

 

2. You can use your home directory for Hadoop installation (/home/<username>).

mv hadoop-2.4.1.tar.gz /home/hduser/

 

3. Extract the contents of the tar file.

cd /home/hduser
tar -xzvf hadoop-2.4.1.tar.gz

 

4. Configure the Hadoop environment variables in ~/.bashrc (for Ubuntu) or ~/.bash_profile (for CentOS) using any text editor.

sudo nano ~/.bashrc

Append the following variables in this file:

export HADOOP_PREFIX="/home/hduser/hadoop-2.4.1/"
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}

After saving the file, run the source command to refresh the values of the environment variables.

source ~/.bashrc

 

5. Edit the /home/hduser/hadoop-2.4.1/etc/hadoop/core-site.xml file and add the Hadoop HDFS URI (NameNode and its port)  property under configuration tag.

    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:8020</value>
        <final>true</final>
    </property> 

 

6. Edit the /home/hduser/hadoop-2.4.1/etc/hadoop/hdfs-site.xml file and add the Hadoop HDFS properties:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hduser/hadoop-2.4.1/hadoop_data/dfs/name</value>
    </property>
  
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///home/hduser/hadoop-2.4.1/hadoop_data/dfs/data</value>
    </property>
</configuration>

 

7. Edit the /home/hduser/hadoop-2.4.1/etc/hadoop/mapred-site.xml file and specify the MapReduce framework as YARN.

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

 

8. Edit the /home/hduser/hadoop-2.4.1/etc/hadoop/yarn-site.xml file and specify the YARN properties.

<configuration>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>localhost:8030</value>
    </property>

    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>localhost:8031</value>
    </property>

    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>localhost:8033</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>localhost:8088</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>
</configuration>

9. Hadoop requires JAVA_HOME environment variable. You can check the value of JAVA_HOME in your system using the following command:

echo $JAVA_HOME

Edit the /home/hduser/hadoop-2.4.1/etc/hadoop/hadoop-env.sh file and specify the JAVA_HOME for Hadoop.

export JAVA_HOME=<value for JAVA_HOME variable>

 

10. Last step before running the Hadoop services is to format the NameNode. Before executing the format command, make sure that the dfs.namenode.name.dir and dfs.datanode.data.dir directories specified in hdfs-site.xml file  does not exists. The hdfs command exists in the /home/hduser/hadoop-2.4.1/bin directory.

hdfs namenode –format

 

11. Start Hadoop Services using the Hadoop 2 Scripts in the /home/hduser/hadoop-2.4.1/sbin/ directory.

Service Command
Namenode hadoop-daemon.sh start namenode
Datanode hadoop-daemon.sh start datanode
Resourcemanager yarn-daemon.sh start resourcemanager
Nodemanager yarn-daemon.sh start nodemanager
Job History Server mr-jobhistory-daemon.sh start historyserver

The output of the jps command (list java processes running on a system) will have the service name in the list. If not, you can refer to the service logs under /home/hduser/hadoop-2.4.1/logs directory.

 

12. Browse to Hadoop 2 HDFS and ResourceManager WebUI.

Service Link
Hadoop HDFS http://<NameNode machine IP>:50070/
ResourceManager http://<ResourceManager machine IP>:8088/

 

Share this:

2 thoughts on “Hadoop 2 Single Node Installation with YARN

  1. This process worked step by step. No issues. Can you please link the next blog for how to check Hadoop and its running services. Do we have any GUI available for the same?

Leave a Reply

Your email address will not be published. Required fields are marked *