Pre-requisites

Before starting with Hadoop single node installation, make sure that the node have the following pre-requisites:

a)    Any Linux Operating system
b)    Sun Java 1.6 or above should already be installed and the version should be same across all the nodes. To install Java, you can refer to the installation steps mentioned in the blog.
c)     Secure Shell (ssh) must be already installed and its service (sshd) should be running. PasswordLess SSH configured from node its own IP address or hostname.
d)    Make sure that node is able to identify its HostName, i.e. node is able to resolve HostName to its IP address. For single node installation, you should be able to ping localhost (the Loopback HostName).

To prepare a node for Hadoop, you can follow the blog here.

Hadoop Single Node Installation Steps

The Hadoop single node installation is summarized as a simple 3 step process.

a)   Download and extract Hadoop Tarball bundle from Hadoop repository.
b)   Configure Hadoop environment variables and configuration files.
c)   Format NameNode and start DFS & MapReduce services using Hadoop scripts.

Follow the steps below for Hadoop single node installation on a linux machine:

1. Login to your system and download Hadoop 1.x bundle (tar.gz) file from Apache archive (Link for Hadoop 1.2.1).

wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz

 

2. Move the tar file to the home directory of hduser

mv hadoop-1.2.1.tar.gz /home/hduser

 

3. Extract the contents of the tar file.

tar -xzvf hadoop-1.2.1.tar.gz
cd /home/hduser/hadoop-1.2.1

 

4. Configure the Hadoop environment variables in ~/.bashrc (for Ubuntu) or ~/.bash_profile (for CentOS) using any text editor.

nano ~/.bashrc

Append the following variables in this file:

export HADOOP_HOME="/home/hduser/hadoop-1.2.1/"
export PATH=$PATH:$HADOOP_HOME/bin

After saving the file, run the source command to refresh the values of the environment variables.

source ~/.bashrc

 

5. Edit the /home/hduser/hadoop-1.2.1/conf/core-site.xml file and add the specify the Hadoop HDFS URI (NameNode and its port) to it

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

 

6. Edit the /home/hduser/hadoop-1.2.1/conf/hdfs-site.xml file and add the Hadoop HDFS properties:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

    <property>
        <name>dfs.name.dir</name>
        <value>/home/hduser/hadoop-1.2.1/hadoop_data/dfs/name</value>
    </property>
  
    <property>
        <name>dfs.data.dir</name>
        <value>/home/hduser/hadoop-1.2.1/hadoop_data/dfs/data</value>
    </property>
</configuration>

 

7. Edit the /home/hduser/hadoop-1.2.1/conf/mapred-site.xml file and specify the host and port for JobTracker daemon.

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
    </property>
</configuration>

 

8. Hadoop requires JAVA_HOME environment variable. You can check the value of JAVA_HOME in your system using the following command:

echo $JAVA_HOME

Edit the /home/hduser/hadoop-1.2.1/conf/hadoop-env.sh file and specify the JAVA_HOME for Hadoop.

export JAVA_HOME=<value for JAVA_HOME variable>

 

9. Last step before running the Hadoop services is to format the NameNode. Before executing the format command, make sure that the dfs.name.dir and dfs.data.dir directories specified in hdfs-site.xml file  does not exists.

hadoop namenode –format

 

10. Start Hadoop Services using the Hadoop scripts in the /home/hduser/hadoop-1.2.1/bin/ directory.

Services Command
NameNode hadoop-daemon.sh start namenode
DataNode hadoop-daemon.sh start datanode
JobTracker hadoop-daemon.sh start jobtracker
TaskTracker hadoop-daemon.sh start tasktracker

 

The output of the jps command (list java processes running on a system) will have the service name in the list. If not, you can refer to the service logs under /home/hduser/hadoop-1.2.1/logs directory.

 

11. Browse to Hadoop HDFS and JobTracker WebUI.

Service Link
Hadoop HDFS http://<NameNode machine IP>:50070/
JobTracker http://<JobTracker machine IP>:50030/
Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *