Hadoop is a distributed processing framework with multiple nodes connected with each other through network. An administrator needs to prepare node for Hadoop, i.e. configure a node to be used as a part of Hadoop cluster.  This blog describes a list of prerequisites and how you can configure these prerequisites before using a node as a part of your Hadoop cluster.

Prepare Node for Hadoop

1.  Install Java: Sun Java 1.6 or above should already be installed should on all nodes in a Hadoop cluster. The Java version and JAVA_HOME should be consistent across all the nodes. To install and configure Java in Linux, you can refer to the blog here.

2.  Enable Passwordless SSH: Hadoop master requires passwordless ssh connection to all the slave nodes in the cluster. As an administrator, you need to start sshd service on all the nodes and configure passwordless ssh connection from master to slave nodes. To do so, you can refer to the blog here.

3.  Disable Firewall or Open Hadoop Ports: Hadoop Daemons use some TCP and RPC ports for internal communication. Some other ports are used for accessing the Web UI for services. The administrator has to either disable the firewall of the nodes or allow traffic on the ports required by Hadoop. Hadoop has a list of default ports, but you can configure them as per your need. The documentation for both Cloudera and HortonWorks describes the list of ports used and their usage. You can refer either of them, Cloudera and HortonWorks. For firewall options in Ubuntu, you can refer to the blog here. If you are using CentOS, you can refer to the blog here.

4.  Configure Machine HostName and IP Address: A cluster administrator needs to ensure that all nodes in the cluster resolve to a unique hostname and each node is able to resolve hostname of other nodes. You can either configure a DHCP server or configure /etc/hosts file as:

192.168.56.101 node1
192.168.56.102 node2

To view system hostname, you can execute hostname command. To modify system hostname in Linux, you can refer to the blogs here (CentOS / Ubuntu). After editing the file, you need to reboot your system.

Share this:

2 thoughts on “Prepare Node for Hadoop

Leave a Reply

Your email address will not be published. Required fields are marked *