Scenario: Clustered setup of Hadoop > Exception : java.io.IOException: Incompatible clusterIDs in /home/hadoop/dfs/data: namenode clusterID …..
After creating a multiple node clustered env. of Hadoop with one name node (master) and three datanodes (master, slave1, slave2) with all configurations done perfectly (in /etc/hosts, /etc/hostname, hdfs-site.xml, yarn-site.xml, core-site.xml, mapred-site.xml, hadoop-env.sh, ssh setup, scp of hadoop to all nodes etc etc)
Issue: As soon as I started all the nodes using start-all.sh, I was able to see all the services (ResourceManager, NodeManager, SecondaryNameNode, NameNode, Jps) running on respective node other than Datanode service for each datanode.
Solution:
>I crossed checked all the files and noticed that every file and setting was perfect.
>I checked the logs of my namenode in file opt/hadoop-2.2.0/logs/hadoop-hadoop-datanode-master.out and found that
Whenever I was trying to start a DN on a slave machine I was getting
> java.io.IOException: Incompatible clusterIDs in /home/hadoop/dfs/data: namenode clusterID …..
> It was because after I set up my cluster, for some reason I reformated (using hdfs namenode -format command) my namenode.
My Data Nodes on slaves were still bearing the reference to the old Name Node.
>To resolve this I deleted and recreated data folder on all data nodes machines in local Linux FS, namely /home/hadoop/dfs/data.
Note : either do this manually or Restarting that DN’s daemon on that machine will recreate data/ folder’s content
Finally I resolved the pbm. 🙂