MapReduce programming

In this article, you will learn to write MapReduce program using Java programming language. This program is to just understand the concept of MapReduce programming, which will simply take some input file and same data will be passed through Mappers and Reducer to generate the final output.

Pre-Requisites:

1. Eclipse IDE
2. Hadoop Libraries:

  1. hadoop-mapreduce-client-core-2.6.0.jar
  2. hadoop-common-2.6.0.jar

Step by Step Guide to write MapReduce Program

Follow the below steps to create your first Map reduce application.

Step-1: Create Java Project

1. Open Eclipse IDE
2. Create a new Java project ( Name it as say “FirstMapReduceApplication”).
3. Create a new folder under the project as “lib”.
4. Add following 2 jars to the lib folder:

  1. hadoop-mapreduce-client-core-2.6.0.jar
  2. hadoop-common-2.6.0.jar

5. Right click on the project and open project “properties”.
6. Click on Java Build Path.
7. Move to “Libraries” tab.
8. Click on “Add Jars”.
9. Select and Add the Jars available in lib folder.

Step-2: Write Mapper Class

1. Right click on the project and click on New -> Class.
2. Create a class with name “FirstMapReduceMapper”.
3. Copy below code in that class

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class FirstMapReduceMapper extends Mapper<LongWritable, Text, LongWritable, Text> {

    @Override
    protected void map(LongWritable key, Text value,Context context)throws IOException,   
                                                        InterruptedException {
        context.write(key, value);    
    }

}

For text input format, this map method will just pass the record offset as key and complete record as value for each record available in the split. Apply the filtering or search logic on the record “value” in map to be applied on each record and write only relevant records in the context. Also define appropriate key and value datatypes in Mapper abstract class.

Step-3: Write Reducer Class

1. Right click on the project and click on New -> Class.
2. Create a class with name “FirstMapReduceReducer”.
3. Copy below code in that class

import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.io.LongWritable;

public class FirstMapReduceReducer extends Reducer< LongWritable, Text, LongWritable, Text>{

    @Override
    protected void reduce(LongWritable key, Iterable<Text> values,Context context)throws IOException, InterruptedException {
        for(Text record : values){
            context.write(key, record);
        }
    }
}

The reducer in this case will receive offsets as key and complete record as value. Since the offsets are unique, so the sort/shuffle and merge phase after mapper would not shuffle and merge the keys and the iterable input values would have a single record. We are just writing the record as output with offset of record as key. Also define appropriate key and value datatypes in Reducer abstract class.

Step-4: Write Main Job Class

1. Right click on the project and click on New -> Class.
2. Create a class with name “FirstMapReduceMain”.
3. Copy below code in that class

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class FirstMapReduceMain {
    
    public static void main(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.println("Usage: FirstMapReduceMain  <input path> <output path>");
            System.exit(-1);
        }
        Job job = Job.getInstance();
        job.setJarByClass(FirstMapReduceMain.class);
        job.setJobName("Product Purchased From Selected IPs");
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.setMapperClass(FirstMapReduceMapper.class);
        job.setReducerClass(FirstMapReduceReducer.class);
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(Text.class);
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

}

The main method contains the job description with information like HDFS input file/folder path, HDFS output folder path, Mapper and Reducer class name and so on. Also define appropriate data-types for output key and value class.

Step-4: Compile and Build Runnable jar

Follow the steps to create and run MapReduce application jar:

1. Right click on “FirstMapReduceMain” class and then click on “Run As” => “Java Application”.
It will compile the source code and will give you usage warning message:

2. Right click on the project => Click on “Export..”
3. Search and select “Runnable JAR file”. Then click on “Next” button.
4. Select the main method class in “Launch Configuration”
5. Provide the path in “Export Destination” to store the JAR file.
6. Copy the JAR file to any Hadoop node of the cluster.
7. Run below command to execute MapReduce application on cluster:

yarn jar <path-to-firstjob.jar> <input-file-hdfs-path> <hdfs-output-folder-path>

OR

hadoop jar <path-to-firstjob.jar> <input-file-hdfs-path> <hdfs-output-folder-path>

This is how you can write any MapReduce application based on your requirement. To apply any filter kind operation, you can easily do that in Mapper class and to apply aggregation kind operation you can do that in Reducer class.

Share this:

One thought on “Getting started with MapReduce Programming

Leave a Reply to Anonymous Cancel reply

Your email address will not be published. Required fields are marked *