0% found this document useful (0 votes)
55 views

Module10-BigData Guide v1.0

The document discusses developing and running a MapReduce program on AWS Elastic MapReduce. It provides steps to create a MapReduce Java project in Eclipse with Map and Reduce classes, package it into a JAR file, and then run the JAR on EMR by specifying the input and output locations.

Uploaded by

srinubasani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Module10-BigData Guide v1.0

The document discusses developing and running a MapReduce program on AWS Elastic MapReduce. It provides steps to create a MapReduce Java project in Eclipse with Map and Reduce classes, package it into a JAR file, and then run the JAR on EMR by specifying the input and output locations.

Uploaded by

srinubasani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

AWS Development Certification Training www.edureka.

co/aws-development

Module 10: Big Data and Analytics


Hands-on

© Brain4ce Education Solutions Pvt. Ltd.


Module 10: Big Data and Analytics www.edureka.co/aws-development

Table of Contents
Installing and developing a mapreduce program .......................................................... 2
Running an Elastic MapReduce job .............................................................................. 5

©Brain4ce Education Solutions Pvt. Ltd Page 1


Module 10: Big Data and Analytics www.edureka.co/aws-development

Installing and developing a MapReduce program

Step 1: Create a Custom JAR as below

 In Eclipse (or whatever the IDE you are using), Create simple Java Project with name
"WordCount".
 Create a java class name Map and override the map method as below:

Map.java
public class Map extends Mapper<longwritable, text,="" intwritable=""> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map (LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}

©Brain4ce Education Solutions Pvt. Ltd Page 2


Module 10: Big Data and Analytics www.edureka.co/aws-development

Create a java class named Reduce and override the reduce method as below

Reduce.java
public class Reduce extends Reducer<text, intwritable,="" text,="" intwritable=""> {
@Override
protected void reduce(Text key, java.lang.Iterable<intwritable> values,
org.apache.hadoop.mapreduce.Reducer<text, intwritable,="" text,="" intwritable="">.Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}

Create a java class named WordCount and defined the main method as below

WordCount.java
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}

©Brain4ce Education Solutions Pvt. Ltd Page 3


Module 10: Big Data and Analytics www.edureka.co/aws-development

Export the WordCount program in a jar using eclipse and save it to some location on
disk. Make sure that you have provided the Main Class (WordCount.jar)

©Brain4ce Education Solutions Pvt. Ltd Page 4


Module 10: Big Data and Analytics www.edureka.co/aws-development

Running an Elastic MapReduce job

Step 2: Run a mapreduce job with a custom jar

 Sign in to the AWS Management Console and open the Amazon Elastic MapReduce
console at https://console.aws.amazon.com/elasticmapreduce/
 Click Create New Job Flow.
 In the DEFINE JOB FLOW page, enter the following details,
» Job Flow Name = WordCountJob
» Select Run your own application
» Select Custom JAR in the drop-down list
» Click Continue
 In the SPECIFY PARAMETERS page, enter values in the boxes using the following
table as a guide, and then click Continue.
» JAR Location = bucketName/jarFileLocation
» JAR Arguments = s3n://bucketName/inputFileLocation, s3n://bucketName/outputpath

©Brain4ce Education Solutions Pvt. Ltd Page 5

You might also like