0% found this document useful (0 votes)

62 views6 pages

Example - (Map Function in Word Count)

MapReduce is a computation framework that decomposes large data manipulation jobs into individual tasks that can be executed in parallel across a cluster of servers. It consists of two main steps - the Map function that converts input data into key-value pairs, and the Reduce function that combines the outputs of the Map tasks into a smaller set of aggregated results. The example provided demonstrates how a word counting program in Java uses MapReduce by defining Mapper and Reducer classes to implement the map and reduce logic.

Uploaded by

Dileep Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views6 pages

Example - (Map Function in Word Count)

Uploaded by

Dileep Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

In Hadoop, MapReduce is a computation that decomposes large

manipulation jobs into individual tasks that can be executed in parallel

cross a cluster of servers. The results of tasks can be joined together to
compute final results.

MapReduce consists of 2 steps:

 Map Function – It takes a set of data and converts it into another set
of data, where individual elements are broken down into tuples (Key-
Value pair).

Example – (Map function in Word Count)

Set of data Bus, Car, bus, car, train, car, bus, car, train,
Input bus, TRAIN,BUS, buS, caR, CAR, car, BUS,
TRAIN

(Bus,1), (Car,1), (bus,1), (car,1), (train,1),

Convert into (car,1), (bus,1), (car,1), (train,1), (bus,1),

Output another set of data (TRAIN,1),(BUS,1), (buS,1), (caR,1),
(Key,Value) (CAR,1),

(car,1), (BUS,1), (TRAIN,1)

 Reduce Function – Takes the output from Map as an input and

combines those data tuples into a smaller set of tuples.

Example – (Reduce function in Word Count)

Input Set of Tuples (Bus,1), (Car,1), (bus,1), (car,1),

(train,1),
(output of Map
(car,1), (bus,1), (car,1), (train,1),
(bus,1),

function) (TRAIN,1),(BUS,1), (buS,1),

(caR,1), (CAR,1),

(car,1), (BUS,1), (TRAIN,1)

(BUS,7),
Converts into
Output (CAR,7),
smaller set of tuples
(TRAIN,4)

Work Flow of Program

Workflow of MapReduce consists of 5 steps

1. Splitting – The splitting parameter can be anything, e.g. splitting

by space, comma, semicolon, or even by a new line (‘\n’).

2. Mapping – as explained above

3. Intermediate splitting – the entire process in parallel on different

clusters. In order to group them in “Reduce Phase” the similar
KEY data should be on same cluster.

4. Reduce – it is nothing but mostly group by phase

5. Combining – The last phase where all the data (individual result
set from each cluster) is combine together to form a Result

Now Let’s See the Word Count Program in Java

Fortunately we don’t have to write all of the above steps, we only need
to write the splitting parameter, Map function logic, and Reduce function
logic. The rest of the remaining steps will execute automatically.

Make sure that Hadoop is installed on your system with java idk

Steps

Step 1. Open Eclipse> File > New > Java Project >( Name it –
MRProgramsDemo) > Finish

Step 2. Right Click > New > Package ( Name it - PackageDemo) >
Finish

Step 3. Right Click on Package > New > Class (Name it - WordCount)

Step 4. Add Following Reference Libraries –

Right Click on Project > Build Path> Add External Archivals

 /usr/lib/hadoop-0.20/hadoop-core.jar

 Usr/lib/hadoop-0.20/lib/Commons-cli-1.2.jar

Step 5. Type following Program :

package PackageDemo;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration c=new Configuration();
String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
Path input=new Path(files[0]);
Path output=new Path(files[1]);
Job j=new Job(c,"wordcount");
j.setJarByClass(WordCount.class);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class);
j.setOutputKeyClass(Text.class);
j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j, input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable,
Text, Text, IntWritable>{
public void map(LongWritable key, Text value, Context con) throws
IOException, InterruptedException
{
String line = value.toString();
String[] words=line.split(",");
for(String word: words )
{
Text outputKey = new Text(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
}
}
public static class ReduceForWordCount extends Reducer<Text,
IntWritable, Text, IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context
con) throws IOException, InterruptedException
{
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
con.write(word, new IntWritable(sum));
}
}
}

Explanation

The program consist of 3 classes:

 Driver class (Public void static main- the entry point)

 Map class which extends public class

Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> and
implements the Map function.

 Reduce class which extends public class

Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> and
implements the Reduce function.

Step 6. Make Jar File

Right Click on Project> Export> Select export destination as Jar File >
next> Finish

CISA Practice Exam Questions
83% (6)
CISA Practice Exam Questions
12 pages
Final Exam Java Foundations
100% (1)
Final Exam Java Foundations
17 pages
Sololearn PHP Course
No ratings yet
Sololearn PHP Course
67 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Practical 2-1
No ratings yet
Practical 2-1
4 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
5 pages
Ravikant Hadoop File
No ratings yet
Ravikant Hadoop File
22 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
Big Data 4 Vivek
No ratings yet
Big Data 4 Vivek
3 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Assignment 11 DSBDA
No ratings yet
Assignment 11 DSBDA
4 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Bda Exp2 Chinmay
No ratings yet
Bda Exp2 Chinmay
7 pages
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
No ratings yet
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
4 pages
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
No ratings yet
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
13 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
DA Lab Program-2
No ratings yet
DA Lab Program-2
6 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
Mapreduce Programming Framework
No ratings yet
Mapreduce Programming Framework
23 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Week-8 de
No ratings yet
Week-8 de
9 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
Palak
No ratings yet
Palak
10 pages
3 MapReduce Program Ex Code
No ratings yet
3 MapReduce Program Ex Code
14 pages
Part B Assignment - No - 1
No ratings yet
Part B Assignment - No - 1
6 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Hands-On Exercises With Big Data: Lab Sheet 1: Getting Started With Mapreduce and Hadoop
No ratings yet
Hands-On Exercises With Big Data: Lab Sheet 1: Getting Started With Mapreduce and Hadoop
14 pages
Steps To Create Jar File and Execute Word Count Problem in Mapper Reducer
No ratings yet
Steps To Create Jar File and Execute Word Count Problem in Mapper Reducer
5 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
Aji Bda2 Final
No ratings yet
Aji Bda2 Final
4 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Bda Unit 3
No ratings yet
Bda Unit 3
20 pages
Sanoob BDA - 2
No ratings yet
Sanoob BDA - 2
4 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
B1 Instructions
No ratings yet
B1 Instructions
9 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Dsbda 11
No ratings yet
Dsbda 11
15 pages
Ex No 04
No ratings yet
Ex No 04
4 pages
Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description
No ratings yet
Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description
6 pages
Hadoop 2
No ratings yet
Hadoop 2
31 pages
MapReduce Programs
No ratings yet
MapReduce Programs
10 pages
Experiment 3
No ratings yet
Experiment 3
5 pages
MapReduce Enhanced Guide
No ratings yet
MapReduce Enhanced Guide
3 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
59 pages
Big Data Analytics Lab Manual (BE AI&DS)
No ratings yet
Big Data Analytics Lab Manual (BE AI&DS)
29 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Q.1) Basic HTML Tags Welcome
No ratings yet
Q.1) Basic HTML Tags Welcome
19 pages
DevOps For Web Developers
No ratings yet
DevOps For Web Developers
4 pages
Introduction To CP and DSA
No ratings yet
Introduction To CP and DSA
28 pages
Source Code
No ratings yet
Source Code
22 pages
Module 3 - Inheritance and Polymorphism
No ratings yet
Module 3 - Inheritance and Polymorphism
7 pages
UDE CAN Support
No ratings yet
UDE CAN Support
10 pages
Sepm Lab Manual For Student
No ratings yet
Sepm Lab Manual For Student
17 pages
QTONZ - Job Description - Android - Junior Developer
No ratings yet
QTONZ - Job Description - Android - Junior Developer
4 pages
Archiving Object WORKITEM - Tables With Deletion: Symptom
No ratings yet
Archiving Object WORKITEM - Tables With Deletion: Symptom
3 pages
DDM 5
No ratings yet
DDM 5
46 pages
Android Malware Detection Using Machine Learning Techniques
No ratings yet
Android Malware Detection Using Machine Learning Techniques
59 pages
Harshal Lonare: PHP Web Developer / Website Administrator
No ratings yet
Harshal Lonare: PHP Web Developer / Website Administrator
3 pages
ETL Tools: Basic Details About Informatica
No ratings yet
ETL Tools: Basic Details About Informatica
121 pages
Chanturia Nika
No ratings yet
Chanturia Nika
2 pages
Beginners Python Cheat Sheet PCC All
100% (2)
Beginners Python Cheat Sheet PCC All
22 pages
4 First Version
No ratings yet
4 First Version
6 pages
WinCC V7.3 - Working With WinCC - How To Display Archive Tags
No ratings yet
WinCC V7.3 - Working With WinCC - How To Display Archive Tags
2 pages
Wolkite University College of Computing and Informatics Department of Information Systems Project Title: - Construction Information Management System (Cims) For Wolkite University
No ratings yet
Wolkite University College of Computing and Informatics Department of Information Systems Project Title: - Construction Information Management System (Cims) For Wolkite University
90 pages
8 CE Internship Report Format
No ratings yet
8 CE Internship Report Format
11 pages
CSE 209 Lecture-1 Introduction
No ratings yet
CSE 209 Lecture-1 Introduction
20 pages
How Do You Get Started With This Template
No ratings yet
How Do You Get Started With This Template
2 pages
Blue J Trial
No ratings yet
Blue J Trial
27 pages
Data Loading Into Hyperion Planning by FDMEE
No ratings yet
Data Loading Into Hyperion Planning by FDMEE
28 pages
Day - 2 - Lab Content - Array Operations Using Java
100% (1)
Day - 2 - Lab Content - Array Operations Using Java
8 pages
An Introduction To JavaScript
No ratings yet
An Introduction To JavaScript
5 pages
Atharva IWT
No ratings yet
Atharva IWT
24 pages

Example - (Map Function in Word Count)

Uploaded by

Example - (Map Function in Word Count)

Uploaded by

In Hadoop, MapReduce is a computation that decomposes large

manipulation jobs into individual tasks that can be executed in parallel

MapReduce consists of 2 steps:

Example – (Map function in Word Count)

(Bus,1), (Car,1), (bus,1), (car,1), (train,1),

Convert into (car,1), (bus,1), (car,1), (train,1), (bus,1),

(car,1), (BUS,1), (TRAIN,1)

 Reduce Function – Takes the output from Map as an input and

Example – (Reduce function in Word Count)

Input Set of Tuples (Bus,1), (Car,1), (bus,1), (car,1),

function) (TRAIN,1),(BUS,1), (buS,1),

(car,1), (BUS,1), (TRAIN,1)

Work Flow of Program

1. Splitting – The splitting parameter can be anything, e.g. splitting

2. Mapping – as explained above

3. Intermediate splitting – the entire process in parallel on different

4. Reduce – it is nothing but mostly group by phase

Now Let’s See the Word Count Program in Java

Step 4. Add Following Reference Libraries –

Right Click on Project > Build Path> Add External Archivals

Step 5. Type following Program :

The program consist of 3 classes:

 Map class which extends public class

 Reduce class which extends public class

Step 6. Make Jar File

You might also like