0% found this document useful (0 votes)

13 views8 pages

Map Reduce Algorithm

The MapReduce algorithm consists of two main tasks: Map and Reduce, executed by the Mapper and Reducer classes, respectively. It divides tasks into smaller parts for processing across multiple systems, implementing algorithms for sorting, searching, indexing, and TF-IDF. The document provides examples of how these algorithms function within the MapReduce framework, particularly in processing employee data and creating inverted indexes.

Uploaded by

ramyatech25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views8 pages

Map Reduce Algorithm

Uploaded by

ramyatech25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Page 1 of 8

Home Whiteboard AI Assistant Online Compilers Jobs Tools Art

SQL HTML CSS Javascript Python Java C C++ PHP Scala C#

MapReduce - Algorithm

The MapReduce algorithm contains two important tasks, namely Map and Reduce.

The map task is done by means of Mapper Class

The reduce task is done by means of Reducer Class.

Mapper class takes the input, tokenizes it, maps and sorts it. The output of Mapper class
is used as input by Reducer class, which in turn searches matching pairs and reduces
them.

MapReduce implements various mathematical algorithms to divide a task into small parts
and assign them to multiple systems. In technical terms, MapReduce algorithm helps in
sending the Map & Reduce tasks to appropriate servers in a cluster.

These mathematical algorithms may include the following −

Advertisement
-
Sorting

Searching

Indexing
TF-IDF
Page 2 of 8

Sorting
Sorting is one of the basic MapReduce algorithms to process and analyze data.
MapReduce implements sorting algorithm to automatically sort the output key-value
pairs from the mapper by their keys.

Sorting methods are implemented in the mapper class itself.

In the Shuffle and Sort phase, after tokenizing the values in the mapper class,
the Context class (user-defined class) collects the matching valued keys as a
collection.

To collect similar key-value pairs (intermediate keys), the Mapper class takes the
help of RawComparator class to sort the key-value pairs.

The set of intermediate key-value pairs for a given Reducer is automatically

sorted by Hadoop to form key-values (K2, {V2, V2, }) before they are presented
to the Reducer.

Searching
Advertisement
Searching plays an important role in MapReduce
- algorithm. It helps in the combiner
phase (optional) and in the Reducer phase. Let us try to understand how Searching
works with the help of an example.

Example
Page 3 of 8

The following example shows how MapReduce employs Searching algorithm to find out
the details of the employee who draws the highest salary in a given employee dataset.

Let us assume we have employee data in four different files − A, B, C, and D. Let
us also assume there are duplicate employee records in all four files because of
importing the employee data from all database tables repeatedly. See the
following illustration.

The Map phase processes each input file and provides the employee data in
key-value pairs (<k, v> : <emp name, salary>). See the following illustration.

The combiner phase (searching technique) will accept the input from the Map
phase as a key-value pair with employee name and salary. Using searching
technique, the combiner will check all the employee salary to find the highest
salaried employee in each file. See the following snippet.

<k: employee name, v: salary>

Max= the salary of an first employee. Treated as max salary
Advertisement
if(v(second employee).salary > Max){ -
Max = v(salary);
}

else{
Page 4 of 8

Continue checking;
}

The expected result is as follows −

<satish, <gopal, <kiran, <manisha,

26000> 50000> 45000> 45000>

Reducer phase − Form each file, you will find the highest salaried employee. To
avoid redundancy, check all the <k, v> pairs and eliminate duplicate entries, if
any. The same algorithm is used in between the four <k, v> pairs, which are
coming from four input files. The final output should be as follows −

<gopal, 50000>

Indexing
Normally indexing is used to point to a particular data and its address. It performs batch
indexing on the input files for a particular Mapper.

The indexing technique that is normally used in MapReduce is known as inverted index.
Search engines like Google and Bing use inverted indexing technique. Let us try to
understand how Indexing works with the help of a simple example.

Example

The following text is the input for inverted indexing. Here T[0], T[1], and t[2] are the file
names and their content are in double quotes.

T[0] = "it is what it is"

T[1] = "what is it"
T[2] = "it is a Advertisement
banana"
-

After applying the Indexing algorithm, we get the following output −

"a": {2}
"banana": {2}
"is": {0, 1, 2}
Page 5 of 8

"it": {0, 1, 2}
"what": {0, 1}

Here "a": {2} implies the term "a" appears in the T[2] file. Similarly, "is": {0, 1, 2}
implies the term "is" appears in the files T[0], T[1], and T[2].

TF-IDF
TF-IDF is a text processing algorithm which is short for Term Frequency − Inverse
Document Frequency. It is one of the common web analysis algorithms. Here, the term
'frequency' refers to the number of times a term appears in a document.
Chapters Categories

Term Frequency (TF)

It measures how frequently a particular term occurs in a document. It is calculated by

the number of times a word appears in a document divided by the total number of words
in that document.

TF(the) = (Number of times term the the appears in a document) / (Total number of

Inverse Document Frequency (IDF)

It measures the importance of a term. It is calculated by the number of documents in
the text database divided by the number of documents where a specific term appears.

While computing TF, all the terms are considered equally important. That means, TF
counts the term frequency for normal words like is, a, what, etc. Thus we need to know
the frequent terms while scaling up the rare ones, by computing the following −

IDF(the) = log_e(Total number of documents / Number of documents with term the in

The algorithm is explained below with the help of a small example.

Example Advertisement
-
Consider a document containing 1000 words, wherein the word hive appears 50 times.
The TF for hive is then (50 / 1000) = 0.05.

Now, assume we have 10 million documents and the word hive appears in 1000 of
these. Then, the IDF is calculated as log(10,000,000 / 1,000) = 4.
Page 6 of 8

The TF-IDF weight is the product of these quantities − 0.05 4 = 0.20.

Cloud Computing Tutorial

Amazon Web Services Tutorial

Microsoft Azure Tutorial

Git Tutorial
Ethical Hacking Tutorial

Docker Tutorial
Kubernetes Tutorial
DSA Tutorial

Spring Boot Tutorial

SDLC Tutorial
Unix Tutorial

CERTIFICATIONS
Advertisement
Business Analytics Certification -

Java & Spring Boot Advanced Certification

Data Science Advanced Certification
Cloud Computing And DevOps

Advanced Certification In Business Analytics

Artificial Intelligence And Machine Learning
Page 7 of 8

DevOps Certification
Game Development Certification
Front-End Developer Certification

AWS Certification Training

Python Programming Certification

COMPILERS & EDITORS

Online Java Compiler

Online Python Compiler
Online Go Compiler

Online C Compiler
Online C++ Compiler
Online C# Compiler

Online PHP Compiler

Online MATLAB Compiler
Online Bash Compiler

Online SQL Compiler

Online Html Editor

ABOUT US | OUR TEAM | CAREERS | JOBS | CONTACT US | TERMS OF USE |

PRIVACY POLICY | REFUND POLICY | COOKIES POLICY | FAQ'S

Advertisement
-
Tutorials Point is a leading Ed Tech company striving to provide the best learning material on
technical and non-technical subjects.
Page 8 of 8

Advertisement
-

CPT Project Report DN
100% (1)
CPT Project Report DN
33 pages
Diakonia Brand Guidelines 170706
No ratings yet
Diakonia Brand Guidelines 170706
76 pages
An Elasticsearch Crash Course Presentation PDF
No ratings yet
An Elasticsearch Crash Course Presentation PDF
81 pages
MapReduce - Algorithm
No ratings yet
MapReduce - Algorithm
4 pages
Map Reduce
No ratings yet
Map Reduce
5 pages
Why MapReduce
No ratings yet
Why MapReduce
8 pages
Unit - 5
No ratings yet
Unit - 5
57 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
6.unit 3 Bda
No ratings yet
6.unit 3 Bda
18 pages
IRS Module 5
No ratings yet
IRS Module 5
24 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Hadoop Training #5: MapReduce Algorithm
100% (2)
Hadoop Training #5: MapReduce Algorithm
31 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Unit 2
No ratings yet
Unit 2
12 pages
Cloudera Academic Partnership 7
No ratings yet
Cloudera Academic Partnership 7
70 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Algorithms
No ratings yet
Algorithms
49 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Indexing 1
No ratings yet
Indexing 1
61 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
9 Hadoop PDF
No ratings yet
9 Hadoop PDF
59 pages
Course Name: Advanced Information Retrieval
No ratings yet
Course Name: Advanced Information Retrieval
6 pages
IR Journal
No ratings yet
IR Journal
36 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Chapter4 - MapReduce
No ratings yet
Chapter4 - MapReduce
29 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
Introduction To Automatic Indexing
No ratings yet
Introduction To Automatic Indexing
28 pages
Inverted Index
No ratings yet
Inverted Index
13 pages
Introduction To Indexing Structure and Designing An Information Retrieval
No ratings yet
Introduction To Indexing Structure and Designing An Information Retrieval
22 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
BDA-3 UNIT (1)
No ratings yet
BDA-3 UNIT (1)
23 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
IR Unit 2 Dictionaries and Query Processing
No ratings yet
IR Unit 2 Dictionaries and Query Processing
20 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
Chapter 3 Indexing
No ratings yet
Chapter 3 Indexing
48 pages
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
03 MapReduce
No ratings yet
03 MapReduce
184 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
14 MapReduce PDF
100% (1)
14 MapReduce PDF
82 pages
14 MapReduce
100% (1)
14 MapReduce
82 pages
June 19th 2009
No ratings yet
June 19th 2009
71 pages
Inverted Index-Unit-3
No ratings yet
Inverted Index-Unit-3
11 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Data Mining Series 2 Important Topics
No ratings yet
Data Mining Series 2 Important Topics
22 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Research Paper - Map Reduce - CSC3323
No ratings yet
Research Paper - Map Reduce - CSC3323
16 pages
Chapter - 3 and 4
No ratings yet
Chapter - 3 and 4
47 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Unit 1 Notes-1
No ratings yet
Unit 1 Notes-1
10 pages
Chapter 3,4, 5 and 6
No ratings yet
Chapter 3,4, 5 and 6
145 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Convolutional Neural Network - Layers, Types, & More
No ratings yet
Convolutional Neural Network - Layers, Types, & More
25 pages
Difference Between Traditional Data and Big Data - GeeksforGeeks
No ratings yet
Difference Between Traditional Data and Big Data - GeeksforGeeks
10 pages
The CAP Theorem in DBMS - GeeksforGeeks
No ratings yet
The CAP Theorem in DBMS - GeeksforGeeks
6 pages
One Day Tour Package For Amritsar (Golden Temple)
No ratings yet
One Day Tour Package For Amritsar (Golden Temple)
3 pages
Delhi Sightseeing - Delhi Darshan Tour Packages in AC Bus, Same Day Trip - India Incredible
No ratings yet
Delhi Sightseeing - Delhi Darshan Tour Packages in AC Bus, Same Day Trip - India Incredible
5 pages
Same Day Delhi Sightseeing Tour by AC Luxury Volvo Bus
No ratings yet
Same Day Delhi Sightseeing Tour by AC Luxury Volvo Bus
10 pages
Transcend-ESD380C (TS2TESD380C) Portable SSD Datasheet
No ratings yet
Transcend-ESD380C (TS2TESD380C) Portable SSD Datasheet
2 pages
What Are Pointers? What Is The Purpose of Using Pointers?: ASK Olution
No ratings yet
What Are Pointers? What Is The Purpose of Using Pointers?: ASK Olution
8 pages
2004 Hs Contest
No ratings yet
2004 Hs Contest
6 pages
PM Wani
No ratings yet
PM Wani
2 pages
Vaadin 8
No ratings yet
Vaadin 8
17 pages
40333386
No ratings yet
40333386
2 pages
DMMVIEW - C Software UsersManual (1.0)
No ratings yet
DMMVIEW - C Software UsersManual (1.0)
24 pages
Computer Notes
No ratings yet
Computer Notes
225 pages
DCIM Sneijder - IT Energy Consumption Using DCIM
No ratings yet
DCIM Sneijder - IT Energy Consumption Using DCIM
11 pages
String Matching - RYS - Lect - 1 - 2 - 3 - Update
No ratings yet
String Matching - RYS - Lect - 1 - 2 - 3 - Update
61 pages
Modal Analysis of A Stepped Bar Using Ma
No ratings yet
Modal Analysis of A Stepped Bar Using Ma
6 pages
Starmux III Manual 0016903 - Rev - B
No ratings yet
Starmux III Manual 0016903 - Rev - B
34 pages
Computer Science. Lecture 19-2
No ratings yet
Computer Science. Lecture 19-2
36 pages
PRS10 M
No ratings yet
PRS10 M
87 pages
Welcome To PDF Hero: Swipe Up To Take A Quick Tour
No ratings yet
Welcome To PDF Hero: Swipe Up To Take A Quick Tour
15 pages
LDM Form 1P - School Heads' Practicum Portfolio Evaluation Form
0% (1)
LDM Form 1P - School Heads' Practicum Portfolio Evaluation Form
140 pages
Kuang22 Interspeech
No ratings yet
Kuang22 Interspeech
5 pages
Gen7 Datasheet
No ratings yet
Gen7 Datasheet
2 pages
Inuke NU3000-2
No ratings yet
Inuke NU3000-2
11 pages
Pssed 1
No ratings yet
Pssed 1
40 pages
Mobile Application Testing Checklist
0% (1)
Mobile Application Testing Checklist
6 pages
I. Write SQL Statements To Create Database "Productorders" As Following
No ratings yet
I. Write SQL Statements To Create Database "Productorders" As Following
3 pages
2025-04-14 Biz Main
No ratings yet
2025-04-14 Biz Main
18 pages
DB Iqc Template Odf
No ratings yet
DB Iqc Template Odf
159 pages
Lecture Week 7 - Control Charts For Attributes
No ratings yet
Lecture Week 7 - Control Charts For Attributes
78 pages
Syntax and Semantics
No ratings yet
Syntax and Semantics
12 pages
Reading and Writing Skills: Quarter 1 - Module 4: Hypertext and Intertext
0% (1)
Reading and Writing Skills: Quarter 1 - Module 4: Hypertext and Intertext
32 pages
Challenges of Malware Analysis: Obfuscation Techniques
No ratings yet
Challenges of Malware Analysis: Obfuscation Techniques
11 pages

Map Reduce Algorithm

Uploaded by

Map Reduce Algorithm

Uploaded by

Page 1 of 8

Home Whiteboard AI Assistant Online Compilers Jobs Tools Art

SQL HTML CSS Javascript Python Java C C++ PHP Scala C#

The map task is done by means of Mapper Class

The reduce task is done by means of Reducer Class.

These mathematical algorithms may include the following −

Sorting methods are implemented in the mapper class itself.

The set of intermediate key-value pairs for a given Reducer is automatically

<k: employee name, v: salary>

The expected result is as follows −

<satish, <gopal, <kiran, <manisha,

T[0] = "it is what it is"

After applying the Indexing algorithm, we get the following output −

Term Frequency (TF)

It measures how frequently a particular term occurs in a document. It is calculated by

Inverse Document Frequency (IDF)

IDF(the) = log_e(Total number of documents / Number of documents with term the in

The algorithm is explained below with the help of a small example.

The TF-IDF weight is the product of these quantities − 0.05 4 = 0.20.

Cloud Computing Tutorial

Microsoft Azure Tutorial

Spring Boot Tutorial

Java & Spring Boot Advanced Certification

Advanced Certification In Business Analytics

AWS Certification Training

COMPILERS & EDITORS

Online Java Compiler

Online PHP Compiler

Online SQL Compiler

ABOUT US | OUR TEAM | CAREERS | JOBS | CONTACT US | TERMS OF USE |

PRIVACY POLICY | REFUND POLICY | COOKIES POLICY | FAQ'S

© Copyright 2025. All Rights Reserved.

You might also like