100% found this document useful (1 vote)
76 views

BigData MapReduce

MapReduce is a programming model used to process large datasets across clusters of computers. It works by having a master node divide input data into smaller subproblems and distribute them to worker nodes. Each worker node then processes its subset and returns results to the master node, which combines the results into the final output. Key aspects of MapReduce include mapping functions to divide the work, a partitioning function to group output data, and reduce functions to combine results from each partition.

Uploaded by

arjuncchaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
76 views

BigData MapReduce

MapReduce is a programming model used to process large datasets across clusters of computers. It works by having a master node divide input data into smaller subproblems and distribute them to worker nodes. Each worker node then processes its subset and returns results to the master node, which combines the results into the final output. Key aspects of MapReduce include mapping functions to divide the work, a partitioning function to group output data, and reduce functions to combine results from each partition.

Uploaded by

arjuncchaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Big Data

Map Reduce

Table of Contents
Key approach to work with Big Data............................................................................................................2
Mapping......................................................................................................................................................2
The Map Step..........................................................................................................................................2
Appling the Reduce Step.............................................................................................................................3
Reduce Step.............................................................................................................................................3
Map Reduce Data Flow................................................................................................................................4
A Closer Look at the map and partition Step...............................................................................................5

1
Big Data
Map Reduce

Key approach to work with Big Data


 MapReduce is a programing model for processing large data sets, and the name of an

implementation of the model by Google.

 MapReduce is typically used to do distribute computing of large datasets on clusters of

computers.

Worker Node1
Map
Problem Data

Master Node Worker Node2

Problem Data Worker Node3

Mapping

The Map Step

 The master node takes the input, divides it into smaller sub-problems, and distributed them to

worker nodes.

 This process is iterative which can lead to a multi-level tree structure.

2
Big Data
Map Reduce

 The worker nodes process their small problem and hand their result back to their parent node.

INPUT LIST

MAPPING FUNCTION

OUTPUT LIST

Appling the Reduce Step


Reduce Step

The master node will then collect the answer from all the child nodes and combine them in a meaningful

way to from the primary output, which is the answer to the problem that was put to the system.

Input List

MAPPING FUNCTION

Output List

3
Big Data
Map Reduce

Map Reduce Data Flow

Input Format

Split Split Split File

File
RR RR RR

Map Map Map

Partitioner

(Short)

Reduce

Output Format

 If we zoom in on each part of the MapReduce framework, we see this is a large distributed sort.
The most important steps are defined as follows.

 An input function

 A Map Function

 A Partition function

 A compare/sort function

4
Big Data
Map Reduce

 A reduce function

 An output writer

A Closer Look at the map and partition Step


 The map function takes a series of key/value pairs; it will then subdivide these further creating the

full structure.

 Each Map node output is assigned to a particular reducer by the application’s partition function for

sharing purpose.

 The partition function is given the key and the number of reduce and return the index.

 The input for each reduces is pulled from the machine where the map ran and sorted using the

application’s comparison function.

 The framework calls the applications reduce function once for each unique key in the sorted

order. The reduce can iterate through the values that are associated with the key and produce

zero or more outputs.

 The output writer writes the output of the reduce of the stable storage, usually a distributed file

system.

5
Big Data
Map Reduce

Input List

MAPPING FUNCTION

Output List

You might also like