BigData MapReduce
BigData MapReduce
Map Reduce
Table of Contents
Key approach to work with Big Data............................................................................................................2
Mapping......................................................................................................................................................2
The Map Step..........................................................................................................................................2
Appling the Reduce Step.............................................................................................................................3
Reduce Step.............................................................................................................................................3
Map Reduce Data Flow................................................................................................................................4
A Closer Look at the map and partition Step...............................................................................................5
1
Big Data
Map Reduce
computers.
Worker Node1
Map
Problem Data
Mapping
The master node takes the input, divides it into smaller sub-problems, and distributed them to
worker nodes.
2
Big Data
Map Reduce
The worker nodes process their small problem and hand their result back to their parent node.
INPUT LIST
MAPPING FUNCTION
OUTPUT LIST
The master node will then collect the answer from all the child nodes and combine them in a meaningful
way to from the primary output, which is the answer to the problem that was put to the system.
Input List
MAPPING FUNCTION
Output List
3
Big Data
Map Reduce
Input Format
File
RR RR RR
Partitioner
(Short)
Reduce
Output Format
If we zoom in on each part of the MapReduce framework, we see this is a large distributed sort.
The most important steps are defined as follows.
An input function
A Map Function
A Partition function
A compare/sort function
4
Big Data
Map Reduce
A reduce function
An output writer
full structure.
Each Map node output is assigned to a particular reducer by the application’s partition function for
sharing purpose.
The partition function is given the key and the number of reduce and return the index.
The input for each reduces is pulled from the machine where the map ran and sorted using the
The framework calls the applications reduce function once for each unique key in the sorted
order. The reduce can iterate through the values that are associated with the key and produce
The output writer writes the output of the reduce of the stable storage, usually a distributed file
system.
5
Big Data
Map Reduce
Input List
MAPPING FUNCTION
Output List