MapReduce Paradigm
Job Tracker
Example
Developing a MapReduce Application
Oguzhan Gencoglu
TIE 12206 - Apache Hadoop
Tampere University of Technology, Finland
November, 2014
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
Job Tracker
Example
Outline
1 MapReduce Paradigm
What is MapReduce
MapReduce Workflow
2 Job Tracker
Hadoop Default Ports
3 Example
Word Count
Job Tracker
Key Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
Outline
1 MapReduce Paradigm
What is MapReduce
MapReduce Workflow
2 Job Tracker
Hadoop Default Ports
3 Example
Word Count
Job Tracker
Key Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
What is MapReduce
MapReduce is a software framework for processing (large) data
sets in a distributed fashion over several machines.
Core idea
< key, value > pairs
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
What is MapReduce
MapReduce is a software framework for processing (large) data
sets in a distributed fashion over several machines.
Core idea
< key, value > pairs
Almost all data can be mapped into key, value pairs.
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
What is MapReduce
MapReduce is a software framework for processing (large) data
sets in a distributed fashion over several machines.
Core idea
< key, value > pairs
Almost all data can be mapped into key, value pairs.
Keys and values may be of any type.
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
Outline
1 MapReduce Paradigm
What is MapReduce
MapReduce Workflow
2 Job Tracker
Hadoop Default Ports
3 Example
Word Count
Job Tracker
Key Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
MapReduce Workflow
Write your map and reduce functions
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Run on full dataset
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Run on full dataset
If it fails Hadoop provides some debugging tools
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Run on full dataset
If it fails Hadoop provides some debugging tools
e.g. IsolationRunner : runs a task over the same input which it
failed.
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
What is MapReduce
Job Tracker
MapReduce Workflow
Example
MapReduce Workflow
Write your map and reduce functions
Test with a small subset of data
If it fails use your IDE’s debugger to find the problem
Run on full dataset
If it fails Hadoop provides some debugging tools
e.g. IsolationRunner : runs a task over the same input which it
failed.
Do profiling to tune the performance
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
Job Tracker Hadoop Default Ports
Example
Outline
1 MapReduce Paradigm
What is MapReduce
MapReduce Workflow
2 Job Tracker
Hadoop Default Ports
3 Example
Word Count
Job Tracker
Key Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm
Job Tracker Hadoop Default Ports
Example
Hadoop Default Ports
Handful of ports over TCP.
Some used by Hadoop itself (to schedule jobs, replicate
blocks, etc.).
Some are directly for users (either via an interposed Java
client or via plain old HTTP)
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Outline
1 MapReduce Paradigm
What is MapReduce
MapReduce Workflow
2 Job Tracker
Hadoop Default Ports
3 Example
Word Count
Job Tracker
Key Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Word Count
Task: Counting the word occurances (frequencies) in a text file (or
set of files).
< word, count > as < key, value > pair
Mapper: Emits < word, 1 > for each word (no counting at this
part).
Shuffle in between: pairs with same keys grouped together and
passed to a single machine.
Reducer: Sums up the values (1s) with the same key value.
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Outline
1 MapReduce Paradigm
What is MapReduce
MapReduce Workflow
2 Job Tracker
Hadoop Default Ports
3 Example
Word Count
Job Tracker
Key Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Job Tracker
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Tasks
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Name Node
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Outline
1 MapReduce Paradigm
What is MapReduce
MapReduce Workflow
2 Job Tracker
Hadoop Default Ports
3 Example
Word Count
Job Tracker
Key Points
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Key Points
Test mapper and reducer outside hadoop.
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Key Points
Test mapper and reducer outside hadoop.
Copy your MapReduce function and files to DFS.
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Key Points
Test mapper and reducer outside hadoop.
Copy your MapReduce function and files to DFS.
Test mapper and reducer with hadoop using a small portion of
the data.
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Key Points
Test mapper and reducer outside hadoop.
Copy your MapReduce function and files to DFS.
Test mapper and reducer with hadoop using a small portion of
the data.
Track the jobs, debug, do profiling
Oguzhan Gencoglu Developing a MapReduce Application
MapReduce Paradigm Word Count
Job Tracker Job Tracker
Example Key Points
Questions/Comments
Oguzhan Gencoglu Developing a MapReduce Application