HADOOP
HADOOP
HADOOP
i. NameNode
It is also known as Master node. NameNode
does not store actual data or dataset. NameNode
stores Metadata i.e. number of blocks, their location,
on which Rack, which Datanode the data is stored
and other details. It consists of files and directories.
The Mapper
Reads data as key/value pairs
◦ The key is often discarded
Outputs zero or more key/value pairs
Shuffle and Sort
Output from the mapper is sorted by key
All values with the same key are guaranteed to go to
the same machine
The Reducer
Called once for each unique key
Gets a list of all values associated with a key as input
The reducer outputs zero or more final key/value
pairs
◦ Usually just one output per input key
MapReduce: Word Count
Features of MapReduce
Simplicity – MapReduce jobs are easy to run.
Other Tools
Hive
◦ Hadoop processing with SQL
Pig
◦ Hadoop processing with scripting
Cascading
◦ Pipe and Filter processing model
HBase
◦ Database model built on top of Hadoop
Flume
◦ Designed for large scale data movement
Matrix Multiplication
https://www.youtube.com/watch?v=RIMA4rvNpI8