Hadoop
Hadoop
Hadoop
distributed clusters of computers. It uses a simple programming model and provides a scalable,
fault-tolerant environment for big data processing.
Features:
Function: YARN is Hadoop's cluster resource management layer. It manages and schedules
the resources of the Hadoop cluster.
Components:
Features:
Phases:
o Map Phase: The input data is split into independent chunks processed by the
mapper function to produce intermediate key-value pairs.
o Shuffle and Sort Phase: The intermediate data is shuffled and sorted to prepare for
the reduce phase.
o Reduce Phase: The reducer function processes the intermediate data, aggregates
results, and produces the final output.
Features:
Hadoop's architecture is designed to handle big data efficiently with its distributed storage
(HDFS) and processing (MapReduce) capabilities. YARN enhances resource management,
while components like HBase, Hive, Pig, Zookeeper, and Oozie provide additional
functionalities to build a robust big data ecosystem. This modular architecture allows
Hadoop to scale horizontally and handle large datasets, making it a powerful tool for big
data analytics.