Hadoop

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

HADOOP

INTRODUCTION
Hadoop is an open-source software framework
for storing and processing large sets of data. It
provides massive storage for any kind of data,
enormous processing power, and the ability to
handle virtually limitless concurrent tasks or
jobs.
As we all know Hadoop is a framework written in Java that utilizes
a large cluster of commodity hardware to maintain and store big
size data. Hadoop works on MapReduce Programming Algorithm
that was introduced by Google. Today lots of Big Brand Companies
are using Hadoop in their Organization to deal with big data, eg.
Facebook, Yahoo, Netflix, eBay, etc.

The Hadoop Architecture Mainly consists of 4 components.


1.Map Reduce
2.HDFS(Hadoop Distributed File System)
3.YARN(Yet Another Resource Negotiator)
Hadoop architecture
1. MapReduce
MapReduce nothing but just like an Algorithm. The major feature of MapReduce is to perform the distributed processing in parallel in a Hadoop cluster which Makes Hadoop
working so fast. When you are dealing with Big Data, serial processing is no more of any use.
MapReduce has mainly 2 tasks which are divided phase-wise:
In first phase, Map is utilized and in next phase Reduce is utilized.
2.HDFS
 HDFS(Hadoop Distributed File System) is utilized for storage permission. It is
mainly designed for working on commodity Hardware devices(inexpensive
devices), working on a distributed file system design. HDFS is designed in such a
way that it believes more in storing the data in a large chunk of blocks rather than
storing small data blocks.
 HDFS in Hadoop provides Fault-tolerance and High availability to the storage
layer and the other devices present in that Hadoop cluster. Data storage Nodes in
HDFS.
1. Name Node(Master)
2. Data Node(Slave)
File Block In HDFS: Data in HDFS is always stored in terms of blocks. So the single block of data
is divided into multiple blocks of size 128MB which is default and you can also change it manually.
Name Node Data Node
 It is a single master server exist in the  The HDFS cluster contains multiple Data
HDFS cluster. Nodes.
 As it is a single node, it may become the  Each Data Node contains multiple data
reason of single point failure. blocks.
 It manages the file system namespace by  These data blocks are used to store data.
executing an operation like the opening,  It is the responsibility of Data Node to
renaming and closing the files. read and write requests from the file
 It simplifies the architecture of the system's clients.
system.  It performs block creation, deletion, and
replication upon instruction from the
Name Node.
3.YARN(Yet Another Resource Negotiator)

YARN is a Framework on which MapReduce works. It processes job requests and manages
cluster resources.
YARN contains:
1. Resource Manager: The use of Resource Manager is to manage all the resources that are
made available for running a Hadoop cluster.
2. Node Manager: Handles the nodes and monitors the resources.
3. Application Manager: works as an interface between resource and node manager.
4. Container: It holds collection of multiple physical resources.

You might also like