Seminar Umera Hadoop

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Dr.

Babasaheb Amebedkar Technological University


Institute of Petrochemical Engineering

Department of Computer

Seminar On
HADOOP
• A DISTRIBUTED FRAMEWORK FOR BIG DATA

Presented By: Umera Abdul Aziz Rawoot Under the guidance of


Roll No:- 2201860 Prof. S.M Sable
PRN No:- 2030408245032
Content
• Introduction to Hadoop
• Hadoop’s Developers
• Core-Components of Hadoop
• Hadoop Architecture
• Hadoop Framework Tools
• Features of Hadoop
• Famous Hadoop Users
• Big data vs Hadoop
INTRODUCTION TO HADOOP
• Hadoop is an open source software programming
framework for storing a large amount of data and
performing the computation.
• Its framework is based on Java programming with
some native code in C and shell scripts.
• Instead of using one large computer to store and
process the data, Hadoop allows clustering multiple
computers to analyze massive datasets in parallel
more quickly.
Hadoop’s Developers

• 2005: Doug Cutting and Michael J.Cafarella


developed Hadoop to support distribution for the N
utch search engine project.

• The project was funded by Yahoo.

• 2006: Yahoo gave the project to Apache


Software Foundation. Michael
Doug Cutting J.Cafarella
Core-Components Of Hadoop

• There are three components of Hadoop:


• Hadoop HDFS - Hadoop Distributed File System (HDFS) is
the storage unit.
• Hadoop MapReduce - Hadoop MapReduce is the
processing unit.
• Hadoop YARN - Yet Another Resource Negotiator (YARN)
is a resource management unit
Hadoop’s Architecture
• Distributed, with some centralization
• Main nodes of cluster are where most of the computational
power and storage of the system lies
• Central control node runs Name Node to keep track of HDFS
directories & files, and Job Tracker to dispatch compute tasks
to Task Tracker
• Written in Java, also supports Python and Ruby
Features Of Hadoop
• Cost Effective System
• Large Cluster of Nodes
• Parallel Processing
• Distributed Data
• Automatic Failover Management
• Data Locality Optimization
• Heterogeneous Cluster
• Scalability
Famous Hadoop Users

The World's Top Companies Using Hadoop:


• #1 Amazon EMR
• #2 Cloudera.
• #3 Pivotal.
• #4 IBM InfoSphere BigInsights.
• #5 Microsoft.
BIG DATA VS HADOOP
BIG DATA HADOOP

Big Data is group of technologies. It is a Hadoop is a open source java based framework
collection of huge data which is multiplying which involves some of the big data principles.
continuously.

It is a collection of assets which is quite


complex, complicated and ambiguous. It achieves a set of goals and objectives for
dealing with the collection of assets.

It is a complicated problem i.e. huge amount of It is a solution being processing machine of those
raw data. data.

Big data has a wide range of applications in Hadoop is used for cluster resource
fields such as Telecommunication,the banking management,parallel processing and for data
sector,Healthcare ,etc. storage.
Application Of Hadoop
• Website Tracking.
• Geographical Data.
• Retail Industry.
• Financial Industry.
• Digital Marketing.
THANKYOU…!

You might also like