0% found this document useful (0 votes)

23 views

Introduction To BigData Hadoop

Uploaded by

jr9617883006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Introduction To BigData Hadoop

Uploaded by

jr9617883006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

INTRODUCTION TO HADOOP 2020

Introduction to Hadoop
What is hadoop?
 Apache Hadoop is a framework that allows for the distributed processing of
large data sets across cluster of commodity computers using a simple
programming model.

 It is an open source data management with scale-out storage & distributed

processing.
Hadoop Key Characteristics
Economical: -
1. It is open source and freely available.

2. No License require
Reliable: -
1. High availability of data.

2. If data may loss due to node failure, which can be recovered.

1
By Mr. Virendra
INTRODUCTION TO HADOOP 2020

Flexible: -
1. Number of nodes is not fixed, you can add “n” number of nodes into
cluster.
Scalable: -
1. You can process large data sets.

2. Your data may be in Kilobyte (KB),Megabyte (MB),Gigabyte

(GB),Terabyte (TB),Petabyte (PB),Exabyte (EB),Zettabyte (ZB),Yottabyte
(YB)

2
By Mr. Virendra
INTRODUCTION TO HADOOP 2020

Apache Hadoop Ecosystem

3
By Mr. Virendra
INTRODUCTION TO HADOOP 2020

COMPONENT OF HADOOP ECOSYSTEM

4
By Mr. Virendra
INTRODUCTION TO HADOOP 2020

HDFS (Hadoop distributed file system)

 The Hadoop Distributed File System (HDFS) is a distributed file system
designed to run on commodity hardware.

 It has many similarities with existing distributed file systems.

 HDFS is highly fault-tolerant and is designed to be deployed on low-cost
hardware.

 HDFS provides high throughput access to application data and is suitable

for applications that have large data sets.

 In HDFS block size is 64MB which is expendable upto 128MB

 HDFS was originally built as infrastructure for the Apache Nutch web
search engine project.

 HDFS is now an Apache Hadoop subproject.

5
By Mr. Virendra
INTRODUCTION TO HADOOP 2020

Distributed Processing (Map Reduce)

 Hadoop MapReduce is a software framework.

 Use for easily writing applications which process vast amounts of data
(multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes)
of commodity hardware in a reliable, fault-tolerant manner.

 A MapReduce job usually splits the input data-set into independent chunks
which are processed by the map tasks in a completely parallel manner.

 The framework sorts the outputs of the maps, which are then input to the
reduce tasks.

 Typically both the input and the output of the job are stored in a file-system.

6
By Mr. Virendra
INTRODUCTION TO HADOOP 2020

Pig
 Apache Pig is a high level data flow platform for execution Map Reduce
programs of Hadoop.

 The language for Pig is pig Latin.

 The Pig scripts get internally converted to Map Reduce jobs and get
executed on data stored in HDFS.

 Every task which can be achieved using PIG can also be achieved using
java used in Map reduce.

 Ease of programming, Optimization opportunities, Extensibility

7
By Mr. Virendra
INTRODUCTION TO HADOOP 2020

Hive
 Hive is a data warehouse infrastructure tool to process structured data in
Hadoop.
 Initially Hive was developed by Facebook; later the Apache Software
Foundation took it up and developed it further as an open source under the
name Apache Hive.

Hive is not
 A relational database
 A design for OnLine Transaction Processing (OLTP)
 A language for real-time queries and row-level updates

Features of Hive

 It stores schema in a database and processed data into HDFS.

 It is designed for OLAP.
 It provides SQL type language for querying called HiveQL or HQL.
 It is familiar, fast, scalable, and extensible.

8
By Mr. Virendra
INTRODUCTION TO HADOOP 2020

Hbase
 Hbase is called as hadoop database.

 HBase is a column-oriented database management system that runs on top

of Hadoop Distributed File System (HDFS).

 It is well suited for sparse data sets, which are common in many big data
use cases.

 HBase does not support a structured query language like SQL.

 HBase does support writing applications in Apache™ Avro™, REST, and

Thrift.

9
By Mr. Virendra
INTRODUCTION TO HADOOP 2020

Sqoop
It is (SQL + Hadoop)
 Sqoop is a tool designed to transfer data between Hadoop and relational
database servers.

 It is used to import data from relational databases such as MySQL, Oracle

to Hadoop HDFS, and export from Hadoop file system to relational
databases.

 Sqoop occupies a place in the Hadoop ecosystem to provide feasible

interaction between relational database server and Hadoop’s HDFS.

10
By Mr. Virendra
INTRODUCTION TO HADOOP 2020

Flume (Data streaming)

 Apache Flume is a system used for moving massive quantities of streaming
data into HDFS.

 Collecting log data present in log files from web servers and aggregating it
in HDFS for analysis, is one common example use case of Flume.

Oozie (scheduler system)

 Apache Oozie is a scheduler system to run and manage Hadoop jobs in a
distributed environment.

 It allows combining multiple complex jobs to be run in a sequential order to

achieve a bigger task.

 Within a sequence of task, two or more jobs can also be programmed to run
parallel to each other.

11
By Mr. Virendra
INTRODUCTION TO HADOOP 2020

Zookeeper (reliable cluster co-ordination service)

 The ZooKeeper framework was originally built at “Yahoo!” for accessing
their applications in an easy and robust manner.

 Later, Apache ZooKeeper became a standard for organized service used by

Hadoop, HBase, and other distributed frameworks.

 Apache ZooKeeper is an open-source project which deals with maintaining

configuration information, naming, providing distributed synchronization,
group services for various distributed applications

Ambari (Hadoop clusters manager)

 A completely open source management platform for provisioning,

managing, monitoring and securing Apache Hadoop clusters.
 Ambari enables system administrators to provision, manage and monitor a
Hadoop cluster, and also to integrate Hadoop with the existing enterprise
infrastructure
12
By Mr. Virendra

JUNIPER SRX Security-Policies
No ratings yet
JUNIPER SRX Security-Policies
736 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
8D Template
No ratings yet
8D Template
1 page
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Unit_IV_Hadoop
No ratings yet
Unit_IV_Hadoop
90 pages
BDA-Module2
No ratings yet
BDA-Module2
43 pages
Hadoop Intro - Part1
No ratings yet
Hadoop Intro - Part1
45 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
44 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
55 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
8 MapReduce Different Phases 08-01-2025
No ratings yet
8 MapReduce Different Phases 08-01-2025
28 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
58 pages
BD - HadoopEcoSystem Unit 2part 1
No ratings yet
BD - HadoopEcoSystem Unit 2part 1
12 pages
Lecture 4 - Hadoop Ecosystem - 1691899782480
No ratings yet
Lecture 4 - Hadoop Ecosystem - 1691899782480
36 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
56 pages
Hadoop and Big Data
No ratings yet
Hadoop and Big Data
41 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Chap3_OverviewOfBigDataEcosystem
No ratings yet
Chap3_OverviewOfBigDataEcosystem
91 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
13 Lecture
No ratings yet
13 Lecture
23 pages
Big Data and Hadoop Guide
No ratings yet
Big Data and Hadoop Guide
8 pages
Hadoop Overview Training Material
No ratings yet
Hadoop Overview Training Material
44 pages
2 Hadoop Ecosystem
No ratings yet
2 Hadoop Ecosystem
41 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Hadoop and Their Ecosystem
100% (2)
Hadoop and Their Ecosystem
24 pages
Bda Lab Manual
0% (1)
Bda Lab Manual
40 pages
Certified Hadoop and Spark Course Curriculum
No ratings yet
Certified Hadoop and Spark Course Curriculum
9 pages
BigData Unit 2
No ratings yet
BigData Unit 2
15 pages
Haddob Lab Report
No ratings yet
Haddob Lab Report
12 pages
Apache Hadoop
No ratings yet
Apache Hadoop
11 pages
UNIT-I Introduction To Hadoop - A20
No ratings yet
UNIT-I Introduction To Hadoop - A20
24 pages
Big Data Hadoop - Course Curriculum - V1
No ratings yet
Big Data Hadoop - Course Curriculum - V1
7 pages
Module-2 - Introduction To Hadoop
No ratings yet
Module-2 - Introduction To Hadoop
13 pages
11 Lecture
No ratings yet
11 Lecture
22 pages
Unit 2 - Hadoop PDF
No ratings yet
Unit 2 - Hadoop PDF
7 pages
bda2
No ratings yet
bda2
25 pages
Hadoop Course Content
No ratings yet
Hadoop Course Content
3 pages
INTRO hadoop-ecosystem
No ratings yet
INTRO hadoop-ecosystem
6 pages
DS Unit 4.1
No ratings yet
DS Unit 4.1
14 pages
BIG DATA ANALYTICS (1)
No ratings yet
BIG DATA ANALYTICS (1)
20 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
Big Data Analytics – Unit 4
No ratings yet
Big Data Analytics – Unit 4
32 pages
BDA viva
No ratings yet
BDA viva
26 pages
Module 2.2
No ratings yet
Module 2.2
32 pages
Lesson 1 - Introduction To Big Data and Hadoop
No ratings yet
Lesson 1 - Introduction To Big Data and Hadoop
46 pages
Module 2. 16974328568170
No ratings yet
Module 2. 16974328568170
113 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
Lec 2
No ratings yet
Lec 2
20 pages
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
No ratings yet
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
52 pages
Unit - 3
No ratings yet
Unit - 3
34 pages
Big Data Testing
100% (1)
Big Data Testing
34 pages
MODULE 2 Hadoop Ecosystem Tools
No ratings yet
MODULE 2 Hadoop Ecosystem Tools
44 pages
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Aptitude Day01
No ratings yet
Aptitude Day01
19 pages
Big Data Distributed Computing
No ratings yet
Big Data Distributed Computing
21 pages
Spark Architecture
No ratings yet
Spark Architecture
17 pages
What BigData History
No ratings yet
What BigData History
5 pages
HDFS Architecture
No ratings yet
HDFS Architecture
8 pages
Project PPT
No ratings yet
Project PPT
27 pages
Logic Pro 8 Getting Started
No ratings yet
Logic Pro 8 Getting Started
111 pages
OS6250 AOS 6.6.1 R01 Hardware Users Guide
No ratings yet
OS6250 AOS 6.6.1 R01 Hardware Users Guide
158 pages
Solidworks EDUCATION/STUDENT 2015-2016 Products Installation Instructions
No ratings yet
Solidworks EDUCATION/STUDENT 2015-2016 Products Installation Instructions
4 pages
Cyber Law
No ratings yet
Cyber Law
25 pages
Ericsson BSC Commands PDF
0% (3)
Ericsson BSC Commands PDF
2 pages
SCADALink RIO900 Datasheet - v2 PDF
No ratings yet
SCADALink RIO900 Datasheet - v2 PDF
2 pages
Digital Electronic Circuits Principles And Practices Shuqin Lou Chunling Yang China Science Publishing Media Ltd instant download
100% (1)
Digital Electronic Circuits Principles And Practices Shuqin Lou Chunling Yang China Science Publishing Media Ltd instant download
89 pages
Capgemini Section-1: Syllabus
No ratings yet
Capgemini Section-1: Syllabus
42 pages
AEC Collection PDF
100% (1)
AEC Collection PDF
2 pages
Callrevu Freephoneevaluation
No ratings yet
Callrevu Freephoneevaluation
1 page
Design Development of Applications KCS075
No ratings yet
Design Development of Applications KCS075
2 pages
Wcms Unit-2-2
No ratings yet
Wcms Unit-2-2
19 pages
Computer Network
No ratings yet
Computer Network
4 pages
Tour & Travel Management System
No ratings yet
Tour & Travel Management System
8 pages
Steganography
No ratings yet
Steganography
23 pages
Vardhaman College of Engineering, Hyderabad: Autonomous Institute Affiliated To JNTUH
No ratings yet
Vardhaman College of Engineering, Hyderabad: Autonomous Institute Affiliated To JNTUH
2 pages
Report 162
No ratings yet
Report 162
2 pages
Cambridge International Advanced Subsidiary and Advanced Level
No ratings yet
Cambridge International Advanced Subsidiary and Advanced Level
16 pages
ITE 8 Scope and Sequence
No ratings yet
ITE 8 Scope and Sequence
6 pages
Assignment 4 (1) .Edited
No ratings yet
Assignment 4 (1) .Edited
13 pages
(WWW - Entrance-Exam - Net) - DOEACC B Level-Network Management & Information Security Sample Paper 1
No ratings yet
(WWW - Entrance-Exam - Net) - DOEACC B Level-Network Management & Information Security Sample Paper 1
2 pages
Question Bank 2
No ratings yet
Question Bank 2
6 pages
Beckhoff Twincat 3 Basics
No ratings yet
Beckhoff Twincat 3 Basics
69 pages
Assignment No 2: Q1) C Program For Fixed Incremental Algorithm
No ratings yet
Assignment No 2: Q1) C Program For Fixed Incremental Algorithm
13 pages
C For C++ Programmers
No ratings yet
C For C++ Programmers
6 pages
ITU Regional Global Key ICT Indicator Aggregates Nov 2022 Revised 15feb2023
No ratings yet
ITU Regional Global Key ICT Indicator Aggregates Nov 2022 Revised 15feb2023
35 pages
Thug2 PC Manual
No ratings yet
Thug2 PC Manual
68 pages
Wasser Catalogue 2022 Fin
No ratings yet
Wasser Catalogue 2022 Fin
12 pages