100% found this document useful (1 vote)

76 views

BigData MapReduce

MapReduce is a programming model used to process large datasets across clusters of computers. It works by having a master node divide input data into smaller subproblems and distribute them to worker nodes. Each worker node then processes its subset and returns results to the master node, which combines the results into the final output. Key aspects of MapReduce include mapping functions to divide the work, a partitioning function to group output data, and reduce functions to combine results from each partition.

Uploaded by

arjuncchaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

76 views

BigData MapReduce

Uploaded by

arjuncchaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Big Data

Map Reduce

Table of Contents
Key approach to work with Big Data............................................................................................................2
Mapping......................................................................................................................................................2
The Map Step..........................................................................................................................................2
Appling the Reduce Step.............................................................................................................................3
Reduce Step.............................................................................................................................................3
Map Reduce Data Flow................................................................................................................................4
A Closer Look at the map and partition Step...............................................................................................5

1
Big Data
Map Reduce

Key approach to work with Big Data

 MapReduce is a programing model for processing large data sets, and the name of an

implementation of the model by Google.

 MapReduce is typically used to do distribute computing of large datasets on clusters of

computers.

Worker Node1
Map
Problem Data

Master Node Worker Node2

Problem Data Worker Node3

Mapping

The Map Step

 The master node takes the input, divides it into smaller sub-problems, and distributed them to

worker nodes.

 This process is iterative which can lead to a multi-level tree structure.

2
Big Data
Map Reduce

 The worker nodes process their small problem and hand their result back to their parent node.

INPUT LIST

MAPPING FUNCTION

OUTPUT LIST

Appling the Reduce Step

Reduce Step

The master node will then collect the answer from all the child nodes and combine them in a meaningful

way to from the primary output, which is the answer to the problem that was put to the system.

Input List

MAPPING FUNCTION

Output List

3
Big Data
Map Reduce

Map Reduce Data Flow

Input Format

Split Split Split File

File
RR RR RR

Map Map Map

Partitioner

(Short)

Reduce

Output Format

 If we zoom in on each part of the MapReduce framework, we see this is a large distributed sort.
The most important steps are defined as follows.

 An input function

 A Map Function

 A Partition function

 A compare/sort function

4
Big Data
Map Reduce

 A reduce function

 An output writer

A Closer Look at the map and partition Step

 The map function takes a series of key/value pairs; it will then subdivide these further creating the

full structure.

 Each Map node output is assigned to a particular reducer by the application’s partition function for

sharing purpose.

 The partition function is given the key and the number of reduce and return the index.

 The input for each reduces is pulled from the machine where the map ran and sorted using the

application’s comparison function.

 The framework calls the applications reduce function once for each unique key in the sorted

order. The reduce can iterate through the values that are associated with the key and produce

zero or more outputs.

 The output writer writes the output of the reduce of the stable storage, usually a distributed file

system.

5
Big Data
Map Reduce

Input List

MAPPING FUNCTION

Output List

Siemens Safety Presentation
No ratings yet
Siemens Safety Presentation
28 pages
Data Protection Impact Assessment: Revision Date Initiator Nature of Change
100% (1)
Data Protection Impact Assessment: Revision Date Initiator Nature of Change
27 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
38 pages
Protegrity Database Protector
No ratings yet
Protegrity Database Protector
2 pages
20 Easy To Follow Poster Design Tutorials in Photoshop
No ratings yet
20 Easy To Follow Poster Design Tutorials in Photoshop
34 pages
Map Reduce With Hadoop:: Presented by ANIVESHA-126 ARITRA-128 RIA-142 Shashvat - 150 SHEKHAR-151
100% (1)
Map Reduce With Hadoop:: Presented by ANIVESHA-126 ARITRA-128 RIA-142 Shashvat - 150 SHEKHAR-151
9 pages
Map Reduce
100% (1)
Map Reduce
33 pages
MDX Tutorial
100% (1)
MDX Tutorial
31 pages
Machine Learning in Python - Course Notes
No ratings yet
Machine Learning in Python - Course Notes
36 pages
Machine Learning For Automation Software Testing Challenges, Use Cases Advantages & Disadvantages
No ratings yet
Machine Learning For Automation Software Testing Challenges, Use Cases Advantages & Disadvantages
7 pages
File Layout Example
No ratings yet
File Layout Example
4 pages
Big Data and Hadoop
No ratings yet
Big Data and Hadoop
37 pages
McGrawHill_CompTIA_CySA_Cybersecurity_Analyst_Certification_Practice (2)
No ratings yet
McGrawHill_CompTIA_CySA_Cybersecurity_Analyst_Certification_Practice (2)
420 pages
Decentralized Web Platform - Public
No ratings yet
Decentralized Web Platform - Public
18 pages
Oltp Olap Rtap
No ratings yet
Oltp Olap Rtap
53 pages
Instant Access To Data Lake Architecture Designing The Data Lake and Avoiding The Garbage Dump First Edition Bill Inmon Ebook Full Chapters
100% (4)
Instant Access To Data Lake Architecture Designing The Data Lake and Avoiding The Garbage Dump First Edition Bill Inmon Ebook Full Chapters
62 pages
Implementing An Azure SQL Data Warehouse
No ratings yet
Implementing An Azure SQL Data Warehouse
41 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
ETL Tools in 2024 - How To Evaluate Effectively
No ratings yet
ETL Tools in 2024 - How To Evaluate Effectively
12 pages
MIE1628 Big Data Analytics Lecture8
No ratings yet
MIE1628 Big Data Analytics Lecture8
82 pages
Best Practices in Data Migration: Monographseries
No ratings yet
Best Practices in Data Migration: Monographseries
13 pages
Lecture 9 Overview of Geospatial Programming Languages Block 2
No ratings yet
Lecture 9 Overview of Geospatial Programming Languages Block 2
41 pages
Nvidia-Learning-Training Course-Catalog
No ratings yet
Nvidia-Learning-Training Course-Catalog
27 pages
Scope, and The Inter-Relationships Among These Entities
No ratings yet
Scope, and The Inter-Relationships Among These Entities
12 pages
The Data Warehousing Development Lifecycle
100% (1)
The Data Warehousing Development Lifecycle
5 pages
Planning and Implementing Data Services
No ratings yet
Planning and Implementing Data Services
33 pages
Folium Documentation: Release 0.2.0
No ratings yet
Folium Documentation: Release 0.2.0
16 pages
Lab - Qlik Replicate Oracle To Azure Synapse
No ratings yet
Lab - Qlik Replicate Oracle To Azure Synapse
23 pages
Presentation Deck Part 21612531397089
No ratings yet
Presentation Deck Part 21612531397089
59 pages
Python Programming-Grade 9
No ratings yet
Python Programming-Grade 9
53 pages
Talend ESB Container AG 50b en
No ratings yet
Talend ESB Container AG 50b en
63 pages
Big Data Processing Types
No ratings yet
Big Data Processing Types
22 pages
Big Data
No ratings yet
Big Data
3 pages
iCEDQ Ebooks - DataOps Implementation Guide
No ratings yet
iCEDQ Ebooks - DataOps Implementation Guide
13 pages
Metadata Management On A Hadoop Eco-System: Whitepaper by
No ratings yet
Metadata Management On A Hadoop Eco-System: Whitepaper by
12 pages
APN Partner Project Plan Template
No ratings yet
APN Partner Project Plan Template
8 pages
Advanced SQL Case Study
No ratings yet
Advanced SQL Case Study
42 pages
Elastic An Introduction To Apm The What Why and How
No ratings yet
Elastic An Introduction To Apm The What Why and How
24 pages
Bigdata MINT PDF
No ratings yet
Bigdata MINT PDF
4 pages
Data Governance Maturity Model
No ratings yet
Data Governance Maturity Model
42 pages
Resentation@ Eclipse Iot Days Grenoble, April 28 2016: Gilles Privat, Orange Labs
100% (1)
Resentation@ Eclipse Iot Days Grenoble, April 28 2016: Gilles Privat, Orange Labs
40 pages
Big Data Landscape 2017
No ratings yet
Big Data Landscape 2017
1 page
Leveling Up With SQL Advanced Techniques For Transforming Data Into Insights 9781484296851 9781484296844
No ratings yet
Leveling Up With SQL Advanced Techniques For Transforming Data Into Insights 9781484296851 9781484296844
449 pages
Cloudera Nokia Case Study Final
No ratings yet
Cloudera Nokia Case Study Final
2 pages
Abstract On The Artificial Intelegence
No ratings yet
Abstract On The Artificial Intelegence
15 pages
Data Warehouse Concepts: Avinash Kanumuru Diya Jana Debyajit Majumder
No ratings yet
Data Warehouse Concepts: Avinash Kanumuru Diya Jana Debyajit Majumder
482 pages
Kafka & Redis For Big Data Solutions: Christopher Curtin Head of Technical Research @chriscurtin
No ratings yet
Kafka & Redis For Big Data Solutions: Christopher Curtin Head of Technical Research @chriscurtin
43 pages
Messaging With RabbitMQ - Logical Link Diagram
100% (1)
Messaging With RabbitMQ - Logical Link Diagram
11 pages
Technologies For Handling Big Data: Prepared By: Saidatul Rahah Hamidi
No ratings yet
Technologies For Handling Big Data: Prepared By: Saidatul Rahah Hamidi
49 pages
SQL Python Connect
No ratings yet
SQL Python Connect
2 pages
Business Intelligence Fundamentals - 2324
No ratings yet
Business Intelligence Fundamentals - 2324
54 pages
SQL Server 2016 ITDM - Sales Deck
No ratings yet
SQL Server 2016 ITDM - Sales Deck
39 pages
Week-09-10-11-12 Fundamentals of Cybersecurity(2)
No ratings yet
Week-09-10-11-12 Fundamentals of Cybersecurity(2)
67 pages
MDX and DAX-compare and Contrast - Mark Whitehorn
No ratings yet
MDX and DAX-compare and Contrast - Mark Whitehorn
61 pages
2023 Technology Adoption Roadmap For Data and Analytics
No ratings yet
2023 Technology Adoption Roadmap For Data and Analytics
1 page
AWS Splunk Infrastructure Monitoring 101 The Power To Predict and Prevent
100% (1)
AWS Splunk Infrastructure Monitoring 101 The Power To Predict and Prevent
23 pages
Big Data Not Right Data Yes
No ratings yet
Big Data Not Right Data Yes
8 pages
Data Mining N Business Intelligence
No ratings yet
Data Mining N Business Intelligence
63 pages
Data Strategy Guide
No ratings yet
Data Strategy Guide
18 pages
Elastic-Introduction-to-application-performance-monitoring
No ratings yet
Elastic-Introduction-to-application-performance-monitoring
16 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
0542 (MDM) HubJavaUserExits en H2L
No ratings yet
0542 (MDM) HubJavaUserExits en H2L
18 pages
MDM 960 Sif
No ratings yet
MDM 960 Sif
150 pages
BigData - Oozie
No ratings yet
BigData - Oozie
5 pages
Migrating IDD Applications To The Business Entity Data Model
No ratings yet
Migrating IDD Applications To The Business Entity Data Model
24 pages
Big Data - Impala
No ratings yet
Big Data - Impala
5 pages
School Policies
No ratings yet
School Policies
8 pages
StudyGuides PMstudy Project Scope Management
No ratings yet
StudyGuides PMstudy Project Scope Management
19 pages
Serious Money-How To Make It and Enjoy It
100% (12)
Serious Money-How To Make It and Enjoy It
218 pages
Data Warehouse Road Map
No ratings yet
Data Warehouse Road Map
31 pages
Oracle Data IntegratorODI - PoV
No ratings yet
Oracle Data IntegratorODI - PoV
7 pages
Ultra Edit Reg Expr
No ratings yet
Ultra Edit Reg Expr
3 pages
400 Questions
No ratings yet
400 Questions
11 pages
400 Questions
No ratings yet
400 Questions
11 pages
The Role of Artificial Intelligence in Modern Healthcare
No ratings yet
The Role of Artificial Intelligence in Modern Healthcare
2 pages
Sheet Metal Operation
60% (5)
Sheet Metal Operation
17 pages
FINAL
No ratings yet
FINAL
24 pages
Lpic Course
No ratings yet
Lpic Course
64 pages
UCJV300 Multilayer Printing Guide
No ratings yet
UCJV300 Multilayer Printing Guide
30 pages
High Pressure Mud Pump
No ratings yet
High Pressure Mud Pump
4 pages
BBA Marketing Management II Session 1
No ratings yet
BBA Marketing Management II Session 1
57 pages
AICA Mindoro-Oriental
No ratings yet
AICA Mindoro-Oriental
10 pages
Total Result
No ratings yet
Total Result
4,084 pages
20.SOP For Boiler Initial Water Fillling
No ratings yet
20.SOP For Boiler Initial Water Fillling
9 pages
Wa0000.
No ratings yet
Wa0000.
4 pages
Digital Voltage Regulator: Installation and Maintenance
No ratings yet
Digital Voltage Regulator: Installation and Maintenance
20 pages
Instruction Manual: Ip Video Decoder
No ratings yet
Instruction Manual: Ip Video Decoder
30 pages
Yealink SIP-T27P Quick Start Guide V81 70
No ratings yet
Yealink SIP-T27P Quick Start Guide V81 70
8 pages
OWASP Webinar Script
No ratings yet
OWASP Webinar Script
4 pages
Manual Clasic50
No ratings yet
Manual Clasic50
16 pages
Stress for GRP Lines
No ratings yet
Stress for GRP Lines
9 pages
04 - Review of Literature
No ratings yet
04 - Review of Literature
10 pages
FujiXerox C1110 Service Manual
83% (6)
FujiXerox C1110 Service Manual
676 pages
ErationCard - RKSY-I - RationCardNo - 0801158416 - 33695812 - 30 - 03 - 2024 13 32 09
No ratings yet
ErationCard - RKSY-I - RationCardNo - 0801158416 - 33695812 - 30 - 03 - 2024 13 32 09
1 page
AJ Answer Bank
No ratings yet
AJ Answer Bank
69 pages
Internet Bill Format Hathway PDF
No ratings yet
Internet Bill Format Hathway PDF
4 pages
The Godfather Term One Sample Basic Four Annual Scheme of Learning Termly Scheme of Learning WEEK 1 - 12
No ratings yet
The Godfather Term One Sample Basic Four Annual Scheme of Learning Termly Scheme of Learning WEEK 1 - 12
304 pages
INF10003 Introduction To Business Information Systems: Mark Dale and Rohan Bennett February 2021
No ratings yet
INF10003 Introduction To Business Information Systems: Mark Dale and Rohan Bennett February 2021
34 pages
BCI164C
No ratings yet
BCI164C
8 pages
Because Every Fraction of A Second Counts: 8030HEPTA/GPS
No ratings yet
Because Every Fraction of A Second Counts: 8030HEPTA/GPS
5 pages
Technology Acquisition
No ratings yet
Technology Acquisition
14 pages
Dorian Armstrong Resume-2022
No ratings yet
Dorian Armstrong Resume-2022
3 pages