0% found this document useful (0 votes)

166 views

Hadoop MapReduce Join & Counter With Example

Uploaded by

jppn33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

166 views

Hadoop MapReduce Join & Counter With Example

Uploaded by

jppn33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

(/)

Hadoop MapReduce Join & Counter with Example

What is Join in Mapreduce?
Mapreduce Join

Once a join in MapReduce is distributed, either Mapper or Reducer uses the smaller dataset to
perform a lookup for matching records from the large dataset and then combine those records
to form output records.

In this tutorial, you will learn-

What is a Join in MapReduce?

Types of Join
How to Join two DataSets: MapReduce Example
What is Counter in MapReduce?
Types of MapReduce Counters
Counters Example

Types of Join
Depending upon the place where the actual join is performed, joins in Hadoop are classified
into-

1. Map-side join - When the join is performed by the mapper, it is called as map-side join. In this
type, the join is performed before data is actually consumed by the map function. It is
mandatory that the input to each map is in the form of a partition and is in sorted order. Also,
there must be an equal number of partitions and it must be sorted by the join key.

2. Reduce-side join - When the join is performed by the reducer, it is called as reduce-side join.
There is no necessity in this join to have a dataset in a structured form (or partitioned).

Here, map side processing emits join key and corresponding tuples of both the tables. As an
effect of this processing, all the tuples with same join key fall into the same reducer which then
joins the records with same join key.

An overall process flow of joins in Hadoop is depicted in below diagram.

(/images/Big_Data/061114_1003_Introductio1.png)
Types of Joins in Hadoop MapReduce

How to Join two DataSets: MapReduce Example

There are two Sets of Data in two Different Files (shown below). The Key Dept_ID is common in
both files. The goal is to use MapReduce Join to combine these files

(/images/Big_Data/061114_1032_MapReduceHa1.png)
il
File 1

(/images/Big_Data/061114_1032_MapReduceHa2.png)
File 2

Input: The input data set is a txt file, DeptName.txt & DepStrength.txt

Download Input Files From Here (https://drive.google.com/uc?

export=download&id=0B_rQGHfXD8ltdUdCS3gzR1RKNFE)

Ensure you have Hadoop installed. Before you start with the MapReduce Join example actual
process, change user to 'hduser' (id used while Hadoop configuration, you can switch to the
userid used during your Hadoop config ).

su - hduser_

(/images/Big_Data/061114_1032_MapReduceHa3.png)

Step 1) Copy the zip file to the location of your choice

(/images/Big_Data/061114_1032_MapReduceHa4.png)

Step 2) Uncompress the Zip File

sudo tar -xvf MapReduceJoin.tar.gz

(/images/Big_Data/061114_1032_MapReduceHa5.png)

Step 3) Go to directory MapReduceJoin/

cd MapReduceJoin/

(/images/Big_Data/061114_1032_MapReduceHa6.png)
Step 4) Start Hadoop

$HADOOP_HOME/sbin/start-dfs.sh

$HADOOP_HOME/sbin/start-yarn.sh

(/images/Big_Data/061114_1032_MapReduceHa7.png)

Step 5) DeptStrength.txt and DeptName.txt are the input files used for this MapReduce Join
example program.

These file needs to be copied to HDFS using below command-

$HADOOP_HOME/bin/hdfs dfs -copyFromLocal DeptStrength.txt DeptName.txt /

(/images/Big_Data/061114_1032_MapReduceHa8.png)

Step 6) Run the program using below command-

$HADOOP_HOME/bin/hadoop jar MapReduceJoin.jar MapReduceJoin/JoinDriver/DeptSt

rength.txt /DeptName.txt /output_mapreducejoin

(/images/Big_Data/061114_1032_MapReduceHa9.png)
(/images/Big_Data/061114_1032_MapReduceHa10.png)

Step 7) After execution, output file (named 'part-00000') will stored in the directory
/output_mapreducejoin on HDFS

Results can be seen using the command line interface

$HADOOP_HOME/bin/hdfs dfs -cat /output_mapreducejoin/part-00000

(/images/Big_Data/061114_1032_MapReduceHa11.png)

Results can also be seen via a web interface as-

(/images/Big_Data/061114_1032_MapReduceHa12.png)

Now select 'Browse the filesystem' and navigate upto /output_mapreducejoin

(/images/Big_Data/061114_1032_MapReduceHa13.png)

Open part-r-00000
(/images/Big_Data/061114_1032_MapReduceHa14.png)

Results are shown

(/images/Big_Data/061114_1032_MapReduceHa15.png)

NOTE: Please note that before running this program for the next time, you will need to delete
output directory /output_mapreducejoin

$HADOOP_HOME/bin/hdfs dfs -rm -r /output_mapreducejoin

Alternative is to use a different name for the output directory.

What is Counter in MapReduce?

A Counter in MapReduce is a mechanism used for collecting and measuring statistical
information about MapReduce jobs and events. Counters keep the track of various job statistics
in MapReduce like number of operations occurred and progress of the operation. Counters are
used for Problem diagnosis in MapReduce.

Hadoop Counters are similar to putting a log message in the code for a map or reduce. This
information could be useful for diagnosis of a problem in MapReduce job processing.

Typically, these counters in Hadoop are defined in a program (map or reduce) and are
incremented during execution when a particular event or condition (specific to that counter)
occurs. A very good application of Hadoop counters is to track valid and invalid records from an
input dataset.

Types of MapReduce Counters

There are basically 2 types of MapReduce (/introduction-to-mapreduce.html) Counters

1. Hadoop Built-In counters:There are some built-in Hadoop counters which exist per job.
Below are built-in counter groups-
MapReduce Task Counters - Collects task specific information (e.g., number of input
records) during its execution time.
FileSystem Counters - Collects information like number of bytes read or written by a
task
FileInputFormat Counters - Collects information of a number of bytes read through
FileInputFormat
FileOutputFormat Counters - Collects information of a number of bytes written
through FileOutputFormat
Job Counters - These counters are used by JobTracker. Statistics collected by them
include e.g., the number of task launched for a job.
2. User Defined Counters

In addition to built-in counters, a user can define his own counters using similar functionalities
provided by programming languages (/best-programming-language.html). For example, in
Java (/java-tutorial.html)'enum' are used to define user defined counters.

Counters Example
An example MapClass with Counters to count the number of missing and invalid values. Input
data file used in this tutorial Our input data set is a CSV file, SalesJan2009.csv
(https://drive.google.com/uc?export=download&id=0B_vqvT0ovzHccGJ1VjVic1AwbGc)
public static class MapClass
extends MapReduceBase
implements Mapper<LongWritable, Text, Text, Text>
{
static enum SalesCounters { MISSING, INVALID };
public void map ( LongWritable key, Text value,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException
{

//Input string is split using ',' and stored in 'fields' array

String fields[] = value.toString().split(",", -20);
//Value at 4th index is country. It is stored in 'country' variable
String country = fields[4];

//Value at 8th index is sales data. It is stored in 'sales' variable

String sales = fields[8];

if (country.length() == 0) {
reporter.incrCounter(SalesCounters.MISSING, 1);
} else if (sales.startsWith("\"")) {
reporter.incrCounter(SalesCounters.INVALID, 1);
} else {
output.collect(new Text(country), new Text(sales + ",1"));
}
}
}

Above code snippet shows an example implementation of counters in Hadoop Map Reduce.

Here, SalesCounters is a counter defined using 'enum'. It is used to

count MISSING and INVALID input records.

In the code snippet, if 'country' field has zero length then its value is missing and hence
corresponding counter SalesCounters.MISSING is incremented.

Next, if 'sales' field starts with a " then the record is considered INVALID. This is indicated by
incrementing counter SalesCounters.INVALID.

 Prev (/create-your-first-hadoop-program.html) Report a Bug

Next  (/introduction-to-flume-and-sqoop.html)
YOU MIGHT LIKE:

BIGDATA SDLC DEVOPS

(/how-to-install- (/notepad-plus-plus- (/splunk-tutorial.html)

hadoop.html) alternative.html) (/splunk-
(/how-to-install- (/notepad-plus- tutorial.html)
hadoop.html) plus-alternative.html) Splunk Tutorial for
How to Install Hadoop with 10 Best Notepad++ Beginners: What is Splunk
Step by Step Configuration Alternatives for Windows, Tool? How to Use?
on Ubuntu Mac, Linux (2021) (/splunk-tutorial.html)
(/how-to-install- (/notepad-plus-plus-
hadoop.html) alternative.html)

REVIEW SDLC SDLC

(/best-oled-monitor- (/mvc-interview- (/difference-system-

4k.html) (/best-oled- questions.html) software-application-
monitor-4k.html) (/mvc-interview- software.html)
Best OLED Monitors for PC | questions.html) (/difference-system-
4k Computers Monitors Top 55 MVC Interview software-application-
(/best-oled-monitor-4k.html) Questions & Answers software.html)
(/mvc-interview- Di erence between System
questions.html) So ware and Application
So ware
(/difference-system-
software-application-
software.html)

BigData Tutorials
5) MAPReduce (/introduction-to-mapreduce.html)

6) Hadoop Example (/create-your-first-hadoop-program.html)

7) Counters & Joins In MapReduce (/introduction-to-counters-joins-in-map-reduce.html)

8) Sqoop (/introduction-to-flume-and-sqoop.html)

9) FLUME (/create-your-first-flume-program.html)
 (https://www.facebook.com/guru99com/)
 (https://twitter.com/guru99com) 
(https://www.linkedin.com/company/guru99/)

(https://www.youtube.com/channel/UC19i1XD6k88KqHlET8atqFQ)

(https://forms.aweber.com/form/46/724807646.htm)

About
About Us (/about-us.html)
Advertise with Us (/advertise-us.html)
Write For Us (/become-an-instructor.html)
Contact Us (/contact-us.html)

Career Suggestion
SAP Career Suggestion Tool (/best-sap-module.html)
Software Testing as a Career (/software-testing-career-
complete-guide.html)

Interesting
eBook (/ebook-pdf.html)
Blog (/blog/)
Quiz (/tests.html)
SAP eBook (/sap-ebook-pdf.html)

Execute online
Execute Java Online (/try-java-editor.html)
Execute Javascript (/execute-javascript-online.html)
Execute HTML (/execute-html-online.html)
Execute Python (/execute-python-online.html)

Privacy Policy (/privacy-policy.html) | Affiliate
Disclaimer (/affiliate-earning-disclaimer.html) | ToS
(/terms-of-service.html)

Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
Mobidig 200
100% (1)
Mobidig 200
26 pages
ISM Project Outline - MBA CP DIV. A & C - Jan.2011 - April 2011
No ratings yet
ISM Project Outline - MBA CP DIV. A & C - Jan.2011 - April 2011
10 pages
MapReduce Questions
No ratings yet
MapReduce Questions
8 pages
Using Counters in Hadoop
No ratings yet
Using Counters in Hadoop
2 pages
S MapReduce Types Formats Features 06
No ratings yet
S MapReduce Types Formats Features 06
26 pages
Hadoop For Dummies: Mapreduce To The Rescue
No ratings yet
Hadoop For Dummies: Mapreduce To The Rescue
17 pages
Data Science
No ratings yet
Data Science
7 pages
Unit 4 Handouts
No ratings yet
Unit 4 Handouts
13 pages
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
17 pages
Map Red
No ratings yet
Map Red
6 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Interview Questions for IBM Mainframe Developers
From Everand
Interview Questions for IBM Mainframe Developers
Robert Wingate
1/5 (1)
Lecture 03
No ratings yet
Lecture 03
26 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
L4
No ratings yet
L4
65 pages
bda megh
No ratings yet
bda megh
50 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
CC UNIT-7
No ratings yet
CC UNIT-7
16 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
S MapReduce Types Formats Features 03
No ratings yet
S MapReduce Types Formats Features 03
16 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
No ratings yet
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
7 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Unit IV BDA
No ratings yet
Unit IV BDA
32 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Hadoop Tutorial - YDN
No ratings yet
Hadoop Tutorial - YDN
14 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Why MapReduce
No ratings yet
Why MapReduce
8 pages
Join Algorithms Using Mapreduce: A Survey: Vikas Jadhav, Jagannath Aghav, Sunil Dorwani
No ratings yet
Join Algorithms Using Mapreduce: A Survey: Vikas Jadhav, Jagannath Aghav, Sunil Dorwani
5 pages
Bda Unit 3
No ratings yet
Bda Unit 3
22 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Summary Hadoop
No ratings yet
Summary Hadoop
2 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
Hadoop MapReduce Tutorial
No ratings yet
Hadoop MapReduce Tutorial
25 pages
MapReduce and Yarn
No ratings yet
MapReduce and Yarn
39 pages
Map Reduce Workflow Colloquim
No ratings yet
Map Reduce Workflow Colloquim
30 pages
Hadoop
No ratings yet
Hadoop
34 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
37 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
S MapReduce Types Formats
100% (2)
S MapReduce Types Formats
22 pages
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
What Is MapReduce in Hadoop - Architecture - Example
No ratings yet
What Is MapReduce in Hadoop - Architecture - Example
7 pages
Pentaho Tutorial - Pentaho Data Integration (PDI) Tutorial
No ratings yet
Pentaho Tutorial - Pentaho Data Integration (PDI) Tutorial
13 pages
What Is BIG DATA - Introduction, Types, Characteristics, Example
No ratings yet
What Is BIG DATA - Introduction, Types, Characteristics, Example
11 pages
Rapid Web Development With Python/Django: Julian Hill
No ratings yet
Rapid Web Development With Python/Django: Julian Hill
37 pages
What Is MongoDB - Introduction, Architecture, Features & Example
No ratings yet
What Is MongoDB - Introduction, Architecture, Features & Example
8 pages
AngularJS Tutorial For Beginners - Learn AngularJS Step by Step
No ratings yet
AngularJS Tutorial For Beginners - Learn AngularJS Step by Step
7 pages
Machine Learning For Everyone
No ratings yet
Machine Learning For Everyone
3 pages
3 Ways To Uninstall Ubuntu Software - Wikihow
No ratings yet
3 Ways To Uninstall Ubuntu Software - Wikihow
6 pages
Semantic Data Solutions
No ratings yet
Semantic Data Solutions
23 pages
Journal Citation Metrics: A Primer and Guide
No ratings yet
Journal Citation Metrics: A Primer and Guide
14 pages
Angular - Tour of Heroes App and Tutorial
No ratings yet
Angular - Tour of Heroes App and Tutorial
5 pages
Noteworthy Machine Learning
No ratings yet
Noteworthy Machine Learning
14 pages
Angular 9/8 Tutorial by Example: REST Crud Apis & HTTP GET Requests With
No ratings yet
Angular 9/8 Tutorial by Example: REST Crud Apis & HTTP GET Requests With
39 pages
Automated RTG: The Yard Revolution
No ratings yet
Automated RTG: The Yard Revolution
12 pages
RPZ Conditions Flyer
No ratings yet
RPZ Conditions Flyer
1 page
Two Types of HTTP Messages:: Request Response
No ratings yet
Two Types of HTTP Messages:: Request Response
16 pages
SSL/TLS Vulnerability Scanner Report (Light)
No ratings yet
SSL/TLS Vulnerability Scanner Report (Light)
1 page
Translation Tasks
No ratings yet
Translation Tasks
2 pages
Denon AVR X250BT Service Manual
No ratings yet
Denon AVR X250BT Service Manual
82 pages
Huawei GSM Rru3008 Introduction 090420 Issue1.0 B
No ratings yet
Huawei GSM Rru3008 Introduction 090420 Issue1.0 B
42 pages
TL Emy B Eye Massager
No ratings yet
TL Emy B Eye Massager
1 page
HS UNIT 6. ARTIFICIAL INTELLIGENCE
No ratings yet
HS UNIT 6. ARTIFICIAL INTELLIGENCE
7 pages
Practical Work 3 Zener Diode
No ratings yet
Practical Work 3 Zener Diode
9 pages
Top 10 Questions
No ratings yet
Top 10 Questions
3 pages
Difference of Windows and Linux Operating System
No ratings yet
Difference of Windows and Linux Operating System
13 pages
1st Quarter Remedial Test - TLE ICT Grade 9
No ratings yet
1st Quarter Remedial Test - TLE ICT Grade 9
3 pages
80 - Network Group Encryption
No ratings yet
80 - Network Group Encryption
8 pages
Common Account Edit
No ratings yet
Common Account Edit
2 pages
5 Jets56012024 22155 Other 79818 1 18 20240227
No ratings yet
5 Jets56012024 22155 Other 79818 1 18 20240227
10 pages
Returning Students - How To Make School Fees Payment Via Interswitch
No ratings yet
Returning Students - How To Make School Fees Payment Via Interswitch
9 pages
By: Chankey Pathak
No ratings yet
By: Chankey Pathak
44 pages
M.phil - PH.D Prospectus 2019 Final
No ratings yet
M.phil - PH.D Prospectus 2019 Final
31 pages
XXX Test Summary and Sign Off
No ratings yet
XXX Test Summary and Sign Off
9 pages
The Good Parts of AWS
No ratings yet
The Good Parts of AWS
176 pages
Mymymymy
No ratings yet
Mymymymy
23 pages
Job Description-Piping Engineer
No ratings yet
Job Description-Piping Engineer
1 page
Maximum Flow: Algorithms and Networks
No ratings yet
Maximum Flow: Algorithms and Networks
55 pages
I30 Motor Controller Product Manual.
No ratings yet
I30 Motor Controller Product Manual.
14 pages
Modbus Messaging Implementation Guide V1 0a
No ratings yet
Modbus Messaging Implementation Guide V1 0a
46 pages
Windows Install - Flutter
No ratings yet
Windows Install - Flutter
1 page
Alcosmart Doser and 2in120060505
No ratings yet
Alcosmart Doser and 2in120060505
2 pages
Mainboard Foxconn Model-661M07
No ratings yet
Mainboard Foxconn Model-661M07
40 pages
Quiz (AI Assistants L1) - Attempt Review
100% (1)
Quiz (AI Assistants L1) - Attempt Review
11 pages
Telecommunications Assignment
No ratings yet
Telecommunications Assignment
17 pages
Duragres Max Unit 3 1200x2400mm1200x1800mm and
No ratings yet
Duragres Max Unit 3 1200x2400mm1200x1800mm and
66 pages
What Is Cybercrime
No ratings yet
What Is Cybercrime
3 pages