!python Seminar

BIG FAT DATA AND
MAPREDUCE
&
WORKING IN THE CLOUD
 MAP ARCHITECTURE-COMPONENTS OF
MAPREDUCE ARCHITECTURE.
 ADVANTAGES OF WORD CLOUD-
DRAWBACKS OF WORD CLOUD
BIG FAT DATA
AND
MAPREDUCE
BIG FAT DATA AND
BIG FAT DATA :
MAPREDUCE
 Big data refers to the users write data-driven parallel programs to
execute in large-scale, distributed environments.
MapReduce :
 MapReduce is a programming model or pattern within the Hadoop
framework that is used to access big data stored in the Hadoop File
System (HDFS). It is a core component, integral to the functioning of
the Hadoop framework.
MAPREDUCE AND
PHASES
Why is it called MapReduce?
 Well, MapReduce is directly inspired by TWO
functional programming, because the PHASES:
Map and Reduce functions are the basic
functions used in functional
programming. MAP REDUCE
 Map task: Splits input data into tuples TASK TASK
(key/value pairs)
 Reduce task: Merges the tuples from the
map task into a smaller collection of
tuples
MAPREDUCE ARCHITECTURE
COMPONENTS OF MAPREDUCE
ARCHITECTURE
1. Client: The MapReduce client is
3. Hadoop Map Reduce Master: It
the one who brings the Job to the divides the particular job into
MapReduce for processing. subsequent job-parts.
There can be multiple clients 4. Job-Parts: The task or sub-jobs
available that continuously send that are obtained after dividing the
jobs for processing to the Hadoop main job. The result of all the job-
MapReduce Manager. parts combined to produce the final
2. Job: The Map Reduce Job is the output.
actual work that the client wanted 5. Input Data: The data set that is
to do which is comprised of so fed to the Map Reduce for
many smaller tasks that the client processing.
wants to process or execute. 6. Output Data: The final result is
obtained after the processing.
MAPREDUCE
 Python MapReduce is a programming model. It enables the processing and
creation of large amounts of data by separating work into discrete jobs. It
also allows work to be performed in parallel across a cluster of machines.
 Job Tracker : The master that schedules jobs and tracks tasks assigned to
Task Trackers.
: There is also one important component of MapReduce
Architecture known as Job History Server.
 Task Trackers : The slaves that track jobs and report their status to the Job
Tracker.
: This Task Tracker is deployed on each of the nodes available in
the cluster that executes the Map and Reduce task as instructed by Job
Tracker.
WORKING IN
THE CLOUD
WORKING IN THE WORD
CLOUD
 A python word cloud or Tag Cloud is
a visualization technique commonly
used to display tags or keywords
from websites. These single words
reflect the webpage's context and
are clustered together in the Word
Cloud.
 Significant textual data points can
be highlighted using a word cloud.
 Word clouds are widely used for
analyzing data from social network
websites.
 For generating word cloud in Python,
modules needed are – matplotlib,  To install these packages, run the
pandas and wordcloud.
 The dataset used for generating word following commands :
 pip install matplotlib
cloud is collected from UCI Machine
Learning Repository. It consists of  pip install pandas
YouTube comments on videos of popular  pip install wordcloud
artists.  Dataset Link :
 The below word cloud has been
https://archive.ics.uci.edu/ml/machi
generated using Youtube04-Eminem.csv
file in the dataset. One interesting task ne-learning-databases/00380/
might be generating word clouds using
other csv files available in the dataset.
# Python program to generate tokens = val.split()
WordCloud # Converts each token into lowercase
# importing all necessary modules for i in range(len(tokens))
from wordcloud import WordCloud, tokens[i] = tokens[i].lower()
STOPWORDS comment_words += " ".join(tokens)+" "
import matplotlib.pyplot as plt wordcloud = WordCloud(width = 800, height
import pandas as pd = 800,
# Reads 'Youtube04-Eminem.csv' file background_color ='white',
df = pd.read_csv(r"Youtube04- stopwords = stopwords,
Eminem.csv", encoding ="latin-1") min_font_size =
comment_words = '' 10).generate(comment_words)
stopwords = set(STOPWORDS) # plot the WordCloud image
# iterate through the csv file plt.figure(figsize = (8, 8), facecolor = None)
for val in df.CONTENT: plt.imshow(wordcloud)
# typecaste each val to string plt.axis("off")
val = str(val) plt.tight_layout(pad = 0)
# split the value plt.show()
OUTPUT:
WORD CLOUD
ADVANTAGES & DISADVANTAGES DRAWBACKS OF WORD
OF WORD CLOUD CLOUD
1.Analyzing customer and 1.Word Clouds are not perfect for
employee feedback. every situation.
2. More flexibility and reliability, 2. Data should be optimized for
increased performance and context.
efficiency, and helps to lower IT
costs.
-----------
1.Not accurately reflect the
content of text if slightly different
words are used for the same idea
THANK YOU
FOR HEARING

!python Seminar

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

!python Seminar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

!python Seminar

Uploaded by

Copyright:

Available Formats

BIG FAT DATA AND

You might also like

!python Seminar

Uploaded by

Document Informationclick to expand document informationshort notes

Document Informationclick to expand document information

Copyright:

Available Formats

!python Seminar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

!python Seminar

Uploaded by

Copyright:

Available Formats

BIG FAT DATA AND

You might also like