Data Visualization

Document about data visualization Mmm Mmmmm

Uploaded by

mm.hh.m.1520002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Data Visualization

Document about data visualization Mmm Mmmmm

Uploaded by

mm.hh.m.1520002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Data Visualization

CSDS3202
Contents
• Basics of data visualization
• Importance of visualization
• Design principles
• Introduction to data visualization libraries in Python – Matplotlib,
Seaborn
• Generate basic graphs such as bar graphs, histograms, line graphs,
scatter plots
• Generate statistical visualizations of data such as distribution plots, pie
charts, bar charts, heat maps
• Genarate Visual maps and images
Data visualization
• Data Visualization is the fundamental aspect of data science for
representing data in a graphical format.
• It is the process of creating visual elements such as charts, graphs,
maps, and diagrams to communicate complex information in a easy
and understandable manner.
• The goal of data visualization is to tell a story and present data in a
way that helps the user(data experts and non-experts) make sense of
the data and identify patterns, trends, and insights.
Basics of data visualization
• Choosing the right type of
chart or graph
• Designing for clarity and
simplicity
• Using appropriate scales
• Highlighting important
information
Importance of visualization
• Data visualization will help
• simplify complex data and make it more accessible to a wide range of audiences.

• identify hidden patterns and trends in large datasets.

• decision-makers make more informed decisions by finding the insights from the
data.

• enhance data quality by making it easier to spot errors and anomalies in the
data.

• save time by presenting data in a way that is easy to understand and analyze.
Design principles
• Clarity - clear and easy to understand and avoid clutter
• Simplicity - simple and focused on the most important information
• Consistency -use consistent colors, fonts, and other design elements
throughout the visualization
• Context -provide context for the data by including labels, annotations, and
other relevant information
• Accuracy -ensure that the data is accurate and transparent
• Functionality - should be functional and interactive with features such as
zooming, filtering, and sorting.
• Aesthetics - should be visually appealing and engaging with pleasing colors,
fonts, and other design elements
Data visualization libraries
• There are some popular Python libraries for visualization:
1.matplotlib,
2.seaborn,
3.bokeh, and
4.altair etc.
• However, in this chapter, we will mainly focus on the popular
libraries such as Matplotlib and Seaborn .
Why matplotlib?
• Matplotlib produces publication-quality figures in a variety of
formats
• Supports interactive environments across Python platforms.
• Pandas comes equipped with useful wrappers around several
matplotlib plotting routines
• Quick and handy plotting of Series and DataFrame objects.
• Before using Matplotlib, you need to import the library into
your Python script or notebook
import matplotlib.pyplot as plt
Dataset used
• Consider the following Dataframe
‘df’ for creating various plots

import pandas as pd
import matplotlib.pyplot as plt
dic = {'year': [2010, 2011, 2012, 2013, 2014, 2015],
'sales': [50, 70, 90, 80, 100, 120],
'profit': [20, 24, 30, 15, 35, 50],
'rating':['B','B','A','B','A','A']}
df = pd.DataFrame(dic)
Line Plot
• Create line plot to show the sales
and profit for all years
plt.plot(df['year'], df['sales'], label='Sales',linestyle='-
',marker='>')
plt.plot(df['year'], df['profit'],
label='profit',linestyle='--',color='r')

plt.xlabel('Year')
plt.ylabel('Amount')
plt.title('Sales and Profit')
plt.legend()
plt.show()
Line Plot - changing limits, ticks and
figure size
Try it on the plot
• plt.xlim(low,high)
• plt.ylim(low,high)
• plt.xticks([list of points])
• plt.yticks([list of points])
• plt.figure(figsize=(width,height))
Scatter Plot
• Used to observe relationship between
two numeric variables
• Scatter plot is used to identify
patterns, trends, clusters, outliers
and anomalies in data.

plt.scatter(df['sales'],df['profit'],c='g')
plt.xlabel('Sales')
plt.ylabel('Profit')
plt.show()
Histogram
• Used to represent the frequency of
occurrence of a range of values in
a dataset using bars of different
heights.
• Represent the distributional
features(peaks, outliers, skewed)
of variables

plt.hist(df['sales'],bins=4,)
plt.show()
Bar Plot - Vertical
• Used to represent data associated
with categorical variable.
• Used to compare the values of
different categories or groups

plt.bar(df['rating'],df['profit'])
plt.show()

Note: Displays the highest value for

both rating values
Bar Plot - Vertical
• plot to display the median value of
the profit column based on rating

df.groupby('rating')['profit'].median()
.plot(kind='bar')
plt.show()
Bar Plot - Horizontal
• plot to display the mean value of
the profit column based on rating

df.groupby('rating')['profit'].mean().
plot(kind='barh',color='red')
plt.show()
Box Plot
• It is a graphical representation
of the distribution of a dataset.
It displays the median,
quartiles, and outliers of the
data.

plt.boxplot(df['profit'])
plt.show()
Pie Chart
• pie chart is a circular statistical
chart divided into slices to show the
numerical proportion.
• Each slice of the pie chart
represents a category or value, and
the size of each slice corresponds to
its percentage of the whole.
df.groupby('rating')['sales'].mean().plot(kind
='pie',autopct="%3.2f%%",explode=[0.2,0])
Subplots
• Create multiple plots in one figure
• Use subplot() method to plot multiple plots.
• 3 parameters used
• number of rows
• number of columns
• current index
Subplots
• Create a subplot with 4 plots
plt.figure(figsize=(15,10))
plt.subplot(2,2,1)
plt.boxplot(df['profit'])
plt.subplot(2,2,2)
df.groupby('rating')['sales'].mean().plot(kind='pie',autopct="%3.2f%%",e
xplode=[0.2,0])
plt.subplot(2,2,3)
plt.scatter(df['sales'],df['profit'],c='g')
plt.xlabel('Sales')
plt.ylabel('Profit')
plt.subplot(2,2,4)
df.groupby('rating')['profit'].mean().plot(kind='barh',color='red')
plt.suptitle("Combined Chart")
plt.show()
Saving Plots
plt.boxplot(df['profit'])
plt.savefig('chart1.jpg')

The boxplot will be saved in the local disk with the name
chart1.jpg
Seaborn
• Seaborn is a library for making statistical plots using Python.
• It builds on top of matplotlib and integrates closely
with pandas
• Import the library before using it

import seaborn as sns

Distribution Plot
Used for visualizing the
distributions in the data that
includes histograms and kernel
density estimation

sns.displot(data=df, x="profit", kd
e=True)
Pair Plot
Shows joint and marginal
distributions for all pairwise
relationships and for each
variable, respectively.

sns.pairplot(data=df, hue="rating")
Heat Map
• It is graphical representation of
data using colors to visualize the
value of the matrix.
• The scale will represent the
different range of values.

Following heat map shows the values for both

‘sales’ and ‘profit’ columns

sns.heatmap(data = df.iloc[:,1:-1],annot=True)
Visualizing Maps using Folium Library
• Folium is one of the best libraries in Python for visualizing
geospatial data.
• Install the library using the command

!pip install folium

And import the library as

import folium
Creating a map and adding markers
muscat = [23.5880, 58.3829]
nizwa = [22.9171, 57.5363]
salalah = [17.0194, 54.1108]
m = folium.Map(muscat,zoom_start=5,tiles="Stamen
Terrain")
folium.Marker(muscat,popup="Muscat City").add_to(m)
folium.Marker(nizwa,tooltip = "Nizwa").add_to(m)
folium.CircleMarker(salalah,radius=40,popup="Salalah").
add_to(m)
m
Choropleth Maps
This code will use the given dataframe “oman” to
create the choropleth map
om =
'https://raw.githubusercontent.com/codeforamerica/cli
ck_that_hood/master/public/data/oman.geojson'
m1 = folium.Map(muscat,zoom_start=6)
folium.Choropleth(geo_data=om,
data = oman,
columns =['Region','count'],
key_on = 'feature.properties.name',
fill_color='YlOrRd',highlight=True).add_to(m1)
m1
Visualizing Image Datasets
• Visualizing image dataset from
sklearn library using matplotlib
• Display 10 random images
from sklearn.datasets import fetch_olivetti_faces
import matplotlib.pyplot as plt
dataset = fetch_olivetti_faces(shuffle=True,
random_state=10)

for k in range(10):
plt.subplot(2,5,k+1)
plt.imshow(dataset.data[k].reshape(64,64))
plt.title('person '+str(dataset.target[k]))
plt.axis('off')
plt.show()
Visualizing Image Datasets
• Display 10 digits as image
from sklearn.datasets import load_digits
digits = load_digits()
for number in range(1,11):
plt.subplot(3, 4, number)
plt.imshow(digits.images[number],cmap='binary')
plt.axis('off')
plt.show()
References
• Charles Mahler (2023). 7 Best Practices for Data Visualization.
Available 2023-02-12 at https://thenewstack.io/7-best-practices-
for-data-visualization/
• Matplotlib (n.d.), Visualization with Python. Available 2023-02-12 at
https://matplotlib.org/
• Seaborn (n.d.), seaborn: statistical data visualization. Available
2023-02-12 at https://seaborn.pydata.org/

SMDM Extended Project Report
No ratings yet
SMDM Extended Project Report
9 pages
Seaborn
No ratings yet
Seaborn
7 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
Matplotlib
No ratings yet
Matplotlib
9 pages
Data Visualization Python Tutorial
No ratings yet
Data Visualization Python Tutorial
9 pages
DMV Unit-4-1.pdf
No ratings yet
DMV Unit-4-1.pdf
10 pages
Data Visualization part 2
No ratings yet
Data Visualization part 2
18 pages
unit 4
No ratings yet
unit 4
27 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
41 pages
unit_5 (1)
No ratings yet
unit_5 (1)
81 pages
Data Unit4
No ratings yet
Data Unit4
8 pages
Data Visulization
No ratings yet
Data Visulization
2 pages
Update Chapter 4 Data Visualizations
No ratings yet
Update Chapter 4 Data Visualizations
19 pages
Data Visualization Cheatsheet 1702209209
100% (1)
Data Visualization Cheatsheet 1702209209
7 pages
Class 1 Data Visualization in Python using matplotlib
No ratings yet
Class 1 Data Visualization in Python using matplotlib
13 pages
Project Synopsis of Python
No ratings yet
Project Synopsis of Python
6 pages
Jmis 26 4 167
No ratings yet
Jmis 26 4 167
9 pages
21CS644 Module 4
No ratings yet
21CS644 Module 4
24 pages
Data Visualization With Python PDF
93% (14)
Data Visualization With Python PDF
662 pages
Unit 3 (Python)
No ratings yet
Unit 3 (Python)
29 pages
Data Visualization
No ratings yet
Data Visualization
11 pages
Essential Python Data Visualization Libraries 1687141550
No ratings yet
Essential Python Data Visualization Libraries 1687141550
16 pages
Unit-5 new
No ratings yet
Unit-5 new
31 pages
IT_R23_Skills Development-DATA VISUALIZATION Lab
No ratings yet
IT_R23_Skills Development-DATA VISUALIZATION Lab
31 pages
DV LAb Staff
No ratings yet
DV LAb Staff
73 pages
Visualization
No ratings yet
Visualization
28 pages
Introduction To Data Visualization
No ratings yet
Introduction To Data Visualization
10 pages
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
No ratings yet
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
6 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
Prac - 6
No ratings yet
Prac - 6
7 pages
Chapter 4 Data Visualizations
No ratings yet
Chapter 4 Data Visualizations
24 pages
Ex - 08 DS
No ratings yet
Ex - 08 DS
11 pages
Data Visualization
No ratings yet
Data Visualization
48 pages
Advanced Python Chap 3 Part 1
No ratings yet
Advanced Python Chap 3 Part 1
49 pages
Python Data Analysis and Visualization 100 Practical Exercises With Results and Explanations (Yuka, Horikawa Yui, Kirigaya Kouta Etc.) (Z-Library)
No ratings yet
Python Data Analysis and Visualization 100 Practical Exercises With Results and Explanations (Yuka, Horikawa Yui, Kirigaya Kouta Etc.) (Z-Library)
453 pages
MATPLOTLIB BASICS
No ratings yet
MATPLOTLIB BASICS
27 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Data Manipulation and Visualization
No ratings yet
Data Manipulation and Visualization
21 pages
Matplotlib Pandas Guide (1)
No ratings yet
Matplotlib Pandas Guide (1)
9 pages
Data Visualisation Using Pyplot
No ratings yet
Data Visualisation Using Pyplot
20 pages
Input Data Categorical (e.g., Product Categories, Months) Purpose Visualize Comparisons Between Different Categories.
No ratings yet
Input Data Categorical (e.g., Product Categories, Months) Purpose Visualize Comparisons Between Different Categories.
28 pages
Day-5 DS Practical
No ratings yet
Day-5 DS Practical
4 pages
DSBDL Write Ups 8 To 10
No ratings yet
DSBDL Write Ups 8 To 10
7 pages
ProgrammingForDS12_viz
No ratings yet
ProgrammingForDS12_viz
25 pages
Data Visualization
No ratings yet
Data Visualization
28 pages
Advanced Visualizations
No ratings yet
Advanced Visualizations
32 pages
Ex1_Plotting and Visualization using Numpy and Pandas
No ratings yet
Ex1_Plotting and Visualization using Numpy and Pandas
14 pages
Data Visualization using Matplotlib in Python
No ratings yet
Data Visualization using Matplotlib in Python
15 pages
visualization
No ratings yet
visualization
18 pages
pandas_cheat_sheet_2
No ratings yet
pandas_cheat_sheet_2
12 pages
Datascienece
No ratings yet
Datascienece
18 pages
1 - Introduction - Data Visualization
No ratings yet
1 - Introduction - Data Visualization
3 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
36 pages
Seaborn 2
No ratings yet
Seaborn 2
49 pages
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
No ratings yet
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
14 pages
Article Review 6 Eng
No ratings yet
Article Review 6 Eng
31 pages
Artificial Intelligence - 14 - Data Visualization With Python
No ratings yet
Artificial Intelligence - 14 - Data Visualization With Python
58 pages
Data Visualization Using Matplotlib
No ratings yet
Data Visualization Using Matplotlib
10 pages
iQRcDEQBTHLdcA6Ncp4A_Miuul_Data_Visualization_Cheat_Sheet
No ratings yet
iQRcDEQBTHLdcA6Ncp4A_Miuul_Data_Visualization_Cheat_Sheet
12 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
From Chaos to Concept: A Team Oriented Approach to Designing World Class Products and Experiences
From Everand
From Chaos to Concept: A Team Oriented Approach to Designing World Class Products and Experiences
Kevin Collamore Braun
No ratings yet
Standard Scores
0% (1)
Standard Scores
26 pages
Ch14 Anthropometry and Bio Mechanics Oct2009
No ratings yet
Ch14 Anthropometry and Bio Mechanics Oct2009
62 pages
Scimthdwdsearch
No ratings yet
Scimthdwdsearch
1 page
Problem Statement
No ratings yet
Problem Statement
1 page
A Study On Impact of Select Factors On The Price of Gold: Dr. Sindhu
No ratings yet
A Study On Impact of Select Factors On The Price of Gold: Dr. Sindhu
13 pages
Lesson Exemplar Statsweek7
No ratings yet
Lesson Exemplar Statsweek7
7 pages
x95 PDF
No ratings yet
x95 PDF
18 pages
Reflection Paper Establishing Efficacy Based Single Arm Trials Submitted Pivotal Evidence Marketing - en
No ratings yet
Reflection Paper Establishing Efficacy Based Single Arm Trials Submitted Pivotal Evidence Marketing - en
15 pages
Instant Download (Ebook) Quantitative Data Analysis with SPSS Release 10 for Windows: A Guide for Social Scientists by Alan Bryman, Duncan Cramer ISBN 9780415243995, 0415243998, 0415244005, 0203471547 PDF All Chapters
100% (3)
Instant Download (Ebook) Quantitative Data Analysis with SPSS Release 10 for Windows: A Guide for Social Scientists by Alan Bryman, Duncan Cramer ISBN 9780415243995, 0415243998, 0415244005, 0203471547 PDF All Chapters
81 pages
Manonmaniam Sundaranar University: B.Sc. Statistics - Iii Year
No ratings yet
Manonmaniam Sundaranar University: B.Sc. Statistics - Iii Year
87 pages
Certify4Sure.splk 1002 Copy
No ratings yet
Certify4Sure.splk 1002 Copy
108 pages
Sta1510 Zulu Translation
No ratings yet
Sta1510 Zulu Translation
199 pages
Data Mining Techniques: Introductory and Advanced Topics
No ratings yet
Data Mining Techniques: Introductory and Advanced Topics
17 pages
Format of Research Paper
No ratings yet
Format of Research Paper
2 pages
Taro Yamanae Theory PDF
100% (3)
Taro Yamanae Theory PDF
34 pages
Research Discussion Lesson 1
No ratings yet
Research Discussion Lesson 1
7 pages
MIT18 05S14 Class26-Prob PDF
No ratings yet
MIT18 05S14 Class26-Prob PDF
12 pages
Leanmap FREE Probability Tree Calculator
No ratings yet
Leanmap FREE Probability Tree Calculator
2 pages
Statistic Assignment PDF
No ratings yet
Statistic Assignment PDF
2 pages
An Empirical Analysis of Initial Public Offering (IPO) Performance
No ratings yet
An Empirical Analysis of Initial Public Offering (IPO) Performance
25 pages
Econometrics Syllabus
No ratings yet
Econometrics Syllabus
4 pages
Bam 212 2023
No ratings yet
Bam 212 2023
116 pages
AI Chapter 6
No ratings yet
AI Chapter 6
28 pages
DA Unit 1
No ratings yet
DA Unit 1
24 pages
Risk Analysis For Information and Systems Engineering: INSE 6320 - Week 3 Session 2
No ratings yet
Risk Analysis For Information and Systems Engineering: INSE 6320 - Week 3 Session 2
23 pages
Types of Statistical Series
No ratings yet
Types of Statistical Series
8 pages
School of Information Science
No ratings yet
School of Information Science
2 pages
Applied Statistics in Business & Economics,: David P. Doane and Lori E. Seward
No ratings yet
Applied Statistics in Business & Economics,: David P. Doane and Lori E. Seward
37 pages
2011girlsbabynamesfinal tcm77-276135
No ratings yet
2011girlsbabynamesfinal tcm77-276135
141 pages