0% found this document useful (0 votes)

28 views18 pages

Data Visualization Part 2

Uploaded by

mrunalikulkarni0331

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views18 pages

Data Visualization Part 2

Uploaded by

mrunalikulkarni0331

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Study Notes

Data Visualization:
 Bar Plots
 Count Plots
 Histograms
 Cat Plots (Box, Violin, Swarm, Boxen)
 Multiple Plots using FacetGrid
 Joint Plots
 KDE Plots
 Pairplots
 Heatmaps
 Scatter Plots
Study Notes- Data Visualization

1. Bar Plots
Bar plots are an effective way to visualize various data types, including counts,
frequencies, percentages, or averages. They are particularly valuable for comparing data
across different categories.

Use Cases:

1. Categorical Comparison: In a bar plot, each bar represents a specific category, and the
height of the bar reflects the aggregated value associated with that category (such as
count, sum, or mean).

For instance, you can use a bar plot to show the average age of Titanic passengers based on
gender.

# Simple barplot
sns.barplot(data=titanic, x="who", y="age", estimator='mean',
errorbar=None, palette='viridis')
plt.title('Simple Barplot')
plt.xlabel('Person')
plt.ylabel('Average Age')
plt.show();

using Seaborn

2. Proportional Representation with Stacked Bar Charts:

Bar plots can also be used to visualize proportions or percentages. By adjusting the
height of each bar to reflect the proportion of observations within a category, stacked
bar charts allow for a comparison of the relative distribution across different categories.

2
Study Notes- Data Visualization

For example, a stacked bar chart could show the proportion of males from various towns
aboard the Titanic.

#Prepare data for next plot

data = titanic.groupby('embark_town').agg({'who':'count','sex': lambda x: (x=='male').sum()}).reset_index()
data.rename(columns={'who':'total', 'sex':'male'}, inplace=True)
data.sort_values('total', inplace=True)

# Barplot Showing Part of Total

sns.set_color_codes("pastel")
sns.barplot(x="total", y="embark_town", data=data,
label="Female", color="b")
sns.set_color_codes("muted")
sns.barplot(x="male", y="embark_town", data=data,
label="Male", color="b")
plt.title('Barplot Showing Part of Total')
plt.xlabel('Number of Persons')
plt.legend(loc='upper right')
plt.show()

using Seaborn

3. Comparing Subcategories within Categories using Clustered Bar Plots:

Clustered bar plots group multiple bars within each category to represent different
subcategories, making it easier to compare and analyze data across them.

For instance, you could use a clustered bar plot to compare the average age of males and
females within each class.

3
Study Notes- Data Visualization

# Clustered barplot
sns.barplot(data=titanic, x='class', y='age', hue='sex',
estimator='mean', errorbar=None, palette='viridis')
plt.title('Clustered Barplot')
plt.xlabel('Class')
plt.ylabel('Average Age')
plt.show();

using Seaborn

2. Count Plots
A count plot visualizes the frequency of occurrences for each category within a
categorical variable. The x-axis shows the categories, while the y-axis indicates the count
or frequency of each category.

Use Cases:

 Frequency Distribution of Categorical Variables: Each bar in the plot represents a

category, and its height reflects the number of observations in that category, helping
identify the most and least common categories.

For example, the count plot can be used to show the status of passengers on the Titanic.

# Simple Countplot
sns.countplot(data=titanic, x='alive', palette='viridis')
plt.title('Simple Countplot')
plt.show();

4
Study Notes- Data Visualization

using Seaborn
Analyzing the relationship between different categorical variables
For example, examining the status of passengers based on gender on the Titanic.

# Clustered Countplot
sns.countplot(data=titanic, y="who",
hue="alive", palette='viridis')
plt.title('Clustered Countplot')
plt.show();

using Seaborn

3. Histograms
Histograms are visual representations that display the distribution of a dataset, helping
5
Study Notes- Data Visualization

to uncover key characteristics such as normality, skewness, or multiple peaks. They

show the frequency or count of data points within specific intervals or "bins." The x-axis
represents the range of values in the dataset, divided into equal bins, while the y-axis
shows the frequency or count of observations within each bin. The height of each bar
corresponds to the number of data points in that bin.
Use Cases:
4. To visualize the distribution, central tendency, range, and spread of a continuous or
numeric variable, as well as to identify any patterns or outliers.

# Histogram with KDE

sns.histplot(data=iris, x='sepal_width', kde=True)
plt.title('Histogram with KDE')
plt.show();

using Seaborn

2. 2. Compare theCompare the distribution of multiple continuous variables

For example, comparing the distribution of petal length and sepal length in flowers.

# Histogram with multiple features

sns.histplot(data=iris[['sepal_length','sepal_width']])
plt.title('Multi-Column Histogram')
plt.show()

6
Study Notes- Data Visualization

3. Compare the distribution of a continuous variable across different categories

For example, comparing the distribution of petal length among various flower species.

#Stacked Histogram
sns.histplot(iris, x='sepal_length', hue='species', multiple='stack',
linewidth=0.5)
plt.title('Stacked Histogram')
plt.show()

using Seaborn

4. Cat Plots (Box, Violin, Swarm, Boxen)

A catplot is a high-level, flexible function that integrates several categorical seaborn
plots, such as boxplots, violinplots, swarmplots, pointplots, barplots, and countplots.
Use Cases:

7
Study Notes- Data Visualization

 Analyze the relationship between categorical and continuous variables

 Obtain a statistical summary of a continuous variable
Examples:

# Boxplot
sns.boxplot(data=tips, x='time', y='total_bill', hue='sex', palette='viridis')
plt.title('Boxplot')
plt.show()

using Seaborn
# Violinplot
sns.violinplot(data=tips, x='day', y='total_bill', palette='viridis')
plt.title('Violinplot')
plt.show()

8
Study Notes- Data Visualization

using Seaborn
#Swarmplot
sns.swarmplot(data=tips, x='time', y='tip', dodge=True, palette='viridis', hue='sex', s=6)
plt.title('SwarmPlot')
plt.show()

using Seaborn
#StripPlot
sns.stripplot(data=tips, x='tip', hue='size', y='day', s=25, alpha=0.2,
jitter=False, marker='D',palette='viridis')
plt.title('StripPlot')
plt.show()

using Seaborn

9
Study Notes- Data Visualization

5Multiple Plots using FacetGrid

FacetGrid is a feature in the Seaborn library that enables the creation of multiple data subsets
arranged in a grid-like structure. Each plot in the grid represents a category, and these subsets
are defined by the column names specified in the 'col' and 'row' attributes of FacetGrid(). The
plots in the grid can be of any type supported by Seaborn, such as scatter plots, line plots, bar
plots, or histograms.
Use Cases:

 Compare and analyze different groups or categories within a dataset

 Create subplots efficiently
Example: Boxplots for pulse rate during various activities

# Creating subplots using FacetGrid

g = sns.FacetGrid(exercise, col='kind', palette='Paired')

# Drawing a plot on every facet

g.map(sns.boxplot, 'pulse')

g.set_titles(col_template="Pulse rate for {col_name}")

g.add_legend();

using Seaborn

Scatter plots for flipper length and body mass of Penguins from different islands

# Creating subplots using FacetGrid

g = sns.FacetGrid(penguins, col='island',hue='sex', palette='Paired')

# Drawing a plot on every facet

g.map(sns.scatterplot, 'flipper_length_mm', 'body_mass_g')
g.set_titles(template="Penguins of {col_name} Island")
g.add_legend();

10
Study Notes- Data Visualization

using Seaborn

6. Joint Plots
A joint plot combines univariate and bivariate visualizations in one figure. The central plot
typically features a scatter plot or hexbin plot to represent the joint distribution of two
variables. Additional plots, such as histograms or Kernel Density Estimates (KDEs), are displayed
along the axes to show the individual distributions of each variable.
Use Cases:

 Analyzing the relationship between two variables

# Hex Plot with Histogram margins

sns.jointplot(x="mpg", y="displacement", data=mpg,
height=5, kind='hex', ratio=2, marginal_ticks=True)

 Comparing the individual distributions of two variables

Example: Comparing displacement and miles per gallon (MPG) for cars

11
Study Notes- Data Visualization

Comparison of acceleration and horsepower for cars from different countries

# Scatter Plot with KDE Margins
sns.jointplot(x="horsepower", y="acceleration", data=mpg,
hue="origin", height=5, ratio=2, marginal_ticks=True);

7. KDE Plots
A KDE (Kernel Density Estimate) plot provides a smooth, continuous representation of the
probability density function for a continuous random variable. The y-axis represents the density
or likelihood of observing specific values, while the x-axis displays the variable's values.
Use Cases:

 Visualizing the distribution of a single variable (univariate analysis)

 Gaining insights into the shape, peaks, and skewness of the distribution
Example: Comparing the horsepower of cars in relation to the number of cylinders

#Overlapping KDE Plots

sns.kdeplot(data=mpg, x='horsepower', hue='cylinders', fill=True,
palette='viridis', alpha=.5, linewidth=0)
plt.title('Overlapping KDE Plot')
plt.show(

12
Study Notes- Data Visualization

Comparing the weight of cars across different countries:

#Stacked KDE Plots

sns.kdeplot(data=mpg, x="weight", hue="origin", multiple="stack")
plt.title('Stacked KDE Plot')
plt.show();

8. Pairplots
A pair plot is a visualization technique that helps explore relationships between multiple
variables in a dataset. It creates a grid of scatter plots where each variable is plotted against
13
Study Notes- Data Visualization

every other variable, with diagonal entries displaying histograms or density plots to show the
distribution of values for each variable.
Use Cases:

 Identifying correlations or patterns between variables, such as linear or non-linear

relationships, clusters, or outliers
Example: Visualizing the relationships between different features of penguins

#Simple Pairplot
sns.pairplot(data=penguins, corner=True);

# Pairplot with hues

sns.pairplot(data=penguins, hue='species');

14
Study Notes- Data Visualization

By adding hue to the plot, we can clearly distinguish key differences between the various
species of penguins.

9. Heatmaps
Heatmaps are visualizations that use color-coded cells to represent the values within a matrix
or table of data. In a heatmap, the rows and columns correspond to two different variables, and
the color intensity of each cell indicates the value or magnitude of the data point at their
intersection.
Use Cases:

 Correlation analysis and visualizing pivot tables that aggregate data by rows and
columns.
Example: Visualizing the correlation between all the numerical columns in the mpg
dataset.

Selection of numeric columns from the dataset

num_cols = list(mpg.select_dtypes(include='number'))

15
Study Notes- Data Visualization

fig = plt.figure(figsize=(12,7))

#Correlation Heatmap
sns.heatmap(data=mpg[num_cols].corr(),
annot=True, cmap=sns.cubehelix_palette(as_cmap=True))
plt.title('Heatmap of Correlation matrix');

plt.show();

10. Scatter Plots

A scatter plot visualizes the relationship between two continuous variables by displaying
individual data points on a graph. The x-axis represents one variable, and the y-axis represents
the other, creating a pattern of scattered points that illustrates their interaction.

Use Cases:

1. Relationship Analysis: Scatter plots help identify the relationship between two variables, such
as positive correlation (both increase together), negative correlation (one increases as the other
decreases), or no correlation.
Example: A scatter plot can show that the horsepower and weight of cars are positively
correlated.

# Simple Scatterplot
sns.scatterplot(data=mpg, x='weight', y='horsepower', s=150, alpha=0.7)
plt.title('Simple Scatterplot')
plt.show();

16
Study Notes- Data Visualization

using Seaborn
Outlier Detection: Scatter plots effectively highlight outliers, which are data points that significantly
deviate from the general trend or pattern.

Clustering and Group Identification: By analyzing the distribution of points, scatter plots can reveal
natural groupings or patterns among the variables.
Example: Comparing the horsepower and weight of cars manufactured in different countries.

# Scatterplot with Hue

sns.scatterplot(data=mpg, x='weight', y='horsepower', s=150, alpha=0.7,
hue='origin', palette='viridis')
plt.title('Scatterplot with Hue')
plt.show()

# Scatterplot with Hue and Markers

sns.scatterplot(data=mpg, x='weight', y='horsepower', s=150, alpha=0.7,
style='origin',palette='viridis', hue='origin')
plt.title('Scatterplot with Hue and Markers')
plt.show()

17
Study Notes- Data Visualization

# Scatterplot with Hue & Size

sns.scatterplot(data=mpg, x='weight', y='horsepower', sizes=(40, 400), alpha=.5,
palette='viridis', hue='origin', size='cylinders')
plt.title('Scatterplot with Hue & Size')
plt.show

 Trend Analysis: Scatter plots can illustrate the progression or changes in variables over
time by plotting data points in chronological order, making it easier to identify trends or
shifts in behavior.
 Model Validation: Scatter plots are useful for assessing a model's accuracy by
comparing predicted values against actual values, highlighting any deviations or
patterns in the model’s predictions.

Debunking The Myth of Secret Trusts PDF
No ratings yet
Debunking The Myth of Secret Trusts PDF
5 pages
Payment Instruction Form (Pif) : Angeles
No ratings yet
Payment Instruction Form (Pif) : Angeles
1 page
Sales Agreement - Accounts or Goods (Upfront)
No ratings yet
Sales Agreement - Accounts or Goods (Upfront)
5 pages
BEIS Design
100% (2)
BEIS Design
20 pages
Summative Assessment No. 1 2021-20223
100% (1)
Summative Assessment No. 1 2021-20223
2 pages
Seaborn
No ratings yet
Seaborn
7 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
Unit 5 Seaborn Visualization
No ratings yet
Unit 5 Seaborn Visualization
35 pages
Pandas Cheat Sheet 2
No ratings yet
Pandas Cheat Sheet 2
12 pages
Advanced Plot Types With Seaborn
No ratings yet
Advanced Plot Types With Seaborn
8 pages
Seaborn 2
No ratings yet
Seaborn 2
49 pages
Data Visualization
No ratings yet
Data Visualization
31 pages
Data Analysis Graphs
No ratings yet
Data Analysis Graphs
9 pages
Data Visualization in Python With Libraries
No ratings yet
Data Visualization in Python With Libraries
28 pages
Advanced Plot Types With Seaborn
No ratings yet
Advanced Plot Types With Seaborn
4 pages
Sl-3 Assignment No.8
No ratings yet
Sl-3 Assignment No.8
21 pages
Pandas 3-2
No ratings yet
Pandas 3-2
27 pages
Session 7 - Data Visualization With Python
No ratings yet
Session 7 - Data Visualization With Python
17 pages
Seaborn: Key Features
No ratings yet
Seaborn: Key Features
5 pages
19 Matplotlib
No ratings yet
19 Matplotlib
26 pages
DSBDL Write Ups 8 To 10
No ratings yet
DSBDL Write Ups 8 To 10
7 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
An Introduction To Seaborn
No ratings yet
An Introduction To Seaborn
42 pages
Sections Revision Part 2
No ratings yet
Sections Revision Part 2
7 pages
Data Visualisation
No ratings yet
Data Visualisation
5 pages
Mfds QnA
No ratings yet
Mfds QnA
8 pages
Dataviz Cheatsheet
No ratings yet
Dataviz Cheatsheet
9 pages
ProgrammingForDS12 Viz
No ratings yet
ProgrammingForDS12 Viz
25 pages
Data Visualization
No ratings yet
Data Visualization
33 pages
DMV Unit-4-1 PDF
No ratings yet
DMV Unit-4-1 PDF
10 pages
Plot Per Columns Features Kde or Normal Distribution Seaborn in Details
No ratings yet
Plot Per Columns Features Kde or Normal Distribution Seaborn in Details
272 pages
Unit 05
No ratings yet
Unit 05
26 pages
Data Visualization Python Tutorial
100% (1)
Data Visualization Python Tutorial
9 pages
DSBDAL - Assignment No 9
No ratings yet
DSBDAL - Assignment No 9
12 pages
1.1 Univariate Analysis: 1.1.1 Categorical Data
No ratings yet
1.1 Univariate Analysis: 1.1.1 Categorical Data
10 pages
Unit 5
No ratings yet
Unit 5
25 pages
Visualization With Help of PANDAS
No ratings yet
Visualization With Help of PANDAS
83 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
13 pages
Matplotlib
No ratings yet
Matplotlib
5 pages
Data Visualization
No ratings yet
Data Visualization
19 pages
Day-5 DS Practical
No ratings yet
Day-5 DS Practical
4 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
Lab 5 &6
No ratings yet
Lab 5 &6
6 pages
Seaborn
No ratings yet
Seaborn
8 pages
Python
No ratings yet
Python
29 pages
Visualization
No ratings yet
Visualization
18 pages
Update Chapter 4 Data Visualizations
No ratings yet
Update Chapter 4 Data Visualizations
19 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
36 pages
Experiment 02: AIM: To Perform Data Visualization Theory
No ratings yet
Experiment 02: AIM: To Perform Data Visualization Theory
4 pages
Data Visualisation Using Pyplot
No ratings yet
Data Visualisation Using Pyplot
20 pages
Day 15
No ratings yet
Day 15
20 pages
Visualization
No ratings yet
Visualization
28 pages
Chapter11 DataVisualization2
No ratings yet
Chapter11 DataVisualization2
43 pages
BarPlot and Histogram
No ratings yet
BarPlot and Histogram
28 pages
V Unit
No ratings yet
V Unit
17 pages
Seaborn For Statistical Plots
No ratings yet
Seaborn For Statistical Plots
26 pages
Experiment No 9
No ratings yet
Experiment No 9
13 pages
Matplotlib
No ratings yet
Matplotlib
9 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
41 pages
Data Mining - Week - 6
No ratings yet
Data Mining - Week - 6
7 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Image Histogram: Unveiling Visual Insights, Exploring the Depths of Image Histograms in Computer Vision
From Everand
Image Histogram: Unveiling Visual Insights, Exploring the Depths of Image Histograms in Computer Vision
Fouad Sabry
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
AI Algorithms: Foundations, Applications, and Advancements
From Everand
AI Algorithms: Foundations, Applications, and Advancements
Anand Vemula
No ratings yet
ZVC405
No ratings yet
ZVC405
2 pages
CDP 2023 2028
No ratings yet
CDP 2023 2028
410 pages
Popins
No ratings yet
Popins
1 page
OBS 211 - Examination 2023
No ratings yet
OBS 211 - Examination 2023
1 page
EasyJet 2022 ARA Sustainability 221215
No ratings yet
EasyJet 2022 ARA Sustainability 221215
20 pages
Integrated Principle Accounting
No ratings yet
Integrated Principle Accounting
20 pages
Academic Self Efficacy As A Predictor of Chemistry Achievement Among Senior Secondary School Students in Anambra State
No ratings yet
Academic Self Efficacy As A Predictor of Chemistry Achievement Among Senior Secondary School Students in Anambra State
6 pages
About Me: Career Objective
No ratings yet
About Me: Career Objective
1 page
MSC Data Science For Business X - HEC
No ratings yet
MSC Data Science For Business X - HEC
5 pages
Pre-Board AGRICULTURAL EXTENSION AND COMMUNICATION
No ratings yet
Pre-Board AGRICULTURAL EXTENSION AND COMMUNICATION
12 pages
Hubbard Student Report Feb 2008
No ratings yet
Hubbard Student Report Feb 2008
13 pages
Jay Omar CV For FAO
No ratings yet
Jay Omar CV For FAO
2 pages
Bus Pass Fees
No ratings yet
Bus Pass Fees
3 pages
7 Sem Syllabus AI
No ratings yet
7 Sem Syllabus AI
11 pages
Install Manual AT155 (03608-0620) PDF
No ratings yet
Install Manual AT155 (03608-0620) PDF
27 pages
IT Risk Management in Cloud Computing
No ratings yet
IT Risk Management in Cloud Computing
207 pages
Y8 5 L5 WS1 Final
No ratings yet
Y8 5 L5 WS1 Final
3 pages
Research Evidence and Policy Making
No ratings yet
Research Evidence and Policy Making
20 pages
Module 4: Laplace and Z Transform Lecture 30: Laplace Transform
No ratings yet
Module 4: Laplace and Z Transform Lecture 30: Laplace Transform
3 pages
HORLICKS
0% (1)
HORLICKS
86 pages
11th Compute Applications Vol 2 Chapter 9 To 13 Study Material English Medium
No ratings yet
11th Compute Applications Vol 2 Chapter 9 To 13 Study Material English Medium
34 pages
Sewing Machines Project RC Jubilee Hills
No ratings yet
Sewing Machines Project RC Jubilee Hills
2 pages
Infographic Resume Sample
No ratings yet
Infographic Resume Sample
1 page
PACF Series: Packaged Air Conditioners
No ratings yet
PACF Series: Packaged Air Conditioners
44 pages
Sabmiller Water Footprinting Report Final
No ratings yet
Sabmiller Water Footprinting Report Final
28 pages