0% found this document useful (0 votes)

12 views

Data Visualization

Uploaded by

Saumya Gurnani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Data Visualization

Uploaded by

Saumya Gurnani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Data Visualization

Data Visualization
• Data visualization allows us to quickly interpret the data
and adjust different variables to see their effect.

• It allows us to observe the existing patterns .

• It helps to identify the extreme value that could be

anomaly.
Why Data Visualization Matters
• Visual perception vs. numerical data: humans are better
at processing visual information.

• Communicating insights effectively to stakeholders.

• Making data-driven decisions.

• Python provides various libraries that come with

different features for visualizing data.

• All these libraries come with different features and can

support various types of graphs. Four such python
libraries-

• Matplotlib
• seaborn
• ggplot
• plotly
.
• Matplotlib is a 2D plotting library.
.

• An easy-to-use, low-level data visualization library that

uses NumPy arrays and other extension codes to provide
better performance even on large size arrays.
• It consists of various plots like scatter plot, line plot,
histogram, etc.

Matplotlib workflow
Types of Data Visualizations
• Charts and Graphs: • Interactive Visualizations:
– Bar charts – Dashboards
– Line charts – Interactive plots
– Pie charts – Drill-down charts
– Scatter plots
– Histograms

• Geospatial Visualizations: • Advanced Visualizations:

– Maps – Tree maps
– Choropleth maps – Sankey diagrams
– Heatmaps – Word clouds
– Parallel coordinates
Bar Chart
• Bar chart is a graph that represents the category of data
with rectangular bars with lengths and heights that is
proportional to the values which they represent.
• To represent the frequency distribution of categorial
variables.
• It can be created using the bar() method.
Horizontal bar chart
• To represent more than seven category
• To represent ranking (eg. election results) and performance
Stacked bar chart
• These are used to show part to whole, so how much
different element contribute to total
Histogram
• A histogram is a type of bar chart with different heights
where the X-axis represents the bin ranges while the Y-
axis (height) gives information about frequency.

• It shows the frequency distribution of continuous data

eg. Time, age, weight.

• The hist() function is used to compute and create a

histogram. In histogram, if we pass categorical data then
it will automatically compute the frequency of that data
i.e. how often each value occurred.
Line Chart
• Line chart is used to represent a relationship between
two data X and Y on a different axis.

• Eg- stock market price changes over time.

• It is plotted using the plot() function.

Pie chart
• Pie chart is usually least used chart for data
analysis.
Scatter plots
• It is set of points that represents the values obtained for
two different variables plotted on horizontal and vertical
axes.

• Scatter plot is used to observe relationships or the

correlation between two numerical variables and uses dots
to represent relationship between them.

• Used to show the clustering trends or outlier.

• The scatter() method in the matplotlib library is used to

draw a scatter plot.
Scatter plots
Types of correlation
Comparisons of graphs
Line chart
import matplotlib.pyplot as plt

# initializing the data

x = [1, 2, 3, 4]
y = [20, 30, 40, 50]

# plotting the data

plt.plot(x, y) # state less way of plotting

# Adding the title

plt.title("Simple Line")

# Adding the labels

plt.ylabel("y-axis")
plt.xlabel("x-axis")
plt.show()
Another way

import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [11, 22, 33, 44]
# setup plot
fig, ax= plt.subplots(figsize=(10, 10)) # width, height
#plot data
ax.plot(x, y)
# customize plot
ax.set( title= "Simple plot",
xlabel = "X-axis",
ylabel= "Y –axis ") # save and show
fig.savefig(“images/sample-plot.png")
Scatter plot code
import matplotlib.pyplot as plt
import numpy as np
x=np.linspace(0, 10, 100) # create some data
fig, ax = plt. subplots() # add figure and axes
ax.scatter(x, np.sin(x), c='red' )
Make a bar plot from dictionary
import matplotlib.pyplot as plt
import numpy as np
x=np.linspace(0, 10, 100) # create some data
cookies_price = { "almond cookies": 10,
"cashew cookies": 15,
"plane cookies": 5}
fig, ax = plt.subplots() # add figure and axes
ax.bar(cookies_price.keys(), cookies_price.values())
ax.set( title= "Cookies store",
ylabel= " price ($) “
);
Horizontal bar plot
(when more than 7 categories comparisons required )
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [11, 22, 33, 44]
# setup plot
fig, ax= plt.subplots(figsize=(10, 10)) # width, height
#plot data
ax.barh(x, y)
# customize plot
ax.set( title= "Simple plot",
xlabel = "X-axis",
ylabel= "Y –axis ") # save and show
fig.savefig(“images/sample-plot.png")
Horizontal bar plot with dictionary

import matplotlib.pyplot as plt

import numpy as np
x=np.linspace(0, 10, 100) # create some data
cookies_price = { "almond cookies": 10,
"cashew cookies": 15,
"plane cookies": 5}
fig, ax = plt.subplots() # add figure and axes
ax.barh(list(cookies_price.keys()), list(cookies_price.values()))
ax.set( title= "Cookies store",
ylabel= " price ($) “
);
Subplot
import matplotlib.pyplot as plt
import numpy as npx=np.linspace(0, 10, 100) # create some datacookies_
price = { "almond cookies": 10,
"cashew cookies": 15,
"plane cookies": 5}

# add figure and axes

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, figsize=(10,5))
ax1.plot(x, x/2);
ax2.scatter(np.random.random(10), np.random.random(10));
ax3.bar(cookies_price.keys(), cookies_price.values())
ax4.hist(np.random.randn(1000));
ax1.set( title= "Simple Plot", xlabel = "X-axis", ylabel= " Y-axis ");
Output graph
Box or whisker plot
• Box plot is a graphical representation of the distribution
of a dataset.
• It displays key summary statistics such as
the median, quartiles and potential outliers in a concise
and visual manner to compare different datasets.
• Box plots helps to identify the average value of the data,
how dispersed the data is, whether the data is skewed or
not (skewness).
Box or whisker plot

Box plot

Data distribution and skewness

.
A box plot consist of 5 things.

• Minimum
• First Quartile or 25%,
(calculated via (n+1)/4 th term for odd number of data points)
• Median (Second Quartile) or 50%
• Third Quartile or 75%
(calculated via 3(n+1)/4 th term for odd number of data points)
• Maximum
.

Q. Create a box plot for 12 values – 10, 12, 11, 15, 11, 14, 13, 17, 12,
22, 14, 11.
Sol- arrange in ascending order
10, 11, 11, 11, 12, 12, 13, 14, 14, 15, 17, 22
Median (Q2) = (12+13)/2 = 12.5; Since there were even values
Q1 = (11+11)/2 = 11 (first 6 values )
Q3 = (14+15)/2 = 14.5 (next 6 values )
IQR(Interquartile range) = Q3-Q1 = 14.5-11 = 3.5
Lower Limit = Q1-1.5*IQR = 11-1.5*3.5 = 5.75
Upper Limit = Q3+1.5*IQR = 14.5+1.5*3.5 = 19.75
Minimum maximum range within [5.75,19.75]
Box plot
import matplotlib.pyplot as plt
import numpy as np
y=[10, 11, 11, 11, 12, 12, 13, 14, 14, 15, 17, 22]
plt.boxplot(y)
plt.show()
Box plot with random data set
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(2)
x=np.random.normal(70,10,200)
plt.boxplot(x)
plt.show()
Plotting functions with Pandas
Call the plot() with data frame and Pass the plot
type as parameter kind
• df.plot(kind='bar')
• df.plot(kind=‘scatter')

• pandas.DataFrame.plot
— pandas 2.2.1 documentation (pydata.org)
Plotting functions with Pandas
.
import pandas as pd
import matplotlib.pyplot as plt
data_dict = {'name': [‘s1', ‘s2', ‘s3', ‘s4', ‘s5', ‘s6'],
'age': [20, 20, 21, 20, 21, 20],
'math_marks': [100, 90, 91, 98, 92, 95],
'physics_marks': [90, 100, 91, 92, 98, 95],
'chem_marks': [93, 89, 99, 92, 94, 92]
}
df = pd.DataFrame(data_dict)
df.head()
df.plot(kind='bar',
x='name',
y=‘chem_marks',
color=‘blue')
plt.title(‘Bar Plot')
plt.show()
import matplotlib.pyplot as plt
import pandas as pd
df=pd.read_csv("C:/Users/HP/Desktop/monthsales.csv")
df.head()
#plot data
df.plot(kind='line', color=['red', 'blue', 'brown', 'yellow]) # panda plot function
plt.title= "year wise sales",
plt.xlabel = "Months",
plt.ylabel= "Sales"
plt.show()
Time series Analysis
• A Time Series is defined as a series of data points
indexed in time order. The time order can be daily,
monthly, or even yearly or
• A time series is a set of observation taken at specified
times usually at equal intervals.
• Time Series Forecasting It is the process of using a
statistical model to predict future values of a time
series based on previous observed values.
.

• .
Components of time series
Components of time series
• Trends(T): shows a general direction of the time series data
over a long period of time. A trend can be increasing(upward),
decreasing(downward), or horizontal(stationary)
• Seasonality(S) : repeating trends or pattern respect to timing,
direction, and magnitude.
• Cycle(C) : repeating but don’t have any fix pattern and can
occur any time so harder to predict.
• Irregularity (I)(noise or residual): These are the fluctuations in
the time series data which become evident when trend and
cyclical variations are removed. These variations are
unpredictable, erratic, and may or may not be random.
Example of a Time Series that illustrates the number of passengers of an airline
per month from the year 1949 to 1960. (seasonal data)
When not to apply time series

• When the values are constant

• When the values are in the form of functions

Types of Time Series Data

Continuous Time Series Data

Discrete Time Series Data

Decomposition
• TSI Decomposition : It is used to separate different
components of a time series.

• The term stands for Trend, Seasonality and Irregularity

• Decomposition model can be additive or multiplicative.

Additive model : Y= T+S+I (when S is constant)

Multiplicative model: Y= T*S*I (when S is changing)
Stationarity
• Time series has a particular behavior over time, there is
very high probability that it will follow the same in future.
• Constant mean, constant variance, auto covariance that
does not depend on time.

• Tests to check stationarity-

Rolling statistics (plotting the moving average and moving
variance) and ADCF tests (non-stationary)
Non stationary process
Time series plot: Example : 1
import matplotlib.pyplot as plt
import pandas as pd
dates = pd.date_range('2024-01-01', periods=10) # Sample time series data
values = [5, 7, 8, 9, 6, 3, 5, 8, 7, 6]
# Creating a pandas DataFrame
df = pd.DataFrame(values, index=dates, columns=['Value'])
# Plotting the time series
plt.figure(figsize=(12, 6))
plt.plot(df.index, df['Value'], marker='o', linestyle='-')
plt.title('TS Plot')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(True)
plt.show()
Time series plot: Example : 2
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.stattools import adfuller
df = pd.read_csv("C:/Users/HP/Desktop/stock_data.csv", parse_dates=True,
index_col="Date")
# displaying the first five rows of dataset
df.head()
df.drop(columns='Unnamed: 0', inplace =True)
df.head()
sns.set(style="whitegrid") # Setting the style to whitegrid for a clean background
plt.figure(figsize=(12, 6)) # Setting the figure size
sns.lineplot(data=df, x='Date', y='High', label='High Price', color='blue')
plt.xlabel('Date')
plt.ylabel('High')
plt.title('Share Highest Price Over Time')
plt.show()
Output graph
References
• Matplotlib documentation — Matplotlib 3.8.3
documentation
• pandas.DataFrame.plot
— pandas 2.2.1 documentation (pydata.org)
• Matplotlib Tutorial – GeeksforGeeks
• NPTEL :: Computer Science and Engineering -
NOC:Python for Data Science

High Society - 1-54!1!22
0% (1)
High Society - 1-54!1!22
22 pages
Konecranes Brochure CLX Chain Hoist Crane en 2014 2
No ratings yet
Konecranes Brochure CLX Chain Hoist Crane en 2014 2
12 pages
Deloitte Interview Questions
0% (1)
Deloitte Interview Questions
12 pages
19_Matplotlib
No ratings yet
19_Matplotlib
26 pages
Data Visualization Using Matplotlib and Seaborn
No ratings yet
Data Visualization Using Matplotlib and Seaborn
28 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
34 pages
Datascienece
No ratings yet
Datascienece
18 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
Matplotlib in Python
No ratings yet
Matplotlib in Python
43 pages
Data Visualization
No ratings yet
Data Visualization
26 pages
Data Visualization - 1 by Matplot Lib
No ratings yet
Data Visualization - 1 by Matplot Lib
19 pages
Unit3_4) Matplotlib and seaborn.ipynb - Colab
No ratings yet
Unit3_4) Matplotlib and seaborn.ipynb - Colab
5 pages
Unit 4 (2) Python
No ratings yet
Unit 4 (2) Python
27 pages
Data Visulation
No ratings yet
Data Visulation
8 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
Notes9_Class_10_Data Visualization using MatPlotlib Notes
No ratings yet
Notes9_Class_10_Data Visualization using MatPlotlib Notes
5 pages
Data Visualization
No ratings yet
Data Visualization
18 pages
Introduction To Matplotlib Using Python For Beginners
No ratings yet
Introduction To Matplotlib Using Python For Beginners
14 pages
Data Visualization Using Pyplot: Submitted by
No ratings yet
Data Visualization Using Pyplot: Submitted by
27 pages
Wa0029.
No ratings yet
Wa0029.
16 pages
Data Visualization - Matplotlib PDF
100% (1)
Data Visualization - Matplotlib PDF
15 pages
Data Visual Iz
No ratings yet
Data Visual Iz
54 pages
Python Plots
No ratings yet
Python Plots
47 pages
Unit 3 CHP 1
No ratings yet
Unit 3 CHP 1
18 pages
XII IP CH 3 Plotting With Pyplot
No ratings yet
XII IP CH 3 Plotting With Pyplot
52 pages
Data Visualization
No ratings yet
Data Visualization
17 pages
Data Visualization using Matplotlib in Python
No ratings yet
Data Visualization using Matplotlib in Python
15 pages
Matplotlib Notes
No ratings yet
Matplotlib Notes
5 pages
CHAPTER-2 Data Visualization
No ratings yet
CHAPTER-2 Data Visualization
4 pages
L34, 35 Matplotlib
No ratings yet
L34, 35 Matplotlib
4 pages
Chapter1.3 - Data Visualization
No ratings yet
Chapter1.3 - Data Visualization
27 pages
Matplotlib
No ratings yet
Matplotlib
13 pages
Matplot Lib Practicals
No ratings yet
Matplot Lib Practicals
24 pages
IEAS W Data Visualization
No ratings yet
IEAS W Data Visualization
27 pages
BarPlot and Histogram
No ratings yet
BarPlot and Histogram
28 pages
lecture-week3
No ratings yet
lecture-week3
51 pages
Data Visualization
No ratings yet
Data Visualization
35 pages
Data Visualization Using Matplotlib
No ratings yet
Data Visualization Using Matplotlib
10 pages
Matplotlib Plots
No ratings yet
Matplotlib Plots
13 pages
Data Visualization Python Tutorial
No ratings yet
Data Visualization Python Tutorial
9 pages
Unit 5
No ratings yet
Unit 5
10 pages
Basic_Plotting
No ratings yet
Basic_Plotting
6 pages
Lab 10
No ratings yet
Lab 10
16 pages
graphs using matplotlib
No ratings yet
graphs using matplotlib
23 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
36 pages
a9bf73_Introduction to Matplotlib
No ratings yet
a9bf73_Introduction to Matplotlib
18 pages
Data Visualization with Python
No ratings yet
Data Visualization with Python
42 pages
Data Science Unit 2-11-08 2023
No ratings yet
Data Science Unit 2-11-08 2023
78 pages
Matplotlib_Functions
No ratings yet
Matplotlib_Functions
32 pages
Unit 1 - Chap 2 - Data Visualisation
No ratings yet
Unit 1 - Chap 2 - Data Visualisation
29 pages
Data Visualization Using Matplotlib
No ratings yet
Data Visualization Using Matplotlib
30 pages
Advanced Python Chap 3 Part 1
No ratings yet
Advanced Python Chap 3 Part 1
49 pages
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
No ratings yet
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
6 pages
Machinelearning Prac
No ratings yet
Machinelearning Prac
17 pages
UNIT3 (1)
No ratings yet
UNIT3 (1)
60 pages
XII-DataVisualization
No ratings yet
XII-DataVisualization
34 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
21 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
18 pages
Chapter 4 Plotting Data using Matplotlib
No ratings yet
Chapter 4 Plotting Data using Matplotlib
11 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
13 pages
Matplotlib
No ratings yet
Matplotlib
9 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
My College Entrance Exam Experience
No ratings yet
My College Entrance Exam Experience
2 pages
Rebar IDand Manufacturer Data Sheets
No ratings yet
Rebar IDand Manufacturer Data Sheets
8 pages
Nutrition in Pregnancy and Lactation
No ratings yet
Nutrition in Pregnancy and Lactation
84 pages
Foundation Engineering2
No ratings yet
Foundation Engineering2
8 pages
Linkedin - HR - Sample Resume
No ratings yet
Linkedin - HR - Sample Resume
3 pages
Instruction Manual: Two-Party Home Blood Pressure Kit
No ratings yet
Instruction Manual: Two-Party Home Blood Pressure Kit
7 pages
Roadmap
No ratings yet
Roadmap
1 page
RK816 Datasheet V1.3
100% (1)
RK816 Datasheet V1.3
78 pages
A Review On TAM and TOE Framework Progression and How These Models Integrate
No ratings yet
A Review On TAM and TOE Framework Progression and How These Models Integrate
9 pages
Audio 2.12 Transcription Touchstone 1
No ratings yet
Audio 2.12 Transcription Touchstone 1
3 pages
Updated Cad Cam Lab Manual
No ratings yet
Updated Cad Cam Lab Manual
16 pages
MQTT - A Practical Protocol For The Internet of Things
No ratings yet
MQTT - A Practical Protocol For The Internet of Things
40 pages
DLL_TLE HE 6_Q3_W3
No ratings yet
DLL_TLE HE 6_Q3_W3
14 pages
Activity #4 -
No ratings yet
Activity #4 -
3 pages
Hotel Rahul Jabalpur
No ratings yet
Hotel Rahul Jabalpur
1 page
Chap 2-New Modern Trade Theories
No ratings yet
Chap 2-New Modern Trade Theories
53 pages
Chambers - Alternative Legal Service Providers 2021 - 1-Contract-Lifecycle-Managment
No ratings yet
Chambers - Alternative Legal Service Providers 2021 - 1-Contract-Lifecycle-Managment
6 pages
Klubersynth CH 2-100 N
No ratings yet
Klubersynth CH 2-100 N
16 pages
GP Multiplication
No ratings yet
GP Multiplication
8 pages
Description of The Proposed System: Hardware Requirements
No ratings yet
Description of The Proposed System: Hardware Requirements
1 page
21 Tara Commentary
100% (2)
21 Tara Commentary
60 pages
Quantum Physics 2-1
No ratings yet
Quantum Physics 2-1
15 pages
Derickman Competencies
No ratings yet
Derickman Competencies
3 pages
Transpo Pages 8 and 9
No ratings yet
Transpo Pages 8 and 9
34 pages
V5 Process Technology 3
100% (1)
V5 Process Technology 3
449 pages
Activity 1 - Word Scramble
No ratings yet
Activity 1 - Word Scramble
4 pages
淘气豆英语说明书
No ratings yet
淘气豆英语说明书
5 pages