Data visualization
Part I
In this lecture
We will learn how to create basic plots using matplotlib library
• Scatter plot
• Histogram
• Bar plot
Python for Data Science 2
Data Visualization
• Data visualization allows us to quickly interpret the data
and adjust different variables to see their effect
• Technology is increasingly making it easier for us to do so
Why visualize data?
o Observe the patterns
o Identify extreme values that could be anomalies
o Easy interpretation
Python for Data Science 3
Popular plotting libraries in Python
Python offers multiple graphing libraries that offers diverse
features
• matplotlib • to create 2D graphs and plots
• pandas visualization • easy to use interface, built on
Matplotlib
• seaborn • provides a high-level interface
for drawing attractive and
informative statistical graphics
• ggplot • based on R’s ggplot2, uses
Grammar of Graphics
• plotly • can create interactive plots
Python for Data Science 4
Matplotlib
• Matplotlib is a 2D plotting library which
produces good quality figures
• Although it has its origins in emulating the
MATLAB graphics commands, it is independent
of MATLAB
• It makes heavy use of NumPy and other
extension code to provide good performance
even for large arrays
Python for Data Science 5
Scatter plot
Python for Data Science 6
Scatter Plot
What is a scatter plot?
• A scatter plot is a set of points that represents
the values obtained for two different variables
plotted on a horizontal and vertical axes
When to use scatter plots?
• Scatter plots are used to convey the relationship
between two numerical variables
• Scatter plots are sometimes called correlation
plots because they show how two variables are
correlated
Python for Data Science 7
Importing data into Spyder
Importing necessary libraries
‘pandas’ library to work with dataframes
‘numpy’ library to do numerical operations
‘matplotlib’ library to do visualization
Python for Data Science 8
Importing data into Spyder
Importing data
Removing missing values from the dataframe
Python for Data Science 9
Scatter plot
x y
Python for Data Science 10
Scatter plot
The price of the car decreases as age of the car increases
Python for Data Science 11
Histogram
Python for Data Science 12
Histogram
What is a histogram?
• It is a graphical representation of data using
bars of different heights
• Histogram groups numbers into ranges and
the height of each bar depicts the frequency
of each range or bin
When to use histograms?
• To represent the frequency distribution of
numerical variables
Python for Data Science 13
Histogram
x
Histogram with default arguments
Python for Data Science 14
Histogram
Python for Data Science 15
Histogram
Frequency distribution of kilometre of the cars shows that
most of the cars have travelled between 50000 – 100000 km
and there are only few cars with more distance travelled
Python for Data Science 16
Bar plot
Python for Data Science 17
Bar plot
What is a bar plot?
• A bar plot is a plot that presents categorical
data with rectangular bars with lengths
proportional to the counts that they
represent
When to use bar plot?
• To represent the frequency distribution of
categorical variables
• A bar diagram makes it easy to compare sets
of data between different groups
Python for Data Science 18
Bar plot
x height of the bars
Python for Data Science 19
Bar plot
Frequency distribution of fuel type
Python for Data Science 20
Bar plot
x height of the bars
Set the labels of the xticks
Set the location of the xticks
Python for Data Science 21
Bar plot
Bar plot of fuel type shows that most of the cars have petrol as
fuel type
Python for Data Science 22
Summary
We have learnt how to create basic plots using matplotlib library
• Scatter plot
• Histogram
• Bar plot
Python for Data Science 23
THANK YOU