Data Visualization with
Python
Using library matplotlib and seaborn
Slides are taken and adapted from slides deck by Dr. Ziad Al-Sharif, Jordan University of Science and Technology
What is data visualization?
• Data visualization is the graphical representation of information and data.
– Can be achieved using visual elements like figures, charts, graphs, maps, and more.
• Data visualization tools provide a way to present these figures and graphs.
• Often, it is essential to analyze massive amounts of information and make
data-driven decisions.
– converting complex data into an easy to understand representation.
Matplotlib
• Matplotlib is one of the most powerful tools for data visualization in
Python.
• Matplotlib is an incredibly powerful (and beautiful!) 2-D plotting
library.
– It is easy to use and provides a huge number of examples for tackling unique
problems
Matplotlib
• Matplotlib allows you to make easy things
• You can generate plots, histograms, power
spectra, bar charts, errorcharts, scatterplots,
etc., with just a few lines of code.
Matplotlib pyplot
• matplotlib.pyplot is a module within the Matplotlib library in Python, providing a state-
based interface for creating various types of plots and visualizations
• Each pyplot function makes some change to the figure:
– e.g.,
• creates a figure,
• creates a plotting area in the figure,
• plots some lines in the plotting area,
• decorates the plot with labels, etc.
• Whenever you plot with matplotlib, the two main code lines should be
considered:
– Type of graph
• this is where you define a bar chart, line chart, etc.
– Show the graph
• this is to display the graph
pyplot
• text() : adds text in an arbitrary location
• xlabel(): adds text to the x-axis
• ylabel(): adds text to the y-axis
• title() : adds title to the plot
• clear() : removes all plots from the axes.
• savefig(): saves your figure to a file
• legend() : shows a legend on the plot
All methods are available on pyplot and on the axes instance
generally.
Line Graphs
import matplotlib.pyplot as plt
#create data for plotting
x_values = [0, 1, 2, 3, 4, 5 ]
y_values = [0, 1, 4, 9, 16,25]
#the default graph style for plot is a line
plt.plot(x_values, y_values)
#display the graph
plt.show()
Simple line
# importing the required module
import matplotlib.pyplot as plt
# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]
# plotting the points
plt.plot(x, y)
# naming the x axis
plt.xlabel('x - axis')
# naming the y axis • Define the x-axis and corresponding y-axis
plt.ylabel('y - axis') values as lists.
• Plot them on canvas using .plot() function.
# giving a title to my graph • Give a name to x-axis and y-axis using .xlabel()
plt.title('My first graph!')
and .ylabel() functions.
# function to show the plot
• Give a title to your plot using .title() function.
plt.show() • Finally, to view your plot, we use .show()
function.
import matplotlib.pyplot as plt
# line 1 points Simple 2 lines
x1 = [1,2,3]
y1 = [2,4,1]
# plotting the line 1 points
plt.plot(x1, y1, label="line 1")
# line 2 points
x2 = [1,2,3]
y2 = [4,1,3]
# plotting the line 2 points
plt.plot(x2, y2, label = "line 2")
# naming the x axis
plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis') • Here, we plot two lines on same graph. We
# giving a title to my graph differentiate between them by giving them a
plt.title('Two lines on same graph!') name(label) which is passed as an argument of
.plot() function.
# show a legend on the plot • The small rectangular box giving information
plt.legend() about type of line and its color is called legend.
We can add a legend to our plot using
# function to show the plot .legend() function.
plt.show()
Bar graphs
import matplotlib.pyplot as plt
#Create data for plotting
values = [5, 6, 3, 7, 2]
names = ["A", "B", "C", "D", "E"]
plt.bar(names, values, color="green")
plt.show()
• When using a bar graph, the change in code will be from
plt.plot() to plt.bar() changes it into a bar chart.
Bar graphs
We can also flip the bar graph horizontally with the following
import matplotlib.pyplot as plt
#Create data for plotting
values = [5,6,3,7,2]
names = ["A", "B", "C", "D", "E"]
# Adding an "h" after bar will flip the graph
plt.barh(names, values, color="yellowgreen")
plt.show()
Histogram
import matplotlib.pyplot as plt
# frequencies
ages=[2,5,70,40,30,45,50,45,43,40,44,60,7,13,57,18,90,77,32,21,20,40]
# setting the ranges and no. of intervals
range = (0, 100)
bins = 10
# plotting a histogram
plt.hist(ages, bins, range, color='green',histtype='bar',rwidth=0.8)
# x-axis label
plt.xlabel('age')
# frequency label
plt.ylabel('No. of people')
# plot title
plt.title('My histogram')
# function to show the plot
plt.show()
Pie-chart
import matplotlib.pyplot as plt
# defining labels
activities = ['eat', 'sleep', 'work', 'play']
# portion covered by each label
slices = [3, 7, 8, 6]
# color for each label
colors = ['r', 'y', 'g', 'b']
# plotting the pie chart
plt.pie(slices, labels = activities, colors=colors,
startangle=90, shadow = True, explode = (0, 0, 0.1, 0),
radius = 1.2, autopct = '%1.1f%%')
# plotting legend
plt.legend()
# showing the plot
plt.show()
Seaborn
• Provides beautiful default styles and color palettes to make statistical
plots more attractive.
• Built on top matplotlib library and is also closely integrated with the
data structures from pandas.
Seaborn – countplot()
• Used to display the counts of observations in categorical data.
• It shows the distribution of a single categorical variable or the
relationship between two categorical variables by creating a bar plot.
import seaborn as sns
import matplotlib.pyplot as plt
# read a tips.csv file from seaborn library
df = sns.load_dataset('tips')
# count plot on single categorical variable
sns.countplot(x ='sex', data = df)
plt.show()
Seaborn – scatterplot()
• Allows one to plot the relationship between x- and y- variables.
• It further allows one to view relationships between those variables by
setting various parameters.
import seaborn as sns
sns.set(style='whitegrid’)
fmri = sns.load_dataset("fmri")
sns.scatterplot(x="timepoint", y="signal", data=fmri)
Seaborn – heatmap()
• A heatmap is a graphical representation of data where individual
values are represented by color intensity.
• Used to identify patterns, correlations and trends within a dataset.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Generating a 10x10 matrix of random numbers
data = np.random.randint(1, 100, (10, 10))
sns.heatmap(data)
plt.show()