Unit 6 Data Visualization-1
Unit 6 Data Visualization-1
Unit 6 Data Visualization-1
Data visualization is the process of representing data using visual elements like charts, graphs, etc.
that helps in deriving meaningful insights from the data. It is aimed at revealing the information
behind the data and further aids the viewer in seeing the structure in the data.
• Seaborn
• pandas
• Bokeh
• Plotly
• ggplot
• pygal
Introduction to Matplotlib
Matplotlib is the basic plotting library of Python programming language. It is the most prominent
tool among Python visualization packages. Matplotlib is highly efficient in performing wide range of
tasks. It can produce publication quality figures in a variety of formats. It can export visualizations to
all of the common formats like PDF, SVG, JPG, PNG, BMP and GIF. It can create popular visualization
types – line plot, scatter plot, histogram, bar chart, error charts, pie chart, box plot, and many more
types of plot. Matplotlib also supports 3D plotting. Many Python libraries are built on top of
Matplotlib. For example, pandas and Seaborn are built on Matplotlib. They allow to access
Matplotlib’s methods with less code.
Import Matplotlib
Before, we need to actually start using Matplotlib, we need to import it. We can import Matplotlib as
follows:-
import matplotlib
Most of the time, we have to work with pyplot interface of Matplotlib. So, I will import pyplot
interface of Matplotlib as follows:-
import matplotlib.pyplot
To make things even simpler, we will use standard shorthand for Matplotlib imports as follows:-
import numpy as np
plt.plot([1, 2, 3, 4])
plt.show()
You may be wondering why the x-axis ranges from 0-3 and the y-axis from 1-4 . If you provide a
single list or array to plot, matplotlib assumes it is a sequence of y values, and automatically
generates the x values for you. Since python ranges start with 0 , the default x vector has the same
length as y but starts with 0; therefore, the x data are [0, 1, 2, 3] .
plot is a versatile function, and will take an arbitrary number of arguments. For example, to plot x
versus y, you can write:
plt.show()
plt.plot(x1, np.sin(x1),'b--')
plt.plot(x1, np.cos(x1), 'ro');
plt.ylabel('trigonometric values') # y label
plt.xlabel('x-Values') # x label
plt.axis((0, 15,-1.5,1.5))
plt.show()
In [33]: x1 = np.linspace(0, 10, 100)
plt.plot(x1, np.sin(x1),'b--')
plt.plot(x1, np.cos(x1), 'bs')# blue square
plt.plot(x1,np.tan(x1), 'g^') # green triangles
plt.ylabel('trigonometric values') # y label
plt.xlabel('x-Values') # x label
plt.axis((0, 15,-1.5,1.5))
plt.show()
subplots
Creating multiple subplots using plt.subplots
Pie Chart
Creating Pie Charts
With Pyplot, you can use the pie() function to draw pie charts:
plt.pie(y)
plt.pie(y,labels=mylabels)
plt.show()
# By default the plotting of the first wedge starts from the x-axis and moves counterclo
In [111… import matplotlib.pyplot as plt
import numpy as np
Bar Graph
In [8]: import matplotlib.pyplot as plt
bar_width = 0.35
index = np.arange(len(months))
plt.xlabel('Months')
plt.ylabel('Sales (in lakh rupees)')
plt.title('Comparison of Product Sales in Different Months')
plt.xticks(index + bar_width / 2, months)
plt.legend()
plt.show()
In [10]: import matplotlib.pyplot as plt
plt.xlabel('Expense Categories')
plt.ylabel('Expenses (in lakh rupees)')
plt.title('Stacked Bar Graph of Expenses')
plt.legend()
plt.show()
In [116… import matplotlib.pyplot as plt
plt.barh(items, values)
plt.xlabel('Values (in units)')
plt.ylabel('Items')
plt.title('Horizontal Bar Graph')
plt.show()
Histogram
A histogram is a graph showing frequency distributions.
Create Histogram
In Matplotlib, we use the hist() function to create histograms.
The hist() function will use an array of numbers to create a histogram, the array is sent into the
function as an argument.
For simplicity we use NumPy to randomly generate an array with 250 values, where the values will
concentrate around 170, and the standard deviation is 10.
print(x)
plt.hist(x)
plt.show()
You can read from the histogram that there are approximately:
However, we can use the following syntax to change the fill color to light blue and the edge color to
red:
You can also use the lw argument to modify the line width for the edges of the histogram:
plt.hist(x,color='lightblue',ec='red',lw=5)
plt.show()
mens_age = 18, 19, 20, 21, 22, 23, 24, 25, 26, 27
female_age = 22, 28, 30, 30, 12, 33, 41, 22, 43, 18
The scatter() function plots one dot for each observation. It needs two arrays of the same length,
one for the values of the x-axis, and one for values on the y-axis:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
Compare Plots
In the example above, there seems to be a relationship between speed and age, but what if we plot
the observations from another day as well? Will the scatter plot tell us something else?
plt.show()
By comparing the two plots, I think it is safe to say that they both gives us the same conclusion: the
newer the car, the faster it drives.
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array(["red","green","blue","yellow","pink","black","orange","purple","beige
plt.scatter(x, y, c=colors)
plt.show()
color map
The Matplotlib module has a number of available colormaps.
A colormap is like a list of colors, where each color has a value that ranges from 0 to 100.
This colormap is called 'viridis' and as you can see it ranges from 0, which is a purple color, up to 100,
which is a yellow color.
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100])
plt.scatter(x, y, c=colors, cmap='viridis') # instead of 'viridis' you can also use, 'Ac
# 'PuBu','Spectral','flag','winter', etc....
plt.colorbar() # adds the colorbar also
plt.show()
You can combine a colormap with different sizes of the dots. This is best visualized if the dots are
transparent:
Create random arrays with 100 values for x-points, y-points, colors and sizes:
x = np.random.randint(100, size=(100))
y = np.random.randint(100, size=(100))
colors = np.random.randint(100, size=(100))
sizes = 10 * np.random.randint(100, size=(100))
plt.colorbar()
plt.show()
Stem and Leaf plot
A stem-and-leaf plot is a data visualization tool used to represent the distribution of a dataset while
preserving individual data points. It provides a way to organize and display the values in a dataset,
making it easier to understand the data's distribution and characteristics. Stem-and-leaf plots are
particularly useful for small to moderate-sized datasets.
let's consider a dataset representing the tensile strengths (in MPa) of steel samples tested in a
mechanical engineering laboratory. The data set is: [63,68,71,72,75,76,78,80,82,85,88,92,94,97,102]
6|38
7|12568
8|0258
9|247
10 | 2
In this example:
The stems (left column) represent the tens place of each data point.
The leaves (right column) represent the ones place of each data point.
For instance, the first row indicates that there are two data points with a tensile strength in the 60s:
63 and 68. The second row shows five data points in the 70s, and so on.
This stem-and-leaf plot helps mechanical engineers quickly visualize the distribution of tensile
strengths in the dataset and identify patterns or outliers.
values = [63,68,71,72,75,76,78,80,82,85,88,92,94,97,102]
stems = [value//10 for value in values]
leaves= [value%10 for value in values]
plt.stem(stems, leaves)
plt.title('Stem-and-Leaf Plot for Tensile Strengths')
plt.xlabel('Stems')
plt.ylabel('Leaves')
plt.show()
# Data
data = [63, 68, 71, 72, 75, 76, 78, 80, 82, 85, 88, 92, 94, 97, 102]
for i in range(len(stems)):
stem = stems[i]
leaf_values = [stem * 10 + leaf for leaf in leaves[i]]
ax.scatter([stem] * len(leaves[i]), leaf_values, label=f'Stem {stem}', s=50)
ax.legend()
plt.show()
In [79]: import matplotlib.pyplot as plt
data = [234, 345, 567, 789, 134, 456, 678, 912, 123, 456]
fig, ax = plt.subplots()
markerline, stemlines, baseline = ax.stem(data)
# Customize plot
plt.setp(markerline, marker='o', markersize=8, markerfacecolor='red', markeredgecolor='b
plt.setp(stemlines, linestyle='dashed', color='blue')
plt.setp(baseline, linestyle='dotted', color='green')
plt.show()
Boxplot
What is Box Plot?
A Box plot is a way to visualize the distribution of the data by using a box and some vertical lines. It
is known as the whisker plot. The data can be distributed between five key ranges, which are as
follows:
Minimum: Q1-1.5*IQR
1st quartile (Q1): 25th percentile
Median:50th percentile
3rd quartile(Q3):75th percentile
Maximum: Q3+1.5*IQR
Here IQR represents the InterQuartile Range which starts from the first quartile (Q1) and ends at the
third quartile (Q3).
Syntax
matplotlib.pyplot.boxplot(data,notch=none,vert=none,patch_artist,widths=none)
np.random.seed(10)
dataSet1 = np.random.normal(100, 10, 220)
dataSet2 = np.random.normal(80, 20, 200)
dataSet3 = np.random.normal(60, 35, 220)
dataSet4 = np.random.normal(50, 40, 200)
dataSet = [dataSet1, dataSet2, dataSet3, dataSet4]
P = float(input('load = '))
u1 = input('load unit = ')
L = float(input('Length of the beam = '))
u2 = input('length unit = ')
a = float(input('Distance of Point load from left end = '))
b = L - a
R1 = P*b/L
R2 = P - R1
R1 = round(R1, 3)
R2 = round(R2, 3)
l = np.linspace(0, L, 1000)
X = []
SF = []
M = []
maxBM= float()
for x in l:
if x <= a:
m = R1*x
sf = R1
elif x > a:
m = R1*x - P*(x-a)
sf = -R2
M.append(m)
X.append(x)
SF.append(sf)
plt.plot(X, SF)
plt.title("SFD")
plt.xlabel("Length in m")
plt.ylabel("Shear Force")
plt.show()
plt.plot(X, M)
plt.title("BMD")
plt.xlabel("Length in m")
plt.ylabel("Bending Moment")
plt.show()
load = 10
load unit = Kn
Length of the beam = 100
length unit = m
Distance of Point load from left end = 50
Shear Force and Bending Moment diagram (detailed)
In [122… import numpy as np
import matplotlib.pyplot as plt
P = float(input('load = '))
u1 = input('load unit = ')
L = float(input('Length of the beam = '))
u2 = input('length unit = ')
a = float(input('Distance of Point load from left end = '))
b = L - a
R1 = P*b/L
R2 = P - R1
R1 = round(R1, 3)
R2 = round(R2, 3)
print(f'''
As per the static equilibrium, net moment sum at either end is zero,
hence Reaction R1 = P*b/L = {R1} {u1},
Also Net sum of vertical forces is zero,
hence R1+R2 = P, R2 = P - R1 = {R2} {u1}.
''')
l = np.linspace(0, L, 1000)
X = []
SF = []
M = []
maxBM= float()
for x in l:
if x <= a:
m = R1*x
sf = R1
elif x > a:
m = R1*x - P*(x-a)
sf = -R2
M.append(m)
X.append(x)
SF.append(sf)
print(f'''
Shear Force at x (x<{a}), Vx = R1 ={R1} {u1}
at x (x>{a}), SF = R1 - P = {R1} - {P} = -{R1-P} {u1}
for k in M:
if maxBM < k:
maxBM = k
print(f'maximum BM, Mmax = {round(maxBM, 3)} {u1}{u2}')
Mx = float()
for x in l:
if x<a:
Mx = R1*x
if maxBM == Mx:
print(f'maximum BM at x = {round(x,3)} {u2}')
elif x>=a:
Mx = R1*x - P*(x- a)
if maxBM == Mx:
print(f'maximum BM at x = {round(x,3)} {u2}')
plt.plot(X, SF)
plt.plot([0, L], [0, 0])
plt.plot([0, 0], [0, R1], [L, L], [0, -R2])
plt.title("SFD")
plt.xlabel("Length in m")
plt.ylabel("Shear Force")
plt.show()
plt.plot(X, M)
plt.plot([0, L], [0, 0])
plt.title("BMD")
plt.xlabel("Length in m")
plt.ylabel("Bending Moment")
plt.show()
load = 10
load unit = 12
Length of the beam = 4
length unit = 1
Distance of Point load from left end = 1
As per the static equilibrium, net moment sum at either end is zero,
hence Reaction R1 = P*b/L = 7.5 12,
Also Net sum of vertical forces is zero,
hence R1+R2 = P, R2 = P - R1 = 2.5 12.
In [ ]:
Assignment 6
(Students are advoiced to use their imagination and are
allowed to modify program based on their interest without
compromising the overall theme and purpose of the
assignment)
1. Given a set of temperature values for each day, create a line plot using Matplotlib. Customize the
plot with a title, xlabel, ylabel, and a blue color line.
2. Plot two different datasets (eg. test score of 20 students and their deviation from mean value) in
a 1x2 grid of subplots. Use a line plot for the first subplot and a scatter plot for the second
subplot. Customize each subplot with appropriate titles and labels.
3. Create a bar chart representing the sales of quantity of different buiscuit brands in a store.
Customize the bar colors to be different shades of green, and add a legend to the plot.
4. Represent the percentage distribution of expenses in a monthly budget using a pie chart.
Customize the colors and explode the slice corresponding to the highest expense category.
5. generate random weight of 500 people in Kg and, create a histogram of these weights.
Customize the color of the bars and add a title and labels to the plot. also draw conclusions of
these data.
6. Take any unique example/problems from any of your subjects. write a python program, to get
results of the problem and plot your results.
In [ ]: