Unit 6 Data Visualization-1

Unit 6 Data Visualization
“A picture is worth a thousand words”
Data visualization is the process of representing data using visual elements like charts, graphs, etc.
that helps in deriving meaningful insights from the data. It is aimed at revealing the information
behind the data and further aids the viewer in seeing the structure in the data.
Need for visualizing data :

Understand the trends and patterns of data
Analyze the frequency and other such characteristics of data
Know the distribution of the variables in the data.
Visualize the relationship that may exist between different variables
Overview of Python Visualization Tools

Python is the preferred language of choice for data scientists. Python have multiple options for data
visualization. It has several tools which can help us to visualize the data more effectively. These
Python data visualization tools are as follows:-
• Matplotlib (We will primarily focus on this library only)
• Seaborn
• pandas
• Bokeh
• Plotly
• ggplot
• pygal
Introduction to Matplotlib
Matplotlib is the basic plotting library of Python programming language. It is the most prominent
tool among Python visualization packages. Matplotlib is highly efficient in performing wide range of
tasks. It can produce publication quality figures in a variety of formats. It can export visualizations to
all of the common formats like PDF, SVG, JPG, PNG, BMP and GIF. It can create popular visualization
types – line plot, scatter plot, histogram, bar chart, error charts, pie chart, box plot, and many more
types of plot. Matplotlib also supports 3D plotting. Many Python libraries are built on top of
Matplotlib. For example, pandas and Seaborn are built on Matplotlib. They allow to access
Matplotlib’s methods with less code.
Import Matplotlib
Before, we need to actually start using Matplotlib, we need to import it. We can import Matplotlib as
follows:-
import matplotlib
Most of the time, we have to work with pyplot interface of Matplotlib. So, I will import pyplot
interface of Matplotlib as follows:-
import matplotlib.pyplot
To make things even simpler, we will use standard shorthand for Matplotlib imports as follows:-
import matplotlib.pyplot as plt
In [1]: # Import dependencies
import numpy as np
In [2]: # Import Matplotlib
In [4]: import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4])
plt.show()
You may be wondering why the x-axis ranges from 0-3 and the y-axis from 1-4 . If you provide a
single list or array to plot, matplotlib assumes it is a sequence of y values, and automatically
generates the x values for you. Since python ranges start with 0 , the default x vector has the same
length as y but starts with 0; therefore, the x data are [0, 1, 2, 3] .
plot is a versatile function, and will take an arbitrary number of arguments. For example, to plot x
versus y, you can write:
In [22]: plt.plot([1, 2, 3, 4], [1, 4, 9, 16])

[<matplotlib.lines.Line2D at 0x20c598eb0a0>]
Out[22]:
Formatting the style of your plot
For every x, y pair of arguments, there is an optional third argument which is the format string that
indicates the color and line type of the plot. The letters and symbols of the format string are from
MATLAB, and you concatenate a color string with a line style string. The default format string is 'b-
' , which is a solid blue line. For example, to plot the above with red circles, you would issue
In [5]: plt.plot([1, 2, 3, 4], [1, 4, 9, 16], 'ro')

plt.show()
In [14]: # lets understand formatting the plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], 'ro')
plt.axis((0, 6, 0, 20)) # ------- for x and y axis length [xmin, xmax, ymin, ymax]
plt.xlabel("its just number")
plt.ylabel("its their square")
plt.show()
In [26]: # using numpy linspace() to create 100 values between 0,10

x1 = np.linspace(0, 10, 100)
plt.plot(x1, np.sin(x1),'b--')
plt.plot(x1, np.cos(x1), 'ro');
plt.show()
In [30]: x1 = np.linspace(0, 10, 100)
plt.plot(x1, np.cos(x1), 'ro');
plt.ylabel('trigonometric values') # y label
plt.xlabel('x-Values') # x label
plt.axis((0, 15,-1.5,1.5))
plt.show()
In [33]: x1 = np.linspace(0, 10, 100)
plt.plot(x1, np.cos(x1), 'bs')# blue square
plt.plot(x1,np.tan(x1), 'g^') # green triangles
plt.ylabel('trigonometric values') # y label
plt.xlabel('x-Values') # x label
plt.axis((0, 15,-1.5,1.5))
plt.show()
subplots
Creating multiple subplots using plt.subplots
In [118… import matplotlib.pyplot as plt

# data to display on plots
x = [3, 1, 3]
y = [3, 2, 1]
z = [1, 3, 1]
# adding the subplots

axes1 = plt.subplot2grid (
(7, 1), (0, 0), rowspan = 2, colspan = 1)
(7, 1), (2, 0), rowspan = 2, colspan = 1)
(7, 1), (4, 0), rowspan = 2, colspan = 1)
# plotting the data

axes1.plot(x, y)
axes2.plot(x, z)
axes3.plot(z, y)
plt.show()
Pie Chart
Creating Pie Charts
With Pyplot, you can use the pie() function to draw pie charts:

import numpy as np
y = np.array([75, 55, 45, 15])

mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
plt.pie(y)
plt.pie(y,labels=mylabels)
plt.show()
# By default the plotting of the first wedge starts from the x-axis and moves counterclo
import numpy as np
y = np.array([75, 55, 45, 15])

mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
myexplode=[0.2,0,0,0]
plt.pie(y)
plt.pie(y,labels=mylabels,explode=myexplode)
plt.legend()
# plt.legend(title = "Four Fruits:")
plt.show()
Bar Graph
categories = ['A', 'B', 'C']

sales = [50, 75, 30]
plt.bar(categories, sales)
plt.xlabel('Products')
plt.ylabel('Sales (in lakh rupees)')
plt.title('Monthly Sales of Products')
plt.show()

import numpy as np
months = ['Jan', 'Feb']

sales_A = [50, 60]
sales_B = [75, 80]
bar_width = 0.35
index = np.arange(len(months))
plt.bar(index, sales_A, width=bar_width, label='Product A')

plt.bar(index + bar_width, sales_B, width=bar_width, label='Product B')
plt.xlabel('Months')
plt.ylabel('Sales (in lakh rupees)')
plt.title('Comparison of Product Sales in Different Months')
plt.xticks(index + bar_width / 2, months)
plt.legend()
plt.show()
categories = ['Rent', 'Utilities']

expenses_A = [800, 200]
expenses_B = [600, 300]
plt.bar(categories, expenses_A, label='Product A')

plt.bar(categories, expenses_B, bottom=expenses_A, label='Product B')
plt.xlabel('Expense Categories')
plt.ylabel('Expenses (in lakh rupees)')
plt.title('Stacked Bar Graph of Expenses')
plt.legend()
plt.show()
items = ['Category 1', 'Category 2', 'Category 3']

values = [30, 50, 20]
plt.barh(items, values)
plt.xlabel('Values (in units)')
plt.ylabel('Items')
plt.title('Horizontal Bar Graph')
plt.show()
Histogram
A histogram is a graph showing frequency distributions.
It is a graph showing the number of observations within each given interval.
Create Histogram
In Matplotlib, we use the hist() function to create histograms.
The hist() function will use an array of numbers to create a histogram, the array is sent into the
function as an argument.
For simplicity we use NumPy to randomly generate an array with 250 values, where the values will
concentrate around 170, and the standard deviation is 10.
In [34]: import numpy as np
x = np.random.normal(170, 10, 250)
print(x)
[183.09922342 158.47198312 175.52499894 171.34546011 149.88785534

159.56470407 175.44682349 181.2396453 158.56297093 173.18727414
174.69953871 167.18478837 153.80870054 181.27396457 172.13675712
166.01046458 174.32725344 162.10699526 162.95132844 178.89941727
175.64411489 170.8665408 166.00476147 177.95917728 171.92016993
166.93658669 180.65072656 163.79260598 185.75733706 163.57159514
172.65968283 168.60470941 198.8501579 174.36214452 179.76772442
192.72023056 186.84851891 181.42850929 162.47596697 179.07031734
182.71395541 181.8583888 163.9306459 180.04605611 162.19063055
162.01084112 179.05421559 161.38007381 157.79588552 156.49416731
166.89899191 195.76681103 188.08466309 165.08852754 154.25844766
158.94907383 155.13993202 162.04076262 170.20140676 185.29894705
167.02739528 148.67365023 163.38232363 165.70485569 167.45516941
159.43429386 165.9976339 174.43662629 172.36069381 185.27282789
167.68213965 157.31368469 168.47400604 159.23416703 175.56013013
167.38944371 182.72932947 174.95081787 154.90093681 167.12293982
166.55157356 186.3610084 169.46965055 176.50978197 154.13995104
175.22405817 179.69354423 166.91938084 172.1491547 176.32149502
204.65354636 162.62972983 181.76868235 156.31378402 195.35431062
172.94749475 158.32778165 177.26250969 164.0935331 177.8875382
164.60661007 165.63484377 155.89970644 176.47771135 161.86902219
173.61766058 157.95198704 157.14352749 178.26186679 164.83154356
166.96897224 155.65710978 164.60200619 176.34798453 181.06624493
183.42561355 165.66158156 171.86133988 165.41520111 175.36829092
161.93246144 168.69913012 156.02366327 171.75563959 175.99002216
170.57267865 169.43420891 157.26223747 149.07753452 174.93913773
166.14481354 163.95705109 159.35316727 184.97482074 170.85476505
176.21552507 171.64229271 159.62940411 161.38525383 170.67697066
161.89089923 186.2608644 171.91884107 173.48876937 157.81039373
184.78008936 185.80959619 169.39143956 168.29420363 176.57542034
176.2254719 159.41281223 181.7337193 165.66044815 170.72175371
165.68048102 177.81198526 176.93231681 159.9265245 158.74013665
167.87137101 151.79520925 153.21372632 175.49296746 179.68409119
158.00230788 181.81721321 175.3203737 173.86345701 173.93009929
181.94979427 177.26941513 187.5758652 165.95205938 161.049753
172.6513295 150.93748733 168.79651696 166.90160227 161.68716252
155.5251024 151.14381892 176.30305962 186.64724108 164.51865578
157.1455274 175.07455273 167.35577923 160.39922456 173.83451448
170.57168026 172.17677421 164.92012288 187.9599084 161.79552344
171.40614997 163.85785695 168.33643195 182.78802747 159.51505694
152.65910541 159.51103347 170.22888698 160.28019459 180.3423551
183.49002682 162.50517697 152.82548101 177.45535156 177.04170488
187.99181653 169.76546548 163.2826254 172.20881049 169.98952635
163.9895048 163.147431 171.65594767 173.47040592 146.20548404
191.88201366 175.04690238 168.67270485 187.11156603 185.02337922
172.62821138 182.68769593 161.2801714 160.62782379 164.93499965
167.09778167 166.16402336 176.00604555 168.75521918 174.03653925
186.03473924 169.76381587 177.58509847 162.11021287 177.56048256
179.29680846 154.95403898 157.67692655 191.40913216 192.94344297
180.0511746 181.80371526 155.76259732 144.02803786 187.92449938]

import numpy as np
plt.hist(x)
plt.show()
You can read from the histogram that there are approximately:
2 people from 140 to 145cm

Modifying a Matplotlib Histogram Color

By default, Matplotlib creates a histogram with a dark blue fill color and no edge color.
However, we can use the following syntax to change the fill color to light blue and the edge color to
red:
You can also use the lw argument to modify the line width for the edges of the histogram:

import numpy as np
plt.hist(x,color='lightblue',ec='red',lw=5)
plt.show()
In [41]: # import necessary libraries

mens_age = 18, 19, 20, 21, 22, 23, 24, 25, 26, 27
female_age = 22, 28, 30, 30, 12, 33, 41, 22, 43, 18
plt.hist([mens_age, female_age], color=[

'Black', 'Red'], label=['Male', 'Female'])
plt.xlabel('Age')
plt.ylabel('Person Count')
plt.legend()
plt.show()
Scatter Plots
With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays of the same length,
one for the values of the x-axis, and one for values on the y-axis:

import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y,color = 'gold')

plt.xlabel('how old car is (years)')
plt.ylabel('speed of the car')
plt.show()
Compare Plots
In the example above, there seems to be a relationship between speed and age, but what if we plot
the observations from another day as well? Will the scatter plot tell us something else?

import numpy as np
#day one, the age and speed of 13 cars:

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y, color='lightblue')
#day two, the age and speed of 15 cars:

x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y,color='red')
plt.show()
By comparing the two plots, I think it is safe to say that they both gives us the same conclusion: the
newer the car, the faster it drives.
Colour each dot

import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array(["red","green","blue","yellow","pink","black","orange","purple","beige
plt.scatter(x, y, c=colors)
plt.show()
color map
The Matplotlib module has a number of available colormaps.
A colormap is like a list of colors, where each color has a value that ranges from 0 to 100.
This colormap is called 'viridis' and as you can see it ranges from 0, which is a purple color, up to 100,
which is a yellow color.

import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100])
plt.scatter(x, y, c=colors, cmap='viridis') # instead of 'viridis' you can also use, 'Ac
# 'PuBu','Spectral','flag','winter', etc....
plt.colorbar() # adds the colorbar also
plt.show()
You can combine a colormap with different sizes of the dots. This is best visualized if the dots are
transparent:
Create random arrays with 100 values for x-points, y-points, colors and sizes:

import numpy as np
x = np.random.randint(100, size=(100))
y = np.random.randint(100, size=(100))
colors = np.random.randint(100, size=(100))
sizes = 10 * np.random.randint(100, size=(100))
plt.scatter(x, y, c=colors, s=sizes, alpha=0.5, cmap='nipy_spectral')
plt.colorbar()
plt.show()
Stem and Leaf plot
A stem-and-leaf plot is a data visualization tool used to represent the distribution of a dataset while
preserving individual data points. It provides a way to organize and display the values in a dataset,
making it easier to understand the data's distribution and characteristics. Stem-and-leaf plots are
particularly useful for small to moderate-sized datasets.
let's consider a dataset representing the tensile strengths (in MPa) of steel samples tested in a
mechanical engineering laboratory. The data set is: [63,68,71,72,75,76,78,80,82,85,88,92,94,97,102]
Now, let's create a stem-and-leaf plot for this dataset:
6|38
7|12568
8|0258
9|247
10 | 2
In this example:
The stems (left column) represent the tens place of each data point.
The leaves (right column) represent the ones place of each data point.
For instance, the first row indicates that there are two data points with a tensile strength in the 60s:
63 and 68. The second row shows five data points in the 70s, and so on.
This stem-and-leaf plot helps mechanical engineers quickly visualize the distribution of tensile
strengths in the dataset and identify patterns or outliers.

import numpy as np
values = [63,68,71,72,75,76,78,80,82,85,88,92,94,97,102]
stems = [value//10 for value in values]
leaves= [value%10 for value in values]
plt.stem(stems, leaves)
plt.title('Stem-and-Leaf Plot for Tensile Strengths')
plt.xlabel('Stems')
plt.ylabel('Leaves')
plt.show()
similar plot with some sophistications

# Data
data = [63, 68, 71, 72, 75, 76, 78, 80, 82, 85, 88, 92, 94, 97, 102]
# Extract stems and leaves

stems = [6, 7, 8, 9, 10]
leaves = [[3, 8], [1, 2, 5, 6, 8], [0, 2, 5, 8], [2, 4, 7], [2]]
# Create stem-and-leaf plot

fig, ax = plt.subplots()
ax.set_title('Stem-and-Leaf Plot for Tensile Strengths')
ax.set_xlabel('Stems')
ax.set_ylabel('Leaves')
for i in range(len(stems)):
stem = stems[i]
leaf_values = [stem * 10 + leaf for leaf in leaves[i]]
ax.scatter([stem] * len(leaves[i]), leaf_values, label=f'Stem {stem}', s=50)
ax.legend()
plt.show()
data = [234, 345, 567, 789, 134, 456, 678, 912, 123, 456]
fig, ax = plt.subplots()
markerline, stemlines, baseline = ax.stem(data)
# Customize plot
plt.setp(markerline, marker='o', markersize=8, markerfacecolor='red', markeredgecolor='b
plt.setp(stemlines, linestyle='dashed', color='blue')
plt.setp(baseline, linestyle='dotted', color='green')
ax.set_title('Customized Stem-and-Leaf Plot')

ax.set_xlabel('Stems')
ax.set_ylabel('Leaves')
plt.show()
Boxplot
What is Box Plot?
A Box plot is a way to visualize the distribution of the data by using a box and some vertical lines. It
is known as the whisker plot. The data can be distributed between five key ranges, which are as
follows:
Minimum: Q1-1.5*IQR
1st quartile (Q1): 25th percentile
Median:50th percentile
3rd quartile(Q3):75th percentile
Maximum: Q3+1.5*IQR
Here IQR represents the InterQuartile Range which starts from the first quartile (Q1) and ends at the
third quartile (Q3).
Box Plot visualization
Syntax
matplotlib.pyplot.boxplot(data,notch=none,vert=none,patch_artist,widths=none)

import numpy as np
dataSet = np.random.normal(50, 25, 100)
# print(dataSet)
figure = plt.figure(figsize =(10, 7))
plt.boxplot(dataSet)
plt.show()
import numpy as np
np.random.seed(10)
dataSet1 = np.random.normal(100, 10, 220)
dataSet = [dataSet1, dataSet2, dataSet3, dataSet4]
figure = plt.figure(figsize =(10, 7))

ax = figure.add_subplot(111)
bp = ax.boxplot(dataSet, patch_artist = True,notch ='True', vert = 0)
colors = ['#00FF00','#0F00FF', '#F00FF0','#FFFF0F']
for patch, color in zip(bp['boxes'], colors):
patch.set_facecolor(color)
for whisker in bp['whiskers']:
whisker.set(color ='#8E008B',linewidth = 1.4,linestyle =":")
for cap in bp['caps']:
cap.set(color ='#8E008B',linewidth = 2.1)
for median in bp['medians']:
median.set(color ='blue',linewidth = 3)
for flier in bp['fliers']:
flier.set(marker ='D',color ='#d7298c',alpha = 0.6)
ax.set_yticklabels(['dataSet1', 'dataSet2','dataSet3', 'dataSet4'])
plt.title("Customized box plot using attributes")
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
plt.show()
Lets use matplotlib to solve mechanical problems
Shear Force and Bending Moment diagram (simple)

In [120… import numpy as np
P = float(input('load = '))
u1 = input('load unit = ')
L = float(input('Length of the beam = '))
u2 = input('length unit = ')
a = float(input('Distance of Point load from left end = '))
b = L - a
R1 = P*b/L
R2 = P - R1
R1 = round(R1, 3)
R2 = round(R2, 3)
l = np.linspace(0, L, 1000)
X = []
SF = []
M = []
maxBM= float()
for x in l:
if x <= a:
m = R1*x
sf = R1
elif x > a:
m = R1*x - P*(x-a)
sf = -R2
M.append(m)
X.append(x)
SF.append(sf)
plt.plot(X, SF)
plt.title("SFD")
plt.xlabel("Length in m")
plt.ylabel("Shear Force")
plt.show()
plt.plot(X, M)
plt.title("BMD")
plt.ylabel("Bending Moment")
plt.show()
load = 10
load unit = Kn
Length of the beam = 100
length unit = m
Distance of Point load from left end = 50
Shear Force and Bending Moment diagram (detailed)
In [122… import numpy as np
P = float(input('load = '))
u1 = input('load unit = ')
L = float(input('Length of the beam = '))
u2 = input('length unit = ')
a = float(input('Distance of Point load from left end = '))
b = L - a
R1 = P*b/L
R2 = P - R1
R1 = round(R1, 3)
R2 = round(R2, 3)
print(f'''
As per the static equilibrium, net moment sum at either end is zero,
hence Reaction R1 = P*b/L = {R1} {u1},
Also Net sum of vertical forces is zero,
hence R1+R2 = P, R2 = P - R1 = {R2} {u1}.
''')
l = np.linspace(0, L, 1000)
X = []
SF = []
M = []
maxBM= float()
for x in l:
if x <= a:
m = R1*x
sf = R1
elif x > a:
m = R1*x - P*(x-a)
sf = -R2
M.append(m)
X.append(x)
SF.append(sf)
print(f'''
Shear Force at x (x<{a}), Vx = R1 ={R1} {u1}
at x (x>{a}), SF = R1 - P = {R1} - {P} = -{R1-P} {u1}
Bending Moment at x (x<{a}), Mx = R1*x = {R1}*x

at x (x>={a}), Mx = R1*x - P*(x-{a})
= {R1}x - {P}(x-{a}) = -{R2}x + {P*a}
''')
max_SF = 0
for k in SF:
if max_SF < k:
max_SF = k
print(f'Maximum Shear Force Vmax = {max_SF} {u1}')
for k in M:
if maxBM < k:
maxBM = k
print(f'maximum BM, Mmax = {round(maxBM, 3)} {u1}{u2}')
Mx = float()
for x in l:
if x<a:
Mx = R1*x
if maxBM == Mx:
print(f'maximum BM at x = {round(x,3)} {u2}')
elif x>=a:
Mx = R1*x - P*(x- a)
if maxBM == Mx:
print(f'maximum BM at x = {round(x,3)} {u2}')
plt.plot(X, SF)
plt.plot([0, L], [0, 0])
plt.plot([0, 0], [0, R1], [L, L], [0, -R2])
plt.title("SFD")
plt.ylabel("Shear Force")
plt.show()
plt.plot(X, M)
plt.plot([0, L], [0, 0])
plt.title("BMD")
plt.ylabel("Bending Moment")
plt.show()
load = 10
load unit = 12
Length of the beam = 4
length unit = 1
Distance of Point load from left end = 1
As per the static equilibrium, net moment sum at either end is zero,
hence Reaction R1 = P*b/L = 7.5 12,
Also Net sum of vertical forces is zero,
hence R1+R2 = P, R2 = P - R1 = 2.5 12.
Shear Force at x (x<1.0), Vx = R1 =7.5 12

at x (x>1.0), SF = R1 - P = 7.5 - 10.0 = --2.5 12
Bending Moment at x (x<1.0), Mx = R1*x = 7.5*x

at x (x>=1.0), Mx = R1*x - P*(x-1.0)
= 7.5x - 10.0(x-1.0) = -2.5x + 10.0
Maximum Shear Force Vmax = 7.5 12

maximum BM, Mmax = 7.497 121
maximum BM at x = 1.001 1
In [ ]:
In [ ]:
Assignment 6
(Students are advoiced to use their imagination and are
allowed to modify program based on their interest without
compromising the overall theme and purpose of the
assignment)
1. Given a set of temperature values for each day, create a line plot using Matplotlib. Customize the
plot with a title, xlabel, ylabel, and a blue color line.
2. Plot two different datasets (eg. test score of 20 students and their deviation from mean value) in
a 1x2 grid of subplots. Use a line plot for the first subplot and a scatter plot for the second
subplot. Customize each subplot with appropriate titles and labels.
3. Create a bar chart representing the sales of quantity of different buiscuit brands in a store.
Customize the bar colors to be different shades of green, and add a legend to the plot.
4. Represent the percentage distribution of expenses in a monthly budget using a pie chart.
Customize the colors and explode the slice corresponding to the highest expense category.
5. generate random weight of 500 people in Kg and, create a histogram of these weights.
Customize the color of the bars and add a title and labels to the plot. also draw conclusions of
these data.
6. Take any unique example/problems from any of your subjects. write a python program, to get
results of the problem and plot your results.
In [ ]:

Unit 6 Data Visualization-1

Uploaded by

Document Informationclick to expand document informationThe document discusses various visualization techniques in Python using Matplotlib library. It covers basic plotting, formatting plots, subplots, pie charts, bar graphs and histograms. Examples are provided for each technique discussed.

Document Informationclick to expand document information

Copyright:

Available Formats

Unit 6 Data Visualization-1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 6 Data Visualization-1

Uploaded by

Copyright:

Available Formats

Unit 6 Data Visualization

“A picture is worth a thousand words”

Need for visualizing data :

Overview of Python Visualization Tools

• Matplotlib (We will primarily focus on this library only)

import matplotlib.pyplot as plt

In [1]: # Import dependencies

In [2]: # Import Matplotlib

import matplotlib.pyplot as plt

In [4]: import matplotlib.pyplot as plt

In [22]: plt.plot([1, 2, 3, 4], [1, 4, 9, 16])

In [5]: plt.plot([1, 2, 3, 4], [1, 4, 9, 16], 'ro')

In [26]: # using numpy linspace() to create 100 values between 0,10

In [30]: x1 = np.linspace(0, 10, 100)

In [118… import matplotlib.pyplot as plt

# adding the subplots

# plotting the data

In [105… import matplotlib.pyplot as plt

y = np.array([75, 55, 45, 15])

y = np.array([75, 55, 45, 15])

categories = ['A', 'B', 'C']

In [9]: import matplotlib.pyplot as plt

months = ['Jan', 'Feb']

plt.bar(index, sales_A, width=bar_width, label='Product A')

categories = ['Rent', 'Utilities']

plt.bar(categories, expenses_A, label='Product A')

items = ['Category 1', 'Category 2', 'Category 3']

It is a graph showing the number of observations within each given interval.

In [34]: import numpy as np

x = np.random.normal(170, 10, 250)

[183.09922342 158.47198312 175.52499894 171.34546011 149.88785534

In [35]: import matplotlib.pyplot as plt

x = np.random.normal(170, 10, 250)

2 people from 140 to 145cm

5 people from 145 to 150cm

15 people from 151 to 156cm

31 people from 157 to 162cm

46 people from 163 to 168cm

53 people from 168 to 173cm

45 people from 173 to 178cm

28 people from 179 to 184cm

21 people from 185 to 190cm

Modifying a Matplotlib Histogram Color

In [36]: import matplotlib.pyplot as plt

x = np.random.normal(170, 10, 250)

In [41]: # import necessary libraries

plt.hist([mens_age, female_age], color=[

In [51]: import matplotlib.pyplot as plt

plt.scatter(x, y,color = 'gold')

In [52]: import matplotlib.pyplot as plt

#day one, the age and speed of 13 cars:

#day two, the age and speed of 15 cars:

Colour each dot

In [53]: import matplotlib.pyplot as plt

In [60]: import matplotlib.pyplot as plt

In [61]: import matplotlib.pyplot as plt

plt.scatter(x, y, c=colors, s=sizes, alpha=0.5, cmap='nipy_spectral')

Now, let's create a stem-and-leaf plot for this dataset:

Bending Moment at x (x<{a}), Mx = R1x = {R1}x

Bending Moment at x (x<1.0), Mx = R1x = 7.5x