21css303t-Data Science Unit-3 Visualization
21css303t-Data Science Unit-3 Visualization
UNIT-3 VISUALIZATION
by
J. Arthy,
AP\CSE,
SRMIST,
Ramapuram
AGENDA
1. Introduction to MatplotLib
2. Customizing Plot
3. Seaborne Library
4. 3d Plot of Surface
INTRODUCTION TO MATPLOTLIB
2. Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
The plt interface is what we will use most often, as we’ll see throughout this presentation.
SETTING STYLES
We will use the plt.style directive to choose appropriate aesthetic styles for our figures. Here we will set the classic style,
which ensures that the plots we create use the classic Matplotlib style:
In[2]: plt.style.use('classic')
Output:
[‘Solarize_Light2’, ‘_classic_test_patch’, ‘bmh’, ‘classic’, ‘dark_background’, ‘fast’, ‘fivethirtyeight’,
‘ggplot’,’grayscale’,’seaborn’,’seaborn-bright’,’seaborn-colorblind’, ‘seaborn-dark’, ‘seaborn-dark-palette’, ‘seaborn-
darkgrid’, ‘seaborn-deep’, ‘seaborn-muted’, ‘seaborn-notebook’, ‘seaborn-paper’, ‘seaborn-pastel’, ‘seaborn-
poster’,’seaborn-talk’,’seaborn-ticks’,’seaborn-white’,’seaborn-whitegrid’,’tableau-colorblind10′]
PLOTTING FROM A SCRIPT
plt.show()
$ python myplot.py
The plt.show() command does a lot under the hood, as it must interact with your system’s interactive
graphical backend. The details of this operation can vary greatly from system to system and even installation
to installation, but Matplotlib does its best to hide all these details from you.
One thing to be aware of: the plt.show() command should be used only once per Python session, and is
most often seen at the very end of the script. Multiple show() commands can lead to unpredictable
backend-dependent behavior, and should mostly be avoided.
PLOTTING FROM AN IPYTHON SHELL
It can be very convenient to use Matplotlib interactively within an IPython shell . IPython is built to work well with
Matplotlib if you specify Matplotlib mode. To enable this mode, you can use the %matplotlib magic command
after starting ipython:
In [1]: %matplotlib
Using matplotlib backend: TkAgg
At this point, any plt plot command will cause a figure window to open, and further commands can be run to
update the plot. Some changes (such as modifying properties of lines that are already drawn) will not draw
automatically; to force an update, use plt.draw(). Using plt.show() in Matplotlib mode is not required.
PLOTTING FROM AN IPYTHON NOTEBOOK
The IPython notebook is a browser-based interactive data analysis tool that can combine narrative, code, graphics, HTML elements, and
much more into a single executable document .Plotting interactively within an IPython notebook can be done with the %matplotlib
command, and works in a similar way to the IPython shell. In the IPython notebook, you also have the option of embedding graphics
directly in the notebook, with two possible options:
● %matplotlib notebook will lead to interactive plots embedded within the notebook
● %matplotlib inline will lead to static images of your plot embedded in the notebook
● In[3]: %matplotlib inline
After you run this command (it needs to be done only once per kernel/session), any cell within the notebook that creates a plot
will embed a PNG image of the resulting graphic :
ln[4]: import matplotlib.pyplot as plt
import numpy as np
# Create an array of values for x from 0 to 10
x = np.linspace(0, 10, 100)
# Create a new figure object
fig = plt.figure()
# Plot sin(x) with a solid line
plt.plot(x, np.sin(x), '-', label='sin(x)')
# Plot cos(x) with a dashed line
plt.plot(x, np.cos(x), '--', label='cos(x)')
# Add labels and a title
plt.xlabel('x')
plt.ylabel('y')
plt.title('Plot of sin(x) and cos(x)')
# Display a legend
plt.legend()
# Show the plot
plt.show()
SAVING FIGURES TO FILE
One nice feature of Matplotlib is the ability to save figures in a wide variety of formats. You can save a figure using the savefig()
command. For example, to save the previous figure as a PNG file, you can run this:
In[5]: fig.savefig('my_figure.png')
To confirm that it contains what we think it contains, let’s use the IPython Image object to display the contents of this file (
Figure 4-2):
In[7]: from IPython.display import Image
Image('my_figure.png')
In savefig(), the file format is inferred from the extension of the given filename. Depending on what backends you have
installed, many different file formats are available. You can find the list of supported file types for your system by using the
following method of the figure canvas object:
In[8]: fig.canvas.get_supported_filetypes()
Out[8]: {'eps': 'Encapsulated Postscript',
'jpeg': 'Joint Photographic Experts Group',
'jpg': 'Joint Photographic Experts Group',
'pdf': 'Portable Document Format',
'pgf': 'PGF code for LaTeX',
'png': 'Portable Network Graphics',
'ps': 'Postscript',
'raw': 'Raw RGBA bitmap',
'rgba': 'Raw RGBA bitmap',
'svg': 'Scalable Vector Graphics',
'svgz': 'Scalable Vector Graphics',
'tif': 'Tagged Image File Format',
plt.plot(x, np.sin(x))
plt.subplot(2, 1, 2)
plt.plot(x, np.cos(x));
INTERFACES
A potentially confusing feature of Matplotlib is its dual interfaces: a convenient MATLAB-style state-based interface, and a more
powerful object-oriented interface.
1. Matplotlib was originally written as a Python alternative for MATLAB users, and much of its syntax reflects that fact.
The MATLAB-style tools are contained in the pyplot (plt) interface. For example, the following code will probably look
quite familiar to MATLAB users.
2. The object-oriented interface is available for these more complicated situations, and for when you want more control over
your figure. Rather than depending on some notion of an “active” figure or axes, in the object-oriented interface the
plotting functions are methods of explicit Figure and Axes objects. To re-create the previous plot using this style of
plotting, you might do the following.
In[10]: # First create a grid of plots
fig, ax = plt.subplots(2)
ax[0].plot(x, np.sin(x))
ax[1].plot(x, np.cos(x));
In[10]: # First create a grid of plots
fig, ax = plt.subplots(2)
ax[0].plot(x, np.sin(x))
ax[1].plot(x, np.cos(x));
SIMPLE PLOT GRAPH
Perhaps the simplest of all plots is the visualization of a single function y=f(x). Here we will take a first look at creating a simple
plot of this type. As with all the following sections, we’ll start by setting up the notebook for plotting and importing the functions
we will use:
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
For all Matplotlib plots, we start by creating a figure and an axes. In their simplest form, a figure and axes can be created as
follows
In[2]: fig = plt.figure()
ax = plt.axes()
In Matplotlib, the figure (an instance of the class plt.Figure) can be thought of as a single container that
contains all the objects representing axes, graphics, text, and labels. The axes (an instance of the class
plt.Axes) a bounding box with ticks and labels, which will eventually contain the plot elements that
make up our visualization.
Once we have created an axes, we can use the ax.plot function to plot some data. Let’s start with a simple
sinusoid
ax = plt.axes()
ax.plot(x, np.sin(x));
If we want to create a single figure with multiple lines, we can simply call the plot function multiple times
In[5]: plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x));
Adjusting the Plot: Line Colors and Styles
In[6]:
plt.plot(x, x + 1, linestyle='dashed')
plt.plot(x, x + 2, linestyle='dashdot')
plt.plot(x, x + 3, linestyle='dotted');
plt.xlim(-1, 11)
plt.ylim(-1.5, 1.5);
A useful related method is plt.axis() (note here the potential confusion between axes with an e, and
axis with an i). The plt.axis() method allows you to set the x and y limits with a single call, by passing
a list that specifies [xmin, xmax, ymin, ymax]:
import numpy as np
plt.axis('equal')
plt.legend();
SIMPLE SCATTER PLOTS
plt.style.use('seaborn-whitegrid')
import numpy as np
In the previous section, we looked at plt.plot/ax.plot to produce line plots. It turns out that this same
function can produce scatter plots as well :
import numpy as np
# Define x values
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.plot(x, y, 'o', color='black');
plt.show()
The third argument in the function call is a character that represents the type of symbol used for the plotting. Just as
you can specify options such as '-' and '--' to control the line style, the marker style has its own set of short string
codes. The full list of available symbols can be seen in the documentation of plt.plot, or in Matplotlib’s online
documentation. Most of the possibilities are fairly intuitive, and we’ll show a number of the more common ones here :
In[3]: rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
plt.plot(rng.rand(5), rng.rand(5), marker,
label="marker='{0}'".format(marker))
plt.legend(numpoints=1)
plt.xlim(0, 1.8);
A second, more powerful method of creating scatter plots is the plt.scatter function, which can be used
very similarly to the plt.plot function :
# Compute Z values
Z = f(X, Y)
Let’s show this by creating a random scatter plot with points of many colors and sizes. In order to better see the overlapping results,
we’ll also use the alpha keyword to adjust the transparency level:
In[7]: rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)
plt.scatter(x, y, c=colors, s=sizes, alpha=0.3,
cmap='viridis')
plt.colorbar();
HISTOGRAMS,BINNING AND DENSITY
import numpy as np
import matplotlib.pyplot as plt
# Plot a histogram
plt.hist(data)
histtype='stepfilled', color='steelblue'
,edgecolor='none');
The plt.hist docstring has more information on other customization options available. I find this combination of
histtype='stepfilled' along with some transparency alpha to be very useful when comparing histograms of several
distributions
x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)
plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);
Customizing Colorbars
In[3]: x = np.linspace(0, 10, 1000)
I = np.sin(x) * np.cos(x[:, np.newaxis])
plt.imshow(I)
plt.colorbar();
In[12]: # load images of the digits 0 through 5 and visualize several of them
digits = load_digits(n_class=6)
axi.imshow(digits.images[i], cmap='binary')
axi.set(xticks=[], yticks=[])
plt.axes: Subplots
The most basic method of creating an axes is to use the plt.axes function. As we’ve seen previously, by default this creates a standard axes object that fills the entire figure. plt.axes also takes an optional argument that is a list of four numbers in the figure coordinate system. These numbers represent [bottom, left, width, height] in the figure coordinate system, which ranges from 0 at the bottom left of the figure to 1 at the top right of the figure.
For example, we might create an inset axes at the top-right corner of another axes by setting the x and y position to 0.65 (that is, starting at 65% of the width and 65% of the height of the figure) and the x and y extents to 0.2 (that is, the size of the axes is 20% of the width and 20% of the height of the figure).
In[2]: ax1 = plt.axes() # standard axes
Aligned columns or rows of subplots are a common enough need that Matplotlib has several convenience
routines that make them easy to create. The lowest level of these is plt.subplot(), which creates a
single subplot within a grid. As you can see, this command takes three integer arguments—the number of
rows, the number of columns, and the index of the plot to be created in this scheme, which runs from the
upper left to the bottom right :
In[4]: for i in range(1, 7):
plt.subplot(2, 3, i)
fontsize=18, ha='center')
Text and Annotation
Creating a good visualization involves guiding the reader so that the figure tells a story. In some cases, this
story can be told in an entirely visual manner, without the need for added text, but in others, small textual
cues and labels are necessary. Perhaps the most basic types of annotations you will use are axes labels and
titles, but the options go beyond this.
In[1]: %matplotlib inline
plt.style.use('seaborn-whitegrid')
import numpy as np
import pandas as pd
Arrows and Annotation
Within each axis, there is the concept of a major tick mark and a minor tick mark. As the names would imply,
major ticks are usually bigger or more pronounced, while minor ticks are usually smaller. By default,
Matplotlib rarely makes use of minor ticks, but one place you can see them is within logarithmic plots :
In[1]: %matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
import numpy as np
import math
fig = plt.figure()
y = np.sin(x)
ax.plot(x, y)
ax.set_xlabel(‘angle’)
ax.set_title('sine')
ax.set_xticks([0,2,4,6])
ax.set_xticklabels(['zero','two','four','six'])
ax.set_yticks([-1,0,1])
Locator class Description
NullLocator No ticks
Matplotlib was initially designed with only two-dimensional plotting in mind. Around the time of the 1.0 release, some three-
dimensional plotting utilities were built on top of Matplotlib’s two-dimensional display, and the result is a convenient (if
somewhat limited) set of tools for three-dimensional data visualization. We enable three-dimensional plots by importing the
mplot3d toolkit, included with the main Matplotlib installation:
In[1]: from mpl_toolkits import mplot3d
Once this submodule is imported, we can create a three-dimensional axes by passing the
keyword projection='3d' to any of the normal axes creation routines:
In[2]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
In[3]: fig = plt.figure()
ax = plt.axes(projection='3d')
Three-Dimensional Points and Lines
The most basic three-dimensional plot is a line or scatter plot created from sets of (x, y, z) triples. In analogy
with the more common two-dimensional plots discussed earlier, we can create these using the ax.plot3D
and ax.scatter3D functions. The call signature for these is nearly identical to that of their two-
dimensional counterparts, so you can refer to “Simple Line Plots” and “Simple Scatter Plots” for more
information on controlling the output. Here we’ll plot a trigonometric spiral, along with some points drawn
randomly near the line (Figure 4-93):
I n[4]: ax = plt.axes(projection='3d')
# Data for a three-dimensional line
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline, yline, zline, 'gray')
# Data for three-dimensional scattered points
zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.randn(100)
ydata = np.cos(zdata) + 0.1 * np.random.randn(100)
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');
SEABORN
Seaborn has many of its own high-level plotting routines, but it can also overwrite Matplotlib’s default
parameters and in turn get even simple Matplotlib scripts to produce vastly superior output. We can set the
style by calling Seaborn’s set() method. By convention, Seaborn is imported as sns:
In[4]: import seaborn as sns
sns.set()
plt.plot(x, y)
sns.distplot(data['y']);
Thank you