0% found this document useful (0 votes)
2 views

fundamentals of data science unit 4 and 5

This document covers essential Python libraries for data wrangling, focusing on NumPy and Pandas. It details array manipulation, aggregation, computation, and data handling techniques, including hierarchical indexing and combining datasets. The document also introduces data visualization using Matplotlib and Seaborn, highlighting various plot types and customization options.

Uploaded by

kaleeswaranmmcas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

fundamentals of data science unit 4 and 5

This document covers essential Python libraries for data wrangling, focusing on NumPy and Pandas. It details array manipulation, aggregation, computation, and data handling techniques, including hierarchical indexing and combining datasets. The document also introduces data visualization using Matplotlib and Seaborn, highlighting various plot types and customization options.

Uploaded by

kaleeswaranmmcas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

UNIT IV PYTHONLIBRARIESFORDATAWRANGLING

BasicsofNumpyarrays–aggregations–computationsonarrays–comparisons,masks,booleanlogic– fancy
indexing – structured arrays – Data manipulation with Pandas – data indexing and selection – operating
on data – missing data – Hierarchical indexing – combining datasets – aggregation and grouping – pivot
tables

I. BasicsofNumpyarrays:

NumPyarraymanipulationtoaccessdataandsubarrays,andtosplit,reshape,andjointhearrays. Basic

array manipulations:
Attributesof arrays
Determiningthesize,shape,memoryconsumption,anddatatypesofarrays Indexing of
arrays
Gettingandsettingthevalueofindividualarrayelements
Slicing of arrays
Gettingandsettingsmallersubarrayswithinalargerarray
Reshaping of arrays
Changingtheshapeofagivenarray
Joining and splitting of arrays
Combiningmultiplearraysintoone,andsplittingonearrayinto many

NumPyArray Attributes

1. Creatingnumpyarrays:

TheThreerandomarrays:aone-dimensional,two-dimensional,andthree-dimensionalarray.
We’lluseNumPy’srandomnumbergenerator,whichwewillseedwithasetvalueinordertoensurethatthe same random
arrays are generated each time this code is run:

2. Attributes:

Eacharrayhasattributesndim(thenumberofdimensions),shape(thesizeofeachdimension),andsize(the total size of


the array):
3. Array Indexing:AccessingSingle Elements

Inaone-dimensionalarray,youcanaccesstheith value(countingfromzero)byspecifyingthedesiredindexin square


brackets, just as with Python lists:
4. ArraySlicing:Accessing Subarrays

Wecanalso usethem toaccesssubarrays withtheslicenotation,marked bythe colon(:) character.


TheNumPyslicingsyntaxfollowsthatofthestandardPythonlist;toaccessasliceofan arrayx,usethis: x[start:stop:step]
Ifany oftheseareunspecified, theydefault to thevalues start=0, stop=size of dimension, step=1.

Multidimensionalsubarrays

Multidimensionalslicesworkinthesameway,withmultipleslicesseparatedbycommas. For
example:
5. Accessingarray rows and columns

Onecommonlyneededroutineisaccessingsingle rowsorcolumnsofanarray.
Wecan dothis bycombining indexingand slicing,using an emptyslicemarkedby asinglecolon (:):

6. Subarraysasno-copyviews

NumPyarrayslicingdiffersfromPythonlistslicing:inlists,sliceswillbecopies. The
array slices is that they return views rather than copies of the array data Consider
our two-dimensional array from before:
7. Creatingcopiesof arrays

Despitethenicefeaturesofarrayviews,itissometimesusefultoinsteadexplicitlycopythedatawithinan array or a
subarray.
Thiscan bemosteasily donewiththecopy() method:

8. ReshapingofArrays

Themostflexibleway of doingthisis withthereshape() method.


Forexample,if we wanttoput thenumbers 1through9 ina 3X3grid, we can dothe following:
Anothercommonreshapingpatternistheconversionofaone-dimensionalarrayintoatwo-dimensionalrowor column
matrix.
Wecandothiswiththereshapemethod,ormoreeasilybymakinguseofthenewaxiskeywordwithinaslice operation:

9. ArrayConcatenationandSplitting

It’salsopossibletocombinemultiplearraysintoone,andtoconverselysplitasinglearrayinto multiplearrays. Concatenation

of arrays
Concatenation, or joining of two arrays in NumPy, is primarily accomplished through the routines
np.concatenate,np.vstack,andnp.hstack.np.concatenatetakesatupleorlistofarraysasitsfirstargument
Splitting of arrays
Theoppositeofconcatenationissplitting,whichisimplementedbythefunctionsnp.split,np.hsplit,and np.vsplit. For
each of these, we can pass a list of indices giving the split points:
II. Aggregations:Min,Max, andEverythingin Between

1. SummingtheValues inan Array


Becareful,though:thesumfunctionandthenp.sumfunctionarenot
Becareful,though:thesumfunctionandthenp.sumfunctionarenotidentical,whichcansometimesleadto
identical,whichcansometimesleadto confusion! In
particular, their optional arguments have different meanings, and np.sum is aware of multiple array dimensions,
as we will see in the following section.

2. Minimum and Maximum

3. Multidimensional aggregates
Otheraggregation functions
Example:What IstheAverageHeightofUSPresidents?

AggregatesavailableinNumPycanbeextremelyusefulforsummarizingasetofvalues. As a
simple example, let’s consider the heights of all US presidents.
Thisdataisavailableinthefilepresident_heights.csv,whichisasimplecomma-separatedlistof labels and
values:
III. ComputationonArrays:Broadcasting

AnothermeansofvectorizingoperationsistouseNumPy’sbroadcastingfunctionality.Broadcastingissimply a set of
rules for applying binary ufuncs (addition, subtraction, multiplication, etc.) on
arraysofdifferentsizes.

IntroducingBroadcasting
Recall that for arrays of the same size, binary operations are performed on an element-by-element basis:
Broadcastingallowsthesetypesofbinaryoperationstobeperformedonarraysofdifferentsizes—forexample, we can
just as easily add a scalar (think of it as a zero dimensional array) to an array:
Rules ofBroadcasting
BroadcastinginNumPy followsastrict setof rulestodeterminetheinteractionbetween thetwo arrays:
• Rule1:Ifthetwo arraysdifferintheirnumberofdimensions,theshapeoftheonewithfewerdimensionsis padded
with ones on its leading (left) side.
• Rule2:Iftheshapeofthetwoarraysdoesnotmatchinanydimension,thearraywithshapeequal to1inthat dimension is
stretched to match the other shape.
• Rule3:Ifinanydimensionthesizesdisagree andneitherisequalto1,an errorisraised.
IV. Comparisons,Masks, andBooleanLogic

ThissectioncoverstheuseofBooleanmaskstoexamineandmanipulatevalueswithinNumPyarrays.Masking comes
up when you want to extract, modify, count, or otherwise manipulate values in an array based on some
criterion: for example, you might wish to count all values greater than a certain value, or perhaps remove all
outliers that are above some threshold.
InNumPy,Booleanmaskingisoften themostefficientwayto accomplishthesetypesoftasks.
Oneapproachtothiswouldbetoanswerthesequestionsbyhand:loopthroughthedata,incrementinga counter each
time we see values in some desired range.
Forreasonsdiscussedthroughoutthischapter,suchanapproachisveryinefficient,bothfromthestandpointof time
writing code and time computing the result.

ComparisonOperators as ufuncs
NumPyalsoimplementscomparisonoperatorssuchas<(lessthan)and>(greaterthan)aselement-wise ufuncs.
TheresultofthesecomparisonoperatorsisalwaysanarraywithaBooleandatatype. All six
of the standard comparison operations are available:
Boolean operators
NumPyoverloadstheseasufuncsthat workelement
workelement-wiseon (usuallyBoolean) arrays.
BooleanArraysasMasks:

WelookedataggregatescomputeddirectlyonBoolean
WelookedataggregatescomputeddirectlyonBooleanarrays.
A more powerful pattern is to use Boolean arrays as masks, to select particular subsets of the data themselves.
Returningtoourxarrayfrombefore,supposewewantanarrayofallvaluesinthearraythatarelesst
Returningtoourxarrayfrombefore,supposewewantanarrayofallvaluesinthearraythatarelessthan,say, 5:

IV. Fancy Indexing:


We’lllook atanother styleofarrayindexing, knownasfancy indexing.
Fancyindexingislikethesimpleindexingwe’vealreadyseen,butwepass arraysofindicesinplaceofsingle scalars.
Thisallowsustoveryquicklyaccessandmodifycomplicatedsubsetsofanarray’svalues. Exploring Fancy

Indexing
Example:BinningData
V. StructuredData:NumPy’sStructuredArrays
Arrays

ThissectiondemonstratestheuseofNumPy’sstructuredarraysandrecordarrays,whichprovideefficient storage for


ThissectiondemonstratestheuseofNumPy’sstructuredarraysandrecordarrays,whichprovide
compound, hetero‐ geneous data.

Imaginethatwehaveseveralcategoriesofdataonanumberofpeople(say,name,age,andweight),andwe’d like to store


these values for use in a Python program. It would be possible to store these in three separate arrays:

In[2]:name=['Alice','Bob','Cathy','Doug']]
age = [25, 45, 37, 19]
weight=[55.0,85.5, 68.0, 61.5]
VI. DataManipulationwithPandas

Data Indexing and Selection

DataSelectioninSeries
ASeriesobjectactsinmanywayslikeaone-dimensionalNumPyarray,andinmanywayslikeastandard Python
dictionary.

Seriesas dictionary
Likeadictionary, theSeries objectprovides amapping fromacollection of keys toacollectionofvalues:

Seriesasone-dimensionalarray
A Series builds on this dictionary-like interface and provides array-style item selection via the same basic
mechanismsasNumPyarrays—thatis,slices,masking,andfancyindexing.Examplesoftheseareasfollows:
Indexers:loc,iloc, andix
Forexample,ifyourSerieshasanexplicitintegerindex,anindexingoperationsuchasdata[1]willusethe explicit
indices, while a slicing operation like data[1:3] will use the implicit Python-style index.
Athirdindexingattribute,ix,isahybridofthetwo,andforSeriesobjectsisequivalenttostandard[]-based indexing.

DataSelectioninDataFrame:

DataFrame as a dictionary
OperatingonDatainPandas:

Pandas:forunaryoperationslikenegationandtrigonometricfunctions,theufuncswillpreserveindexand column
labels in the output, and for binary operations such as addition and multiplication, Pandas will automatically
align indices when passing the objects to the ufunc.
Ufuncs:OperationsBetweenDataFrameand Series

WhenyouareperformingoperationsbetweenaDataFrameandaSeries,theindexandcolumnalignmentis similarly
maintained.
OperationsbetweenaDataFrameandaSeriesaresimilartooperationsbetweenatwo-dimensionalandone- dimensional
NumPy array.

In[15]:A=rng.randint(10,size=(3,4)) A
Out[15]:array([[3,8,2,4],
[2, 6,4, 8],
[6,1,3, 8]])
In[16]:A-A[0]
Out[16]:array([[0,0,0,0],
[-1,-2,2,4],
[3,-7, 1, 4]])

subtractionbetweenatwo-dimensionalarrayandoneofitsrowsisappliedrow-wise. In

Pandas, the convention similarly operates row-wise by default:


In[17]:df=pd.DataFrame(A,columns=list('QRST')) df
- df.iloc[0]
Out[17]:QR S T
00000
1-1 -22 4
23 -7 1 4
Ifyouwouldinsteadliketooperatecolumn-wise,youcanusetheobjectmethods mentioned
earlier, while specifying the axis keyword:
In[18]:df.subtract(df['R'],axis=0)
Out[18]: Q R S T
0-50-6 -4
1-4 0-2 2
25027

VII. HandlingMissingData

Intherealworldisthatreal-worlddataisrarelycleanandhomogeneous.Inparticular,manyinteresting datasets will


have some amount of data missing

Trade-OffsinMissingDataConventions
A number of schemes have been developed to indicate the presence of missing data in a table or DataFrame.
Twostrategies:usingamaskthatgloballyindicatesmissingvalues,orchoosingasentinelvaluethatindicatesa missing
entry.

Inthemaskingapproach,themaskmightbeanentirelyseparateBooleanarray,oritmayinvolveappropriation of one bit


in the data representation to locally indicate the null status of a value.
Inthesentinelapproach,thesentinelvaluecouldbesomedata-specificconvention.Eg:IEEEfloating-point specification.

MissingData in Pandas
ThewayinwhichPandashandlesmissingvaluesisconstrainedbyitsrelianceontheNumPypackage,which does not
have a built-in notion of NA values for nonfloating- point data types.
VIII. HierarchicalIndexing:

Hierarchical indexing (also known as multi-indexing) - to incorporate multiple index levels within a
singleindex.Inthisway,higher-dimensionaldatacanbecompactlyrepresentedwithinthefamiliarone-
dimensional Series and two-dimensional DataFrame objects.

AMultiplyIndexed Series
Rearranging Multi-Indices

Sortedandunsortedindices
ManyoftheMultiIndexslicingoperationswill failiftheindexisnot sorted.

Stackingandunstackingindices
VIII. CombiningDatasets: Concatand Append
Noticetherepeatedindicesintheresult.WhilethisisvalidwithinDataFrames,theoutcomeisoften undesirable.
pd.concat() gives us a few ways to handle it.

Catchingtherepeatsasanerror. Ignoring
the index.
AddingMultiIndexkeys

IX. AggregationandGrouping

Anessentialpieceofanalysisoflargedataisefficientsummarization:computingaggregationslikesum(), mean(),
median(), min(), and max()

SimpleAggregationinPandas
GroupBy:Split, Apply, Combine

Acanonicalexampleofthissplit-apply-combineoperation,wherethe“apply”isasummationaggregation,is illustrated
in Figure 3-1.
Figure3-1makesclearwhattheGroupByaccomplishes:
• ThesplitstepinvolvesbreakingupandgroupingaDataFramedependingonthe value
of the specified key.
• Theapplystepinvolvescomputingsomefunction,usuallyanaggregate,transformation, or
filtering, within the individual groups.
• Thecombinestep mergesthe resultsoftheseoperationsinto anoutput array.

Hereit’simportanttorealizethattheintermediate splitsdo notneedtobeexplicitly instantiated.


TheGroupByobject
TheGroupByobjectisa veryflexible abstraction.

Columnindexing.TheGroupByobjectsupportscolumnindexinginthesamewayas the
DataFrame, and returns a modified GroupBy object. For example
example:
In[14]:planets.groupby('method')
Out[14]:<pandas.core.groupby.DataFrameGroupByobjectat0x1172727b8>
Out[14]:<pandas.core.groupby.DataFrameGroupByobjectat0x1172727b8>In[15]:
planets.groupby('method')['orbital_period'
'orbital_period']
Out[15]:<pandas.core.groupby.SeriesGroupByobjectat
Out[15]:<pandas.core.groupby.SeriesGroupByobjectat0x117272da0>

Iterationovergroups.TheGroupByobjectsupportsdirectiterationoverthegroups,
supportsdirectiterationoverthegroups, returning
each group as a Series or DataFrame:
In[17]:for(method,group)inplanets.groupby
groupby('method'):
print("{0:30s}shape={1}".format(method
method,group.shape))

Dispatch methods. Through some Python class magic, any method no not explicitly
implementedbytheGroupByobjectwillbepassedthroughandcalledonthegroups,
whether they are DataFrame or Series objects. For example, you can use the
describe() method of DataFrames to perform a set of aggregations that describe each
group in the data:
In[18]:planets.groupby('method')['year'].describe
describe().unstack()
X. Pivot Tables

Wehaveseen how theGroupBy abstractionlets us explorerelationships within a dataset.


Apivottableisasimilaroperationthatiscommonlyseeninspreadsheetsandotherprogramsthatoperateon tabular data.
Thepivottabletakessimplecolumnwisedataasinput,andgroupstheentriesintoatwo-dimensionaltablethat provides a
multidimensional summarization of the data.
UNIT V
DATAVISUALIZATION
Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density and contour plots –
Histograms–legends–colors–subplots–textandannotation–customization–threedimensional
plotting - Geographic Data with Basemap - Visualization with Seaborn.

SimpleLinePlots
The simplest of all plots is the visualization ofa single function y= fx . Here we will take a first look at
creating asimple plot of this type.
The figure (an instance of the class plt.Figure) can be thought of as a single container that contains
all the objectsrepresenting axes, graphics, text, and labels.
The axes (an instance of the class plt.Axes) is what we see above: a bounding box with ticks and
labels, which willeventually contain the plot elements that make up our visualization.

LineColorsandStyles
 Thefirstadjustmentyoumightwishtomaketoaplotistocontrolthelinecolorsandstyles.
 To adjust the color, you can use the color keyword, which accepts a string argument
representing virtuallyanyimaginablecolor.Thecolorcanbespecifiedinavarietyofways
 Ifnocolorisspecified,Matplotlibwillautomaticallycyclethroughasetofdefaultcolors for
multiple lines

Differentformsofcolorrepresentation.
specify color by name -color='blue'
short color code (rgbcmyk) - color='g'
Grayscale between 0 and 1 -color='0.75'
Hex code (RRGGBB from 00 to FF) -
color='#FFDD44' RGB tuple, values 0 and 1
-
color=(1.0,0.2,0.3)allHTMLcolornames
supported -
color='chartreuse'

 Wecanadjustthelinestyleusingthelinestyle keyword.
Differentlinestyles
linestyl
e='soli
d'linest
yl
e='das
hed'lin
estyl
e='das
hdot'li
nestyl
e='dott
ed'

Shortassignment
linestyle='-
'# solid
linestyle='-
-'#dashed
linestyle='-
.'#
dashdot
linestyle=':
'#dotted

 linestyleandcolorcodescanbecombinedintoasinglenonkeywordargumenttotheplt.plot()
function
plt.plot(x, x + 0, '-g') #
solid green plt.plot(x, x +
1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') #
dashdotblackplt.plot(x,x
+3, ':r');# dotted red

Axes
Limits
1
 Themostbasicwaytoadjustaxislimitsistousetheplt.xlim()andplt.ylim()methods
Example
plt.xlim(10,0)
plt.ylim(1.2,-1.2);
 The plt.axis() method allows you to set the x and y limits with a single call, by passing a list that
specifies[xmin, xmax, ymin, ymax]
plt.axis([-1,11,-1.5,1.5]);

 Aspectratioequalisusedtorepresent oneunitinxisequaltooneunitiny.plt.axis('equal')

Labeling Plots
Thelabelingofplotsincludestitles,axislabels,andsimple
legends.Title - plt.title()
Label-plt.xlabel()
plt.ylabel()
Legend-plt.legend()

Exampleprograms
Line color
importmatplotlib.pyplotas
pltimport numpy as np
fig=
plt.figure()ax=
plt.axes()
x = np.linspace(0, 10,
1000)ax.plot(x,np.sin(x));
plt.plot(x,np.sin(x-0),color='blue')#specifycolorbyname
plt.plot(x, np.sin(x - 1), color='g') # short color code
(rgbcmyk) plt.plot(x, np.sin(x - 2), color='0.75') # Grayscale
between 0 and 1
plt.plot(x,np.sin(x-3),color='#FFDD44')#Hexcode(RRGGBBfrom00to
FF)plt.plot(x,np.sin(x-4),color=(1.0,0.2,0.3))#RGBtuple,values0and1
plt.plot(x, np.sin(x - 5), color='chartreuse');# all HTML color names
supported

Linestyle
importmatplotlib.pyplotasplt
importnumpyasnpfig=
plt.figure()
ax=plt.axes()
x = np.linspace(0, 10, 1000)
plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1,
linestyle='dashed') plt.plot(x, x +
2,linestyle='dashdot')plt.plot(x,x
+3, linestyle='dotted');
# For short, you can use the following
codes:plt.plot(x,x+4,linestyle='-')#solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.') # dashdot
plt.plot(x, x + 7, linestyle=':'); # dotted

Axislimitwithlabelandlegend

importmatplotlib.pyplotas pltimport
numpy as np
fig=
plt.figure()ax=
plt.axes()
x=np.linspace(0,10,1000)
plt.xlim(-1,11)
plt.ylim(-1.5,1.5);
plt.plot(x,np.sin(x),'-g',label='sin(x)')
plt.plot(x, np.cos(x), ':b',
label='cos(x)')plt.title("A Sine
Curve")
plt.xlabel("x")
plt.ylabel("sin(x)");
plt.legend();

SimpleScatter Plots
Another commonly used plot type is the simple scatter plot, a close cousin of the line plot. Instead of points being
joined by line segments, here the points are represented individually with a dot, circle, or other shape.
Syntax
plt.plot(x,y,'typeofsymbol',color);

Example
plt.plot(x,y,'o',color='black');
 Thethirdargumentinthefunctioncallisacharacterthatrepresentsthetypeofsymbolusedfortheplotting.
Justasyoucanspecifyoptionssuchas'-'and'--'tocontrolthelinestyle,themarkerstylehasitsownsetof short string
codes.
Example
 Varioussymbolsusedtospecify['o','.',',','x','+','v','^','<','>','s','d']

 Shorthandassignmentofline,symbolandcoloralsoallowed.

plt.plot(x, y, '-ok');

 Additionalargumentsinplt.plot()
Wecanspecify someotherparametersrelatedwithscatterplotwhichmakesitmoreattractive.They arecolor,
marker size, linewidth, marker face color, marker edge color, marker edge width, etc

Example
plt.plot(x,y,'-p',color='gray',
markersize=15, linewidth=4,
markerfacecolor='white',
markeredgecolor='gray',
markeredgewidth=2)
plt.ylim(-1.2, 1.2);

ScatterPlotswith plt.scatter
 Asecond,morepowerfulmethodofcreatingscatterplotsistheplt.scatterfunction,whichcanbeusedvery
similarly to the plt.plot function
plt.scatter(x,y,marker='o');
 The primary difference of plt.scatter from plt.plot is that it can be used to create scatter plots where the
propertiesof eachindividualpoint(size,facecolor,edgecolor,etc.)canbeindividuallycontrolledor mapped to
data.
 Notice that the color argument is automatically mapped to a color scale (shown here by the colorbar()
command), and the size argument is given in pixels.
 Cmap–colormapusedinscatterplot givesdifferentcolorcombinations.

PerceptuallyUniformSequential
['viridis','plasma','inferno','magma']
Sequential
['Greys','Purples','Blues','Greens','Oranges','Reds','YlOrBr','YlOrRd',
'OrRd','PuRd','RdPu','BuPu','GnBu','PuBu','YlGnBu','PuBuGn','BuGn', 'YlGn']
Sequential(2)
['binary','gist_yarg','gist_gray','gray','bone','pink','spring','summer',
'autumn','winter','cool','Wistia','hot','afmhot','gist_heat','copper']

4
Diverging
['PiYG','PRGn','BrBG','PuOr','RdGy','RdBu','RdYlBu','RdYlGn','Spectral',
'coolwarm', 'bwr', 'seismic']
Qualitative
['Pastel1','Pastel2','Paired','Accent','Dark2','Set1','Set2','Set3', 'tab10',
'tab20', 'tab20b', 'tab20c']
Miscellaneous
['flag','prism','ocean','gist_earth','terrain','gist_stern','gnuplot',
'gnuplot2','CMRmap','cubehelix','brg','hsv','gist_rainbow','rainbow', 'jet',
'nipy_spectral', 'gist_ncar']
Exampleprograms.

Simplescatterplot.
import numpy as
npimportmatplotlib.pyplot
as
pltx=np.linspace(0,10,30) y
= np.sin(x)
plt.plot(x,y,'o',color='black');

Scatterplotwithedgecolor,facecolor,size,
andwidthofmarker.(Scatterplotwithline)

import numpy as
npimportmatplotlib.pyplot
as
pltx=np.linspace(0,10,20) y
= np.sin(x)
plt.plot(x, y, '-o',
color='gray',
markersize=15,
linewidth=4,
markerfacecolor='yellow',
markeredgecolor='red',
markeredgewidth=4)
plt.ylim(-1.5, 1.5);

Scatterplotwithrandomcolors, sizeandtransparency
importnumpyasnp
importmatplotlib.pyplotasplt
rng=
np.random.RandomState(0)x=
rng.randn(100)
y=rng.randn(100)
colors=
rng.rand(100)
sizes=1000* rng.rand(100)
plt.scatter(x,y,c=colors,s=sizes,alpha=0.3,
map='viridis')plt.colorbar()
VisualizingErrors
For any scientific measurement, accurate accounting for errors is nearly as important, if notmore important,
thanaccuratereportingofthenumberitself.Forexample,imaginethatIamusingsomeastrophysical observations
to estimate the Hubble Constant, the local measurement of the expansion rate of the Universe.
Invisualizationof data andresults,showing theseerrorseffectively canmake a plot convey muchmore
completeinformation.

Typesoferrors
 BasicErrorbars
 ContinuousErrors

BasicErrorbars
AbasicerrorbarcanbecreatedwithasingleMatplotlib functioncall.
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
x=np.linspace(0,10,50)
dy =0.8
y=np.sin(x)+dy*np.random.randn(50)
plt.errorbar(x, y, yerr=dy, fmt='.k');

 Herethefmtisaformatcodecontrollingtheappearanceoflinesandpoints,andhasthesamesyntaxas
theshorthand used in plt.plot()
 Inadditiontothesebasicoptions,theerrorbar functionhasmanyoptionstofinetunetheoutputs.
Usingthese additional options you can easily customize the aesthetics of your errorbar plot.

plt.errorbar(x,y,yerr=dy,fmt='o',color='black',ecolor='lightgray',elinewidth=3,capsize=0);

6
ContinuousErrors
 In some situations it is desirable to show errorbars on continuous quantities. Though Matplotlib does not
haveabuilt-inconvenienceroutineforthistypeofapplication,it’srelativelyeasytocombineprimitiveslike plt.plot
and plt.fill_between for a useful result.
 Here we’ll perform a simple Gaussianprocess regression (GPR), using theScikit-Learn API. This is a method
of fitting a very flexible nonparametric function to data with a continuous measure of the uncertainty.

DensityandContourPlots
Todisplaythree-dimensionaldataintwodimensionsusingcontoursorcolor-coded regions.There
are three Matplotlib functions that can be helpful for this task:
 plt.contourforcontourplots,
 plt.contourfforfilledcontourplots,and
 plt.imshowforshowingimages.

VisualizingaThree-DimensionalFunction
Acontour plotcanbecreatedwiththeplt.contour function.
I
ttakesthreearguments:
 agridof xvalues,
 agridofyvalues, and
 agridof zvalues.
The x and y values represent positions on the plot, and the z
values will be represented by the contour levels.
The way to prepare such data is to use the np.meshgrid
function, which builds two-dimensional grids from one-
dimensional arrays:
Example
deff(x,y):
returnnp.sin(x)**10+np.cos(10+y*x)*np.cos(x) x =
np.linspace(0, 5, 50)
y=np.linspace(0,5,40)
X,Y=np.meshgrid(x,y) Z
= f(X, Y)
plt.contour(X,Y,Z,colors='black');

 Notice that by default when a single color is used, negative values arerepresentedbydashedlines,
andpositive values by solid lines.
 Alternatively, youcancolor-codethelinesbyspecifyingacolormapwiththecmap argument.
 We’llalsospecifythatwewantmorelinestobedrawn—20equallyspacedintervalswithinthedata range.

7
plt.contour(X,Y,Z,20,cmap='RdGy');
 One potential issue with this plot is that it is abit “splotchy.” That is, the color steps arediscrete rather
thancontinuous, which is not always what is desired.
 You could remedy this by setting the number of contours to a very high number, but this results in a
ratherinefficient plot: Matplotlib must render a new polygon for each step in the level.
 A better way to handle this is to use the plt.imshow() function, which interprets a two-dimensional grid
ofdata as an image.

Thereareafewpotentialgotchaswithimshow().
 plt.imshow()doesn’tacceptanx andygrid,soyoumustmanuallyspecifythe extent[xmin,xmax,ymin, ymax]
of the image on the plot.
 plt.imshow() bydefault follows the standard image arraydefinition where the origin is in the upper left,
notin the lower left as in most contour plots. This must be changed when showing gridded data.
 plt.imshow() will automatically adjust the axis aspect ratio to match the input data; you can change this
bysetting, for example, plt.axis(aspect='image') to make x and y units match.

Finally,itcansometimesbeusefultocombine contour plots


and image plots. we’ll use a partially transparent
background image (with transparency set via the alpha
parameter) and over-plot contours with
labelsonthecontoursthemselves(usingtheplt.clabel()
function):
contours = plt.contour(X,Y, Z,3, colors='black')
plt.clabel(contours, inline=True, fontsize=8)
plt.imshow(Z,extent=[0,5,0,5],origin='lower',
cmap='RdGy', alpha=0.5)
plt.colorbar();

ExampleProgram
importnumpyasnp
importmatplotlib.pyplotasplt def
f(x, y):
returnnp.sin(x)**10+np.cos(10+y*x)* np.cos(x)
x=np.linspace(0,5,50)
y=np.linspace(0,5,40)
X,Y=np.meshgrid(x,y) Z
= f(X, Y)
plt.imshow(Z,extent=[0,10,0,10],
origin='lower', cmap='RdGy')
plt.colorbar()
Histograms
 Histogramisthesimpleplottorepresentthelargedataset.Ahistogramisagraphshowing
frequencydistributions.Itisagraphshowingthenumberofobservationswithineach giveninterval.

Parameters
 plt.hist( )is used to plot histogram. The hist() function will use an array of numbers to create a
histogram,the array is sent into the function as an argument.

8
 bins - A histogram displays numerical data by grouping data into "bins" of equal width. Each bin is plotted
as a barwhose heightcorrespondsto how manydatapoints are inthatbin. Bins are also sometimes called
"intervals", "classes", or "buckets".
 normed-Histogramnormalizationisatechniquetodistributethefrequenciesofthehistogramoverawider range
than the current range.
 x - (n,) array or sequence of (n,) arrays Input values, this takes either a single array or a sequenceof arrays
which are not required to be of the same length.
 histtype - {'bar', 'barstacked', 'step', 'stepfilled'},
optionalThe type of histogram to draw.

 'bar'isatraditionalbar-typehistogram.Ifmultipledataaregiventhebarsarearrangedsidebyside.
 'barstacked'isabar-typehistogramwheremultipledataarestackedontop ofeachother.
 'step'generatesalineplot thatisbydefault unfilled.
 'stepfilled'generatesalineplotthatisbydefault
filled.Default is 'bar'
 align -{'left','mid','right'},optional
Controls how the histogram is
plotted.

 'left':barsarecenteredontheleftbin edges.
 'mid':barsarecenteredbetweenthebinedges.
 'right':barsarecenteredontherightbin
edges.Default is 'mid'
 orientation-{'horizontal','vertical'},optional
If'horizontal',barhwillbeusedforbar-typehistogramsandthebottomkwargwillbetheleftedges.
 color-colororarray_likeofcolorsorNone,optional
Colorspecorsequenceofcolorspecs, oneperdataset. Default(None)usesthestandardlinecolor sequence.

Defaultis None
 label-strorNone,optional.Defaultis None

Otherparameter
 **kwargs-Patchproperties,it allowsustopassa
variable number of keyword arguments to a
pythonfunction.**denotesthistypeoffunction.

Example
importnumpyasnp
importmatplotlib.pyplotasplt
plt.style.use('seaborn-white')
data = np.random.randn(1000)
plt.hist(data);

The hist() function has many options to tune both the calculation and the display; here’s an example of a
morecustomized histogram.
plt.hist(data,bins=30,alpha=0.5,histtype='stepfilled',color='steelblue',edgecolor='none');

The plt.hist docstring has more information onother customization options available. I find this combination
of histtype='stepfilled' along with some transparency alpha to be very useful when comparing histograms of
several distributions
x1=np.random.normal(0,0.8,1000)
x2=np.random.normal(-2,1,1000)
x3=np.random.normal(3,2,1000)
kwargs=dict(histtype='stepfilled',alpha=0.3,bins=40)
plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);

Two-DimensionalHistogramsandBinnings
 Wecancreatehistogramsintwodimensionsbydividingpointsamongtwodimensionalbins.
 Wewoulddefinexandyvalues.HereforexampleWe’llstartbydefiningsomedata—anxandyarray drawn from a
multivariate Gaussian distribution:
 Simplewaytoplotatwo-dimensionalhistogramistouseMatplotlib’splt.hist2d()function

Example
mean =[0,0]
cov= [[1,1],[1,2]]
x,y=np.random.multivariate_normal(mean,cov,1000).T plt.hist2d(x,
y, bins=30, cmap='Blues')
cb = plt.colorbar()
cb.set_label('countsinbin')

10
Legends
Plotlegendsgivemeaningtoavisualization, assigninglabelsto thevariousplotelements. Wepreviouslysaw
howtocreateasimplelegend;herewe’lltakealookatcustomizingtheplacementandaestheticsofthelegend in
Matplotlib.
Plotlegendsgivemeaningtoavisualization, assigninglabelstothevariousplotelements. Wepreviouslysaw
howtocreateasimplelegend;herewe’lltakealookatcustomizingtheplacementandaestheticsofthelegend in
Matplotlib
plt.plot(x,np.sin(x),'-b',label='Sine')
plt.plot(x,np.cos(x),'--r',label='Cosine')
plt.legend();

CustomizingPlotLegends
Location and turn off the frame - We can specify the location and turn off the frame. By the parameter loc and
framon.
ax.legend(loc='upperleft',frameon=False) fig

Numberof columns-Wecanusethencolcommandtospecifythenumberofcolumnsinthelegend. ax.legend(frameon=False,


loc='lower center', ncol=2)
fig

Roundedbox,shadowandframetransparency

11
Wecanusearoundedbox(fancybox)oraddashadow,changethetransparency(alphavalue)oftheframe,or change the
padding around the text.
ax.legend(fancybox=True,framealpha
framealpha=1,shadow=True,borderpad=1) fig

ChoosingElementsforthe Legend
 Thelegend includesall labeled elementsbydefault. Wecan changewhichelements andlabelsappearin
thelegend by using the objects returned by plot commands.
 The plt.plot() command is able to create multiple lines at once, and returns a list of created
c line instances.
Passinganyoftheseto plt.legend()willtellit whichtoidentify,alongwiththe labels we’d like to specify y =
np.sin(x[:, np.newaxis] + np.pi * np np.arange(0, 2, 0.5))
lines = plt.plot(x,, y)
plt.legend(lines[:2],['first','second']);
]);

# Applying label individually.


plt.plot(x,y[:,0],label='first')
plt.plot(x,y[:,1],label='second')
plt.plot(x, y[:, 2:])
plt.legend(framealpha=1,frameon==True);

Multiplelegends
It is only possible to create a single legend for the entire plot. If
youtrytocreateasecondlegendusingplt.legend()orax.legend(),
createasecondlegendusingplt.legend()orax.legend(), it
willsimply override the first one. We can work around this by
creating a
new legend artist from scratch, and then using the lower
lower-level
level ax.add_artist() method to manually add the
second artist to the plot

Example
importmatplotlib.pyplotasplt
plt.style.use('classic')
importnumpyasnp
x=np.linspace(0,10,1000)
ax.legend(loc='lower center', frameon=True, shadow=True,borderpad=1,fancybox=True)
fig

ColorBars
InMatplotlib,acolorbarisaseparateaxesthatcanprovide
InMatplotlib,acolorbarisaseparateaxesthatcanprovideakeyforthemeaningofcolorsinaplot.
akeyforthemeaningofcolorsinaplot. Forcontinuous labels
based on the color of points, lines, or regions, a labeled color bar can be a great tool.
Thesimplestcolorbarcanbecreatedwiththeplt.colorbar()
Thesimplestcolorbarcanbecreatedwiththeplt.colorbar()function.

CustomizingColorbars
Choosing color map.
Wecanspecifythecolormapusingthecmap
specifythecolormapusingthecmap argumenttotheplottingfunctionthatiscreatingthe visualization.Broadly,
we can know three different categories of colormaps:
 Sequentialcolormaps-Theseconsistofonecontinuoussequenceof
Theseconsistofonecontinuoussequenceof colors(e.g.,binaryorviridis).
colors(e.g.,binaryor
 Divergentcolormaps -Theseusuallycontain
Theseusuallycontain twodistinct colors,whichshowpositiveand negative
deviations from a mean (e.g., RdBu or PuOr).
 Qualitativecolormaps-Thesemixcolorswithnoparticularsequence(e.g.,rainbowor
Thesemixcolorswithnoparticularsequence(e.g.,rainboworjet).
jet).

12
Colorlimitsand extensions
 Matplotlib allows for a large range of colorbar customization. The colorbar itself issimply an instance
ofplt.Axes, so all of the axes and tick formatting tricks we’ve learned are applicable.
 We can narrow the color limits and indicate the out-of-bounds values with a triangular arrow at the top
andbottom by setting the extend property.
plt.subplot(1, 2, 2)
plt.imshow(I,cmap='RdBu')
plt.colorbar(extend='both')
plt.clim(-1, 1);

Discretecolorbars
Colormaps are by default continuous, but sometimes you’d like to
represent discrete values. The easiest way to do this is to use the
plt.cm.get_cmap()function,andpassthenameofasuitablecolormap
along with the number of desired bins.
plt.imshow(I,cmap=plt.cm.get_cmap('Blues',6))
plt.colorbar()
plt.clim(-1,1);

Subplots
 Matplotlibhastheconceptofsubplots:groupsofsmalleraxesthatcanexisttogetherwithinasinglefigure.
 Thesesubplotsmightbeinsets,gridsofplots,orothermorecomplicatedlayouts.
 We’llexplorefourroutinesforcreatingsubplotsinMatplotlib.
 plt.axes:SubplotsbyHand
 plt.subplot:SimpleGridsof Subplots
 plt.subplots:TheWholeGridinOneGo
 plt.GridSpec:MoreComplicatedArrangements

plt.axes:SubplotsbyHand
 Themostbasicmethodofcreatinganaxesistousetheplt.axesfunction.Aswe’veseenpreviously, bydefault this
creates a standard axes object that fills the entire figure.
 plt.axesalsotakesanoptionalargumentthatisalistoffournumbersinthefigurecoordinate system.
 Thesenumbersrepresent[bottom,left,width,height]inthefigurecoordinatesystem,whichrangesfrom0 atthe
bottom left of the figure to 1 at the top right of the figure.

13
Forexample,
we might create an inset axes at the top-right corner of
another axes by setting the x and y position to 0.65 (that is,
starting at 65% of the width and 65% of the height of the
figure) and the xand y extents to 0.2 (that is, the size of the
axesis20%ofthewidthand20%oftheheightofthefigure).

importmatplotlib.pyplotasplt
import numpy as np
ax1 = plt.axes() # standard
axesax2=plt.axes([0.65,0.65,0.2,0.
2])

Verticalsub plot
The equivalent of plt.axes() command within the
object-oriented interface is ig.add_axes(). Let’s use
this to create two vertically stacked axes.
fig=plt.figure()
ax1=fig.add_axes([0.1,0.5, 0.8,0.4],
xticklabels=[],ylim=(-1.2,1.2))
ax2=fig.add_axes([0.1,0.1, 0.8,0.4],
ylim=(-1.2,1.2))
x=np.linspace(0,10)
ax1.plot(np.sin(x))
ax2.plot(np.cos(x));
 We now have two axes (the top with no tick
labels)thatarejusttouching:thebottomofthe
upperpanel(atposition0.5)matchesthetopof the
lower panel (at position 0.1+ 0.4).
 If the axis value is changed in second plot both
the plots are separated with each other,
exampleax2 = fig.add_axes([0.1, 0.01, 0.8, 0.4

plt.subplot:SimpleGridsofSubplots
 Matplotlibhasseveralconvenienceroutinestoalign columnsorrowsofsubplots.
 Thelowestleveloftheseisplt.subplot(),whichcreatesasinglesubplotwithinagrid.

 This command takes three integer


arguments—thenumberofrows,thenumber of
columns, and the index of the plot to be
created in this scheme, which runs from the
upper left to the bottom right
foriinrange(1,7):
plt.subplot(2, 3, i)
plt.text(0.5,0.5,str((2,3,i)),
fontsize=18, ha='center')
plt.subplots:TheWholeGrid inOneGo
 Theapproachjustdescribedcanbecomequitetediouswhenyou’recreatingalargegridofsubplots,
especiallyifyou’dliketohidethex-andy-
axislabelsontheinnerplots.
 For this purpose, plt.subplots() is the easier
tooltouse(notethesattheendofsubplots).
 Rather than creating a single subplot, this
function creates a full grid of subplots in a
single line, returning them in a NumPy array.
 The arguments are the number of rows and
number of columns, along with optional
keywords sharex and sharey, which allow you
to specifythe relationships between different
axes.
 Herewe’llcreatea2×3gridofsubplots,where all
axes in the same row share their y- axis scale,
and all axes in the same column share their x-
axis scale
fig, ax = plt.subplots(2, 3, sharex='col',
sharey='row')
Note that by specifying sharex and sharey,
we’ve automatically removed inner labels on
the grid to make the plot cleaner.

plt.GridSpec:MoreComplicatedArrangements
To go beyond a regular grid to subplots that span multiple rows and columns, plt.GridSpec() is the best
tool.The plt.GridSpec() object does not create a plot by itself; it is simply a convenient interface that is
recognizedby the plt.subplot() command.

For example, a gridspec for a grid of two rows and three columns with some specified width and height
spacelooks like this:

grid=plt.GridSpec(2,3,wspace=0.4,hspace=0.3)
Fromthiswecanspecifysubplotlocationsand
extentsplt.subplot(grid[0, 0])
plt.subplot(grid[0,1:])
plt.subplot(grid[1,:2])
plt.subplot(grid[1,2]);

TextandAnnotation
 Themostbasictypesofannotationswewilluseareaxeslabelsandtitles,herewewillseesomemore visualization
and annotation information’s.

15
 Text annotation can be done manually with the plt.text/ax.text command, which will place text ata
particular x/y value.
 The ax.text method takes an x position, a y position, a string, and then optional keywords specifying the
color,size,style,alignment,andotherpropertiesofthetext.Hereweusedha='right'andha='center',where ha is
short for horizontal alignment.

TransformsandText Position
 Weanchoredourtextannotationstodatalocations.Sometimesit’spreferabletoanchorthetexttoa
positionontheaxesorfigure,independentofthedata.InMatplotlib,wedothisbymodifyingthetransform.
 Anygraphicsdisplayframeworkneedssomeschemefortranslatingbetweencoordinatesystems.
 Mathematically, such coordinate transformations are relatively straightforward, and Matplotlib has a well-
developed set of tools that it uses internally to perform them (the tools can be explored in the
matplotlib.transforms submodule).
 Therearethreepredefinedtransformsthatcanbeusefulinthissituation.

oax.transData -Transformassociatedwithdatacoordinates
o ax.transAxes-Transformassociatedwiththeaxes(inunitsof axesdimensions)
o fig.transFigure-Transformassociatedwiththefigure(inunitsoffiguredimensions)

Example
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.style.use('seaborn-whitegrid')
import numpy as np
importpandasaspd
fig,ax=plt.subplots(facecolor='lightgray')
ax.axis([0, 10, 0, 10])
#transform=ax.transDataisthedefault,butwe'llspecifyitanyway
ax.text(1, 5, ". Data: (1, 5)", transform=ax.transData)
ax.text(0.5,0.1,".Axes:(0.5,0.1)",transform=ax.transAxes)
ax.text(0.2,0.2,".Figure:(0.2,0.2)",transform=fig.transFigure);

16
Note that by default, the text is aligned above and to the left of the specified coordinates; here the “.” at the
beginning of each string will approximately mark the given coordinate location.

The transData coordinates give the usual data coordinates associated with the x- and y-axis labels. The
transAxes coordinates give the location from the bottom-left corner of the axes (here the white box) as a
fraction of the axes size.

The transfigure coordinates are similar, but specify the position from the bottom left of the figure(here the
gray box) as a fraction of the figure size.
Noticenowthatifwechangetheaxeslimits,itisonlythetransDatacoordinatesthatwillbeaffected,whilethe others
remain stationary.

Arrowsand Annotation
 Alongwithtickmarksandtext,anotherusefulannotationmarkisthesimplearrow.
 DrawingarrowsinMatplotlibisnotmuchharderbecausethereisaplt.arrow()function available.
 The arrows itcreatesareSVG (scalable vectorgraphics)objectsthatwill be subjectto thevarying
aspectratio of your plots, and the result is rarely what the user intended.
 Thearrowstyleiscontrolledthroughthearrowpropsdictionary,whichhasnumerousoptionsavailable.

Three-DimensionalPlottinginMatplotlib
Weenablethree-dimensionalplotsbyimportingthemplot3dtoolkit,included withthemain Matplotlib installation.
importnumpyasnp
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
fig = plt.figure()
ax=plt.axes(projection='3d')

Withthis3Daxesenabled,wecannowplota varietyof
three-dimensional plot types.

Three-DimensionalPointsand Lines
Themostbasicthree-dimensionalplotisalineorscatterplotcreatedfromsetsof(x,y,z)triples.
In analogywiththemorecommon two-dimensionalplotsdiscussed earlier,we cancreatetheseusing the
ax.plot3D
andax.scatter3Dfunctions

importnumpyasnp
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
ax = plt.axes(projection='3d')
#Dataforathree-dimensionalline
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline,yline,zline,'gray')
#Dataforthree-dimensionalscatteredpoints
zdata = 15 * np.random.random(100)
xdata=np.sin(zdata)+0.1*np.random.randn(100)
ydata=np.cos(zdata)+0.1*np.random.randn(100)
ax.scatter3D(xdata,ydata,zdata,c=zdata,cmap='Greens');plt.show()

Noticethatbydefault,thescatterpointshavetheirtransparencyadjustedtogiveasenseofdepthonthepage.

Three-DimensionalContourPlots
 mplot3dcontainstoolstocreatethree-dimensionalreliefplotsusingthesameinputs.
 Like two-dimensional ax.contourplots, ax.contour3Drequires all the input data to be in the form of
two-dimensional regular grids, with the Z data evaluated at each point.
 Herewe’llshowathree-dimensionalcontourdiagramofathreedimensionalsinusoidalfunction
import numpy as np
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
def f(x, y):
returnnp.sin(np.sqrt(x**2+y**2)) x =
np.linspace(-6, 6, 30)
y=np.linspace(-6,6,30)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
fig=plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X,Y,Z,50,cmap='binary')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()
Sometimesthedefaultviewingangleisnotoptimal,inwhichcasewecanusetheview_initmethodto
settheelevation and azimuthal angles.
ax.view_init(60,35)
fig

WireframesandSurface Plots
 Twoothertypesofthree-dimensionalplotsthatworkongriddeddataarewireframesandsurfaceplots.
 Thesetakeagridofvaluesandprojectitontothespecifiedthreedimensionalsurface,andcanmake theresulting
three-dimensional forms quite easy to visualize.

importnumpyasnp
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_wireframe(X,Y,Z,color='black')
ax.set_title('wireframe');
plt.show()

 A surface plot is like a wireframe plot, but each


faceof the wireframe is a filled polygon.

18
 Addingacolormaptothefilledpolygonscanaidperceptionofthetopologyofthesurfacebeingvisualized

importnumpyasnp
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
ax = plt.axes(projection='3d')
ax.plot_surface(X,Y,Z,rstride=1,cstride=1,
cmap='viridis', edgecolor='none')
ax.set_title('surface')
plt.show()

SurfaceTriangulations
 For some applications, the evenly sampled grids required by
the preceding routines are overly restrictive and
inconvenient.
 Inthesesituations,thetriangulation-basedplotscanbeveryuseful.
import numpy as np
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
theta=2*np.pi*np.random.random(1000) r =
6 * np.random.random(1000)
x=np.ravel(r*np.sin(theta))
y =np.ravel(r*np.cos(theta))
z=f(x,y)
ax=plt.axes(projection='3d')
ax.scatter(x,y,z,c=z,cmap='viridis',linewidth=0.5)

GeographicDatawithBasemap
 One common type of visualization in data science is
thatof geographic data.
 Matplotlib’s maintoolforthistypeof visualization istheBasemap toolkit,whichis oneof several
Matplotlib toolkits that live under the mpl_toolkits namespace.
 BasemapisausefultoolforPythonuserstohaveintheirvirtualtoolbelts
 Installation of Basemap. Once you have the Basemap toolkit installed and imported, geographic plots
alsorequire the PIL package in Python 2, or the pillow package
inPython3.
importnumpyasnp
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimportBasemap
plt.figure(figsize=(8, 8))
m=Basemap(projection='ortho',resolution=None,
lat_0=50, lon_0=-100)
m.bluemarble(scale=0.5);

 Matplotlib axes that understands spherical coordinates


andallows us to easily over-plot data on the map

19
 We’lluseanetopoimage(whichshowstopographicalfeaturesbothonlandandundertheocean)as themap
background
Programtodisplayparticularareaofthemapwithlatitude
andlongitude lines
importnumpyasnp
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimportBasemap
from itertools import chain
fig=plt.figure(figsize=(8,8))
m=Basemap(projection='lcc',resolution=None,
width=8E6, height=8E6,
lat_0=45, lon_0=-100,)
m.etopo(scale=0.5,alpha=0.5)
def draw_map(m, scale=0.2):
# draw a shaded-relief image
m.shadedrelief(scale=scale)
# lats and longs are returned as a dictionary
lats=m.drawparallels(np.linspace(-90,90,13))
lons=m.drawmeridians(np.linspace(-180,180,13)) #
keys contain the plt.Line2D instances
lat_lines = chain(*(tup[1][0] for tup in lats.items()))
lon_lines=chain(*(tup[1][0]fortupinlons.items()))
all_lines = chain(lat_lines, lon_lines)
#cyclethroughtheselinesandsetthedesiredstyle for
line in all_lines:
line.set(linestyle='-',alpha=0.3,color='r')

Map Projections
TheBasemappackageimplementsseveraldozensuchprojections,allreferencedbyashortformatcode.Here
we’llbrieflydemonstratesomeofthemorecommon ones.
 Cylindricalprojections
 Pseudo-cylindricalprojections
 Perspectiveprojections
 Conicprojections

Cylindricalprojection
 Thesimplestofmapprojectionsarecylindricalprojections, inwhichlinesofconstant latitudeand
longitudeare mapped to horizontal and vertical lines, respectively.
 Thistypeofmappingrepresentsequatorialregionsquitewell,butresultsinextremedistortionsnear thepoles.
 Thespacingoflatitudelinesvariesbetweendifferentcylindricalprojections,leadingtodifferent
conservation properties, and different distortion near the poles.
 OthercylindricalprojectionsaretheMercator(projection='merc')andthecylindricalequal-area
(projection='cea') projections.
 TheadditionalargumentstoBasemapforthisviewspecifythelatitude(lat)andlongitude(lon)of thelower-left
corner (llcrnr) and upper-right corner (urcrnr) for the desired map, in units of degrees. import numpy as
np
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimportBasemap
20
fig=plt.figure(figsize=(8,6),edgecolor='w')
m=Basemap(projection='cyl',resolution=None,
llcrnrlat=-90, urcrnrlat=90,
llcrnrlon=-180,urcrnrlon=180,)
draw_map(m)

Pseudo-cylindricalprojections
 Pseudo-cylindricalprojectionsrelaxtherequirementthatmeridians(linesofconstantlongitude)
remainvertical; this can give better properties near the poles of the projection.
 The Mollweideprojection(projection='moll')isonecommonexampleofthis, in whichall meridians
areelliptical arcs
 Itisconstructedsoas to
 preserve area across the map: though there
aredistortionsnearthepoles,theareaofsmall
patches reflects the true area.
 Other pseudo-cylindrical projections are the
sinusoidal (projection='sinu') and Robinson
(projection='robin') projections.
 TheextraargumentstoBasemapherereferto
the central latitude (lat_0) and longitude
(lon_0) for the desired map.
importnumpyasnp
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimportBasemap fig
= plt.figure(figsize=(8, 6), edgecolor='w')
m=Basemap(projection='moll',resolution=None,
lat_0=0, lon_0=0)
draw_map(m)

Perspectiveprojections
 Perspective projections are constructed using a particular choice of perspective point, similar to if you
photographed the Earth from a particular point in space (a point which, for some projections, technically
lieswithin the Earth!).
21
 Onecommonexampleistheorthographicprojection(projection='ortho'),whichshowsonesideoftheglobe
necommonexampleistheorthographicprojection(projection='ortho'),whichshowsonesideoftheglobe as
seen from a viewer at a very long distance.
 Thus,itcanshowonlyhalftheglobeata time.
 Other perspective-based
based projections include the
gnomonic projection (projecti (projection='gnom') and
stereographic projection (projection=
(projection='stere').
 These are often the most useful for showing small
portions of the map.
importnumpyasnp
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimport importBasemap fig
= plt.figure(figsize=(8, 8))
m=Basemap(projection='ortho',resolution
resolution=None,
lat_0=50, lon_0=0)
draw_map(m);

Conicprojections
 Aconicprojectionprojectsthemapontoasinglecone,whichisthen
Aconicprojectionprojectsthemapontoasinglecone,whichisthenunrolled.
 Thiscanlead tovery goodlocalproperties, but regionsfar from thefocuspointofthe
thefocuspointofthecone
cone may
becomevery distorted.
 OneexampleofthisistheLambertconformalconicprojection
OneexampleofthisistheLambertconformalconicprojection(projection='lcc').
 Itprojectsthemapontoaconearrangedinsuchawaythattwostandardparallels(specifiedinBasemapby
Itprojectsthemapontoaconearrangedinsuchawaythattwostandardparallels(specifiedinBasemap lat_1 and
lat_2) have well-represented
represented distances, with scale decreasing between them and increasing outsideof
them.
 Other useful conic projections are the equidistant conic (projection='
(projection='eqdc')
') and the Albers equal-area
equal
(projection='aea') projection
importnumpyasnp
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimport
importBasemap fig
= plt.figure(figsize=(8, 8))
m=Basemap(projection='lcc',resolution
resolution=None,
lon_0=0,lat_0=50,lat_1=45,lat_2=55 55,width=1.6E7,height=1.2E7)
draw_map(m)
DrawingaMapBackground
The Basemap package contains a range of useful functions for drawing borders of physical features like
continents,oceans,lakes,andrivers,aswellaspoliticalboundariessuchascountriesandUS statesandcounties.
Thefollowingaresomeoftheavailabledrawingf
ThefollowingaresomeoftheavailabledrawingfunctionsthatyoumaywishtoexploreusingIPython’s
unctionsthatyoumaywishtoexploreusingIPython’s helpfeatures:

• Physicalboundariesandbodiesofwater
drawcoastlines()-Draw
Draw continentalcoast
lines
drawlsmask()-Drawa
Drawa maskbetweenthelandandsea,forusewithprojectingimagesononeor
maskbetweenthelandandsea,forusewithprojectingimagesonone the
otherdrawmapboundary() - Draw the map boundary, including the fill color for oceans
drawrivers() - Draw rivers on the map
fillcontinents()-Fillthecontinentswithagivencolor;optionallyfilllakes
Fillthecontinentswithagivencolor;optionallyfilllakes withanother
withanothercolor
color

• Politicalboundaries
drawcountries() - Draw country
boundariesdrawstates()-DrawUSstate
boundaries drawcounties() - Draw US
countyboundaries

• Mapfeatures
drawgreatcircle() - Draw a great circle between two
pointsdrawparallels()-Drawlinesofconstantlatitude
Drawlinesofconstantlatitude
drawmeridians() - Draw lines of constant longitude
drawmapscale() - Draw a linear scale on the map

• Whole-globeimages
bluemarble() - Project NASA’s blue marble image onto the
mapshadedrelief() - Project a shaded relief image onto the
map etopo() - Draw an etopo relief image onto the map
warpimage()-Projectauser-providedimageontothemap
providedimageontothemap

PlottingDataon Maps
 TheBasemaptoolkitistheabilitytoover
TheBasemaptoolkitistheabilitytoover-plotavarietyofdataontoamapbackground.
 Therearemanymap-specificfunctionsavailableasmethodsoftheBasemap
specificfunctionsavailableasmethodsoftheBasemap
instance.Some of these map-specific methods
hods are:
contour()/contourf()-Drawcontourlinesorfilled
Drawcontourlinesorfilled
contoursimshow() - Draw an image
pcolor()/pcolormesh()-Drawapseudocolorplotforirregular/regular
Drawapseudocolorplotforirregular/regular
meshesplot() - Draw lines and/or markers
scatter() - Draw points with
markersquiver()- Drawvectors
barbs() - Draw wind barbs
drawgreatcircle()-Drawagreat
circle

Visualizationwith
VisualizationwithSeaborn
ThemainideaofSeabornisthatitprovideshigh
ThemainideaofSeabornisthatitprovideshigh-levelcommandstocreateavarietyofplottypes
levelcommandstocreateavarietyofplottypes
usefulforstatistical data exploration, and even some statistical model fitting.
Histograms,KDE,anddensities
 In statistical data visualization, all you want is to plot
histograms and joint distributions of variables. We have
seen that this is relatively straightforward in Matplotlib
 Rather than a histogram, we can get a smooth estimate of
the distribution using a kernel density estimation, which
Seaborn does with sns.kdeplot
import pandas as pd
importseabornassns
data=np.random.multivariate_normal([0,0],[[5,2],[2,
2]],size=2000)
data=pd.DataFrame(data,columns=['x','y'])
forcolin'xy':
sns.kdeplot(data[col],shade=True)

 HistogramsandKDEcanbecombinedusingdistplot
sns.distplot(data['x'])
sns.distplot(data['y']);

 If we pass the full two-dimensional dataset to kdeplot, we will get


atwo-dimensional visualization of the data.
 Wecanseethejointdistributionandthemarginaldistributionstogetherusingsns.jointplot.

Pair plots
When you generalize joint plots to datasets of larger dimensions, you end up with pair plots. This is veryuseful
forexploringcorrelationsbetweenmultidimensionaldata,whenyou’dliketoplotallpairsofvaluesagainsteach other.

We’lldemothiswiththeIrisdataset,whichlistsmeasurementsofpetalsandsepalsofthreeirisspecies:
importseabornassns
iris = sns.load_dataset("iris")
sns.pairplot(iris,hue='species',size=2.5);

24
Facetedhistograms
 Sometimes the best wayto view data is via histograms of subsets. Seaborn’s FacetGrid makes this
extremely simple.
 We’ll take a look at some data that shows the amountthat restaurant staff receive in tips based
onvariousindicator data

25
Factorplots
Factorplotscanbeusefulforthiskindofvisualizationaswell.Thisallowsyouto
viewthedistributionofaparameter within bins defined by any other parameter.

Jointdistributions
Similartothepairplotwesawearlier,wecanusesns.jointplottoshowthejoint
distributionbetweendifferentdatasets,alongwiththeassociatedmarginaldistributions.

Barplots
Timeseriescanbeplottedwith sns.factorplot.

You might also like