fundamentals of data science unit 4 and 5
fundamentals of data science unit 4 and 5
BasicsofNumpyarrays–aggregations–computationsonarrays–comparisons,masks,booleanlogic– fancy
indexing – structured arrays – Data manipulation with Pandas – data indexing and selection – operating
on data – missing data – Hierarchical indexing – combining datasets – aggregation and grouping – pivot
tables
I. BasicsofNumpyarrays:
NumPyarraymanipulationtoaccessdataandsubarrays,andtosplit,reshape,andjointhearrays. Basic
array manipulations:
Attributesof arrays
Determiningthesize,shape,memoryconsumption,anddatatypesofarrays Indexing of
arrays
Gettingandsettingthevalueofindividualarrayelements
Slicing of arrays
Gettingandsettingsmallersubarrayswithinalargerarray
Reshaping of arrays
Changingtheshapeofagivenarray
Joining and splitting of arrays
Combiningmultiplearraysintoone,andsplittingonearrayinto many
NumPyArray Attributes
1. Creatingnumpyarrays:
TheThreerandomarrays:aone-dimensional,two-dimensional,andthree-dimensionalarray.
We’lluseNumPy’srandomnumbergenerator,whichwewillseedwithasetvalueinordertoensurethatthe same random
arrays are generated each time this code is run:
2. Attributes:
Multidimensionalsubarrays
Multidimensionalslicesworkinthesameway,withmultipleslicesseparatedbycommas. For
example:
5. Accessingarray rows and columns
Onecommonlyneededroutineisaccessingsingle rowsorcolumnsofanarray.
Wecan dothis bycombining indexingand slicing,using an emptyslicemarkedby asinglecolon (:):
6. Subarraysasno-copyviews
NumPyarrayslicingdiffersfromPythonlistslicing:inlists,sliceswillbecopies. The
array slices is that they return views rather than copies of the array data Consider
our two-dimensional array from before:
7. Creatingcopiesof arrays
Despitethenicefeaturesofarrayviews,itissometimesusefultoinsteadexplicitlycopythedatawithinan array or a
subarray.
Thiscan bemosteasily donewiththecopy() method:
8. ReshapingofArrays
9. ArrayConcatenationandSplitting
of arrays
Concatenation, or joining of two arrays in NumPy, is primarily accomplished through the routines
np.concatenate,np.vstack,andnp.hstack.np.concatenatetakesatupleorlistofarraysasitsfirstargument
Splitting of arrays
Theoppositeofconcatenationissplitting,whichisimplementedbythefunctionsnp.split,np.hsplit,and np.vsplit. For
each of these, we can pass a list of indices giving the split points:
II. Aggregations:Min,Max, andEverythingin Between
3. Multidimensional aggregates
Otheraggregation functions
Example:What IstheAverageHeightofUSPresidents?
AggregatesavailableinNumPycanbeextremelyusefulforsummarizingasetofvalues. As a
simple example, let’s consider the heights of all US presidents.
Thisdataisavailableinthefilepresident_heights.csv,whichisasimplecomma-separatedlistof labels and
values:
III. ComputationonArrays:Broadcasting
AnothermeansofvectorizingoperationsistouseNumPy’sbroadcastingfunctionality.Broadcastingissimply a set of
rules for applying binary ufuncs (addition, subtraction, multiplication, etc.) on
arraysofdifferentsizes.
IntroducingBroadcasting
Recall that for arrays of the same size, binary operations are performed on an element-by-element basis:
Broadcastingallowsthesetypesofbinaryoperationstobeperformedonarraysofdifferentsizes—forexample, we can
just as easily add a scalar (think of it as a zero dimensional array) to an array:
Rules ofBroadcasting
BroadcastinginNumPy followsastrict setof rulestodeterminetheinteractionbetween thetwo arrays:
• Rule1:Ifthetwo arraysdifferintheirnumberofdimensions,theshapeoftheonewithfewerdimensionsis padded
with ones on its leading (left) side.
• Rule2:Iftheshapeofthetwoarraysdoesnotmatchinanydimension,thearraywithshapeequal to1inthat dimension is
stretched to match the other shape.
• Rule3:Ifinanydimensionthesizesdisagree andneitherisequalto1,an errorisraised.
IV. Comparisons,Masks, andBooleanLogic
ThissectioncoverstheuseofBooleanmaskstoexamineandmanipulatevalueswithinNumPyarrays.Masking comes
up when you want to extract, modify, count, or otherwise manipulate values in an array based on some
criterion: for example, you might wish to count all values greater than a certain value, or perhaps remove all
outliers that are above some threshold.
InNumPy,Booleanmaskingisoften themostefficientwayto accomplishthesetypesoftasks.
Oneapproachtothiswouldbetoanswerthesequestionsbyhand:loopthroughthedata,incrementinga counter each
time we see values in some desired range.
Forreasonsdiscussedthroughoutthischapter,suchanapproachisveryinefficient,bothfromthestandpointof time
writing code and time computing the result.
ComparisonOperators as ufuncs
NumPyalsoimplementscomparisonoperatorssuchas<(lessthan)and>(greaterthan)aselement-wise ufuncs.
TheresultofthesecomparisonoperatorsisalwaysanarraywithaBooleandatatype. All six
of the standard comparison operations are available:
Boolean operators
NumPyoverloadstheseasufuncsthat workelement
workelement-wiseon (usuallyBoolean) arrays.
BooleanArraysasMasks:
WelookedataggregatescomputeddirectlyonBoolean
WelookedataggregatescomputeddirectlyonBooleanarrays.
A more powerful pattern is to use Boolean arrays as masks, to select particular subsets of the data themselves.
Returningtoourxarrayfrombefore,supposewewantanarrayofallvaluesinthearraythatarelesst
Returningtoourxarrayfrombefore,supposewewantanarrayofallvaluesinthearraythatarelessthan,say, 5:
Indexing
Example:BinningData
V. StructuredData:NumPy’sStructuredArrays
Arrays
In[2]:name=['Alice','Bob','Cathy','Doug']]
age = [25, 45, 37, 19]
weight=[55.0,85.5, 68.0, 61.5]
VI. DataManipulationwithPandas
DataSelectioninSeries
ASeriesobjectactsinmanywayslikeaone-dimensionalNumPyarray,andinmanywayslikeastandard Python
dictionary.
Seriesas dictionary
Likeadictionary, theSeries objectprovides amapping fromacollection of keys toacollectionofvalues:
Seriesasone-dimensionalarray
A Series builds on this dictionary-like interface and provides array-style item selection via the same basic
mechanismsasNumPyarrays—thatis,slices,masking,andfancyindexing.Examplesoftheseareasfollows:
Indexers:loc,iloc, andix
Forexample,ifyourSerieshasanexplicitintegerindex,anindexingoperationsuchasdata[1]willusethe explicit
indices, while a slicing operation like data[1:3] will use the implicit Python-style index.
Athirdindexingattribute,ix,isahybridofthetwo,andforSeriesobjectsisequivalenttostandard[]-based indexing.
DataSelectioninDataFrame:
DataFrame as a dictionary
OperatingonDatainPandas:
Pandas:forunaryoperationslikenegationandtrigonometricfunctions,theufuncswillpreserveindexand column
labels in the output, and for binary operations such as addition and multiplication, Pandas will automatically
align indices when passing the objects to the ufunc.
Ufuncs:OperationsBetweenDataFrameand Series
WhenyouareperformingoperationsbetweenaDataFrameandaSeries,theindexandcolumnalignmentis similarly
maintained.
OperationsbetweenaDataFrameandaSeriesaresimilartooperationsbetweenatwo-dimensionalandone- dimensional
NumPy array.
In[15]:A=rng.randint(10,size=(3,4)) A
Out[15]:array([[3,8,2,4],
[2, 6,4, 8],
[6,1,3, 8]])
In[16]:A-A[0]
Out[16]:array([[0,0,0,0],
[-1,-2,2,4],
[3,-7, 1, 4]])
subtractionbetweenatwo-dimensionalarrayandoneofitsrowsisappliedrow-wise. In
VII. HandlingMissingData
Trade-OffsinMissingDataConventions
A number of schemes have been developed to indicate the presence of missing data in a table or DataFrame.
Twostrategies:usingamaskthatgloballyindicatesmissingvalues,orchoosingasentinelvaluethatindicatesa missing
entry.
MissingData in Pandas
ThewayinwhichPandashandlesmissingvaluesisconstrainedbyitsrelianceontheNumPypackage,which does not
have a built-in notion of NA values for nonfloating- point data types.
VIII. HierarchicalIndexing:
Hierarchical indexing (also known as multi-indexing) - to incorporate multiple index levels within a
singleindex.Inthisway,higher-dimensionaldatacanbecompactlyrepresentedwithinthefamiliarone-
dimensional Series and two-dimensional DataFrame objects.
AMultiplyIndexed Series
Rearranging Multi-Indices
Sortedandunsortedindices
ManyoftheMultiIndexslicingoperationswill failiftheindexisnot sorted.
Stackingandunstackingindices
VIII. CombiningDatasets: Concatand Append
Noticetherepeatedindicesintheresult.WhilethisisvalidwithinDataFrames,theoutcomeisoften undesirable.
pd.concat() gives us a few ways to handle it.
Catchingtherepeatsasanerror. Ignoring
the index.
AddingMultiIndexkeys
IX. AggregationandGrouping
Anessentialpieceofanalysisoflargedataisefficientsummarization:computingaggregationslikesum(), mean(),
median(), min(), and max()
SimpleAggregationinPandas
GroupBy:Split, Apply, Combine
Acanonicalexampleofthissplit-apply-combineoperation,wherethe“apply”isasummationaggregation,is illustrated
in Figure 3-1.
Figure3-1makesclearwhattheGroupByaccomplishes:
• ThesplitstepinvolvesbreakingupandgroupingaDataFramedependingonthe value
of the specified key.
• Theapplystepinvolvescomputingsomefunction,usuallyanaggregate,transformation, or
filtering, within the individual groups.
• Thecombinestep mergesthe resultsoftheseoperationsinto anoutput array.
Columnindexing.TheGroupByobjectsupportscolumnindexinginthesamewayas the
DataFrame, and returns a modified GroupBy object. For example
example:
In[14]:planets.groupby('method')
Out[14]:<pandas.core.groupby.DataFrameGroupByobjectat0x1172727b8>
Out[14]:<pandas.core.groupby.DataFrameGroupByobjectat0x1172727b8>In[15]:
planets.groupby('method')['orbital_period'
'orbital_period']
Out[15]:<pandas.core.groupby.SeriesGroupByobjectat
Out[15]:<pandas.core.groupby.SeriesGroupByobjectat0x117272da0>
Iterationovergroups.TheGroupByobjectsupportsdirectiterationoverthegroups,
supportsdirectiterationoverthegroups, returning
each group as a Series or DataFrame:
In[17]:for(method,group)inplanets.groupby
groupby('method'):
print("{0:30s}shape={1}".format(method
method,group.shape))
Dispatch methods. Through some Python class magic, any method no not explicitly
implementedbytheGroupByobjectwillbepassedthroughandcalledonthegroups,
whether they are DataFrame or Series objects. For example, you can use the
describe() method of DataFrames to perform a set of aggregations that describe each
group in the data:
In[18]:planets.groupby('method')['year'].describe
describe().unstack()
X. Pivot Tables
SimpleLinePlots
The simplest of all plots is the visualization ofa single function y= fx . Here we will take a first look at
creating asimple plot of this type.
The figure (an instance of the class plt.Figure) can be thought of as a single container that contains
all the objectsrepresenting axes, graphics, text, and labels.
The axes (an instance of the class plt.Axes) is what we see above: a bounding box with ticks and
labels, which willeventually contain the plot elements that make up our visualization.
LineColorsandStyles
Thefirstadjustmentyoumightwishtomaketoaplotistocontrolthelinecolorsandstyles.
To adjust the color, you can use the color keyword, which accepts a string argument
representing virtuallyanyimaginablecolor.Thecolorcanbespecifiedinavarietyofways
Ifnocolorisspecified,Matplotlibwillautomaticallycyclethroughasetofdefaultcolors for
multiple lines
Differentformsofcolorrepresentation.
specify color by name -color='blue'
short color code (rgbcmyk) - color='g'
Grayscale between 0 and 1 -color='0.75'
Hex code (RRGGBB from 00 to FF) -
color='#FFDD44' RGB tuple, values 0 and 1
-
color=(1.0,0.2,0.3)allHTMLcolornames
supported -
color='chartreuse'
Wecanadjustthelinestyleusingthelinestyle keyword.
Differentlinestyles
linestyl
e='soli
d'linest
yl
e='das
hed'lin
estyl
e='das
hdot'li
nestyl
e='dott
ed'
Shortassignment
linestyle='-
'# solid
linestyle='-
-'#dashed
linestyle='-
.'#
dashdot
linestyle=':
'#dotted
linestyleandcolorcodescanbecombinedintoasinglenonkeywordargumenttotheplt.plot()
function
plt.plot(x, x + 0, '-g') #
solid green plt.plot(x, x +
1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') #
dashdotblackplt.plot(x,x
+3, ':r');# dotted red
Axes
Limits
1
Themostbasicwaytoadjustaxislimitsistousetheplt.xlim()andplt.ylim()methods
Example
plt.xlim(10,0)
plt.ylim(1.2,-1.2);
The plt.axis() method allows you to set the x and y limits with a single call, by passing a list that
specifies[xmin, xmax, ymin, ymax]
plt.axis([-1,11,-1.5,1.5]);
Aspectratioequalisusedtorepresent oneunitinxisequaltooneunitiny.plt.axis('equal')
Labeling Plots
Thelabelingofplotsincludestitles,axislabels,andsimple
legends.Title - plt.title()
Label-plt.xlabel()
plt.ylabel()
Legend-plt.legend()
Exampleprograms
Line color
importmatplotlib.pyplotas
pltimport numpy as np
fig=
plt.figure()ax=
plt.axes()
x = np.linspace(0, 10,
1000)ax.plot(x,np.sin(x));
plt.plot(x,np.sin(x-0),color='blue')#specifycolorbyname
plt.plot(x, np.sin(x - 1), color='g') # short color code
(rgbcmyk) plt.plot(x, np.sin(x - 2), color='0.75') # Grayscale
between 0 and 1
plt.plot(x,np.sin(x-3),color='#FFDD44')#Hexcode(RRGGBBfrom00to
FF)plt.plot(x,np.sin(x-4),color=(1.0,0.2,0.3))#RGBtuple,values0and1
plt.plot(x, np.sin(x - 5), color='chartreuse');# all HTML color names
supported
Linestyle
importmatplotlib.pyplotasplt
importnumpyasnpfig=
plt.figure()
ax=plt.axes()
x = np.linspace(0, 10, 1000)
plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1,
linestyle='dashed') plt.plot(x, x +
2,linestyle='dashdot')plt.plot(x,x
+3, linestyle='dotted');
# For short, you can use the following
codes:plt.plot(x,x+4,linestyle='-')#solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.') # dashdot
plt.plot(x, x + 7, linestyle=':'); # dotted
Axislimitwithlabelandlegend
importmatplotlib.pyplotas pltimport
numpy as np
fig=
plt.figure()ax=
plt.axes()
x=np.linspace(0,10,1000)
plt.xlim(-1,11)
plt.ylim(-1.5,1.5);
plt.plot(x,np.sin(x),'-g',label='sin(x)')
plt.plot(x, np.cos(x), ':b',
label='cos(x)')plt.title("A Sine
Curve")
plt.xlabel("x")
plt.ylabel("sin(x)");
plt.legend();
SimpleScatter Plots
Another commonly used plot type is the simple scatter plot, a close cousin of the line plot. Instead of points being
joined by line segments, here the points are represented individually with a dot, circle, or other shape.
Syntax
plt.plot(x,y,'typeofsymbol',color);
Example
plt.plot(x,y,'o',color='black');
Thethirdargumentinthefunctioncallisacharacterthatrepresentsthetypeofsymbolusedfortheplotting.
Justasyoucanspecifyoptionssuchas'-'and'--'tocontrolthelinestyle,themarkerstylehasitsownsetof short string
codes.
Example
Varioussymbolsusedtospecify['o','.',',','x','+','v','^','<','>','s','d']
Shorthandassignmentofline,symbolandcoloralsoallowed.
plt.plot(x, y, '-ok');
Additionalargumentsinplt.plot()
Wecanspecify someotherparametersrelatedwithscatterplotwhichmakesitmoreattractive.They arecolor,
marker size, linewidth, marker face color, marker edge color, marker edge width, etc
Example
plt.plot(x,y,'-p',color='gray',
markersize=15, linewidth=4,
markerfacecolor='white',
markeredgecolor='gray',
markeredgewidth=2)
plt.ylim(-1.2, 1.2);
ScatterPlotswith plt.scatter
Asecond,morepowerfulmethodofcreatingscatterplotsistheplt.scatterfunction,whichcanbeusedvery
similarly to the plt.plot function
plt.scatter(x,y,marker='o');
The primary difference of plt.scatter from plt.plot is that it can be used to create scatter plots where the
propertiesof eachindividualpoint(size,facecolor,edgecolor,etc.)canbeindividuallycontrolledor mapped to
data.
Notice that the color argument is automatically mapped to a color scale (shown here by the colorbar()
command), and the size argument is given in pixels.
Cmap–colormapusedinscatterplot givesdifferentcolorcombinations.
PerceptuallyUniformSequential
['viridis','plasma','inferno','magma']
Sequential
['Greys','Purples','Blues','Greens','Oranges','Reds','YlOrBr','YlOrRd',
'OrRd','PuRd','RdPu','BuPu','GnBu','PuBu','YlGnBu','PuBuGn','BuGn', 'YlGn']
Sequential(2)
['binary','gist_yarg','gist_gray','gray','bone','pink','spring','summer',
'autumn','winter','cool','Wistia','hot','afmhot','gist_heat','copper']
4
Diverging
['PiYG','PRGn','BrBG','PuOr','RdGy','RdBu','RdYlBu','RdYlGn','Spectral',
'coolwarm', 'bwr', 'seismic']
Qualitative
['Pastel1','Pastel2','Paired','Accent','Dark2','Set1','Set2','Set3', 'tab10',
'tab20', 'tab20b', 'tab20c']
Miscellaneous
['flag','prism','ocean','gist_earth','terrain','gist_stern','gnuplot',
'gnuplot2','CMRmap','cubehelix','brg','hsv','gist_rainbow','rainbow', 'jet',
'nipy_spectral', 'gist_ncar']
Exampleprograms.
Simplescatterplot.
import numpy as
npimportmatplotlib.pyplot
as
pltx=np.linspace(0,10,30) y
= np.sin(x)
plt.plot(x,y,'o',color='black');
Scatterplotwithedgecolor,facecolor,size,
andwidthofmarker.(Scatterplotwithline)
import numpy as
npimportmatplotlib.pyplot
as
pltx=np.linspace(0,10,20) y
= np.sin(x)
plt.plot(x, y, '-o',
color='gray',
markersize=15,
linewidth=4,
markerfacecolor='yellow',
markeredgecolor='red',
markeredgewidth=4)
plt.ylim(-1.5, 1.5);
Scatterplotwithrandomcolors, sizeandtransparency
importnumpyasnp
importmatplotlib.pyplotasplt
rng=
np.random.RandomState(0)x=
rng.randn(100)
y=rng.randn(100)
colors=
rng.rand(100)
sizes=1000* rng.rand(100)
plt.scatter(x,y,c=colors,s=sizes,alpha=0.3,
map='viridis')plt.colorbar()
VisualizingErrors
For any scientific measurement, accurate accounting for errors is nearly as important, if notmore important,
thanaccuratereportingofthenumberitself.Forexample,imaginethatIamusingsomeastrophysical observations
to estimate the Hubble Constant, the local measurement of the expansion rate of the Universe.
Invisualizationof data andresults,showing theseerrorseffectively canmake a plot convey muchmore
completeinformation.
Typesoferrors
BasicErrorbars
ContinuousErrors
BasicErrorbars
AbasicerrorbarcanbecreatedwithasingleMatplotlib functioncall.
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
x=np.linspace(0,10,50)
dy =0.8
y=np.sin(x)+dy*np.random.randn(50)
plt.errorbar(x, y, yerr=dy, fmt='.k');
Herethefmtisaformatcodecontrollingtheappearanceoflinesandpoints,andhasthesamesyntaxas
theshorthand used in plt.plot()
Inadditiontothesebasicoptions,theerrorbar functionhasmanyoptionstofinetunetheoutputs.
Usingthese additional options you can easily customize the aesthetics of your errorbar plot.
plt.errorbar(x,y,yerr=dy,fmt='o',color='black',ecolor='lightgray',elinewidth=3,capsize=0);
6
ContinuousErrors
In some situations it is desirable to show errorbars on continuous quantities. Though Matplotlib does not
haveabuilt-inconvenienceroutineforthistypeofapplication,it’srelativelyeasytocombineprimitiveslike plt.plot
and plt.fill_between for a useful result.
Here we’ll perform a simple Gaussianprocess regression (GPR), using theScikit-Learn API. This is a method
of fitting a very flexible nonparametric function to data with a continuous measure of the uncertainty.
DensityandContourPlots
Todisplaythree-dimensionaldataintwodimensionsusingcontoursorcolor-coded regions.There
are three Matplotlib functions that can be helpful for this task:
plt.contourforcontourplots,
plt.contourfforfilledcontourplots,and
plt.imshowforshowingimages.
VisualizingaThree-DimensionalFunction
Acontour plotcanbecreatedwiththeplt.contour function.
I
ttakesthreearguments:
agridof xvalues,
agridofyvalues, and
agridof zvalues.
The x and y values represent positions on the plot, and the z
values will be represented by the contour levels.
The way to prepare such data is to use the np.meshgrid
function, which builds two-dimensional grids from one-
dimensional arrays:
Example
deff(x,y):
returnnp.sin(x)**10+np.cos(10+y*x)*np.cos(x) x =
np.linspace(0, 5, 50)
y=np.linspace(0,5,40)
X,Y=np.meshgrid(x,y) Z
= f(X, Y)
plt.contour(X,Y,Z,colors='black');
Notice that by default when a single color is used, negative values arerepresentedbydashedlines,
andpositive values by solid lines.
Alternatively, youcancolor-codethelinesbyspecifyingacolormapwiththecmap argument.
We’llalsospecifythatwewantmorelinestobedrawn—20equallyspacedintervalswithinthedata range.
7
plt.contour(X,Y,Z,20,cmap='RdGy');
One potential issue with this plot is that it is abit “splotchy.” That is, the color steps arediscrete rather
thancontinuous, which is not always what is desired.
You could remedy this by setting the number of contours to a very high number, but this results in a
ratherinefficient plot: Matplotlib must render a new polygon for each step in the level.
A better way to handle this is to use the plt.imshow() function, which interprets a two-dimensional grid
ofdata as an image.
Thereareafewpotentialgotchaswithimshow().
plt.imshow()doesn’tacceptanx andygrid,soyoumustmanuallyspecifythe extent[xmin,xmax,ymin, ymax]
of the image on the plot.
plt.imshow() bydefault follows the standard image arraydefinition where the origin is in the upper left,
notin the lower left as in most contour plots. This must be changed when showing gridded data.
plt.imshow() will automatically adjust the axis aspect ratio to match the input data; you can change this
bysetting, for example, plt.axis(aspect='image') to make x and y units match.
ExampleProgram
importnumpyasnp
importmatplotlib.pyplotasplt def
f(x, y):
returnnp.sin(x)**10+np.cos(10+y*x)* np.cos(x)
x=np.linspace(0,5,50)
y=np.linspace(0,5,40)
X,Y=np.meshgrid(x,y) Z
= f(X, Y)
plt.imshow(Z,extent=[0,10,0,10],
origin='lower', cmap='RdGy')
plt.colorbar()
Histograms
Histogramisthesimpleplottorepresentthelargedataset.Ahistogramisagraphshowing
frequencydistributions.Itisagraphshowingthenumberofobservationswithineach giveninterval.
Parameters
plt.hist( )is used to plot histogram. The hist() function will use an array of numbers to create a
histogram,the array is sent into the function as an argument.
8
bins - A histogram displays numerical data by grouping data into "bins" of equal width. Each bin is plotted
as a barwhose heightcorrespondsto how manydatapoints are inthatbin. Bins are also sometimes called
"intervals", "classes", or "buckets".
normed-Histogramnormalizationisatechniquetodistributethefrequenciesofthehistogramoverawider range
than the current range.
x - (n,) array or sequence of (n,) arrays Input values, this takes either a single array or a sequenceof arrays
which are not required to be of the same length.
histtype - {'bar', 'barstacked', 'step', 'stepfilled'},
optionalThe type of histogram to draw.
'bar'isatraditionalbar-typehistogram.Ifmultipledataaregiventhebarsarearrangedsidebyside.
'barstacked'isabar-typehistogramwheremultipledataarestackedontop ofeachother.
'step'generatesalineplot thatisbydefault unfilled.
'stepfilled'generatesalineplotthatisbydefault
filled.Default is 'bar'
align -{'left','mid','right'},optional
Controls how the histogram is
plotted.
'left':barsarecenteredontheleftbin edges.
'mid':barsarecenteredbetweenthebinedges.
'right':barsarecenteredontherightbin
edges.Default is 'mid'
orientation-{'horizontal','vertical'},optional
If'horizontal',barhwillbeusedforbar-typehistogramsandthebottomkwargwillbetheleftedges.
color-colororarray_likeofcolorsorNone,optional
Colorspecorsequenceofcolorspecs, oneperdataset. Default(None)usesthestandardlinecolor sequence.
Defaultis None
label-strorNone,optional.Defaultis None
Otherparameter
**kwargs-Patchproperties,it allowsustopassa
variable number of keyword arguments to a
pythonfunction.**denotesthistypeoffunction.
Example
importnumpyasnp
importmatplotlib.pyplotasplt
plt.style.use('seaborn-white')
data = np.random.randn(1000)
plt.hist(data);
The hist() function has many options to tune both the calculation and the display; here’s an example of a
morecustomized histogram.
plt.hist(data,bins=30,alpha=0.5,histtype='stepfilled',color='steelblue',edgecolor='none');
The plt.hist docstring has more information onother customization options available. I find this combination
of histtype='stepfilled' along with some transparency alpha to be very useful when comparing histograms of
several distributions
x1=np.random.normal(0,0.8,1000)
x2=np.random.normal(-2,1,1000)
x3=np.random.normal(3,2,1000)
kwargs=dict(histtype='stepfilled',alpha=0.3,bins=40)
plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);
Two-DimensionalHistogramsandBinnings
Wecancreatehistogramsintwodimensionsbydividingpointsamongtwodimensionalbins.
Wewoulddefinexandyvalues.HereforexampleWe’llstartbydefiningsomedata—anxandyarray drawn from a
multivariate Gaussian distribution:
Simplewaytoplotatwo-dimensionalhistogramistouseMatplotlib’splt.hist2d()function
Example
mean =[0,0]
cov= [[1,1],[1,2]]
x,y=np.random.multivariate_normal(mean,cov,1000).T plt.hist2d(x,
y, bins=30, cmap='Blues')
cb = plt.colorbar()
cb.set_label('countsinbin')
10
Legends
Plotlegendsgivemeaningtoavisualization, assigninglabelsto thevariousplotelements. Wepreviouslysaw
howtocreateasimplelegend;herewe’lltakealookatcustomizingtheplacementandaestheticsofthelegend in
Matplotlib.
Plotlegendsgivemeaningtoavisualization, assigninglabelstothevariousplotelements. Wepreviouslysaw
howtocreateasimplelegend;herewe’lltakealookatcustomizingtheplacementandaestheticsofthelegend in
Matplotlib
plt.plot(x,np.sin(x),'-b',label='Sine')
plt.plot(x,np.cos(x),'--r',label='Cosine')
plt.legend();
CustomizingPlotLegends
Location and turn off the frame - We can specify the location and turn off the frame. By the parameter loc and
framon.
ax.legend(loc='upperleft',frameon=False) fig
Roundedbox,shadowandframetransparency
11
Wecanusearoundedbox(fancybox)oraddashadow,changethetransparency(alphavalue)oftheframe,or change the
padding around the text.
ax.legend(fancybox=True,framealpha
framealpha=1,shadow=True,borderpad=1) fig
ChoosingElementsforthe Legend
Thelegend includesall labeled elementsbydefault. Wecan changewhichelements andlabelsappearin
thelegend by using the objects returned by plot commands.
The plt.plot() command is able to create multiple lines at once, and returns a list of created
c line instances.
Passinganyoftheseto plt.legend()willtellit whichtoidentify,alongwiththe labels we’d like to specify y =
np.sin(x[:, np.newaxis] + np.pi * np np.arange(0, 2, 0.5))
lines = plt.plot(x,, y)
plt.legend(lines[:2],['first','second']);
]);
Multiplelegends
It is only possible to create a single legend for the entire plot. If
youtrytocreateasecondlegendusingplt.legend()orax.legend(),
createasecondlegendusingplt.legend()orax.legend(), it
willsimply override the first one. We can work around this by
creating a
new legend artist from scratch, and then using the lower
lower-level
level ax.add_artist() method to manually add the
second artist to the plot
Example
importmatplotlib.pyplotasplt
plt.style.use('classic')
importnumpyasnp
x=np.linspace(0,10,1000)
ax.legend(loc='lower center', frameon=True, shadow=True,borderpad=1,fancybox=True)
fig
ColorBars
InMatplotlib,acolorbarisaseparateaxesthatcanprovide
InMatplotlib,acolorbarisaseparateaxesthatcanprovideakeyforthemeaningofcolorsinaplot.
akeyforthemeaningofcolorsinaplot. Forcontinuous labels
based on the color of points, lines, or regions, a labeled color bar can be a great tool.
Thesimplestcolorbarcanbecreatedwiththeplt.colorbar()
Thesimplestcolorbarcanbecreatedwiththeplt.colorbar()function.
CustomizingColorbars
Choosing color map.
Wecanspecifythecolormapusingthecmap
specifythecolormapusingthecmap argumenttotheplottingfunctionthatiscreatingthe visualization.Broadly,
we can know three different categories of colormaps:
Sequentialcolormaps-Theseconsistofonecontinuoussequenceof
Theseconsistofonecontinuoussequenceof colors(e.g.,binaryorviridis).
colors(e.g.,binaryor
Divergentcolormaps -Theseusuallycontain
Theseusuallycontain twodistinct colors,whichshowpositiveand negative
deviations from a mean (e.g., RdBu or PuOr).
Qualitativecolormaps-Thesemixcolorswithnoparticularsequence(e.g.,rainbowor
Thesemixcolorswithnoparticularsequence(e.g.,rainboworjet).
jet).
12
Colorlimitsand extensions
Matplotlib allows for a large range of colorbar customization. The colorbar itself issimply an instance
ofplt.Axes, so all of the axes and tick formatting tricks we’ve learned are applicable.
We can narrow the color limits and indicate the out-of-bounds values with a triangular arrow at the top
andbottom by setting the extend property.
plt.subplot(1, 2, 2)
plt.imshow(I,cmap='RdBu')
plt.colorbar(extend='both')
plt.clim(-1, 1);
Discretecolorbars
Colormaps are by default continuous, but sometimes you’d like to
represent discrete values. The easiest way to do this is to use the
plt.cm.get_cmap()function,andpassthenameofasuitablecolormap
along with the number of desired bins.
plt.imshow(I,cmap=plt.cm.get_cmap('Blues',6))
plt.colorbar()
plt.clim(-1,1);
Subplots
Matplotlibhastheconceptofsubplots:groupsofsmalleraxesthatcanexisttogetherwithinasinglefigure.
Thesesubplotsmightbeinsets,gridsofplots,orothermorecomplicatedlayouts.
We’llexplorefourroutinesforcreatingsubplotsinMatplotlib.
plt.axes:SubplotsbyHand
plt.subplot:SimpleGridsof Subplots
plt.subplots:TheWholeGridinOneGo
plt.GridSpec:MoreComplicatedArrangements
plt.axes:SubplotsbyHand
Themostbasicmethodofcreatinganaxesistousetheplt.axesfunction.Aswe’veseenpreviously, bydefault this
creates a standard axes object that fills the entire figure.
plt.axesalsotakesanoptionalargumentthatisalistoffournumbersinthefigurecoordinate system.
Thesenumbersrepresent[bottom,left,width,height]inthefigurecoordinatesystem,whichrangesfrom0 atthe
bottom left of the figure to 1 at the top right of the figure.
13
Forexample,
we might create an inset axes at the top-right corner of
another axes by setting the x and y position to 0.65 (that is,
starting at 65% of the width and 65% of the height of the
figure) and the xand y extents to 0.2 (that is, the size of the
axesis20%ofthewidthand20%oftheheightofthefigure).
importmatplotlib.pyplotasplt
import numpy as np
ax1 = plt.axes() # standard
axesax2=plt.axes([0.65,0.65,0.2,0.
2])
Verticalsub plot
The equivalent of plt.axes() command within the
object-oriented interface is ig.add_axes(). Let’s use
this to create two vertically stacked axes.
fig=plt.figure()
ax1=fig.add_axes([0.1,0.5, 0.8,0.4],
xticklabels=[],ylim=(-1.2,1.2))
ax2=fig.add_axes([0.1,0.1, 0.8,0.4],
ylim=(-1.2,1.2))
x=np.linspace(0,10)
ax1.plot(np.sin(x))
ax2.plot(np.cos(x));
We now have two axes (the top with no tick
labels)thatarejusttouching:thebottomofthe
upperpanel(atposition0.5)matchesthetopof the
lower panel (at position 0.1+ 0.4).
If the axis value is changed in second plot both
the plots are separated with each other,
exampleax2 = fig.add_axes([0.1, 0.01, 0.8, 0.4
plt.subplot:SimpleGridsofSubplots
Matplotlibhasseveralconvenienceroutinestoalign columnsorrowsofsubplots.
Thelowestleveloftheseisplt.subplot(),whichcreatesasinglesubplotwithinagrid.
plt.GridSpec:MoreComplicatedArrangements
To go beyond a regular grid to subplots that span multiple rows and columns, plt.GridSpec() is the best
tool.The plt.GridSpec() object does not create a plot by itself; it is simply a convenient interface that is
recognizedby the plt.subplot() command.
For example, a gridspec for a grid of two rows and three columns with some specified width and height
spacelooks like this:
grid=plt.GridSpec(2,3,wspace=0.4,hspace=0.3)
Fromthiswecanspecifysubplotlocationsand
extentsplt.subplot(grid[0, 0])
plt.subplot(grid[0,1:])
plt.subplot(grid[1,:2])
plt.subplot(grid[1,2]);
TextandAnnotation
Themostbasictypesofannotationswewilluseareaxeslabelsandtitles,herewewillseesomemore visualization
and annotation information’s.
15
Text annotation can be done manually with the plt.text/ax.text command, which will place text ata
particular x/y value.
The ax.text method takes an x position, a y position, a string, and then optional keywords specifying the
color,size,style,alignment,andotherpropertiesofthetext.Hereweusedha='right'andha='center',where ha is
short for horizontal alignment.
TransformsandText Position
Weanchoredourtextannotationstodatalocations.Sometimesit’spreferabletoanchorthetexttoa
positionontheaxesorfigure,independentofthedata.InMatplotlib,wedothisbymodifyingthetransform.
Anygraphicsdisplayframeworkneedssomeschemefortranslatingbetweencoordinatesystems.
Mathematically, such coordinate transformations are relatively straightforward, and Matplotlib has a well-
developed set of tools that it uses internally to perform them (the tools can be explored in the
matplotlib.transforms submodule).
Therearethreepredefinedtransformsthatcanbeusefulinthissituation.
oax.transData -Transformassociatedwithdatacoordinates
o ax.transAxes-Transformassociatedwiththeaxes(inunitsof axesdimensions)
o fig.transFigure-Transformassociatedwiththefigure(inunitsoffiguredimensions)
Example
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.style.use('seaborn-whitegrid')
import numpy as np
importpandasaspd
fig,ax=plt.subplots(facecolor='lightgray')
ax.axis([0, 10, 0, 10])
#transform=ax.transDataisthedefault,butwe'llspecifyitanyway
ax.text(1, 5, ". Data: (1, 5)", transform=ax.transData)
ax.text(0.5,0.1,".Axes:(0.5,0.1)",transform=ax.transAxes)
ax.text(0.2,0.2,".Figure:(0.2,0.2)",transform=fig.transFigure);
16
Note that by default, the text is aligned above and to the left of the specified coordinates; here the “.” at the
beginning of each string will approximately mark the given coordinate location.
The transData coordinates give the usual data coordinates associated with the x- and y-axis labels. The
transAxes coordinates give the location from the bottom-left corner of the axes (here the white box) as a
fraction of the axes size.
The transfigure coordinates are similar, but specify the position from the bottom left of the figure(here the
gray box) as a fraction of the figure size.
Noticenowthatifwechangetheaxeslimits,itisonlythetransDatacoordinatesthatwillbeaffected,whilethe others
remain stationary.
Arrowsand Annotation
Alongwithtickmarksandtext,anotherusefulannotationmarkisthesimplearrow.
DrawingarrowsinMatplotlibisnotmuchharderbecausethereisaplt.arrow()function available.
The arrows itcreatesareSVG (scalable vectorgraphics)objectsthatwill be subjectto thevarying
aspectratio of your plots, and the result is rarely what the user intended.
Thearrowstyleiscontrolledthroughthearrowpropsdictionary,whichhasnumerousoptionsavailable.
Three-DimensionalPlottinginMatplotlib
Weenablethree-dimensionalplotsbyimportingthemplot3dtoolkit,included withthemain Matplotlib installation.
importnumpyasnp
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
fig = plt.figure()
ax=plt.axes(projection='3d')
Withthis3Daxesenabled,wecannowplota varietyof
three-dimensional plot types.
Three-DimensionalPointsand Lines
Themostbasicthree-dimensionalplotisalineorscatterplotcreatedfromsetsof(x,y,z)triples.
In analogywiththemorecommon two-dimensionalplotsdiscussed earlier,we cancreatetheseusing the
ax.plot3D
andax.scatter3Dfunctions
importnumpyasnp
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
ax = plt.axes(projection='3d')
#Dataforathree-dimensionalline
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline,yline,zline,'gray')
#Dataforthree-dimensionalscatteredpoints
zdata = 15 * np.random.random(100)
xdata=np.sin(zdata)+0.1*np.random.randn(100)
ydata=np.cos(zdata)+0.1*np.random.randn(100)
ax.scatter3D(xdata,ydata,zdata,c=zdata,cmap='Greens');plt.show()
Noticethatbydefault,thescatterpointshavetheirtransparencyadjustedtogiveasenseofdepthonthepage.
Three-DimensionalContourPlots
mplot3dcontainstoolstocreatethree-dimensionalreliefplotsusingthesameinputs.
Like two-dimensional ax.contourplots, ax.contour3Drequires all the input data to be in the form of
two-dimensional regular grids, with the Z data evaluated at each point.
Herewe’llshowathree-dimensionalcontourdiagramofathreedimensionalsinusoidalfunction
import numpy as np
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
def f(x, y):
returnnp.sin(np.sqrt(x**2+y**2)) x =
np.linspace(-6, 6, 30)
y=np.linspace(-6,6,30)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
fig=plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X,Y,Z,50,cmap='binary')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()
Sometimesthedefaultviewingangleisnotoptimal,inwhichcasewecanusetheview_initmethodto
settheelevation and azimuthal angles.
ax.view_init(60,35)
fig
WireframesandSurface Plots
Twoothertypesofthree-dimensionalplotsthatworkongriddeddataarewireframesandsurfaceplots.
Thesetakeagridofvaluesandprojectitontothespecifiedthreedimensionalsurface,andcanmake theresulting
three-dimensional forms quite easy to visualize.
importnumpyasnp
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_wireframe(X,Y,Z,color='black')
ax.set_title('wireframe');
plt.show()
18
Addingacolormaptothefilledpolygonscanaidperceptionofthetopologyofthesurfacebeingvisualized
importnumpyasnp
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
ax = plt.axes(projection='3d')
ax.plot_surface(X,Y,Z,rstride=1,cstride=1,
cmap='viridis', edgecolor='none')
ax.set_title('surface')
plt.show()
SurfaceTriangulations
For some applications, the evenly sampled grids required by
the preceding routines are overly restrictive and
inconvenient.
Inthesesituations,thetriangulation-basedplotscanbeveryuseful.
import numpy as np
import matplotlib.pyplot as plt
frommpl_toolkitsimportmplot3d
theta=2*np.pi*np.random.random(1000) r =
6 * np.random.random(1000)
x=np.ravel(r*np.sin(theta))
y =np.ravel(r*np.cos(theta))
z=f(x,y)
ax=plt.axes(projection='3d')
ax.scatter(x,y,z,c=z,cmap='viridis',linewidth=0.5)
GeographicDatawithBasemap
One common type of visualization in data science is
thatof geographic data.
Matplotlib’s maintoolforthistypeof visualization istheBasemap toolkit,whichis oneof several
Matplotlib toolkits that live under the mpl_toolkits namespace.
BasemapisausefultoolforPythonuserstohaveintheirvirtualtoolbelts
Installation of Basemap. Once you have the Basemap toolkit installed and imported, geographic plots
alsorequire the PIL package in Python 2, or the pillow package
inPython3.
importnumpyasnp
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimportBasemap
plt.figure(figsize=(8, 8))
m=Basemap(projection='ortho',resolution=None,
lat_0=50, lon_0=-100)
m.bluemarble(scale=0.5);
19
We’lluseanetopoimage(whichshowstopographicalfeaturesbothonlandandundertheocean)as themap
background
Programtodisplayparticularareaofthemapwithlatitude
andlongitude lines
importnumpyasnp
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimportBasemap
from itertools import chain
fig=plt.figure(figsize=(8,8))
m=Basemap(projection='lcc',resolution=None,
width=8E6, height=8E6,
lat_0=45, lon_0=-100,)
m.etopo(scale=0.5,alpha=0.5)
def draw_map(m, scale=0.2):
# draw a shaded-relief image
m.shadedrelief(scale=scale)
# lats and longs are returned as a dictionary
lats=m.drawparallels(np.linspace(-90,90,13))
lons=m.drawmeridians(np.linspace(-180,180,13)) #
keys contain the plt.Line2D instances
lat_lines = chain(*(tup[1][0] for tup in lats.items()))
lon_lines=chain(*(tup[1][0]fortupinlons.items()))
all_lines = chain(lat_lines, lon_lines)
#cyclethroughtheselinesandsetthedesiredstyle for
line in all_lines:
line.set(linestyle='-',alpha=0.3,color='r')
Map Projections
TheBasemappackageimplementsseveraldozensuchprojections,allreferencedbyashortformatcode.Here
we’llbrieflydemonstratesomeofthemorecommon ones.
Cylindricalprojections
Pseudo-cylindricalprojections
Perspectiveprojections
Conicprojections
Cylindricalprojection
Thesimplestofmapprojectionsarecylindricalprojections, inwhichlinesofconstant latitudeand
longitudeare mapped to horizontal and vertical lines, respectively.
Thistypeofmappingrepresentsequatorialregionsquitewell,butresultsinextremedistortionsnear thepoles.
Thespacingoflatitudelinesvariesbetweendifferentcylindricalprojections,leadingtodifferent
conservation properties, and different distortion near the poles.
OthercylindricalprojectionsaretheMercator(projection='merc')andthecylindricalequal-area
(projection='cea') projections.
TheadditionalargumentstoBasemapforthisviewspecifythelatitude(lat)andlongitude(lon)of thelower-left
corner (llcrnr) and upper-right corner (urcrnr) for the desired map, in units of degrees. import numpy as
np
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimportBasemap
20
fig=plt.figure(figsize=(8,6),edgecolor='w')
m=Basemap(projection='cyl',resolution=None,
llcrnrlat=-90, urcrnrlat=90,
llcrnrlon=-180,urcrnrlon=180,)
draw_map(m)
Pseudo-cylindricalprojections
Pseudo-cylindricalprojectionsrelaxtherequirementthatmeridians(linesofconstantlongitude)
remainvertical; this can give better properties near the poles of the projection.
The Mollweideprojection(projection='moll')isonecommonexampleofthis, in whichall meridians
areelliptical arcs
Itisconstructedsoas to
preserve area across the map: though there
aredistortionsnearthepoles,theareaofsmall
patches reflects the true area.
Other pseudo-cylindrical projections are the
sinusoidal (projection='sinu') and Robinson
(projection='robin') projections.
TheextraargumentstoBasemapherereferto
the central latitude (lat_0) and longitude
(lon_0) for the desired map.
importnumpyasnp
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimportBasemap fig
= plt.figure(figsize=(8, 6), edgecolor='w')
m=Basemap(projection='moll',resolution=None,
lat_0=0, lon_0=0)
draw_map(m)
Perspectiveprojections
Perspective projections are constructed using a particular choice of perspective point, similar to if you
photographed the Earth from a particular point in space (a point which, for some projections, technically
lieswithin the Earth!).
21
Onecommonexampleistheorthographicprojection(projection='ortho'),whichshowsonesideoftheglobe
necommonexampleistheorthographicprojection(projection='ortho'),whichshowsonesideoftheglobe as
seen from a viewer at a very long distance.
Thus,itcanshowonlyhalftheglobeata time.
Other perspective-based
based projections include the
gnomonic projection (projecti (projection='gnom') and
stereographic projection (projection=
(projection='stere').
These are often the most useful for showing small
portions of the map.
importnumpyasnp
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimport importBasemap fig
= plt.figure(figsize=(8, 8))
m=Basemap(projection='ortho',resolution
resolution=None,
lat_0=50, lon_0=0)
draw_map(m);
Conicprojections
Aconicprojectionprojectsthemapontoasinglecone,whichisthen
Aconicprojectionprojectsthemapontoasinglecone,whichisthenunrolled.
Thiscanlead tovery goodlocalproperties, but regionsfar from thefocuspointofthe
thefocuspointofthecone
cone may
becomevery distorted.
OneexampleofthisistheLambertconformalconicprojection
OneexampleofthisistheLambertconformalconicprojection(projection='lcc').
Itprojectsthemapontoaconearrangedinsuchawaythattwostandardparallels(specifiedinBasemapby
Itprojectsthemapontoaconearrangedinsuchawaythattwostandardparallels(specifiedinBasemap lat_1 and
lat_2) have well-represented
represented distances, with scale decreasing between them and increasing outsideof
them.
Other useful conic projections are the equidistant conic (projection='
(projection='eqdc')
') and the Albers equal-area
equal
(projection='aea') projection
importnumpyasnp
importmatplotlib.pyplotasplt
frommpl_toolkits.basemapimport
importBasemap fig
= plt.figure(figsize=(8, 8))
m=Basemap(projection='lcc',resolution
resolution=None,
lon_0=0,lat_0=50,lat_1=45,lat_2=55 55,width=1.6E7,height=1.2E7)
draw_map(m)
DrawingaMapBackground
The Basemap package contains a range of useful functions for drawing borders of physical features like
continents,oceans,lakes,andrivers,aswellaspoliticalboundariessuchascountriesandUS statesandcounties.
Thefollowingaresomeoftheavailabledrawingf
ThefollowingaresomeoftheavailabledrawingfunctionsthatyoumaywishtoexploreusingIPython’s
unctionsthatyoumaywishtoexploreusingIPython’s helpfeatures:
• Physicalboundariesandbodiesofwater
drawcoastlines()-Draw
Draw continentalcoast
lines
drawlsmask()-Drawa
Drawa maskbetweenthelandandsea,forusewithprojectingimagesononeor
maskbetweenthelandandsea,forusewithprojectingimagesonone the
otherdrawmapboundary() - Draw the map boundary, including the fill color for oceans
drawrivers() - Draw rivers on the map
fillcontinents()-Fillthecontinentswithagivencolor;optionallyfilllakes
Fillthecontinentswithagivencolor;optionallyfilllakes withanother
withanothercolor
color
• Politicalboundaries
drawcountries() - Draw country
boundariesdrawstates()-DrawUSstate
boundaries drawcounties() - Draw US
countyboundaries
• Mapfeatures
drawgreatcircle() - Draw a great circle between two
pointsdrawparallels()-Drawlinesofconstantlatitude
Drawlinesofconstantlatitude
drawmeridians() - Draw lines of constant longitude
drawmapscale() - Draw a linear scale on the map
• Whole-globeimages
bluemarble() - Project NASA’s blue marble image onto the
mapshadedrelief() - Project a shaded relief image onto the
map etopo() - Draw an etopo relief image onto the map
warpimage()-Projectauser-providedimageontothemap
providedimageontothemap
PlottingDataon Maps
TheBasemaptoolkitistheabilitytoover
TheBasemaptoolkitistheabilitytoover-plotavarietyofdataontoamapbackground.
Therearemanymap-specificfunctionsavailableasmethodsoftheBasemap
specificfunctionsavailableasmethodsoftheBasemap
instance.Some of these map-specific methods
hods are:
contour()/contourf()-Drawcontourlinesorfilled
Drawcontourlinesorfilled
contoursimshow() - Draw an image
pcolor()/pcolormesh()-Drawapseudocolorplotforirregular/regular
Drawapseudocolorplotforirregular/regular
meshesplot() - Draw lines and/or markers
scatter() - Draw points with
markersquiver()- Drawvectors
barbs() - Draw wind barbs
drawgreatcircle()-Drawagreat
circle
Visualizationwith
VisualizationwithSeaborn
ThemainideaofSeabornisthatitprovideshigh
ThemainideaofSeabornisthatitprovideshigh-levelcommandstocreateavarietyofplottypes
levelcommandstocreateavarietyofplottypes
usefulforstatistical data exploration, and even some statistical model fitting.
Histograms,KDE,anddensities
In statistical data visualization, all you want is to plot
histograms and joint distributions of variables. We have
seen that this is relatively straightforward in Matplotlib
Rather than a histogram, we can get a smooth estimate of
the distribution using a kernel density estimation, which
Seaborn does with sns.kdeplot
import pandas as pd
importseabornassns
data=np.random.multivariate_normal([0,0],[[5,2],[2,
2]],size=2000)
data=pd.DataFrame(data,columns=['x','y'])
forcolin'xy':
sns.kdeplot(data[col],shade=True)
HistogramsandKDEcanbecombinedusingdistplot
sns.distplot(data['x'])
sns.distplot(data['y']);
Pair plots
When you generalize joint plots to datasets of larger dimensions, you end up with pair plots. This is veryuseful
forexploringcorrelationsbetweenmultidimensionaldata,whenyou’dliketoplotallpairsofvaluesagainsteach other.
We’lldemothiswiththeIrisdataset,whichlistsmeasurementsofpetalsandsepalsofthreeirisspecies:
importseabornassns
iris = sns.load_dataset("iris")
sns.pairplot(iris,hue='species',size=2.5);
24
Facetedhistograms
Sometimes the best wayto view data is via histograms of subsets. Seaborn’s FacetGrid makes this
extremely simple.
We’ll take a look at some data that shows the amountthat restaurant staff receive in tips based
onvariousindicator data
25
Factorplots
Factorplotscanbeusefulforthiskindofvisualizationaswell.Thisallowsyouto
viewthedistributionofaparameter within bins defined by any other parameter.
Jointdistributions
Similartothepairplotwesawearlier,wecanusesns.jointplottoshowthejoint
distributionbetweendifferentdatasets,alongwiththeassociatedmarginaldistributions.
Barplots
Timeseriescanbeplottedwith sns.factorplot.