Felsenstein 1985

Confidence Limits on Phylogenies: An Approach Using the Bootstrap
Author(s): Joseph Felsenstein

Reviewed work(s):
Source: Evolution, Vol. 39, No. 4 (Jul., 1985), pp. 783-791
Published by: Society for the Study of Evolution
Stable URL: http://www.jstor.org/stable/2408678 .
Accessed: 19/07/2012 11:23
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Society for the Study of Evolution is collaborating with JSTOR to digitize, preserve and extend access to
Evolution.
http://www.jstor.org
Evolution,39(4), 1985, pp. 783-791
CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH

USING THE BOOTSTRAP
JOSEPH FELSENSTEIN
Department
ofGenetics
SK-50, University
of Washington,
Seattle,WA98195
Abstract.-The recently-developed statistical

methodknownas the"bootstrap" canbe used
to placeconfidenceintervals on phylogenies. It involvesresamplingpointsfromone'sown
data,withreplacement, tocreatea seriesofbootstrap samplesofthesamesizeas theoriginal
data.Each oftheseis analyzed,and thevariationamongtheresulting estimates takento
indicatethesize oftheerrorinvolvedin makingestimates fromtheoriginaldata.In the
case ofphylogenies,it is arguedthatthepropermethodofresampling is to keepall ofthe
originalspecieswhilesampling characters withreplacement, undertheassumption thatthe
charactershavebeenindependently drawnbythesystematist and have evolvedindepen-
dently.Majority-ruleconsensustreescan be usedto construct a phylogeny showingall of
theinferred monophyletic groupsthatoccurredin a majority ofthebootstrap samples.If
a groupshowsup 95% ofthetimeor more,theevidenceforit is takento be statistically
significant.
Existing
computer programs can be usedto analyzedifferent
bootstrap samples
byusingweights on thecharacters, theweightofa character beinghowmanytimesit was
drawnin bootstrap sampling. Whenall characters are perfectly
compatible,as envisioned
by Hennig,bootstrapsamplingbecomesunnecessary; thebootstrapmethodwouldshow
evidencefora groupifit is defined
significant bythreeor morecharacters.
ReceivedJuly12, 1984. AcceptedApril12, 1985
It is rare that any attemptis made to a confidenceintervalby findingall trees

put a confidenceintervalon an estimate that cannot be rejected in comparison
of a phylogeny.Most methodsforinfer- with the best supportedtree. I have re-
ringphylogeniesyield one or a fewtrees, cently extended Cavender's analysis to
and theirusers rarelygo beyond exam- the case of a molecular clock with three
iningthe variation among treesthat are species, obtaining, in that case, confi-
tied with the best tree under whatever dence limitsthatwere somewhatsmaller
criterionis being employed. There is no than Cavender's (Felsenstein, 1985). I
reason to believe that this practicecon- have also recentlyreviewed the appli-
stitutesan adequate exploration of the cation of statisticsto inferringphyloge-
size of the confidencelimitson the esti- nies (Felsenstein,1983a); thatpaper may
mate. be consultedforearlierreferenceson sta-
A fewauthorshave exploredthe ques- tisticalestimationof phylogenies.
tion of confidencelimitson phylogenies. An importantrecentstatisticalmethod
The pioneer in doing so is Cavender is the bootstrap(Efron,1979), a relative
(1978, 1981) who examined the confi- of thejackknife.Like thejackknife,it is
dence limits for a four-speciescase, in a method of resamplingone's own data
terms of how many steps worse a tree to inferthe variabilityof the estimate.
mustbe thanthemostparsimonioustree This paper will explore the use of the
to be significantly worse.His resultswere bootstrapin inferringphylogenies,where
a bit disconcerting:wheninferenceswere it leads to a practicalmethodforplacing
based on twentycharacters,a treewould confidenceintervalson the estimates.
have to be 9 steps worse to be signifi-
cantlyworse. This implies that the con- The Bootstrap
fidenceintervalswould be quite large. A straightforward statisticalexposition
Templeton (1983) has constructeda of the bootstrap is given by Efron and
test of whetherone tree is significantly Gong (1983), and a readable elementary
bettersupportedthan another. In prin- treatmentis thatby Diaconis and Efron
ciple such a testcould be used to delimit (1983). The basic idea of the bootstrap
783
784 JOSEPH FELSENSTEIN
involves inferringthe variabilityin an can be inferredby computingthe vari-

unknown distributionfromwhich your ance of this collection of t* values, and
data were drawnby resamplingfromthe the confidencelimits on the parameter
data. Suppose that you had data points can be approximatedby usingtheappro-
xl, x2, . . ., xn,which you are willingto priate upper and/orlower percentilesof
assume were drawn independentlyfrom theobserveddistributionofthet* values.
the same distribution.From these, ap- The justificationforthis resamplingis
plyingsome method T of statisticales- that,ifthe originalsample size n is large,
timation,we obtain an estimate each possible value of x will be repre-
sented in the same proportionas in the
t = T(x1,x2, . . .*,x) (1)
underlyingdistribution,and resampling
of a parameterwe are interestedin. If we from the data points with replacement
knew the exact distributionfromwhich will be the same as sampling from the
the xi were drawn,and if the functionT underlying distribution. For smaller
were sufficiently tractable algebraically, sample sizes, the process is an approxi-
we could obtain a formulaforthe stan- mationbut frequently is a verygood one.
dard erroroftheestimatet,and also con- The monographby Efron(1982) can be
structconfidenceintervalsfort. consultedforfurther details on the prop-
The bootstrapprocedureis mostuseful ertiesof the bootstrap.
when we eitherdo not know the distri-
bution of the xi, or when T is so com- BootstrappingPhylogenies
plicatedthatits standarderroris difficult How can the bootstrapbe applied to
to compute. It suggeststhatwe resample phylogenies?Instead of sample pointsxl,
our data to constructa seriesof fictional x2, ..-, xn we usually have a table of
sets of data. Each of theseis constructed species x characters(or species x sites
by sampling n points fromthe x1,sam- for molecular sequences). It is not im-
pling with replacement.Each such fic- mediatelyobvious how resamplingcan
tional data set consists of n points,xl*, be done in thedata table. I will arguethat
... , xn* whereeach pointxi* is drawnat a justifiable procedure is to bootstrap
random fromamong the n originaldata across the characters,that is, to sample
points. It is quite likelythat,in this re- characters(or sites) fromthe data table
sampling process, some of the original with replacement.Thus, each bootstrap
data points are representedmore than sample consistsof a new data table with
once, and othersare omitted. the same set of species,but withsome of
For each fictionalset of data, we com- the original characters duplicated and
pute the estimate others dropped by the process of sam-
pling n charactersfromthe original set
t* = T(xl*, x2*, ... *, xn*). (2) with
replacement.
The resamplingprocess is done many The justificationforthisis thatwe can
times (say r times),each time producing view each characteras having evolved
a fictionalsample of n pointsby sampling independentlyfromthe othersaccording
withreplacementfromtheoriginaln data to a stochasticprocessthathas among its
points. For each the estimatet* is com- parameters the topology and branch
puted. We are then in possession of a lengthsoftheunderlying phylogeny.Each
collectionofrestimatesoftheparameter. characteris thena random sample from
The essentialidea ofthebootstrapis that a distributionof all possible configura-
this set of estimates has a distribution tions of characters.For example, if we
thatapproximatesthe distributionof the are consideringnucleicacid sequencedata
actual estimate t. A bias-correctedesti- withp species,thereare 4P possible out-
mate of the parametercan be computed comes at each site,not countingthe pos-
by averagingthe r different t*values (Ef- sibilitiesof deletion and insertion.To a
ron and Gong, 1983). The variance of t firstapproximationwe can considereach
BOOTSTRAPPING PHYLOGENIES 785
site to be independentlydrawn from a characters.I have recentlydiscussed(Fel-

distributionwith 4P possibilities,whose senstein, 1983a) some of the statistical
probabilitiesdepend on thephylogenywe issues involved in such a random-sam-
are tryingto estimate. pling model of inferenceof phylogenies.
Given thisindependenceof evolution- A more serious difficultyis lack of in-
ary processes in different characters,the dependence of the evolutionaryprocess-
configurations in the charactersare seen es in differentcharacters.Ifthecharacters
to be drawn independentlyand identi- are correlated(as measurementcharac-
cally distributed(i.i.d.), a necessarycon- ters oftenare), then, in effect,we have
ditionforthebootstrapmethodto be val- fewercharactersin the studythanwe be-
id. In fact,in thecase ofdiscretecharacter lieve. If correlationsmean thatwhat ap-
states(such as nucleic acids), the under- pear to be 50 independentcharactersare
lyingdistributionis multinomial,since really more like 30, the variabilitythat
there are 4P possibilities each of which we inferforour estimatewillbe too small,
has some probabilityof occurring.De- producingoverconfidencein the result;a
spitethecomplexityofthestructure being bootstrap involving sampling 30 char-
inferred(the phylogeny)the statistical acters at random from among the 50
model is a verystraightforward one-in- would have been more appropriate,
dependentsamples froma multinomial thoughthereis no way to know this in
distribution. advance. For the purposes of this paper,
It mightbe argued that this presup- we will ignorethese correlationsand as-
poses that the same probabilisticevolu- sume that they cause no problems; in
tionaryprocess is operatingin all of the practice,theypose themost seriouschal-
characters,which is extremelyunrealis- lenge to the use of bootstrapmethods.
tic. Such an assumptionis not necessary. A similarproblemcan arisewhenmul-
If instead we had a varietyof different tistatecharactershave been recoded into
kindsofcharactersevolvingaccordingto binary"factors"thatare then treatedas
different processes,we need onlyimagine iftheywere independenttwo-statechar-
that there is an additional stage in the acters. These cannot be completelyin-
process of random sampling,one occur- dependent,as theywould have to be if
ringin the mind of the systematist.We the bootstrap sampled them indepen-
imagine,as partofthestochasticprocess, dently.Walter Fitch (pers. comm.) has
a stepin whichthe systematistrandomly suggestedthatthisproblemcan be avoid-
draws each characterfroma pool of dif- ed by retaininga recordof which binary
ferentkindsofcharacters,each kindhav- factorsare associated with which of the
ing a different evolutionaryprocess that originalcharacters,and havingthe boot-
applies to it. Once drawn,each character strapsample the originalcharactersand
then has its actual configurationdeter- keep all of the binaryfactorsof a char-
minedby theappropriatestochasticevo- acter together.Thus, if therewere nine
lutionaryprocess. The resultingdistri- charactersthathad been expanded to 20
bution of characterconfigurationsis a binary factors,we would constructthe
mixture of multinomial distributions, bootstrapsample by drawingnine times
and, as such, is still a multinomialdis- fromthe nine characters,and whenever
tributionand is stilli.i.d. a characterwas drawnwe would takecare
In practice the systematistmay not to put all of its binary factorsinto the
have sampled the charactersat random. bootstrapsample.
Systematistsfrequentlyinclude charac-
tersin thestudyin groups(such as groups ConfidenceLimits on Phylogenies
of measurementson the skull). We are An interesting
problemariseswhenwe
thennotjustifiedin regardingtheprocess begin to consider how to constructcon-
of choice of charactersas a seriesof ran- fidencelimits on the phylogenies.Each
dom samples from a pool of possible bootstrapsample is a data set that must
be analyzed to obtain an estimateof the theyare both presentin thetruetree.But

phylogeny.We then have r phylogenies. at leasttheycannotbe contradictory: each
Each of these is a complicated multi- being present on at least 95% of the
variateentitythathas a treetopologyand bootstrapestimatedtrees,theymust co-
may also have branch lengths.Defining occuron at least one ofthetreesand must
a confidenceintervaland summarizingit thus be eithernested or disjoint.
in a useable form is far froma simple The same argumenthas been used by
matter. Margushand McMorris (1981) to define
In bootstrapping,confidencelimitson "majority rule" consensus trees. These
a statisticare frequentlyconstructedby are trees composed of all those subsets
the percentilemethod, which involves that appear in a majorityof a collection
simplytaking(fora 95% confidencein- oftrees.By theargumentjust given,these
terval) the empirical upper and lower subsets must definea tree,since no two
2.5% points of the distributionof boot- of themcan conflict.If we take the set of
strapestimatesof the statistic.Consider phylogeniesthat resultfromanalyzinga
testingwhetherthe probabilityof heads series of bootstrapsamples and make a
of a tossed coin exceeds 0.50. If we did majority-ruleconsensus tree, recording
not know about the binomial distribu- on it how ofteneach subset appears, we
tion and decided insteadto use theboot- will obtain a treethatcan be used to de-
strap,a one-sided confidenceintervalon fineat a glance confidencesets for any
the probabilityof heads could be con- rejection probability below 50%. The
structedby findingthe empirical lower majority-rule consensustreeitselfcan be
5% point of the distributionof bootstrap consideredto be an overall bootstrapes-
estimates.The setofvalues less than0.50 timateof the phylogeny.
would thereforebe rejected if values of In cases wherewe are using a statisti-
the estimated probabilityof heads that callywell-foundedmethod,such as max-
small or smaller occurred less than 5% imum likelihood estimation,we would
of the time among the bootstrap esti- hope thatthe bootstrapmethod and the
mates. curvatureofthelikelihoodsurfacewould
The approach used here startswiththe give similarindicationsofwhichpartsof
assumption that the systematistis pri- the phylogenywere well estimated and
marilyinterestedin whethersome par- whichnot.Wherethemethodofinferring
ticulargroup is monophyletic.A rooted phylogeniesis one with undesirable sta-
tree is a series of statementsasserting tisticalpropertiessuch as inconsistency,
monophylyof a series of nested or dis- the bootstrapdoes not correctforthese.
joint sets of species. Suppose thatwe are For example, clusteringby overall sim-
interestedin a subset S of species and ilaritymakes an inconsistentestimateof
wish to knowwhetherthereis significant thephylogenyifratesof evolutionin dif-
supportin the data forthe assertionthat ferentlineagesdifferby more than a cer-
thisgroupis monophyletic.We can reject tain amount. Parsimony methods are
the alternativesto the subset S if they subjectto the same problem,but require
occur in less than 5% of the bootstrap greaterinequalities of evolutionaryrate
estimates. to be inconsistent.For an elementarydis-
We thus wish to search forall subsets cussion of these phenomena, see my re-
S of species that occur on 95% or more cent review article (Felsenstein, 1983b).
of the bootstrapestimates.Each of these Bootstrappingprovides us with a confi-
subsets may be considered to be sup- dence intervalwithinwhichis contained
ported (in the sense that its alternatives notthetruephylogeny, butthephylogeny
are rejected),althoughthose confidence that would be estimated on repeated
statementsare notjoint confidencestate- sampling of many charactersfrom the
ments:iftwo subsetsare each supported underlyingpool of characters.As such it
at the 95% level, we mighthave as little may be misleadingifthe methodused to
as 90% confidencein the statementthat inferphylogeniesis inconsistent.
TABLE 1. FossilhorsedataofCaminandSokal(1965).Thestatesofeachcharacter areina linearseries.

-1, 0, 1, 2, . . ., withtheancestralstatebeing0. The data are also shownin binaryrecodedformin
whichtheninemultistate havebeenrecodedinto20 binaryfactors.
characters lineofthattable
The first
thecorrespondence
indicates between andrecodedcharacters.
theoriginal sampling
Bootstrap ofcharacters
shouldbe donebeforeanyrecoding intobinaryfactors.
BinaryFactors
Name Characters 11112 22333 44566 77889
Mesohippus 0 0 0 0 0 0 0 0 0 00000 00000 00000 00000

Hypohippus -1 3 3 0 0 0 0 0 1 00011 11111 00000 00001
Archaeohippus 1 0 0 0 0 0 0 1 1 10000 00000 00000 00101
Parahippus 1 1 1 1 0 0 0 0 1 10001 00100 10000 00001
Merychippus 2 2 2 2 1 1 1 2 1 11001 10110 11110 10111
Merych.secundus 2 2 2 2 1 -1 -1 2 1 11001 10110 11101 01111
Nannippus 2 2 1 2 1 1 1 2 1 11001 10100 11110 10111
Neohipparion 2 3 3 2 1 1 1 2 1 11001 11111 11110 10111
Calippus 2 2 1 2 1 -1 -1 2 1 11001 10100 11101 01111
Pliohippus 3 3 3 2 1 -1 -1 2 1 11101 11111 11101 01111
One difficulty in the interpretationof one would have to engage in an extrap-

theresultis thatwe maynothave decided olationto make theirvariancelarger.The
which subset of species interestsus until difficultyin envisaginga procedurelike
afterthe bootstrap result is examined. this is thatthe space of possible phylog-
This raises the "multipletests"problem: enies does not lend itselfreadily to ex-
ifwe have 20 statisticaltests,on average trapolation: once a branch length has
one should show significanceat the 95% shrunkto zero it is not immediatelyob-
level purelyat random. There are ways vious what to do next. Unlike normal
of makingsimplecorrectionsifthenum- means, phylogeniesdo not live in a flat
ber of independenttestsis known,but in Euclidean space. One way to make the
this case the differenttests(the different jackknifevary as much as the bootstrap
subsetsthatshowup on themajority-rule would be to drop not one observation,
consensus tree) are probably correlated, but half the observationschosen at ran-
so thatit is not easy to see how to com- dom. This possibilityis worthexploring,
pute the numberof independenttestsso butforthemomentitis notobvious what
as to correctforit. I have simplytaken advantage there would be to using the
the 95% level as correct,as if we had jackkniferatherthan the bootstrap.
chosen the testof interesta priori.
One mightwonder whetherthe jack- UsingExisting ComputerPrograms
knifewould be a viable alternativeto the The process of generatingmany boot-
bootstrap.If we make a set of estimates strapsamples froma data set is a tedious
by droppingone characterat a time and one. One mightthinkthat it would re-
then estimatingthe phylogeny,the require special programsto rewritethedata
sultingphylogenieswill varyfarless than matrix,leaving out some charactersand
thebootstrapestimatesdo. In the simple duplicatingothers.Fortunately,much of
test case of sample averages estimating thatworkcan be avoided by makinguse
the mean of a normal distribution,it of differential characterweights,which
turns out that the jackknife estimates are allowed in most computerprograms
of the mean will have a variance only for inferringphylogenies, particularly
n21(n- 1)3 times as large as that of the programs using parsimony methods.
correspondingbootstrap estimates (Ef- These programs usually allow integer
ron and Gong, 1983). To make the vari- weightsfor the characters,weightsthat
ance among the jackknife estimates as can be 0, 1, 2, .... A weight of zero
largeas thatamong bootstrapestimates, means that the character is in effect
dropped fromthe analysis. A weightof gram package PHYLIP, available free

w means thatthe characteris counted as fromme (see the Appendix below).
ifpresentw times,so thateach changeof
state in the characteris counted as if it An Example
were w changes of state. Table 1 showsthefossilhorsedata giv-
This automatically accomplishes the en by Camin and Sokal (1965 pp. 321-
duplication and deletion of characters 322) as a computational example. The
without the necessityof recopyingthe fulllist of species and referencesforthe
data matrix.Different bootstrapsamples originaldata are givenby Camin and So-
could be fed into the programsby doing kal (1965). The data set has ten species
computerrunswithdifferent weights.The and nine multistatecharacters.Mesohip-
weights are generated by startingwith pus has been takenas the outgroup,as it
weightsofzero forall characters.We then was in Camin and Sokal's paper.
sample n charactersat random with re- Figure 1 shows the resultsof running
placement(usinga table ofrandomnum- a branch-and-boundprogramthat finds
bers,forexample). Each timea character all most parsimonioustreesaccordingto
is drawn,its weightis increased by one, the Wagner parsimonycriterion.There
so that in the end its weightcounts the are tenmostparsimonioustrees.The left
number of times it was sampled. Here treein Figure 1 shows nine of them: the
are five of the weightvectors that were two emptycircleswiththreedescendants
generatedwhen bootstrapsamplingwas representnot trifurcations, but points at
done on 20 characters: which the tree can be resolved into any
of threebifurcatingtopologies. All nine
21100212120121012010
possible combinationsof theseare in the
01001510031020011211
listofmostparsimonioustrees.The tenth
01031211201012100121
treeis theone shownat therightofFigure
12130421030000100101
1. All of these trees require 29 changes
20010100211211121211
of characterstate.
To do bootstrapsampling,one would If we wereto take thevariationamong
generatea vectorofweights,runthephy- themostparsimonioustreesas providing
logeny estimation program with those an adequate indicationoftheuncertainty
weights, generate another vector of in our estimate of the phylogeny,we
weights,runtheprogramwiththose,and would conclude that fourmonophyletic
so on. The process is fairlytedious, al- groupsweredefined,as thesegroupsshow
thoughwithmicrocomputersit need not up in all ten of the most parsimonious
be expensive.In thefossilhorse example trees.They are:
below, I have used 50 bootstrapsamples. (Pliohippus,Merychippussecundus,
This mightstrikea statisticianas too few,
Calippus)
but a systematistas too many.The more
(Nannippus,Neohipparion,
Merychip-
samples are taken,the more accurate an
pus)
idea we will have of which groups are
(Pliohippus,Merychippussecundus,
likelyto be monophyletic.Even with a
Calippus, Nannippus,Neohippa-
small number of bootstrap samples we rion,Merychippus)
will quicklyget a feel forwhich parts of
(all but Mesohippusand Archaeohip-
our estimate of the phylogenyare well
pus)
supportedand which not.
A computerprogramthat carries out When we carryout bootstrap sampling
bootstrap sampling and computes the ofcolumnsfromtheleft-handpartofTa-
majorityrule consensus treeis available ble 1 and analyze 50 bootstrapreplicates,
for the case of discrete charactersana- we gettheresultsshownin Figure2. Next
lyzed by the parsimonyand compatibil- to each branch of the tree is shown the
ity methods. It is contained in the pro- number of times that the bootstrap es-
a e
P;h t Nn-9,an
Me Neoe
FIG. 1. All mostparsimonioustreesforthefossilhorsedata in Table I when phylogeniesare evaluated

by the Wagnerparsimonycriterion.There are ten most parsimonioustreesin all. Nine of these can be
generatedby resolvingeach of the trifurcationsin the lefttree into all threepossible bifurcations.The
tenthis shownin the righttree.All abbreviationsare firstthreelettersof names in Table 1, exceptMSE =
Merychippussecundus.
timate contained the corresponding formationor othermethodofrootingthe

monophyleticgroup. The tree shown is tree, we could still carryout bootstrap
the majority-ruleconsensus tree. The samplingand constructan unrootedma-
consensus treeturnsout in this case not jority-ruleconsensustree.To do that,we
to be one ofthemostparsimonioustrees. would onlyneed to note thateach branch
All fourof the monophyleticgroupslist- of one of the replicate bootstrap esti-
ed above occur on it, but only one of matesdividesthespeciesintotwogroups,
these (the six-speciesgroupconsistingof at least one of which would be mono-
Pliohippus,Merychippus
secundus,Ca- phyleticif we could root the tree. The
lippus,Nannippus,Neohipparion,and unrootedmajority-ruleconsensus treeis
Merychippus)comes close to occurring definedby findingthose partitionsthat
95% of the time in the bootstrap sam- occur in a majorityof the replicatetrees.
pling(it occurs47/50 or 94% ofthetime). A simple way of doing this is to choose
The others only occur about two-thirds an arbitraryspecies as an outgroup,make
of the time. It is apparentthattakingthe a majority-ruleconsensus tree of the re-
set of most parsimonioustreesas defin- sultingrootedtrees,and thenpresentthe
ing the confidenceintervalwould result resultas an unrooted tree withoutindi-
in fartoo narrowan interval. catingwhich species was the outgroup.
The example given here has had the
treerooted by use of an outgroup.If we Perfectly Hennigian Data
were using a method that produced an Occasionally, though ratherrarely,a
unrooted tree and had no outgroupin- data set will arise that has no internal
charactersis less than 0.05. This is easily

computed,given n and c.
The probabilityof leaving out all c
charactersin drawingn characterswith-
out replacementis (1 - c/n)n . The value
36 28 of c that is necessaryto make this less
than0.05 is the same forall relevantval-
ues of n: it is c = 3. We can thusconclude
34\ /34 that,if the data are perfectlyHennigian,
threecharactersare enoughforthe boot-
47
strapto indicatesignificant supportfora
monophyletic groupat the95% level.Any
groupsupportedby fewercharacterswill
not be in the bootstrapconfidenceinter-
128
Par val. Of course,we are assumingthatthe
evolutionaryprocessesand the inclusion
of the charactersby the systematistare
32 Arc
independentacross characters.
Althoughthreecharactersare enough
to guaranteeinclusion of a group,if the
data are perfectlyHennigian, one will
never encounterany characterthat con-
tradictsthe group. Sometimes we have
FIG.2. Bootstrap estimate ofthephylogeny for greatconfidencethat our charactersare
thedataofTable 1whenphylogenies areevaluated ".clean" ones, that reversals and paral-
bytheWagnerparsimony criterion. bootstrap lelisms would be so rarelyseen that we
Fifty
sampleswereanalyzed.The groupsshownin this
treeare thosethatoccurredin a majorityof the can have confidencein a groupeven ifit
resultingtrees,plusthemostfrequently occurring is supportedby only one character.The
groupsthatwerecompatiblewiththese.Nextto present"rule of three" would then seem
eachbranchis shownthenumberoftimesthatthe to be a conservativeone.
monophyletic groupitdefines occurred.Further
ex-
planationis givenin thetext. It maybe doubtedthattheruleis really
always conservative. I have recently
studied,by exact enumerationmethods,
the problemof placingconfidencelimits
on phylogeniesusing parsimony meth-
conflict,all characters being perfectly ods whenthereare onlythreespecies and
compatible. This sort of data, which is an evoutionaryprocessforwhichan evo-
thatenvisioned by Hennig when he sug- lutionaryclock may be assumed (Felsen-
gested using derived characterstates to stein,1985). It turnsout thatin theworst
definemonophyleticgroups,allows us to case, when the characters are equally
avoid entirelythe bootstrap sampling likelyto resolve a trifurcation in any of
process. The argumentis quite simple. thethreepossible ways,ifwe have three
Suppose thatwe wantto constructa 95% charactersall of which supportthe same
confidence interval by bootstrap sam- resolution,this is not statisticallysignif-
pling. Suppose that c charactersout of n icant at the 95% level. Four characters
define the same monophyletic group. would be. (I am indebted to Alan Tem-
That groupwill show up in thebootstrap pleton for pointing out the connection
estimateif any one of those c characters betweenthe two calculations.)
is drawnin the samplingof n characters. In manycases, strongconclusionshave
The monophyleticgroup will be part of been drawnfromthe existenceof groups
the 95% confidenceintervalif and only definedby as littleas one character.The
if the probabilityof omittingall c of the greatadvantage of the presentapproach
is that it provides a practical method, * 1982. The jackknife,thebootstrap, and
albeit a flawedone, forassessingthe un- otherresampling plans.CBMS-NSF Regional
certaintyinherentin such conclusions. I Conference Seriesin AppliedMathematics No.
38. SocietyforIndustrialand AppliedMathe-
suspect that the levels of uncertainty matics.Philadelphia,PA.
found in practice will be so great as to EFRON,B., AND G. GONG. 1983. A leisurely look
give pause to all but the firmestexpo- at thebootstrap,thejackknife, and cross-vali-
nents of nonstatisticalhypothetico-de- dation.Amer.Statist.37:36-48.
FELSENSTEIN, J. 1983a. Statistical inference
of
ductive approaches to inferringphylog- phylogenies.J. Roy. Statist.Soc. A 146:246-
enies. 272.
. 1983b. Parsimonyin systematics: Bio-
ACKNOWLEDGMENTS logicaland statistical
issues.Ann. Rev. Ecol.
I am gratefulto Kent Fiala of the De- Syst.14:313-333.
1985. Confidence limitson phylogenies
partmentofEcologyand Evolution,State witha molecularclock.Systematic Zoology34:
Universityof New York at StonyBrook, 152-161.
for providing me with the fossil horse MARGUSH, T., AND F. R. MCMORRIS. 1981. Con-
data ofCamin and Sokal in recodedform. sensusn-trees.Bull. Mathemat.Biol. 43:239-
244.
I wishto thankWalterFitch,MontySlat- TEMPLETON, A. R. 1983. Phylogenetic inference
kin, Alan Templeton, Bill Engels, Ruth fromrestriction
endonucleasecleavagesitemaps
Shaw, and an anonymous statisticalre- withparticularreferencetotheevolutionofhu-
viewer forsuggestionsforimprovement mansand theapes.Evolution37:221-224.
of the manuscript.This work was sup-
Editor:W. R. Engels
Corresponding
ported by task agreementnumber DE-
AT06-76EV71005 of contract number
DE-AM06-76RL02225 betweenthe U.S. APPENDIX
Departmentof Energyand the Univer- Availabilityof the PHYLIP
sityof Washington. ProgramPackage
PHYLIP, thePhylogeny InferencePackage,is a
LITERATURE CITED
freepackageofcomputer programs, written inPas-
CAMIN, J. H., AND R. R. SOKAL. 1965. A method cal,forinferringphylogenies.Itincludesparsimony
fordeducingbranchingsequences in phylogeny. methods,compatibility methods,distancematrix
Evolution 19:311-326. methods, and maximumlikelihoodmethods.The
CAVENDER, J. A. 1978. Taxonomy with confi- Pascal sourcecode is provided(compiledobject
dence. Math. Biosci. 40:271-280 (Erratum:Vol. codeis not).PHYLIP willbe written in a standard
44, p. 308, 1979). format on a magnetic tapeprovidedbytherecip-
1981. Tests of phylogenetichypotheses ient.It willalso be providedon 51/4-inch diskettes
undergeneralizedmodels.Math. Biosci. 54:217- if6 doubledensity diskettes
are sent.A varietyof
229. soft-sectored MSDOS, CP/M-80,and CP/M-86
DIACONIS, P., AND B. EFRON. 1983. Computer- formatscan be written;double-sided,hard-sec-
intensivemethodsin statistics.Sci. Amer. 249: tored,or 3.5-inchformats cannot,norcananyAp-
116-130. pleformats. Forinformation on formats supported
EFRON,B. 1979. Bootstrapmethods:Anotherlook and restrictionson countriesto whichdistribution
at thejackknife.Ann. Statist.7:1-26. and supportareavailable,pleasewritetheauthor.

Felsenstein 1985

Uploaded by

Copyright:

Available Formats

Felsenstein 1985

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Felsenstein 1985

Uploaded by

Copyright:

Available Formats

Confidence Limits on Phylogenies: An Approach Using the Bootstrap

Author(s): Joseph Felsenstein

CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH

Abstract.-The recently-developed statistical

ReceivedJuly12, 1984. AcceptedApril12, 1985

It is rare that any attemptis made to a confidenceintervalby findingall trees

involves inferringthe variabilityin an can be inferredby computingthe vari-

site to be independentlydrawn from a characters.I have recentlydiscussed(Fel-

be analyzed to obtain an estimateof the theyare both presentin thetruetree.But

TABLE 1. FossilhorsedataofCaminandSokal(1965).Thestatesofeachcharacter areina linearseries.

Mesohippus 0 0 0 0 0 0 0 0 0 00000 00000 00000 00000

One difficulty in the interpretationof one would have to engage in an extrap-

dropped fromthe analysis. A weightof gram package PHYLIP, available free

FIG. 1. All mostparsimonioustreesforthefossilhorsedata in Table I when phylogeniesare evaluated

timate contained the corresponding formationor othermethodofrootingthe

charactersis less than 0.05. This is easily

You might also like