Thermophilic thermotoga

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

On the chimeric nature, thermophilic origin, and

phylogenetic placement of the Thermotogales


Olga Zhaxybayevaa, Kristen S. Swithersb, Pascal Lapierrec, Gregory P. Fournierb, Derek M. Bickhartb, Robert T. DeBoyd,
Karen E. Nelsond, Camilla L. Nesbøe,f, W. Ford Doolittlea,1, J. Peter Gogartenb, and Kenneth M. Nollb
aDepartment of Biochemistry and Molecular Biology, Dalhousie University, 5850 College Street, Halifax, Nova Scotia, Canada B3H 1X5; bDepartment
of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125; cBiotechnology-Bioservices Center, University of Connecticut,
Storrs, CT 06269-3149; dThe J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850; eCentre for Ecological and Evolutionary
Synthesis (CEES), Department of Biology, University of Oslo, P.O. Box 1066 Blindern, N-0316 Oslo, Norway; and fDepartment of Biological
Sciences, University of Alberta, Edmonton, Alberta, Canada T6G 2E9

Contributed by W. Ford Doolittle, February 11, 2009 (sent for review January 6, 2009)

Since publication of the first Thermotogales genome, Thermotoga developments are relevant here: first that mesophilic Thermo-
maritima strain MSB8, single- and multi-gene analyses have dis- togales have been discovered (2), raising the possibility that
agreed on the phylogenetic position of this order of Bacteria. Here hyperthermophily is not ancestral to the group, and second that
we present the genome sequences of 4 additional members of the a thorough analysis of A. aeolicus shows that, although many of
Thermotogales (Tt. petrophila, Tt. lettingae, Thermosipho melane- its informational genes support sisterhood with Tt. maritima,
siensis, and Fervidobacterium nodosum) and a comprehensive substantial exchange of some of these genes has occurred with
comparative analysis including the original T. maritima genome. ␧-proteobacteria (3).
While ribosomal protein genes strongly place Thermotogales as a Although genome data derived from Tt. maritima have found
sister group to Aquificales, the majority of genes with sufficient wide use, this sequence provides only a single data set to
phylogenetic signal show affinities to Archaea and Firmicutes, represent all Thermotogales, of which at least 25 named species
especially Clostridia. Indeed, on the basis of the majority of genes have been isolated from diverse geothermal features worldwide.
in their genomes (including genes that are also found in Aquifi- Consequently, there is a potentially large genomic resource
cales), Thermotogales should be considered members of the Fir- available to more thoroughly examine the extent to which HGT
micutes. This result highlights the conflict between the taxonomic from archaea or other groups has contributed to the evolution
goal of assigning every species to a unique position in an inclusive of these organisms. We present here an analysis of genomes from
Linnaean hierarchy and the evolutionary goal of understanding this pivotal lineage, expanded by the addition of 4 sequences
phylogenesis in the presence of pervasive horizontal gene transfer completed at the Joint Genome Institute. These are from Tt.

MICROBIOLOGY
(HGT) within prokaryotes. Amino acid compositions of recon- petrophila RKU-1, Tt. lettingae TMO, Thermosipho melanesiensis
structed ancestral sequences from 423 gene families suggest an BI429, and Fervidobacterium nodosum Rt17-B1, species span-
origin of this gene pool even more thermophilic than extant ning much of the known Thermotogales phylogenetic spectrum
members of this order, followed by adaptation to lower growth and isolated from sites around the world. All are extreme
temperatures within the Thermotogales. thermophiles or hyperthermophiles and grow primarily on sug-
ars. We use these new sequences to revisit the role of HGT from
classification 兩 horizontal (lateral) gene transfer 兩 thermoadaptation 兩 Archaea and Firmicutes in the origin of Thermotogales, consider
taxonomy 兩 phylogenomic the meaning of such chimerism for positioning the group, and
examine properties of the ancestral Thermotogales proteome by
investigating the amino acid composition of ancestral protein
T he 1999 publication of the genome sequence of Thermotoga
maritima strain MSB8 brought horizontal (or lateral) gene
transfer (HGT or LGT) to the attention of genome biologists (1)
sequences.

and at the same time marked the beginning of a long quest for Results
this hyperthermophilic organism’s true phylogenetic home or Genome Characteristics. The Thermotogales considered here in-
taxonomic position. That report suggested that up to 24% of Tt. clude 2 close relatives (Tt. maritima and Tt. petrophila) and 3 with
maritima genes, many clustered in its chromosome, were ac- genus-level divergence. All have genome sizes similar to Tt.
quired by HGT from archaea: almost as many (21%) showed maritima [1.86 Mbp; supporting information (SI) Table S1] (1)
Firmicute affinities. Although rRNA phylogenies most often with Tt. lettingae having the largest genome in this group (2.14
placed Tt. maritima (and Aquifex aeolicus, another hyperther- Mbp). Of the 5 genomes, only those from Tt. maritima strain
mophilic bacterium) at the base of the bacterial tree, there was MSB8 and Tt. petrophila showed strong synteny over their entire
little consistent support for this from protein-coding genes. lengths, with 3 inversions (Fig. S1). PSI-TBLASTN analysis
Indeed, Nelson et al. (1) concluded that ‘‘the phylogenetic revealed many putative or fragmentary insertion sequence (IS)
position of Aquifex and Thermotoga, and the nature of the elements in the 5 sequenced Thermotogales genomes and a
deepest branching eubacterial species, should be considered tendency of certain genomes to accumulate specific families of
ambiguous.’’ This situation has not changed much in the ensuing IS elements over others (Table S1). Tt. lettingae has the fewest
10 years: many single- or multigene analyses put Tt. maritima
(sometimes with A. aeolicus as its sister) deepest in the bacterial
Author contributions: O.Z., K.E.N., W.F.D., J.P.G., and K.M.N. designed research; O.Z., K.S.S.,
tree, but several convincing reports (again including several P.L., G.P.F., D.M.B., R.T.D., C.L.N., and K.M.N. performed research; O.Z., K.S.S., P.L., G.P.F.,
multigene studies) place Tt. maritima with bacterial taxa not D.M.B., K.E.N., C.L.N., J.P.G., and K.M.N. analyzed data; and O.Z., K.S.S., P.L., G.P.F., W.F.D.,
generally thought of as deep (most often Firmicutes, frequently J.P.G., and K.M.N. wrote the paper.
among Clostridia). These results have been used to support The authors declare no conflict of interest.
claims for (i) a hyperthermophilic ancestry for Bacteria (or Data deposition: The sequences reported in this paper have been deposited in the GenBank
indeed for all life), (ii) the retention by Thermotogales and (accession nos. CP000702, CP000812, CP000716, and CP000771).
Aquificales, as basal lineages, of ancestral genes kept otherwise 1To whom correspondence should be addressed. E-mail: ford@dal.ca.
only in Archaea, or (iii) their adaptation to high-temperature life This article contains supporting information online at www.pnas.org/cgi/content/full/
by import of genes from hyperthermophilic archaea. Two recent 0901260106/DCSupplemental.

www.pnas.org兾cgi兾doi兾10.1073兾pnas.0901260106 PNAS 兩 April 7, 2009 兩 vol. 106 兩 no. 14 兩 5865–5870


IS elements, with only 3 identifiable IS605 elements. F. nodosum
has 49 IS elements, the largest number and the greatest diversity
(4 represented IS families) of the 5 genomes. Twelve IS elements Tt. petrophila Ts. melanesiensis
of F. nodosum were detectable only using a PSI-TBLASTN 1785 ORFs 1879 ORFs
analysis, an approach that does not rely on ORF prediction and 170 479
10 5
also recognizes pseudogenes. All 5 genomes have elements from 2
the IS605 family, whereas IS6 family transposons are unique to
305 140 52 24 9
F. nodosum. 18 16
No prophages were found in these genomes. The only evi- 1 44 55 142
944
dence of a prophage remnant is 2 adjacent Ts. melanesiensis 2 1
Tt. maritima F. nodosum
genes, Tmel㛭1472 and Tmel㛭1473, annotated as encoding an 224 6 14
1858 ORFs 375 1750 ORFs
endonuclease-like protein and a RecT-like protein, respectively, 15 3 101 0 1
and reminiscent of the prophage rac recET genes in Escherichia 26 30
coli (4). The presence of clustered regularly interspaced palin-
dromic repeat (CRISPR) elements (Table S2) nevertheless
suggests that these organisms may be subject to phage infection 651
Poorly Characterized
in their environments. Several of these elements are unique to
Metabolism
these genomes (Table S2), perhaps an indication of phage host
range specificity. Cellular Processes Tt. lettingae
and Signaling
Ribosomal RNA (rRNA) genes all occur in operons in the 2040 ORFs
Information Storage
order 5S-23S-tRNA-tRNA-16S. Tt. maritima, Tt. petrophila, and and Processing
Tt. lettingae all have 1 rRNA operon on their minus strands. Tt.
lettingae has an intron in its 23S rRNA gene that is 99% identical Fig. 1. Pan-genome of 5 members of Thermotogales. The Venn diagram
to the one in Tt. subterranea (5). In F. nodosum 2 operons are on shows the number of genes shared by different subsets of genomes, as
determined by BLAST⫹BRANCHCLUST (see SI Methods for details). Pie charts
the minus strand and the 5S and 16S rRNA genes in these are
show the distribution of genes unique for each genome across functional
identical. An IS element is present in the 23S rRNA gene of the supercategories. The size of a pie chart is proportional to the number of genes.
distal operon that may disrupt its function, especially given that
the sequences of the 23S rRNA genes additionally differ by 2
nucleotide transitions that map to stem regions. Ts. melanesiensis has been shown, for instance, for different ecotypes of Prochlo-
has 4 rRNA operons on its minus strand that are identical in rococcus spp. (ref. 10)].
sequence. Two adjacent operons have the typical 5S-23S-tRNA- For example, although the fermentative catabolism of the
tRNA-16S organization while the others lack the tRNA genes. Thermotogales species appears uniform, their genome se-
quences reveal interesting differences in carbon and electron
Intralineage HGT by Orthologous Replacement Between the 5 Ge- transfer pathways. All 5 catabolize glucose via the Embden–
nomes. Analysis of 1,115 gene families shared by at least 4 of 5 Meyerhof–Parnas pathway with likely contributions by the En-
Thermotogales genomes identified a strong tree-like signal tner–Doudoroff pathway as has been demonstrated for Tt.
supporting the relationship among these genomes previously maritima (11). Pyruvate is converted to an intermediate of a
indicated by a 16S rRNA gene tree (Fig. S2). This was not partial tricarboxylic acid pathway, using either the malic enzyme
surprising, given that 2 of the 5 genomes are very closely related (Tt. maritima, Tt. petrophila, Tt. lettingae, F. nodosum) or pyru-
compared to the other 3. The recovered plurality topology vate carboxylase (Tt. lettingae, Ts. melanesiensis, F. nodosum). Tt.
grouped the genomes according to their GC content, which
maritima and Tt. petrophila appear to catabolize malate in the
might raise the suspicion that the observed relationships could
oxidative direction to produce succinyl-CoA while the remaining
be an artifact due to skewed amino acid composition (6). A
species catabolize malate or oxalacetate in the reductive direc-
compositional homogeneity test showed that this is unlikely,
tion to 2-oxoglutarate. Tt. maritima and Tt. petrophila can also
because all but 73 gene families passed the test (data not shown).
produce succinate from malate in the reductive direction. These
Additionally, the same tree topology was recovered with 2
independent reconstruction methods (gene presence/absence differences suggest different needs to expel excess reducing
and rearrangement distance, see Methods, data not shown). power, particularly in Tt. maritima and Tt. petrophila when using
Fifty-eight gene families strongly disagree with the plurality the oxidative arm of the tricarboxylic acid pathway.
topology (not shown), supporting the alternative relationships An operon annotated as a putative membrane-associated
shown as red lines in Fig. S2. The majority (40 gene families) are proton-translocating ferredoxin:NAD(P)H oxidoreductase is
from ‘‘metabolism’’ and ‘‘poorly characterized’’ functional su- shared between archaea and the Thermotogales (Fig. 2). Hor-
percategories. Better taxonomic sampling will be required to izontal acquisition of this operon from the Thermococcales has
identify cases of intralineage transfer within the Thermotogales, been previously noted (1, 12). An analysis of phylogenies of
suggested in refs. 7–9. Additionally, our methodology detects several of the encoded proteins indicates that these genes likely
only cases of HGT that resulted in orthologous replacement, evolved as an operon (data not shown). Because the operon is
while ignoring cases of transfer that resulted in gene gain. found in genomes from across the clade, it appears to have been
lost by Tt. petrophila and Tt. lettingae or their direct ancestors.
Genome Contents and Unusual Physiologies. The 5 genomes share The orthologous mbx operon in Pyrococcus furiosus encodes an
944 orthologous gene families (the core), comprising 46–54% of oxidoreductase thought to couple the oxidation of ferredoxin
protein-coding genes per genome (Fig. 1). The remaining genes (reduced by electrons from sugar catabolism) with the reduction
are shared among only some of these 5 genomes, are unique for of NADP⫹. NADPH then transfers electrons to a sulfur reduc-
a specific genome, or form additional (paralogous or xenolo- tase when elemental sulfur is available (13, 14). A similar role is
gous) copies of genes. The unique genes comprise 10–32% of not indicated for the Thermotogales because all of these species,
these genomes, a disproportionately large fraction in functional even those lacking the operon, can reduce elemental sulfur. This
categories termed metabolism and poorly characterized (Fig. 1). operon may encode a membrane-bound NAD:methyl viologen
In many cases such patchily distributed genes might be held oxidoreductase like that observed in Tt. neapolitana (15). That
responsible for physiological differences between the species [as enzyme did not use ferredoxin as a substrate, so the cofactor

5866 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0901260106 Zhaxybayeva et al.


ranking matches (other than Thermotogales) with E-values
⬍10⫺4. Only 7.7–11.0% of genes retain closest affiliation with
Archaea (Table 1), in agreement with more sophisticated Dark-
Horse-based results (19). That this number is substantially lower
than reported by Nelson et al. (1) likely reflects differential
growth of the bacterial (vs. archaeal) database. Notably, the
largest proportion of genes affiliate Thermotogales with Firmi-
cutes (42.3–48.2%), especially Clostridiales. Furthermore, ther-
mophilic members of Firmicutes were among the top-scoring
BLAST hits and the proportion of thermophilic Firmicutes
top-scoring hits was higher for the genomes of Thermotogales
with higher optimal growth temperatures (Table 1). More
striking was the much lower number of top-scoring BLAST hits
to Aquificales, because this group appears as sister to Thermo-
togales in many analyses. This may reflect the fact that Firmi-
cutes and Thermotogales are heterotrophs and Aquificales are
autotrophs.
The number of top-scoring BLAST hits per taxonomic group
can be affected by the number of genomes sequenced from it. At
Fig. 2. The ferredoxin:NAD(P)H oxidioreductase operon shared by Thermo- the time of our analyses (August 2008) Aquificales were repre-
togales and Thermococcales species. The homologous genes in P. furiosus sented only by 2 sequenced genomes (comprising 3,281 genes),
compose the mbx operon. The promoter-proximal and -distal locus names are while 231,386 genes in the nr database represented Clostridiales.
indicated. To test if the low number of top-scoring BLAST hits between the
Thermotogales and Aquificales is an artifact of underrepresen-
tation of Aquificales genes in GenBank, we created a reduced nr
naturally paired with NAD under physiological conditions is database by removal of all Clostridiales sequences and reintro-
unknown. duction of 2 randomly chosen Clostridiales genomes (10 repli-
Tt. lettingae is the only species of the 5 that contains a gene cates were generated, see Table S3; note that Thermoanaer-
annotated as encoding a homolog of the large subunit of ribulose obacteriales is an order within the class Clostridiales). The
bisphosphate carboxylase (RuBisCO), Tlet㛭1684. In Bacillus assignment of taxonomic affiliations of top-scoring hits to ORFs
subtilis, a homologous gene has been shown to encode 2,3 in the Tt. maritima genome was repeated for each database
diketo-5-methylthiopentyl-1-phosphate enolase, an enzyme of a

MICROBIOLOGY
replicate as described above. The number of top-scoring BLAST
methionine salvage pathway (16). These RuBisCO-like proteins hits to the Aquificales did not increase above 60 (even in the case
(RLPs) are related to true RuBisCO proteins, and an evolu- where the database did not contain any Clostridiales), but an
tionary scheme has been proposed that suggests that the increase was noted for the number of hits to other Firmicutes,
RuBisCO large subunit and RLP arose in the archaea and were Proteobacteria, and Archaea (Table S3). There were always
subsequently transferred to an ancestral bacterial lineage via more hits to clostridial sequences than to Aquificales, indeed 3–4
HGT (17). Phylogenetic analysis shows that Tlet㛭1684 belongs to times as many when a Thermoanaerobacter was included as one
the group that Tabita et al. called ‘‘IV-Deep Ykr,’’ containing of the clostridial contributors. Thus the low number of top-
proteins from an eclectic mixture of organisms including alpha scoring BLAST hits to Aquificales is not a consequence of
proteobacteria, Archaeoglobus, some clostridia, and a green alga database sampling biases.
(Fig. S3A and ref. 17). The group also contains sequences Because the top-scoring BLAST hit does not always corre-
derived from the Global Ocean Sampling expedition (18). We spond to the closest phylogenetic neighbor (20), we performed
conducted a phylogenetic analysis using only group IV-Deep phylogenetic analyses as well. To avoid sampling bias because of
Ykr sequences and others with similarity to Tlet㛭1684 (Fig. S3A). using only 2 completely sequenced genomes from the Aquifi-
Poor resolution did not allow us to reliably place the Tt. lettingae cales, we additionally selected only 2 genomes from both Ar-
sequence relative to other IV-Deep Ykr sequences. However, chaea and Clostridiales and asked the following question: In
additional evidence for the inclusion of Tlet㛭1684 as a member individual gene trees, does the Thermotogales sequence group
of the group IV-Deep Ykr clade is provided by gene synteny closer to that from the Archaea, Aquificales, or Clostridiales?
around Tlet㛭1684. Four genes encoding enzymes that are likely We evaluated embedded quartets (up to 8 quartets if all 6
part of a methionine salvage pathway are syntenic in the genomes genomes had homologs to a query gene from the Thermoto-
of members of the IV-Deep Ykr clade, Tt. lettingae, Oceanicola gales). While ⬇50% of data sets did not produce sufficiently
granulosus, and Ochrobactrum anthropi (Fig. S3B). These encode resolved signals, those that did agreed with top-scoring BLAST
the RLP, 2 separately encoded transketolase domains, and a hit results: the majority of the genes prefer to group with those
methylthioribose-1-phosphate isomerase. The first 2 genes are from the Clostridiales, while those from the Aquificales tend to
also adjacent in the genomic fragment sequence from Beggiatoa group with the Thermotogales sequences in the least number of
sp. PS. A fifth gene, 5-methylthioribose kinase, is also syntenic data sets (Fig. 3). Although these analyses do not preclude the
in Tt. lettingae and O. anthropi (Fig. S3B). No other Thermoto- possibility that Thermotogales and Aquificales might be consid-
gales genome examined to date contains this RLP gene, sug- ered independently deep branches in the Bacterial tree, they are
gesting that Tt. lettingae acquired this gene via HGT. not, for the clear majority of genes we could look at, sister taxa.

Multiple Gene Histories in Thermotogales. Since the genomewide The Affiliation of the Thermotogales with Aquificales Based on
analysis of Tt. maritima in 1999 (1), GenBank has grown Ribosomal Protein Data. Ribosomal proteins are often used to
substantially, offering much better taxonomic sampling for such derive the phylogenetic position of a group of organisms,
BLAST-based analyses. We performed similar BLAST-based because they are thought to be infrequently transferred (how-
analyses for the five Thermotogales genomes (which included ever, see refs. 21 and 22) and are highly conserved in sequence.
the Tt. maritima genome analyzed in ref. 1), using the nonre- Phylogenetic analysis of 29 concatenated bacterial ribosomal
dundant (nr) database as a reference and recording highest- proteins provides a high level of support for the monophyly of the

Zhaxybayeva et al. PNAS 兩 April 7, 2009 兩 vol. 106 兩 no. 14 兩 5867


Table 1. Taxonomic affiliations of top-scoring BLAST hits of Thermotogales ORFs found in the GenBank nonredundant (nr) database
Tt. maritima Tt. petrophila Tt. lettingae Ts. melanesiensis F. nodosum

Bacteria 1,379 (74%) 1,355 (76%) 1,633 (80%) 1,345 (71%) 1,369 (78%)
Firmicutes 821 (44%) 关66兴 816 (46%) 关65兴 985 (48%) 关59兴 794 (42%) 关60兴 844 (48%) 关58兴
Class Clostridia 680 670 785 644 710
Order Thermoanaerobacterales 273 269 279 217 259
Class Bacilli (Order Bacillales only) 117 119 174 126 111
Proteobacteria 211 207 276 247 215
Aquificae 46 43 36 38 45
Chloroflexi 61 52 52 23 31
Deinococcus-Thermus 37 35 29 23 32
Bacteroidetes 30 32 40 38 29
Cyanobacteria 43 40 44 42 36
Actinobacteria 26 25 46 22 18
Planctomycetes 22 16 28 21 15
Acidobacteria 10 7 16 10 9
Spirochaetes 10 12 14 12 15

Archaea 204 (11%) 197 (11%) 187 (9%) 168 (9%) 135 (8%)
Euryarchaeota 171 155 148 138 111
Thermococcales 95 80 66 51 58
Archaeoglobales 18 18 13 13 7
Methanococcales 18 17 13 20 12
Methanosarcinales 20 21 30 29 16
Crenarchaeota 27 35 31 26 22
Thermoproteales 12 15 13 9 12
Desulfurococcales 8 12 10 11 3
Sulfolobales 5 6 6 4 6
Unclassified Archaea 6 7 8 4 2
Eukaryotes 16 17 13 12 11
Viruses 1 1 0 5 2
Others 6 5 6 6 1
Thermotogales specific* 252 210 201 343 232
ORFans† 52 22 81 170 71

All percentage values are fractions of the total number of ORFs for that genome. Numbers refer to the number of ORFs in each taxonomic category (only
selected major contributors are shown). Within each major taxonomic group only the groups with largest number of genes are shown. Numbers in brackets
indicate the percentage of thermophilic Firmicutes among the reported number of top-scoring BLAST hits to Firmicutes. *, Homologs found in Thermotogales
genomes, but not in the nr database. †, Homologs found neither in the other Thermotogales genomes nor in the nr database.

Thermotogales and 100% support for Aquificales as a sister composition (i.e., representing more saturated sites, probably
group (Fig. S4A). with multiple substitutions per site) were used, the recovered
To determine if the phylogeny of individual ribosomal pro- tree supports the grouping of Thermotogales with members of
teins supports this sister relationship, we compared the signifi- Firmicutes and Proteobacteria with 83% bootstrap support (Fig.
cantly supported bipartitions of individual ribosomal trees and S4C). This suggests that while the more reliable conserved sites
the concatenated tree. Individual ribosomal protein trees often unambiguously group Thermotogales with Aquificales, sequence
disagreed with the concatenated tree or did not resolve the saturation might artificially bring Thermotogales closer to Fir-
relationships (Fig. S4B), most likely because of the relatively micutes. However, this does not explain the strong affinity for
short length of most ribosomal proteins. Only 2 individual Clostridia for the majority of genes other than those encoding
ribosomal proteins significantly support the branch grouping ribosomal proteins (Fig. 3). Separate analysis of the slow sites of
Thermotogales with Aquificales, but none show significant con- the 100–150 genes supporting the sisterhood of Thermotogales
flict (Fig. S4B). However, even if these 2 ribosomal proteins are and Clostridia retained that relationship and did not associate
deleted, the grouping of Thermotogales with Aquificales re- these genes with those of Aquificales. Thus if phylogenetic
mains robustly supported, indicating a strong but distributed classification were based on the majority phylogenetic signal
phylogenetic signal for this grouping in the ribosomal proteins. within the proteome, each of these members of the Thermoto-
It has been suggested that the deep position of Aquifex and gales and the order as a whole should be considered members of
Thermotoga results from long branch attraction (LBA) to Ar- class Clostridia within the Firmicutes—from both BLAST and
chaea due to saturation in rRNA caused by multiple substitutions phylogenetic analyses (Table 1 and Fig. 3).
(23), and the same might be the case for the protein data in their
support of the sisterhood of these 2 taxa. We therefore used a The Thermophilic Ancestral Proteome of the Thermotogales. Re-
nonhomogeneous model that is known to deal better with LBA cently, 2 compositional features of protein sequences have been
(24) and still obtained strong support for the grouping of suggested to be indicators of optimal growth temperature of an
Aquificales and Thermotogales (Fig. S4A). We further investi- organism: overrepresentation of charged amino acid residues
gated the possibility of the LBA artifact using the slow–fast over polar ones (CvP bias) (26) and overrepresentation of
method (25), separately analyzing subsets of the concatenated IVYWREL amino acids (27). Application of both methods of
ribosomal protein alignment that contained faster evolving sites analysis revealed linear correlations between optimal growth
only. When only sites that vary by at least 50% in amino acid temperatures and compositional features of the proteins (Fig.

5868 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0901260106 Zhaxybayeva et al.


ancestral genome (as the plurality phylogenetic tree in Fig. S2
may suggest), the inferred ancestor does not necessarily repre-
sent the lineage at the root of all Thermotogales. Nevertheless,
by establishing a correlation between optimal growth tempera-
ture and the median CvP values obtained with the extant
Thermotogales species (Fig. 4B), we can extrapolate the ances-
tral proteome belonging to organisms with an optimal growth
temperature of ⬇84.5 °C, higher than that of any known modern
member of the Thermotogales.
The GC content of ribosomal rRNA also correlates with
optimal growth temperature (29). Using a more extended 16S
rRNA gene data set of both mesophilic and thermophilic
Fig. 3. Examination of evolutionary relationships between genes in the
members of Thermotogales, we reconstructed ancestral se-
Thermotogales genomes and their homologs in the Archaea, Aquificales, and quences and GC content at the ancestral nodes of the 16S rRNA
Clostridia. For each scenario, all embedded quartets in a tree made from gene phylogenetic tree (Fig. S5C). This analysis suggests that the
Thermotogales, 2 Archaea, 2 Aquificales, and 2 Clostridia homologs were ancestral 16S rRNA also belonged to a thermophile, favoring a
evaluated. A score per scenario was calculated as the ratio of the number of scenario of secondary loss of thermophily within Thermotogales,
embedded quartets significantly supporting the scenario to the total number a process carried farthest in the recently recognized ‘‘mesotoga’’
of evaluated quartets (for more details see SI Methods). Numbers in paren-
lineage (2). However, the position of the root of Bacteria is not
theses refer to subsets of data from the ‘‘Information Storage and Processing’’
functional supercategory. A quartet topology is unrooted and does not allow
certain and the phylogenetic position of Thermotogales within
distinguishing which 2 taxa are responsible for the observed relationship (i.e., Bacteria cannot be easily pinpointed (see also discussion below).
the inferred topology could be because of gene flow either to/from Thermo- Thus the inferred thermophilic character of the group of organ-
togales or to/from Aquificales); analyses in Table 1 provide some evidence for isms that gave rise to the 5 Thermotogales does not provide
directionality. Abbreviations: T, Thermotogales genome; C, Clostridiales; Aq, evidence for the commonly accepted thermophily of early
Aquificales; Ar, Archaea. Bacteria.

Discussion
S5A). Distribution of CvP values is unimodal for proteins within
each genome (Fig. S5B), providing evidence against the hypoth- Thermotogales genomes have complex and incongruent evolu-
esis that thermophily was brought very recently to Thermoto- tionary histories, with compositions appearing to be more the
gales through HGT from Archaea (23). Because the above- product of HGT than of vertical descent, as these are tradition-

MICROBIOLOGY
described compositional features correlate so well with optimal ally defined. Particularly prominent ‘‘highways of gene sharing’’
growth temperatures, we used them to examine the nature of the (30) link Thermotogales to thermophilic Firmicutes and, to a
ancestral proteome of these 5 Thermotogales species. We iden- lesser but significant extent, Archaea. Indeed, the majority of the
tified ancestral sequences of the most recent common node of genes in each of the 5 genomes examined here appear to be
the 5 Thermotogales in each gene family (see Methods) and derived from these sources (Table 1 and Fig. 3). A high level of
inferred CvP values for all of the gene families for which we could between-phylum HGT between Thermotogales and Firmicutes
reconstruct their ancestral sequence. The distribution, with peak is in fact to be expected, since members of the Firmicutes
CvP values of 15–20, suggests that the ancestral proteome of frequently cohabit with Thermotogales in natural environments
Thermotogales contained mostly thermophilic proteins (Fig. (31–33). Indeed, Thermotogales and the Firmicute genera Ther-
4A). The thermophilic ancestral proteome inferred here was not moanaerobacter and Desulfotomaculum are the only bacteria
necessarily the product of any single ancestral genome or per- thought to be indigenous to high-temperature oil reservoirs (32,
haps even of a contemporary population of genomes, because 33). This situation might thus be contrasted to that of the more
HGT can affect coalescence times of individual gene histories physiologically restricted cyanobacteria, which tend to exchange
(28). Furthermore, even if the gene families were from a single more genes within their phylum (34).

A 150 PAML B y = 0.326x - 11.469


ANCESCON R2 = 0.94456
16
120
Number of gene families

90 14
CvP value

60
12

Ancestral proteome
Ance
30
Contemporary Genomes
Cont
10 Trendline
Tren

0
<5 5 10 15 20 25 30 35 >35 60 64 68 72 76 80 84 88
CvP value Optimal Growth Temperature

Fig. 4. CvP values indicate the ancestor of the Thermotogales was an extreme thermophile. (A) Distribution of CvP values for predicted proteins of
Thermotogales’ ancestral proteome. The results of reconstruction with 2 different programs are shown (see Methods). (B) Extrapolation of optimal growth
temperature for the ancestral Thermotogales proteome. Red points represent median CvP values and optimal growth temperature of 5 contemporary
Thermotogales genomes. A strong linear correlation is observed between CvP values and optimal growth temperature. The median CvP value of the ancestral
proteome is based on 423 inferred ancestral protein sequences.

Zhaxybayeva et al. PNAS 兩 April 7, 2009 兩 vol. 106 兩 no. 14 兩 5869


A likely very much smaller portion (informational genes, here mensional relationships nuanced by a relative number of genes
represented by ribosomal proteins) strongly supports a position contributed by multiple lineages. As was first suggested by
of Thermotogales as sister group to Aquificales. A similar and Nelson et al. in 1999 (1), Thermotogales (not just the species but
reciprocal analysis of the genomes of Aquificales, while showing as we conclude here, the whole ‘‘order’’) pose an especially
complex evolutionary histories for some informational genes (3), strong challenge to the Linnaean paradigm, as they can be made
also supported this sisterhood. to conform to an inclusive hierarchy only by ignoring the
What then is the true ‘‘phylogenetic position’’ of this group? majority of the evolutionary information their proteomes
That term is usually interpreted from a taxonomic perspective to contain.
refer to an unambiguous location within an inclusive Linnaean
hierarchy. In the case of the order Thermotogales, which may so Methods
far be unique for a group of this rank, a relatively few genes Genomes were sequenced by the Joint Genome Institute and annotated at
Oak Ridge National Laboratory. The genomes were examined for synteny and
commonly taken to be ‘‘phylogenetic markers’’ favor one answer,
were searched for various mobile and repetitive elements using established
while a clear majority of the proteome gives another. Any procedures as indicated in Results. To infer the phylogenetic position of each
decision to see Thermotogales’ ‘‘true position’’ as sisters to gene, they were subjected to BLAST-based and embedded quartet analyses
Aquifex based only on the former genes thus cannot escape a (34). Ribosomal protein phylogeny was reconstructed using maximum-
certain arbitrary flavor. likelihood and Bayesian methods. Thermophilic signatures were assessed
In their discussion of the placement of Aquificales, Boussau et using CvP bias (26), IVYWREL bias (27) for proteins, and GC content for rRNA.
al. conclude cautiously that ‘‘if a tree of vertical descent can be Details of all analyses are described in SI Methods.
reconstructed for Bacteria, our results suggest Aquificales
should be placed close to Thermotogales’’ (3). We take these ACKNOWLEDGMENTS. We thank the University of Connecticut Biotech Cen-
ter for computational support. The genome sequencing and annotation work
authors’ ‘‘should’’ as normative. It is only because of a conven- was performed under the auspices of the U.S. Department of Energy’s Office
tion backed by an hypothesis—the complexity hypothesis (35)— of Science, Biological and Environmental Research Program, and by the Uni-
that informational genes (and indeed only a flexibly defined versity of California, Lawrence Berkeley National Laboratory (contract DE-
subset of these) are privileged in determining a species’ “true AC02– 05CH11231), Lawrence Livermore National Laboratory (DE-AC52–
07NA27344), and Los Alamos National Laboratory (DE-AC02– 06NA25396).
phylogenetic position”. Some convention like this may be nec- Genomic DNA was prepared by J. L. DiPippo and I. U. Nieves. O.Z. was
essary if our goal is an inclusively hierarchical (tree-like) clas- supported by a Canadian Institutes of Health Research postdoctoral fellow-
sification of all living things, but as the Thermotogales make ship and Canadian Institutes of Health Research Grant MOP-4467 (to W.F.D.),
and C.L.N. was supported by the Norwegian Research Council (Grant 180444/
especially clear, such a classification should not be taken as a V40). This work was also funded by the National Aeronautics and Space
recapitulation of evolutionary history. In the real world of Administration Exobiology program Grants NNX08AQ10G and NNG05GN41G
prokaryote evolution (36) phylogenetic positions are multidi- (jointly to K.M.N. and J.P.G.).

1. Nelson KE, et al. (1999) Evidence for lateral gene transfer between Archaea and 19. Podell S, Gaasterland T (2007) DarkHorse: A method for genome-wide prediction of
Bacteria from genome sequence of Thermotoga maritima. Nature 399:323–329. horizontal gene transfer. Genome Biol 8:R16.
2. Nesbø CL, Dlutek M, Zhaxybayeva O, Doolittle WF (2006) Evidence for existence of 20. Koski LB, Golding GB (2001) The closest BLAST hit is often not the nearest neighbor. J
‘‘mesotogas,’’ members of the order Thermotogales adapted to low-temperature Mol Evol 52:540 –542.
environments. Appl Environ Microbiol 72:5061–5068. 21. Brochier C, Philippe H, Moreira D (2000) The evolutionary history of ribosomal protein
3. Boussau B, Gueguen L, Gouy M (2008) Accounting for horizontal gene transfers RpS14: Horizontal gene transfer at the heart of the ribosome. Trends Genet 16:529 –
explains conflicting hypotheses regarding the position of aquificales in the phylogeny 533.
of Bacteria. BMC Evol Biol 8:272. 22. Makarova KS, Ponomarev VA, Koonin EV (2001) Two C or not two C: Recurrent
4. Clark AJ, et al. (1993) Genetic and molecular analyses of the C-terminal region of the disruption of Zn-ribbons, gene duplication, lineage-specific gene loss, and horizontal
recE gene from the Rac prophage of Escherichia coli K-12 reveal the recT gene. J gene transfer in evolution of bacterial ribosomal proteins. Genome Biol 2:RESEARCH
Bacteriol 175:7673–7682. 0033.
5. Nesbø CL, Doolittle WF (2003) Active self-splicing group I introns in 23S rRNA genes of 23. Brochier C, Philippe H (2002) Phylogeny: A non-hyperthermophilic ancestor for bac-
hyperthermophilic bacteria, derived from introns in eukaryotic organelles. Proc Natl teria. Nature 417:244.
Acad Sci USA 100:10806 –10811. 24. Lartillot N, Brinkmann H, Philippe H (2007) Suppression of long-branch attraction
6. Singer GAC, Hickey DA (2000) Nucleotide bias causes a genomewide bias in the amino artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol
acid composition of proteins. Mol Biol Evol 17:1581–1588. 7(Suppl 1):S4.
7. Nesbø CL, Nelson KE, Doolittle WF (2002) Suppressive subtractive hybridization detects 25. Brinkmann H, Philippe H (1999) Archaea sister group of Bacteria? Indications from tree
extensive genomic diversity in Thermotoga maritima. J Bacteriol 184:4475– 4488. reconstruction artifacts in ancient phylogenies. Mol Biol Evol 16:817– 825.
8. Nesbø CL, Dlutek M, Doolittle WF (2006) Recombination in Thermotoga: Implications 26. Suhre K, Claverie JM (2003) Genomic correlates of hyperthermostability, an update.
for species concepts and biogeography. Genetics 172:759 –769. J Biol Chem 278:17198 –17202.
9. Mongodin EF, et al. (2005) Gene transfer and genome plasticity in Thermotoga 27. Zeldovich KB, Berezovsky IN, Shakhnovich EI (2007) Protein and DNA sequence deter-
maritima, a model hyperthermophilic species. J Bacteriol 187:4935– 4944. minants of thermophilic adaptation. PLoS Comput Biol 3:e5.
10. Coleman ML, et al. (2006) Genomic islands and the ecology and evolution of Prochlo- 28. Zhaxybayeva O, Peter Gogarten J (2004) Cladogenesis, coalescence and the evolution
rococcus. Science 311:1768 –1770. of the three domains of life. Trends Genet 20:182–187.
11. Selig M, Xavier KB, Santos H, Schonheit P (1997) Comparative analysis of Embden- 29. Galtier N, Tourasse N, Gouy M (1999) A nonhyperthermophilic common ancestor to
Meyerhof and Entner-Doudoroff glycolytic pathways in hyperthermophilic archaea extant life forms. Science 283:220 –221.
and the bacterium. Thermotoga Arch Microbiol 167:217–232. 30. Beiko RG, Harlow TJ, Ragan MA (2005) Highways of gene sharing in prokaryotes. Proc
12. Calteau A, Gouy M, Perriere G (2005) Horizontal transfer of two operons coding for Natl Acad Sci USA 102:14332–14337.
hydrogenases between bacteria and archaea. J Mol Evol 60:557–565. 31. Bonch-Osmolovskaya EA, et al. (2003) Radioisotopic, culture-based, and oligonucleo-
13. Sapra R, Bagramyan K, Adams MW (2003) A simple energy-conserving system: Proton tide microchip analyses of thermophilic microbial communities in a continental high-
reduction coupled to proton translocation. Proc Natl Acad Sci USA 100:7545–7550. temperature petroleum reservoir. Appl Environ Microbiol 69:6143– 6151.
14. Sapra R, Verhagen M, Adams MWW (2000) Purification and characterization of a 32. Dahle H, Garshol F, Madsen M, Birkeland NK (2008) Microbial community structure
membrane-bound hydrogenase from the hyperthermophilic archaeon Pyrococcus analysis of produced water from a high-temperature North Sea oil-field. Antonie
furiosus. J Bacteriol 182:3423–3428. Leeuwenhoek 93:37– 49.
15. Käslin S, Childers SE, Noll KM (1998) Membrane-associated redox enzymes in Thermo- 33. Magot M (2005) Petroleum Microbiology, eds Ollivier B, Magot M (ASM Press, Wash-
toga neapolitana. Arch Microbiol 170:297–303. ington, DC), pp. 21–33.
16. Sekowska A, Danchin A (2002) The methionine salvage pathway in Bacillus subtilis. 34. Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT (2006) Phyloge-
BMC Microbiol 2:8. netic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer
17. Tabita FR, et al. (2007) Function, structure, and evolution of the RubisCO-like proteins events. Genome Res 16:1099 –1108.
and their RubisCO homologs. Microbiol Mol Biol Rev 71:576 –599. 35. Jain R, Rivera MC, Lake JA (1999) Horizontal gene transfer among genomes: The
18. Rusch DB, et al. (2007) The Sorcerer II Global Ocean Sampling expedition: Northwest complexity hypothesis. Proc Natl Acad Sci USA 96:3801–3806.
Atlantic through eastern tropical Pacific. PLoS Biol 5:e77. 36. Dagan T, Martin W (2006) The tree of one percent. Genome Biol 7:118.

5870 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0901260106 Zhaxybayeva et al.

You might also like