Thermophilic thermotoga
Thermophilic thermotoga
Thermophilic thermotoga
Contributed by W. Ford Doolittle, February 11, 2009 (sent for review January 6, 2009)
Since publication of the first Thermotogales genome, Thermotoga developments are relevant here: first that mesophilic Thermo-
maritima strain MSB8, single- and multi-gene analyses have dis- togales have been discovered (2), raising the possibility that
agreed on the phylogenetic position of this order of Bacteria. Here hyperthermophily is not ancestral to the group, and second that
we present the genome sequences of 4 additional members of the a thorough analysis of A. aeolicus shows that, although many of
Thermotogales (Tt. petrophila, Tt. lettingae, Thermosipho melane- its informational genes support sisterhood with Tt. maritima,
siensis, and Fervidobacterium nodosum) and a comprehensive substantial exchange of some of these genes has occurred with
comparative analysis including the original T. maritima genome. -proteobacteria (3).
While ribosomal protein genes strongly place Thermotogales as a Although genome data derived from Tt. maritima have found
sister group to Aquificales, the majority of genes with sufficient wide use, this sequence provides only a single data set to
phylogenetic signal show affinities to Archaea and Firmicutes, represent all Thermotogales, of which at least 25 named species
especially Clostridia. Indeed, on the basis of the majority of genes have been isolated from diverse geothermal features worldwide.
in their genomes (including genes that are also found in Aquifi- Consequently, there is a potentially large genomic resource
cales), Thermotogales should be considered members of the Fir- available to more thoroughly examine the extent to which HGT
micutes. This result highlights the conflict between the taxonomic from archaea or other groups has contributed to the evolution
goal of assigning every species to a unique position in an inclusive of these organisms. We present here an analysis of genomes from
Linnaean hierarchy and the evolutionary goal of understanding this pivotal lineage, expanded by the addition of 4 sequences
phylogenesis in the presence of pervasive horizontal gene transfer completed at the Joint Genome Institute. These are from Tt.
MICROBIOLOGY
(HGT) within prokaryotes. Amino acid compositions of recon- petrophila RKU-1, Tt. lettingae TMO, Thermosipho melanesiensis
structed ancestral sequences from 423 gene families suggest an BI429, and Fervidobacterium nodosum Rt17-B1, species span-
origin of this gene pool even more thermophilic than extant ning much of the known Thermotogales phylogenetic spectrum
members of this order, followed by adaptation to lower growth and isolated from sites around the world. All are extreme
temperatures within the Thermotogales. thermophiles or hyperthermophiles and grow primarily on sug-
ars. We use these new sequences to revisit the role of HGT from
classification 兩 horizontal (lateral) gene transfer 兩 thermoadaptation 兩 Archaea and Firmicutes in the origin of Thermotogales, consider
taxonomy 兩 phylogenomic the meaning of such chimerism for positioning the group, and
examine properties of the ancestral Thermotogales proteome by
investigating the amino acid composition of ancestral protein
T he 1999 publication of the genome sequence of Thermotoga
maritima strain MSB8 brought horizontal (or lateral) gene
transfer (HGT or LGT) to the attention of genome biologists (1)
sequences.
and at the same time marked the beginning of a long quest for Results
this hyperthermophilic organism’s true phylogenetic home or Genome Characteristics. The Thermotogales considered here in-
taxonomic position. That report suggested that up to 24% of Tt. clude 2 close relatives (Tt. maritima and Tt. petrophila) and 3 with
maritima genes, many clustered in its chromosome, were ac- genus-level divergence. All have genome sizes similar to Tt.
quired by HGT from archaea: almost as many (21%) showed maritima [1.86 Mbp; supporting information (SI) Table S1] (1)
Firmicute affinities. Although rRNA phylogenies most often with Tt. lettingae having the largest genome in this group (2.14
placed Tt. maritima (and Aquifex aeolicus, another hyperther- Mbp). Of the 5 genomes, only those from Tt. maritima strain
mophilic bacterium) at the base of the bacterial tree, there was MSB8 and Tt. petrophila showed strong synteny over their entire
little consistent support for this from protein-coding genes. lengths, with 3 inversions (Fig. S1). PSI-TBLASTN analysis
Indeed, Nelson et al. (1) concluded that ‘‘the phylogenetic revealed many putative or fragmentary insertion sequence (IS)
position of Aquifex and Thermotoga, and the nature of the elements in the 5 sequenced Thermotogales genomes and a
deepest branching eubacterial species, should be considered tendency of certain genomes to accumulate specific families of
ambiguous.’’ This situation has not changed much in the ensuing IS elements over others (Table S1). Tt. lettingae has the fewest
10 years: many single- or multigene analyses put Tt. maritima
(sometimes with A. aeolicus as its sister) deepest in the bacterial
Author contributions: O.Z., K.E.N., W.F.D., J.P.G., and K.M.N. designed research; O.Z., K.S.S.,
tree, but several convincing reports (again including several P.L., G.P.F., D.M.B., R.T.D., C.L.N., and K.M.N. performed research; O.Z., K.S.S., P.L., G.P.F.,
multigene studies) place Tt. maritima with bacterial taxa not D.M.B., K.E.N., C.L.N., J.P.G., and K.M.N. analyzed data; and O.Z., K.S.S., P.L., G.P.F., W.F.D.,
generally thought of as deep (most often Firmicutes, frequently J.P.G., and K.M.N. wrote the paper.
among Clostridia). These results have been used to support The authors declare no conflict of interest.
claims for (i) a hyperthermophilic ancestry for Bacteria (or Data deposition: The sequences reported in this paper have been deposited in the GenBank
indeed for all life), (ii) the retention by Thermotogales and (accession nos. CP000702, CP000812, CP000716, and CP000771).
Aquificales, as basal lineages, of ancestral genes kept otherwise 1To whom correspondence should be addressed. E-mail: ford@dal.ca.
only in Archaea, or (iii) their adaptation to high-temperature life This article contains supporting information online at www.pnas.org/cgi/content/full/
by import of genes from hyperthermophilic archaea. Two recent 0901260106/DCSupplemental.
MICROBIOLOGY
replicate as described above. The number of top-scoring BLAST
methionine salvage pathway (16). These RuBisCO-like proteins hits to the Aquificales did not increase above 60 (even in the case
(RLPs) are related to true RuBisCO proteins, and an evolu- where the database did not contain any Clostridiales), but an
tionary scheme has been proposed that suggests that the increase was noted for the number of hits to other Firmicutes,
RuBisCO large subunit and RLP arose in the archaea and were Proteobacteria, and Archaea (Table S3). There were always
subsequently transferred to an ancestral bacterial lineage via more hits to clostridial sequences than to Aquificales, indeed 3–4
HGT (17). Phylogenetic analysis shows that Tlet㛭1684 belongs to times as many when a Thermoanaerobacter was included as one
the group that Tabita et al. called ‘‘IV-Deep Ykr,’’ containing of the clostridial contributors. Thus the low number of top-
proteins from an eclectic mixture of organisms including alpha scoring BLAST hits to Aquificales is not a consequence of
proteobacteria, Archaeoglobus, some clostridia, and a green alga database sampling biases.
(Fig. S3A and ref. 17). The group also contains sequences Because the top-scoring BLAST hit does not always corre-
derived from the Global Ocean Sampling expedition (18). We spond to the closest phylogenetic neighbor (20), we performed
conducted a phylogenetic analysis using only group IV-Deep phylogenetic analyses as well. To avoid sampling bias because of
Ykr sequences and others with similarity to Tlet㛭1684 (Fig. S3A). using only 2 completely sequenced genomes from the Aquifi-
Poor resolution did not allow us to reliably place the Tt. lettingae cales, we additionally selected only 2 genomes from both Ar-
sequence relative to other IV-Deep Ykr sequences. However, chaea and Clostridiales and asked the following question: In
additional evidence for the inclusion of Tlet㛭1684 as a member individual gene trees, does the Thermotogales sequence group
of the group IV-Deep Ykr clade is provided by gene synteny closer to that from the Archaea, Aquificales, or Clostridiales?
around Tlet㛭1684. Four genes encoding enzymes that are likely We evaluated embedded quartets (up to 8 quartets if all 6
part of a methionine salvage pathway are syntenic in the genomes genomes had homologs to a query gene from the Thermoto-
of members of the IV-Deep Ykr clade, Tt. lettingae, Oceanicola gales). While ⬇50% of data sets did not produce sufficiently
granulosus, and Ochrobactrum anthropi (Fig. S3B). These encode resolved signals, those that did agreed with top-scoring BLAST
the RLP, 2 separately encoded transketolase domains, and a hit results: the majority of the genes prefer to group with those
methylthioribose-1-phosphate isomerase. The first 2 genes are from the Clostridiales, while those from the Aquificales tend to
also adjacent in the genomic fragment sequence from Beggiatoa group with the Thermotogales sequences in the least number of
sp. PS. A fifth gene, 5-methylthioribose kinase, is also syntenic data sets (Fig. 3). Although these analyses do not preclude the
in Tt. lettingae and O. anthropi (Fig. S3B). No other Thermoto- possibility that Thermotogales and Aquificales might be consid-
gales genome examined to date contains this RLP gene, sug- ered independently deep branches in the Bacterial tree, they are
gesting that Tt. lettingae acquired this gene via HGT. not, for the clear majority of genes we could look at, sister taxa.
Multiple Gene Histories in Thermotogales. Since the genomewide The Affiliation of the Thermotogales with Aquificales Based on
analysis of Tt. maritima in 1999 (1), GenBank has grown Ribosomal Protein Data. Ribosomal proteins are often used to
substantially, offering much better taxonomic sampling for such derive the phylogenetic position of a group of organisms,
BLAST-based analyses. We performed similar BLAST-based because they are thought to be infrequently transferred (how-
analyses for the five Thermotogales genomes (which included ever, see refs. 21 and 22) and are highly conserved in sequence.
the Tt. maritima genome analyzed in ref. 1), using the nonre- Phylogenetic analysis of 29 concatenated bacterial ribosomal
dundant (nr) database as a reference and recording highest- proteins provides a high level of support for the monophyly of the
Bacteria 1,379 (74%) 1,355 (76%) 1,633 (80%) 1,345 (71%) 1,369 (78%)
Firmicutes 821 (44%) 关66兴 816 (46%) 关65兴 985 (48%) 关59兴 794 (42%) 关60兴 844 (48%) 关58兴
Class Clostridia 680 670 785 644 710
Order Thermoanaerobacterales 273 269 279 217 259
Class Bacilli (Order Bacillales only) 117 119 174 126 111
Proteobacteria 211 207 276 247 215
Aquificae 46 43 36 38 45
Chloroflexi 61 52 52 23 31
Deinococcus-Thermus 37 35 29 23 32
Bacteroidetes 30 32 40 38 29
Cyanobacteria 43 40 44 42 36
Actinobacteria 26 25 46 22 18
Planctomycetes 22 16 28 21 15
Acidobacteria 10 7 16 10 9
Spirochaetes 10 12 14 12 15
Archaea 204 (11%) 197 (11%) 187 (9%) 168 (9%) 135 (8%)
Euryarchaeota 171 155 148 138 111
Thermococcales 95 80 66 51 58
Archaeoglobales 18 18 13 13 7
Methanococcales 18 17 13 20 12
Methanosarcinales 20 21 30 29 16
Crenarchaeota 27 35 31 26 22
Thermoproteales 12 15 13 9 12
Desulfurococcales 8 12 10 11 3
Sulfolobales 5 6 6 4 6
Unclassified Archaea 6 7 8 4 2
Eukaryotes 16 17 13 12 11
Viruses 1 1 0 5 2
Others 6 5 6 6 1
Thermotogales specific* 252 210 201 343 232
ORFans† 52 22 81 170 71
All percentage values are fractions of the total number of ORFs for that genome. Numbers refer to the number of ORFs in each taxonomic category (only
selected major contributors are shown). Within each major taxonomic group only the groups with largest number of genes are shown. Numbers in brackets
indicate the percentage of thermophilic Firmicutes among the reported number of top-scoring BLAST hits to Firmicutes. *, Homologs found in Thermotogales
genomes, but not in the nr database. †, Homologs found neither in the other Thermotogales genomes nor in the nr database.
Thermotogales and 100% support for Aquificales as a sister composition (i.e., representing more saturated sites, probably
group (Fig. S4A). with multiple substitutions per site) were used, the recovered
To determine if the phylogeny of individual ribosomal pro- tree supports the grouping of Thermotogales with members of
teins supports this sister relationship, we compared the signifi- Firmicutes and Proteobacteria with 83% bootstrap support (Fig.
cantly supported bipartitions of individual ribosomal trees and S4C). This suggests that while the more reliable conserved sites
the concatenated tree. Individual ribosomal protein trees often unambiguously group Thermotogales with Aquificales, sequence
disagreed with the concatenated tree or did not resolve the saturation might artificially bring Thermotogales closer to Fir-
relationships (Fig. S4B), most likely because of the relatively micutes. However, this does not explain the strong affinity for
short length of most ribosomal proteins. Only 2 individual Clostridia for the majority of genes other than those encoding
ribosomal proteins significantly support the branch grouping ribosomal proteins (Fig. 3). Separate analysis of the slow sites of
Thermotogales with Aquificales, but none show significant con- the 100–150 genes supporting the sisterhood of Thermotogales
flict (Fig. S4B). However, even if these 2 ribosomal proteins are and Clostridia retained that relationship and did not associate
deleted, the grouping of Thermotogales with Aquificales re- these genes with those of Aquificales. Thus if phylogenetic
mains robustly supported, indicating a strong but distributed classification were based on the majority phylogenetic signal
phylogenetic signal for this grouping in the ribosomal proteins. within the proteome, each of these members of the Thermoto-
It has been suggested that the deep position of Aquifex and gales and the order as a whole should be considered members of
Thermotoga results from long branch attraction (LBA) to Ar- class Clostridia within the Firmicutes—from both BLAST and
chaea due to saturation in rRNA caused by multiple substitutions phylogenetic analyses (Table 1 and Fig. 3).
(23), and the same might be the case for the protein data in their
support of the sisterhood of these 2 taxa. We therefore used a The Thermophilic Ancestral Proteome of the Thermotogales. Re-
nonhomogeneous model that is known to deal better with LBA cently, 2 compositional features of protein sequences have been
(24) and still obtained strong support for the grouping of suggested to be indicators of optimal growth temperature of an
Aquificales and Thermotogales (Fig. S4A). We further investi- organism: overrepresentation of charged amino acid residues
gated the possibility of the LBA artifact using the slow–fast over polar ones (CvP bias) (26) and overrepresentation of
method (25), separately analyzing subsets of the concatenated IVYWREL amino acids (27). Application of both methods of
ribosomal protein alignment that contained faster evolving sites analysis revealed linear correlations between optimal growth
only. When only sites that vary by at least 50% in amino acid temperatures and compositional features of the proteins (Fig.
Discussion
S5A). Distribution of CvP values is unimodal for proteins within
each genome (Fig. S5B), providing evidence against the hypoth- Thermotogales genomes have complex and incongruent evolu-
esis that thermophily was brought very recently to Thermoto- tionary histories, with compositions appearing to be more the
gales through HGT from Archaea (23). Because the above- product of HGT than of vertical descent, as these are tradition-
MICROBIOLOGY
described compositional features correlate so well with optimal ally defined. Particularly prominent ‘‘highways of gene sharing’’
growth temperatures, we used them to examine the nature of the (30) link Thermotogales to thermophilic Firmicutes and, to a
ancestral proteome of these 5 Thermotogales species. We iden- lesser but significant extent, Archaea. Indeed, the majority of the
tified ancestral sequences of the most recent common node of genes in each of the 5 genomes examined here appear to be
the 5 Thermotogales in each gene family (see Methods) and derived from these sources (Table 1 and Fig. 3). A high level of
inferred CvP values for all of the gene families for which we could between-phylum HGT between Thermotogales and Firmicutes
reconstruct their ancestral sequence. The distribution, with peak is in fact to be expected, since members of the Firmicutes
CvP values of 15–20, suggests that the ancestral proteome of frequently cohabit with Thermotogales in natural environments
Thermotogales contained mostly thermophilic proteins (Fig. (31–33). Indeed, Thermotogales and the Firmicute genera Ther-
4A). The thermophilic ancestral proteome inferred here was not moanaerobacter and Desulfotomaculum are the only bacteria
necessarily the product of any single ancestral genome or per- thought to be indigenous to high-temperature oil reservoirs (32,
haps even of a contemporary population of genomes, because 33). This situation might thus be contrasted to that of the more
HGT can affect coalescence times of individual gene histories physiologically restricted cyanobacteria, which tend to exchange
(28). Furthermore, even if the gene families were from a single more genes within their phylum (34).
90 14
CvP value
60
12
Ancestral proteome
Ance
30
Contemporary Genomes
Cont
10 Trendline
Tren
0
<5 5 10 15 20 25 30 35 >35 60 64 68 72 76 80 84 88
CvP value Optimal Growth Temperature
Fig. 4. CvP values indicate the ancestor of the Thermotogales was an extreme thermophile. (A) Distribution of CvP values for predicted proteins of
Thermotogales’ ancestral proteome. The results of reconstruction with 2 different programs are shown (see Methods). (B) Extrapolation of optimal growth
temperature for the ancestral Thermotogales proteome. Red points represent median CvP values and optimal growth temperature of 5 contemporary
Thermotogales genomes. A strong linear correlation is observed between CvP values and optimal growth temperature. The median CvP value of the ancestral
proteome is based on 423 inferred ancestral protein sequences.
1. Nelson KE, et al. (1999) Evidence for lateral gene transfer between Archaea and 19. Podell S, Gaasterland T (2007) DarkHorse: A method for genome-wide prediction of
Bacteria from genome sequence of Thermotoga maritima. Nature 399:323–329. horizontal gene transfer. Genome Biol 8:R16.
2. Nesbø CL, Dlutek M, Zhaxybayeva O, Doolittle WF (2006) Evidence for existence of 20. Koski LB, Golding GB (2001) The closest BLAST hit is often not the nearest neighbor. J
‘‘mesotogas,’’ members of the order Thermotogales adapted to low-temperature Mol Evol 52:540 –542.
environments. Appl Environ Microbiol 72:5061–5068. 21. Brochier C, Philippe H, Moreira D (2000) The evolutionary history of ribosomal protein
3. Boussau B, Gueguen L, Gouy M (2008) Accounting for horizontal gene transfers RpS14: Horizontal gene transfer at the heart of the ribosome. Trends Genet 16:529 –
explains conflicting hypotheses regarding the position of aquificales in the phylogeny 533.
of Bacteria. BMC Evol Biol 8:272. 22. Makarova KS, Ponomarev VA, Koonin EV (2001) Two C or not two C: Recurrent
4. Clark AJ, et al. (1993) Genetic and molecular analyses of the C-terminal region of the disruption of Zn-ribbons, gene duplication, lineage-specific gene loss, and horizontal
recE gene from the Rac prophage of Escherichia coli K-12 reveal the recT gene. J gene transfer in evolution of bacterial ribosomal proteins. Genome Biol 2:RESEARCH
Bacteriol 175:7673–7682. 0033.
5. Nesbø CL, Doolittle WF (2003) Active self-splicing group I introns in 23S rRNA genes of 23. Brochier C, Philippe H (2002) Phylogeny: A non-hyperthermophilic ancestor for bac-
hyperthermophilic bacteria, derived from introns in eukaryotic organelles. Proc Natl teria. Nature 417:244.
Acad Sci USA 100:10806 –10811. 24. Lartillot N, Brinkmann H, Philippe H (2007) Suppression of long-branch attraction
6. Singer GAC, Hickey DA (2000) Nucleotide bias causes a genomewide bias in the amino artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol
acid composition of proteins. Mol Biol Evol 17:1581–1588. 7(Suppl 1):S4.
7. Nesbø CL, Nelson KE, Doolittle WF (2002) Suppressive subtractive hybridization detects 25. Brinkmann H, Philippe H (1999) Archaea sister group of Bacteria? Indications from tree
extensive genomic diversity in Thermotoga maritima. J Bacteriol 184:4475– 4488. reconstruction artifacts in ancient phylogenies. Mol Biol Evol 16:817– 825.
8. Nesbø CL, Dlutek M, Doolittle WF (2006) Recombination in Thermotoga: Implications 26. Suhre K, Claverie JM (2003) Genomic correlates of hyperthermostability, an update.
for species concepts and biogeography. Genetics 172:759 –769. J Biol Chem 278:17198 –17202.
9. Mongodin EF, et al. (2005) Gene transfer and genome plasticity in Thermotoga 27. Zeldovich KB, Berezovsky IN, Shakhnovich EI (2007) Protein and DNA sequence deter-
maritima, a model hyperthermophilic species. J Bacteriol 187:4935– 4944. minants of thermophilic adaptation. PLoS Comput Biol 3:e5.
10. Coleman ML, et al. (2006) Genomic islands and the ecology and evolution of Prochlo- 28. Zhaxybayeva O, Peter Gogarten J (2004) Cladogenesis, coalescence and the evolution
rococcus. Science 311:1768 –1770. of the three domains of life. Trends Genet 20:182–187.
11. Selig M, Xavier KB, Santos H, Schonheit P (1997) Comparative analysis of Embden- 29. Galtier N, Tourasse N, Gouy M (1999) A nonhyperthermophilic common ancestor to
Meyerhof and Entner-Doudoroff glycolytic pathways in hyperthermophilic archaea extant life forms. Science 283:220 –221.
and the bacterium. Thermotoga Arch Microbiol 167:217–232. 30. Beiko RG, Harlow TJ, Ragan MA (2005) Highways of gene sharing in prokaryotes. Proc
12. Calteau A, Gouy M, Perriere G (2005) Horizontal transfer of two operons coding for Natl Acad Sci USA 102:14332–14337.
hydrogenases between bacteria and archaea. J Mol Evol 60:557–565. 31. Bonch-Osmolovskaya EA, et al. (2003) Radioisotopic, culture-based, and oligonucleo-
13. Sapra R, Bagramyan K, Adams MW (2003) A simple energy-conserving system: Proton tide microchip analyses of thermophilic microbial communities in a continental high-
reduction coupled to proton translocation. Proc Natl Acad Sci USA 100:7545–7550. temperature petroleum reservoir. Appl Environ Microbiol 69:6143– 6151.
14. Sapra R, Verhagen M, Adams MWW (2000) Purification and characterization of a 32. Dahle H, Garshol F, Madsen M, Birkeland NK (2008) Microbial community structure
membrane-bound hydrogenase from the hyperthermophilic archaeon Pyrococcus analysis of produced water from a high-temperature North Sea oil-field. Antonie
furiosus. J Bacteriol 182:3423–3428. Leeuwenhoek 93:37– 49.
15. Käslin S, Childers SE, Noll KM (1998) Membrane-associated redox enzymes in Thermo- 33. Magot M (2005) Petroleum Microbiology, eds Ollivier B, Magot M (ASM Press, Wash-
toga neapolitana. Arch Microbiol 170:297–303. ington, DC), pp. 21–33.
16. Sekowska A, Danchin A (2002) The methionine salvage pathway in Bacillus subtilis. 34. Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT (2006) Phyloge-
BMC Microbiol 2:8. netic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer
17. Tabita FR, et al. (2007) Function, structure, and evolution of the RubisCO-like proteins events. Genome Res 16:1099 –1108.
and their RubisCO homologs. Microbiol Mol Biol Rev 71:576 –599. 35. Jain R, Rivera MC, Lake JA (1999) Horizontal gene transfer among genomes: The
18. Rusch DB, et al. (2007) The Sorcerer II Global Ocean Sampling expedition: Northwest complexity hypothesis. Proc Natl Acad Sci USA 96:3801–3806.
Atlantic through eastern tropical Pacific. PLoS Biol 5:e77. 36. Dagan T, Martin W (2006) The tree of one percent. Genome Biol 7:118.