Abstract
Taxonomy is an organizing principle of biology and is ideally based on evolutionary relationships among organisms. Development of a robust bacterial taxonomy has been hindered by an inability to obtain most bacteria in pure culture and, to a lesser extent, by the historical use of phenotypes to guide classification. Culture-independent sequencing technologies have matured sufficiently that a comprehensive genome-based taxonomy is now possible. We used a concatenated protein phylogeny as the basis for a bacterial taxonomy that conservatively removes polyphyletic groups and normalizes taxonomic ranks on the basis of relative evolutionary divergence. Under this approach, 58% of the 94,759 genomes comprising the Genome Taxonomy Database had changes to their existing taxonomy. This result includes the description of 99 phyla, including six major monophyletic units from the subdivision of the Proteobacteria, and amalgamation of the Candidate Phyla Radiation into a single phylum. Our taxonomy should enable improved classification of uncultured bacteria and provide a sound basis for ecological and evolutionary studies.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
24,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
206,07 € per year
only 17,17 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
References
Garrity, G.M. A new genomics-driven taxonomy of Bacteria and Archaea: are we there yet? J. Clin. Microbiol. 54, 1956–1963 (2016).
Hugenholtz, P., Sharshewski, A. & Parks, D.H. Genome-based microbial taxonomy coming of age. in Microbial Evolution (ed. Ochman, H.) 55–65 (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, USA, 2016).
Yoon, S.H. et al. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int. J. Syst. Evol. Microbiol. 67, 1613–1617 (2017).
Godfray, H.C.J. Challenges for taxonomy. Nature 417, 17–19 (2002).
Federhen, S. The NCBI taxonomy database. Nucleic Acids Res. 40, D136–D143 (2012).
Yilmaz, P. et al. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res. 42, D643–D648 (2014).
Cole, J.R. et al. Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 42, D633–D642 (2014).
McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).
Yutin, N. & Galperin, M.Y. A genomic update on clostridial phylogeny: Gram-negative spore formers and other misplaced clostridia. Environ. Microbiol. 15, 2631–2641 (2013).
Beiko, R.G. Microbial malaise: how can we classify the microbiome? Trends Microbiol. 23, 671–679 (2015).
Yarza, P. et al. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol. 12, 635–645 (2014).
Abbott, S.L. & Janda, J.M. in The Prokaryotes 3rd edn. (eds. Dworkin, M. et al.) 72–89 (Springer, New York, 2006).
Jumas-Bilak, E., Roudière, L. & Marchandin, H. Description of 'Synergistetes' phyl. nov. and emended description of the phylum 'Deferribacteres' and of the family Syntrophomonadaceae, phylum 'Firmicutes'. Int. J. Syst. Evol. Microbiol. 59, 1028–1035 (2009).
Janda, J.M. & Abbott, S.L. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J. Clin. Microbiol. 45, 2761–2764 (2007).
Schulz, F. et al. Towards a balanced view of the bacterial tree of life. Microbiome 5, 140 (2017).
DeSantis, T.Z. et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006).
Brochier, C., Forterre, P. & Gribaldo, S. An emerging phylogenetic core of Archaea: phylogenies of transcription and translation machineries converge following addition of new genome sequences. BMC Evol. Biol. 5, 36 (2005).
Ciccarelli, F.D. et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287 (2006).
Thiergart, T., Landan, G. & Martin, W.F. Concatenated alignments and the case of the disappearing tree. BMC Evol. Biol. 14, 266 (2014).
Brown, C.T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015).
Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, 13219 (2016).
Parks, D.H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
Bapteste, E. et al. Do orthologous gene phylogenies really support tree-thinking? BMC Evol. Biol. 5, 33 (2005).
Tonini, J., Moore, A., Stern, D., Shcheglovitova, M. & Ortí, G. Concatenation and species tree methods exhibit statistically indistinguishable accuracy under a range of simulated conditions. PLoS Curr. https://doi.org/10.1371/currents.tol.34260cc27551a527b124ec5f6334b6be (2015).
Hug, L.A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
Lang, J.M., Darling, A.E. & Eisen, J.A. Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One 8, e62510 (2013).
Dupont, C.L. et al. Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage. ISME J. 6, 1186–1199 (2012).
Wu, D., Jospin, G. & Eisen, J.A. Systematic identification of gene families for use as “markers” for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS One 8, e77033 (2013).
Giovannoni, S.J., Rappé, M.S., Vergin, K.L. & Adair, N.L. 16S rRNA genes reveal stratified open ocean bacterioplankton populations related to the Green Non-Sulfur bacteria. Proc. Natl. Acad. Sci. USA 93, 7979–7984 (1996).
Dojka, M.A., Hugenholtz, P., Haack, S.K. & Pace, N.R. Microbial diversity in a hydrocarbon- and chlorinated-solvent-contaminated aquifer undergoing intrinsic bioremediation. Appl. Environ. Microbiol. 64, 3869–3877 (1998).
Zwart, G. et al. Rapid screening for freshwater bacterial groups by using reverse line blot hybridization. Appl. Environ. Microbiol. 69, 5875–5883 (2003).
Wolf, M., Müller, T., Dandekar, T. & Pollack, J.D. Phylogeny of Firmicutes with special reference to Mycoplasma (Mollicutes) as inferred from phosphoglycerate kinase amino acid sequence data. Int. J. Syst. Evol. Microbiol. 54, 871–875 (2004).
Lonergan, D.J. et al. Phylogenetic analysis of dissimilatory Fe(III)-reducing bacteria. J. Bacteriol. 178, 2402–2408 (1996).
Beiko, R.G. Telling the whole story in a 10,000-genome world. Biol. Direct 6, 34 (2011).
Zhang, Y. & Sievert, S.M. Pan-genome analyses identify lineage- and niche-specific markers of evolution and adaptation in Epsilonproteobacteria. Front. Microbiol. 5, 110 (2014).
Hugenholtz, P., Pitulle, C., Hershberger, K.L. & Pace, N.R. Novel division level bacterial diversity in a Yellowstone hot spring. J. Bacteriol. 180, 366–376 (1998).
Konstantinidis, K.T. & Tiedje, J.M. Towards a genome-based taxonomy for prokaryotes. J. Bacteriol. 187, 6258–6264 (2005).
Wu, D., Doroud, L. & Eisen, J.A. TreeOTU: operational taxonomic unit classification based on phylogenetic trees. Preprint at https://arxiv.org/abs/1308.6333 (2013).
Maniloff, J. in Molecular Biology and Pathogenicity of Mycoplasma (eds. Razin, S. & Herrmann, R.) 31–43 (Springer, New York, 2002).
Kumar, S., Stecher, G., Suleski, M. & Hedges, S.B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).
Marin, J., Battistuzzi, F.U., Brown, A.C. & Hedges, S.B. The timetree of prokaryotes: new insights into their evolution and speciation. Mol. Biol. Evol. 34, 437–446 (2017).
Gadagkar, S.R., Rosenberg, M.S. & Kumar, S. Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. J. Exp. Zoolog. B Mol. Dev. Evol. 304, 64–74 (2005).
Balvočiūtė, M. & Huson, D.H. SILVA, RDP, Greengenes, NCBI and OTT: how do these taxonomies compare? BMC Genomics 18 (Suppl. 2), 114 (2017).
Whitman, W.B. Modest proposals to expand the type material for naming of prokaryotes. Int. J. Syst. Evol. Microbiol. 66, 2108–2112 (2016).
Konstantinidis, K.T., Rosselló-Móra, R. & Amann, R. Uncultivated microbes in need of their own taxonomy. ISME J. 11, 2399–2406 (2017).
Comas, I., Homolka, S., Niemann, S. & Gagneux, S. Genotyping of genetically monomorphic bacteria: DNA sequencing in Mycobacterium tuberculosis highlights the limitations of current methodologies. PLoS One 4, e7815 (2009).
Martiny, J.B.H. et al. Microbial biogeography: putting microorganisms on the map. Nat. Rev. Microbiol. 4, 102–112 (2006).
Trost, B., Haakensen, M., Pittet, V., Ziola, B. & Kusalik, A. Analysis and comparison of the pan-genomic properties of sixteen well-characterized bacterial genera. BMC Microbiol. 10, 258 (2010).
Beaz-Hidalgo, R., Hossain, M.J., Liles, M.R. & Figueras, M.J. Strategies to avoid wrongly labelled genomes using as example the detected wrong taxonomic affiliation for aeromonas genomes in the GenBank database. PLoS One 10, e0115813 (2015).
Kook, J.K. et al. Genome-based reclassification of Fusobacterium nucleatum subspecies at the species level. Curr. Microbiol. 74, 1137–1147 (2017).
Bobay, L.M. & Ochman, H. Biological species are universal across life's domains. Genome Biol. Evol. 9, 491–501 (2017).
Galperin, M.Y., Brover, V., Tolstoy, I. & Yutin, N. Phylogenomic analysis of the family Peptostreptococcaceae (Clostridium cluster XI) and proposal for reclassification of Clostridium litorale (Fendrich et al. 1991) and Eubacterium acidaminophilum (Zindel et al. 1989) as Peptoclostridium litorale gen. nov. comb. nov. and Peptoclostridium acidaminophilum comb. nov. Int. J. Syst. Evol. Microbiol. 66, 5506–5513 (2016).
Yarza, P. et al. The All-Species Living Tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains. Syst. Appl. Microbiol. 31, 241–250 (2008).
Sakamoto, M., Iino, T. & Ohkuma, M. Faecalimonas umbilicata gen. nov., sp. nov., isolated from human faeces, and reclassification of Eubacterium contortum, Eubacterium fissicatena and Clostridium oroticum as Faecalicatena contorta gen. nov., comb. nov., Faecalicatena fissicatena comb. nov. and Faecalicatena orotica comb. nov. Int. J. Syst. Evol. Microbiol. 67, 1219–1227 (2017).
Hahnke, R.L. et al. Genome-based taxonomic classification of Bacteroidetes. Front. Microbiol. 7, 2003 (2016).
Garrity, G.M., Bell, J.A. & Lilburn, T. in Bergey's Manual of Systematic Bacteriology (eds. Garrity, G. et al.) 575–922 (Springer, New York, 2005).
Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013).
Waite, D.W. et al. Comparative genomic analysis of the class Epsilonproteobacteria and proposed reclassification to Epsilonbacteraeota (phyl. nov.). Front. Microbiol. 8, 682 (2017).
Brown, D.R. in Bergey's Manual of Systematic Bacteriology (eds. Krieg, N.R. et al.) 567–724 (Springer, New York, 2010).
Skennerton, C.T. et al. Phylogenomic analysis of Candidatus 'Izimaplasma' species: free-living representatives from a Tenericutes clade found in methane seeps. ISME J. 10, 2679–2692 (2016).
Munoz, R., Rosselló-Móra, R. & Amann, R. Revised phylogeny of Bacteroidetes and proposal of sixteen new taxa and two new combinations including Rhodothermaeota phyl. nov. Syst. Appl. Microbiol. 39, 281–296 (2016).
Tanner, M.A., Everett, C.L., Coleman, W.J. & Yang, M.M. Complex microbial communities inhabiting sulfide-rich black mud from marine coastal environments. Biotechnol. Alia 8, 1–16 (2000).
Yamada, T. et al. Characterization of filamentous bacteria, belonging to candidate phylum KSB3, that are associated with bulking in methanogenic granular sludges. ISME J. 1, 246–255 (2007).
Sekiguchi, Y. et al. First genomic insights into members of a candidate bacterial phylum responsible for wastewater bulking. PeerJ 3, e740 (2015).
Chuvochina, M. et al. Syst. Appl. Microbiol. The importance of designating type material for uncultured taxa https://doi.org/10.1016/j.syapm.2018.07.003 (2018).
Haft, D.H. et al. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 46, D851–D860 (2018).
Leinonen, R., Sugawara, H. & Shumway, M. The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2011).
Ondov, B.D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).
Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P. & Tyson, G.W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Eddy, S.R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Finn, R.D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).
Haft, D.H., Selengut, J.D. & White, O. The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371–373 (2003).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
Price, M.N., Dehal, P.S. & Arkin, A.P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
Whelan, S. & Goldman, N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699 (2001).
Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994).
Williams, T.A. et al. New substitution models for rooting phylogenetic trees. Phil. Trans. R. Soc. Lond. B 370, 20140336 (2015).
Ludwig, W. et al. ARB: a software environment for sequence data. Nucleic Acids Res. 32, 1363–1371 (2004).
Euzéby, J.P. List of bacterial names with standing in nomenclature: a folder available on the internet. Int. J. Syst. Bacteriol. 47, 590–592 (1997).
Parker, C.T., Tindall, B.J. & Garrity, G.M. International Code of Nomenclature of Prokaryotes. Int. J. Syst. Evol. Microbiol. https://doi.org/10.1099/ijsem.0.000778 (2015).
Oren, A. et al. Proposal to include the rank of phylum in the International Code of Nomenclature of Prokaryotes. Int. J. Syst. Evol. Microbiol. 65, 4284–4287 (2015).
Wheeler, T.J. in Proceedings of the 9th Workshop on Algorithms in Bioinformatics (eds. Salzberg, S.L. & Warnow, T.) 375–389 (Springer, Berlin, 2009).
Kozlov, A.M., Aberer, A.J. & Stamatakis, A. ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics 31, 2577–2579 (2015).
Nguyen, L.T., Schmidt, H.A., von Haeseler, A. & Minh, B.Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Le, S.Q. & Gascuel, O. An improved general amino acid replacement matrix. Mol. Biol. Evol. 25, 1307–1320 (2008).
Nawrocki, E.P. Structural RNA Homology Search and Alignment Using Covariance Models PhD thesis,Washington Univ. in Saint Louis, (2009).
Tavaré, S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math Life Sci. 17, 57–86 (1986).
Kupczok, A., Schmidt, H.A. & von Haeseler, A. Accuracy of phylogeny reconstruction methods combining overlapping gene data sets. Algorithms Mol. Biol. 5, 37 (2010).
Acknowledgements
We thank P. Yilmaz for helpful discussions on the proposed genome-based taxonomy; QFAB Bioinformatics for providing computational resources; and members of ACE for beta-testing GTDB. The project was primarily supported by an Australian Research Council Laureate Fellowship (FL150100038) awarded to P.H.
Author information
Authors and Affiliations
Contributions
D.H.P., D.W.W. and P.H. wrote the paper, and all other authors provided constructive suggestions. D.H.P. and P.H. designed the study. M.C. and P.H. performed the taxonomic curation. D.H.P., D.W.W., C.R., A.S., and P.-A.C. performed the bioinformatic analyses. P.-A.C. designed the website.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Congruence of the GTDB taxonomy on a tree inferred with ExaML from the concatenation of 120 proteins (bac120) and species-dereplicated genome set.
Percentage of GTDB taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the ExaML tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 10,462 genomes within the ExaML tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the GTDB taxonomy when taxonomy is assigned based on their placement in the inferred tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the ExaML tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.
Supplementary Figure 2 Congruence of the GTDB taxonomy on a tree inferred with FastTree from the concatenation of 16 ribosomal proteins (rp1).
(a) Percentage of GTDB taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the rp1 tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 20,699 genomes within the rp1 tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the GTDB taxonomy when taxonomy is assigned based on their placement in the inferred tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the rp1 tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.
Supplementary Figure 3 Congruence of the GTDB taxonomy on a tree inferred with FastTree by using the 16S rRNA gene.
(a) Percentage of GTDB taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the 16S rRNA gene tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 12,435 genomes within the 16S rRNA gene tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the GTDB taxonomy when taxonomy is assigned based on their placement in the inferred gene tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the 16S rRNA gene tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.
Supplementary Figure 4 Congruence of the NCBI taxonomy on a tree inferred with ExaML from the concatenation of 120 proteins (bac120) and species-dereplicated genome set.
(a) Percentage of NCBI taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the ExaML tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 8,905 RefSeq/GenBank genomes within the ExaML tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the NCBI taxonomy when taxonomy is assigned based on their placement in the inferred tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the ExaML tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.
Supplementary Figure 5 Congruence of the NCBI taxonomy on a tree inferred with FastTree from the concatenation of 120 proteins (bac120).
(a) Percentage of NCBI taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the bac120 tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 16,248 RefSeq/GenBank genomes within the bac120 tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the NCBI taxonomy when taxonomy is assigned based on their placement in the inferred tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the bac120 tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses (note that this is a reproduction of Fig. 2a).
Supplementary Figure 6 Congruence of the NCBI taxonomy on a tree inferred with FastTree from the concatenation of 16 ribosomal proteins (rp1).
(a) Percentage of NCBI taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the rp1 tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 16,306 RefSeq/GenBank genomes within the rp1 tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the NCBI taxonomy when taxonomy is assigned based on their placement in the inferred tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the rp1 tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.
Supplementary Figure 7 Congruence of the NCBI taxonomy on a tree inferred with FastTree by using the 16S rRNA gene.
(a) Percentage of NCBI taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the 16S rRNA gene tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 12,174 RefSeq/GenBank genomes within the 16S rRNA gene tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the NCBI taxonomy when taxonomy is assigned based on their placement in the inferred gene tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the 16S rRNA gene tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.
Supplementary Figure 8 Comparison of GTDB and SILVA.
Comparison of GTDB and SILVA taxonomic assignments across 10,779 bacterial genomes from RefSeq/GenBank release 80. These genomes are part of the 21,943 dereplicated genomes for which a 16S rRNA gene could be reliably matched by sequence similarity to a 16S rRNA gene in SILVA release 128. For each rank, a taxon was classified as being unchanged if its name was identical in both taxonomies, passively changed if the GTDB taxonomy provided name information absent in the SILVA taxonomy, or actively changed if the name was different between the two taxonomies after adjusting the SILVA taxonomy for colloquial designations indicating of missing taxonomic information (see Methods). Changes between the GTDB and SILVA taxonomies are given in Supp. Table 9.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–8 (PDF 1124 kb)
Supplementary Table 1
Robustness of GTDB taxonomy under varying marker sets, maximum-likelihood tree inference software, subsets of taxa, and evolutionary models (XLSX 4427 kb)
Supplementary Table 2
16S rRNA-based taxa names adopted in the GTDB taxonomy and their associated rank and number of circumscribed genomes. (XLSX 12 kb)
Supplementary Table 3
Correspondence between standardly named NCBI and GTDB taxa ordered by degree of polyphyly. (XLSX 131 kb)
Supplementary Table 4
Taxa found to be polyphyletic in one or more of the trees inferred with FastTree, IQ-TREE, or ExaML on species- or genusdereplicated genome sets, or in trees inferred with FastTree using the ribosomal proteins (rp1) marker set or 16S rRNA gene. (XLSX 57 kb)
Supplementary Table 5
Genomes with conflicting or unresolved taxonomic assignments when applying the GTDB taxonomy to the species-dereplicated FastTree, IQ-TREE, or ExaML trees, or trees inferred from the concatenation of 16 ribosomal proteins (rp1) or the 16S rRNA gene. (XLSX 129 kb)
Supplementary Table 6
Pairwise comparison of trees inferred with varying inference methods and marker sets. (XLSX 13 kb)
Supplementary Table 7
Percentage of GTDB taxa at each taxonomic rank that are monophyletic, operationally monophyletic, or polyphyletic in each gene within the bac120 marker set. (XLSX 30 kb)
Supplementary Table 8
NCBI taxa that have been 'retired' in the GTDB taxonomy and brief explanations for their retirement. (XLSX 18 kb)
Supplementary Table 9
Correspondence between NCBI and SILVA taxa ordered by degree of polyphyly. (XLSX 59 kb)
Supplementary Table 10
Comparison of NCBI and GTDB genus and species classifications to those proposed by Beaz-Hidalgo et al. (2015), Kook et al. (2017), and Bobay & Ochman (2017). (XLSX 14 kb)
Supplementary Table 11
Comparison of clostridia classifications proposed by Yutin & Galperin (2013) to the GTDB taxonomy. (XLSX 20 kb)
Supplementary Table 12
Draft genomes with 16S rRNA genes that did not meet the selection criteria for inclusion in the 16S rRNA tree (Online Methods). (XLSX 33 kb)
Rights and permissions
About this article
Cite this article
Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996–1004 (2018). https://doi.org/10.1038/nbt.4229
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.4229
This article is cited by
-
Genome-based taxonomy and functional prediction of Sphingomonas fuzhouensis sp. nov. and Massilia phyllosphaerae sp. nov. isolated from Pennisetum sp. with plant growth-promoting potential
Antonie van Leeuwenhoek (2025)
-
Disentangling the microbial genomic traits associated with aromatic hydrocarbon degradation in a jet fuel-contaminated aquifer
Biodegradation (2025)
-
Hyperexpansion of genetic diversity and metabolic capacity of extremophilic bacteria and archaea in ancient Andean lake sediments
Microbiome (2024)
-
Stratification of human gut microbiomes by succinotype is associated with inflammatory bowel disease status
Microbiome (2024)
-
Full-length 16S rRNA gene sequencing combined with adequate database selection improves the description of Arctic marine prokaryotic communities
Annals of Microbiology (2024)