Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life

Abstract

Taxonomy is an organizing principle of biology and is ideally based on evolutionary relationships among organisms. Development of a robust bacterial taxonomy has been hindered by an inability to obtain most bacteria in pure culture and, to a lesser extent, by the historical use of phenotypes to guide classification. Culture-independent sequencing technologies have matured sufficiently that a comprehensive genome-based taxonomy is now possible. We used a concatenated protein phylogeny as the basis for a bacterial taxonomy that conservatively removes polyphyletic groups and normalizes taxonomic ranks on the basis of relative evolutionary divergence. Under this approach, 58% of the 94,759 genomes comprising the Genome Taxonomy Database had changes to their existing taxonomy. This result includes the description of 99 phyla, including six major monophyletic units from the subdivision of the Proteobacteria, and amalgamation of the Candidate Phyla Radiation into a single phylum. Our taxonomy should enable improved classification of uncultured bacteria and provide a sound basis for ecological and evolutionary studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Rank normalization through RED.
Figure 2: RED of NCBI and GTDB taxa in a genome tree inferred from 120 concatenated proteins.
Figure 3: RED and polyphyly of GTDB and NCBI taxa on trees inferred by using varying inference methods and marker sets.
Figure 4: Comparison of GTDB and NCBI taxonomies and naming status of GTDB taxa.
Figure 5: Comparisons of NCBI and GTDB classifications of genomes designated as Clostridia or Bacteroidetes in the GTDB taxonomy.

Similar content being viewed by others

Accession codes

Primary accessions

BioProject

References

  1. Garrity, G.M. A new genomics-driven taxonomy of Bacteria and Archaea: are we there yet? J. Clin. Microbiol. 54, 1956–1963 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Hugenholtz, P., Sharshewski, A. & Parks, D.H. Genome-based microbial taxonomy coming of age. in Microbial Evolution (ed. Ochman, H.) 55–65 (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, USA, 2016).

  3. Yoon, S.H. et al. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int. J. Syst. Evol. Microbiol. 67, 1613–1617 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Godfray, H.C.J. Challenges for taxonomy. Nature 417, 17–19 (2002).

    Article  CAS  PubMed  Google Scholar 

  5. Federhen, S. The NCBI taxonomy database. Nucleic Acids Res. 40, D136–D143 (2012).

    Article  CAS  PubMed  Google Scholar 

  6. Yilmaz, P. et al. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res. 42, D643–D648 (2014).

    Article  CAS  PubMed  Google Scholar 

  7. Cole, J.R. et al. Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 42, D633–D642 (2014).

    Article  CAS  PubMed  Google Scholar 

  8. McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).

    Article  CAS  PubMed  Google Scholar 

  9. Yutin, N. & Galperin, M.Y. A genomic update on clostridial phylogeny: Gram-negative spore formers and other misplaced clostridia. Environ. Microbiol. 15, 2631–2641 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Beiko, R.G. Microbial malaise: how can we classify the microbiome? Trends Microbiol. 23, 671–679 (2015).

    Article  CAS  PubMed  Google Scholar 

  11. Yarza, P. et al. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol. 12, 635–645 (2014).

    Article  CAS  PubMed  Google Scholar 

  12. Abbott, S.L. & Janda, J.M. in The Prokaryotes 3rd edn. (eds. Dworkin, M. et al.) 72–89 (Springer, New York, 2006).

  13. Jumas-Bilak, E., Roudière, L. & Marchandin, H. Description of 'Synergistetes' phyl. nov. and emended description of the phylum 'Deferribacteres' and of the family Syntrophomonadaceae, phylum 'Firmicutes'. Int. J. Syst. Evol. Microbiol. 59, 1028–1035 (2009).

    Article  CAS  PubMed  Google Scholar 

  14. Janda, J.M. & Abbott, S.L. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J. Clin. Microbiol. 45, 2761–2764 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Schulz, F. et al. Towards a balanced view of the bacterial tree of life. Microbiome 5, 140 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  16. DeSantis, T.Z. et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Brochier, C., Forterre, P. & Gribaldo, S. An emerging phylogenetic core of Archaea: phylogenies of transcription and translation machineries converge following addition of new genome sequences. BMC Evol. Biol. 5, 36 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Ciccarelli, F.D. et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287 (2006).

    Article  CAS  PubMed  Google Scholar 

  19. Thiergart, T., Landan, G. & Martin, W.F. Concatenated alignments and the case of the disappearing tree. BMC Evol. Biol. 14, 266 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Brown, C.T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015).

    Article  CAS  PubMed  Google Scholar 

  21. Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, 13219 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Parks, D.H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).

    Article  CAS  PubMed  Google Scholar 

  23. Bapteste, E. et al. Do orthologous gene phylogenies really support tree-thinking? BMC Evol. Biol. 5, 33 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Tonini, J., Moore, A., Stern, D., Shcheglovitova, M. & Ortí, G. Concatenation and species tree methods exhibit statistically indistinguishable accuracy under a range of simulated conditions. PLoS Curr. https://doi.org/10.1371/currents.tol.34260cc27551a527b124ec5f6334b6be (2015).

  25. Hug, L.A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).

    Article  CAS  PubMed  Google Scholar 

  26. Lang, J.M., Darling, A.E. & Eisen, J.A. Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One 8, e62510 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Dupont, C.L. et al. Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage. ISME J. 6, 1186–1199 (2012).

    Article  CAS  PubMed  Google Scholar 

  28. Wu, D., Jospin, G. & Eisen, J.A. Systematic identification of gene families for use as “markers” for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS One 8, e77033 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Giovannoni, S.J., Rappé, M.S., Vergin, K.L. & Adair, N.L. 16S rRNA genes reveal stratified open ocean bacterioplankton populations related to the Green Non-Sulfur bacteria. Proc. Natl. Acad. Sci. USA 93, 7979–7984 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Dojka, M.A., Hugenholtz, P., Haack, S.K. & Pace, N.R. Microbial diversity in a hydrocarbon- and chlorinated-solvent-contaminated aquifer undergoing intrinsic bioremediation. Appl. Environ. Microbiol. 64, 3869–3877 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Zwart, G. et al. Rapid screening for freshwater bacterial groups by using reverse line blot hybridization. Appl. Environ. Microbiol. 69, 5875–5883 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Wolf, M., Müller, T., Dandekar, T. & Pollack, J.D. Phylogeny of Firmicutes with special reference to Mycoplasma (Mollicutes) as inferred from phosphoglycerate kinase amino acid sequence data. Int. J. Syst. Evol. Microbiol. 54, 871–875 (2004).

    Article  CAS  PubMed  Google Scholar 

  33. Lonergan, D.J. et al. Phylogenetic analysis of dissimilatory Fe(III)-reducing bacteria. J. Bacteriol. 178, 2402–2408 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Beiko, R.G. Telling the whole story in a 10,000-genome world. Biol. Direct 6, 34 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Zhang, Y. & Sievert, S.M. Pan-genome analyses identify lineage- and niche-specific markers of evolution and adaptation in Epsilonproteobacteria. Front. Microbiol. 5, 110 (2014).

    PubMed  PubMed Central  Google Scholar 

  36. Hugenholtz, P., Pitulle, C., Hershberger, K.L. & Pace, N.R. Novel division level bacterial diversity in a Yellowstone hot spring. J. Bacteriol. 180, 366–376 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Konstantinidis, K.T. & Tiedje, J.M. Towards a genome-based taxonomy for prokaryotes. J. Bacteriol. 187, 6258–6264 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Wu, D., Doroud, L. & Eisen, J.A. TreeOTU: operational taxonomic unit classification based on phylogenetic trees. Preprint at https://arxiv.org/abs/1308.6333 (2013).

  39. Maniloff, J. in Molecular Biology and Pathogenicity of Mycoplasma (eds. Razin, S. & Herrmann, R.) 31–43 (Springer, New York, 2002).

  40. Kumar, S., Stecher, G., Suleski, M. & Hedges, S.B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).

    Article  CAS  PubMed  Google Scholar 

  41. Marin, J., Battistuzzi, F.U., Brown, A.C. & Hedges, S.B. The timetree of prokaryotes: new insights into their evolution and speciation. Mol. Biol. Evol. 34, 437–446 (2017).

    CAS  PubMed  Google Scholar 

  42. Gadagkar, S.R., Rosenberg, M.S. & Kumar, S. Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. J. Exp. Zoolog. B Mol. Dev. Evol. 304, 64–74 (2005).

    Article  CAS  Google Scholar 

  43. Balvočiūtė, M. & Huson, D.H. SILVA, RDP, Greengenes, NCBI and OTT: how do these taxonomies compare? BMC Genomics 18 (Suppl. 2), 114 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Whitman, W.B. Modest proposals to expand the type material for naming of prokaryotes. Int. J. Syst. Evol. Microbiol. 66, 2108–2112 (2016).

    Article  CAS  PubMed  Google Scholar 

  45. Konstantinidis, K.T., Rosselló-Móra, R. & Amann, R. Uncultivated microbes in need of their own taxonomy. ISME J. 11, 2399–2406 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Comas, I., Homolka, S., Niemann, S. & Gagneux, S. Genotyping of genetically monomorphic bacteria: DNA sequencing in Mycobacterium tuberculosis highlights the limitations of current methodologies. PLoS One 4, e7815 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Martiny, J.B.H. et al. Microbial biogeography: putting microorganisms on the map. Nat. Rev. Microbiol. 4, 102–112 (2006).

    Article  CAS  PubMed  Google Scholar 

  48. Trost, B., Haakensen, M., Pittet, V., Ziola, B. & Kusalik, A. Analysis and comparison of the pan-genomic properties of sixteen well-characterized bacterial genera. BMC Microbiol. 10, 258 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Beaz-Hidalgo, R., Hossain, M.J., Liles, M.R. & Figueras, M.J. Strategies to avoid wrongly labelled genomes using as example the detected wrong taxonomic affiliation for aeromonas genomes in the GenBank database. PLoS One 10, e0115813 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Kook, J.K. et al. Genome-based reclassification of Fusobacterium nucleatum subspecies at the species level. Curr. Microbiol. 74, 1137–1147 (2017).

    Article  CAS  PubMed  Google Scholar 

  51. Bobay, L.M. & Ochman, H. Biological species are universal across life's domains. Genome Biol. Evol. 9, 491–501 (2017).

    Article  PubMed Central  Google Scholar 

  52. Galperin, M.Y., Brover, V., Tolstoy, I. & Yutin, N. Phylogenomic analysis of the family Peptostreptococcaceae (Clostridium cluster XI) and proposal for reclassification of Clostridium litorale (Fendrich et al. 1991) and Eubacterium acidaminophilum (Zindel et al. 1989) as Peptoclostridium litorale gen. nov. comb. nov. and Peptoclostridium acidaminophilum comb. nov. Int. J. Syst. Evol. Microbiol. 66, 5506–5513 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Yarza, P. et al. The All-Species Living Tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains. Syst. Appl. Microbiol. 31, 241–250 (2008).

    Article  CAS  PubMed  Google Scholar 

  54. Sakamoto, M., Iino, T. & Ohkuma, M. Faecalimonas umbilicata gen. nov., sp. nov., isolated from human faeces, and reclassification of Eubacterium contortum, Eubacterium fissicatena and Clostridium oroticum as Faecalicatena contorta gen. nov., comb. nov., Faecalicatena fissicatena comb. nov. and Faecalicatena orotica comb. nov. Int. J. Syst. Evol. Microbiol. 67, 1219–1227 (2017).

    Article  CAS  PubMed  Google Scholar 

  55. Hahnke, R.L. et al. Genome-based taxonomic classification of Bacteroidetes. Front. Microbiol. 7, 2003 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Garrity, G.M., Bell, J.A. & Lilburn, T. in Bergey's Manual of Systematic Bacteriology (eds. Garrity, G. et al.) 575–922 (Springer, New York, 2005).

  57. Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013).

    Article  CAS  PubMed  Google Scholar 

  58. Waite, D.W. et al. Comparative genomic analysis of the class Epsilonproteobacteria and proposed reclassification to Epsilonbacteraeota (phyl. nov.). Front. Microbiol. 8, 682 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Brown, D.R. in Bergey's Manual of Systematic Bacteriology (eds. Krieg, N.R. et al.) 567–724 (Springer, New York, 2010).

  60. Skennerton, C.T. et al. Phylogenomic analysis of Candidatus 'Izimaplasma' species: free-living representatives from a Tenericutes clade found in methane seeps. ISME J. 10, 2679–2692 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Munoz, R., Rosselló-Móra, R. & Amann, R. Revised phylogeny of Bacteroidetes and proposal of sixteen new taxa and two new combinations including Rhodothermaeota phyl. nov. Syst. Appl. Microbiol. 39, 281–296 (2016).

    Article  PubMed  Google Scholar 

  62. Tanner, M.A., Everett, C.L., Coleman, W.J. & Yang, M.M. Complex microbial communities inhabiting sulfide-rich black mud from marine coastal environments. Biotechnol. Alia 8, 1–16 (2000).

    Google Scholar 

  63. Yamada, T. et al. Characterization of filamentous bacteria, belonging to candidate phylum KSB3, that are associated with bulking in methanogenic granular sludges. ISME J. 1, 246–255 (2007).

    Article  CAS  PubMed  Google Scholar 

  64. Sekiguchi, Y. et al. First genomic insights into members of a candidate bacterial phylum responsible for wastewater bulking. PeerJ 3, e740 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. Chuvochina, M. et al. Syst. Appl. Microbiol. The importance of designating type material for uncultured taxa https://doi.org/10.1016/j.syapm.2018.07.003 (2018).

    Article  PubMed  Google Scholar 

  66. Haft, D.H. et al. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 46, D851–D860 (2018).

    Article  CAS  PubMed  Google Scholar 

  67. Leinonen, R., Sugawara, H. & Shumway, M. The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2011).

    Article  CAS  PubMed  Google Scholar 

  68. Ondov, B.D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  69. Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P. & Tyson, G.W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Eddy, S.R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  72. Finn, R.D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).

    Article  CAS  PubMed  Google Scholar 

  73. Haft, D.H., Selengut, J.D. & White, O. The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371–373 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  75. Price, M.N., Dehal, P.S. & Arkin, A.P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Whelan, S. & Goldman, N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699 (2001).

    Article  CAS  PubMed  Google Scholar 

  77. Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994).

    Article  CAS  PubMed  Google Scholar 

  78. Williams, T.A. et al. New substitution models for rooting phylogenetic trees. Phil. Trans. R. Soc. Lond. B 370, 20140336 (2015).

    Article  CAS  Google Scholar 

  79. Ludwig, W. et al. ARB: a software environment for sequence data. Nucleic Acids Res. 32, 1363–1371 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Euzéby, J.P. List of bacterial names with standing in nomenclature: a folder available on the internet. Int. J. Syst. Bacteriol. 47, 590–592 (1997).

    Article  PubMed  Google Scholar 

  81. Parker, C.T., Tindall, B.J. & Garrity, G.M. International Code of Nomenclature of Prokaryotes. Int. J. Syst. Evol. Microbiol. https://doi.org/10.1099/ijsem.0.000778 (2015).

  82. Oren, A. et al. Proposal to include the rank of phylum in the International Code of Nomenclature of Prokaryotes. Int. J. Syst. Evol. Microbiol. 65, 4284–4287 (2015).

    Article  CAS  PubMed  Google Scholar 

  83. Wheeler, T.J. in Proceedings of the 9th Workshop on Algorithms in Bioinformatics (eds. Salzberg, S.L. & Warnow, T.) 375–389 (Springer, Berlin, 2009).

  84. Kozlov, A.M., Aberer, A.J. & Stamatakis, A. ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics 31, 2577–2579 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Nguyen, L.T., Schmidt, H.A., von Haeseler, A. & Minh, B.Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

    Article  CAS  PubMed  Google Scholar 

  86. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Le, S.Q. & Gascuel, O. An improved general amino acid replacement matrix. Mol. Biol. Evol. 25, 1307–1320 (2008).

    Article  CAS  PubMed  Google Scholar 

  88. Nawrocki, E.P. Structural RNA Homology Search and Alignment Using Covariance Models PhD thesis,Washington Univ. in Saint Louis, (2009).

    Google Scholar 

  89. Tavaré, S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math Life Sci. 17, 57–86 (1986).

    Google Scholar 

  90. Kupczok, A., Schmidt, H.A. & von Haeseler, A. Accuracy of phylogeny reconstruction methods combining overlapping gene data sets. Algorithms Mol. Biol. 5, 37 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank P. Yilmaz for helpful discussions on the proposed genome-based taxonomy; QFAB Bioinformatics for providing computational resources; and members of ACE for beta-testing GTDB. The project was primarily supported by an Australian Research Council Laureate Fellowship (FL150100038) awarded to P.H.

Author information

Authors and Affiliations

Authors

Contributions

D.H.P., D.W.W. and P.H. wrote the paper, and all other authors provided constructive suggestions. D.H.P. and P.H. designed the study. M.C. and P.H. performed the taxonomic curation. D.H.P., D.W.W., C.R., A.S., and P.-A.C. performed the bioinformatic analyses. P.-A.C. designed the website.

Corresponding author

Correspondence to Philip Hugenholtz.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Congruence of the GTDB taxonomy on a tree inferred with ExaML from the concatenation of 120 proteins (bac120) and species-dereplicated genome set.

Percentage of GTDB taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the ExaML tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 10,462 genomes within the ExaML tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the GTDB taxonomy when taxonomy is assigned based on their placement in the inferred tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the ExaML tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.

Supplementary Figure 2 Congruence of the GTDB taxonomy on a tree inferred with FastTree from the concatenation of 16 ribosomal proteins (rp1).

(a) Percentage of GTDB taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the rp1 tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 20,699 genomes within the rp1 tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the GTDB taxonomy when taxonomy is assigned based on their placement in the inferred tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the rp1 tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.

Supplementary Figure 3 Congruence of the GTDB taxonomy on a tree inferred with FastTree by using the 16S rRNA gene.

(a) Percentage of GTDB taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the 16S rRNA gene tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 12,435 genomes within the 16S rRNA gene tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the GTDB taxonomy when taxonomy is assigned based on their placement in the inferred gene tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the 16S rRNA gene tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.

Supplementary Figure 4 Congruence of the NCBI taxonomy on a tree inferred with ExaML from the concatenation of 120 proteins (bac120) and species-dereplicated genome set.

(a) Percentage of NCBI taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the ExaML tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 8,905 RefSeq/GenBank genomes within the ExaML tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the NCBI taxonomy when taxonomy is assigned based on their placement in the inferred tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the ExaML tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.

Supplementary Figure 5 Congruence of the NCBI taxonomy on a tree inferred with FastTree from the concatenation of 120 proteins (bac120).

(a) Percentage of NCBI taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the bac120 tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 16,248 RefSeq/GenBank genomes within the bac120 tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the NCBI taxonomy when taxonomy is assigned based on their placement in the inferred tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the bac120 tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses (note that this is a reproduction of Fig. 2a).

Supplementary Figure 6 Congruence of the NCBI taxonomy on a tree inferred with FastTree from the concatenation of 16 ribosomal proteins (rp1).

(a) Percentage of NCBI taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the rp1 tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 16,306 RefSeq/GenBank genomes within the rp1 tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the NCBI taxonomy when taxonomy is assigned based on their placement in the inferred tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the rp1 tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.

Supplementary Figure 7 Congruence of the NCBI taxonomy on a tree inferred with FastTree by using the 16S rRNA gene.

(a) Percentage of NCBI taxa at each rank which are monophyletic, operationally monophyletic, or polyphyletic within the 16S rRNA gene tree. Results were calculated over all taxa comprised of >1 genomes and the number of taxa considered at each rank is shown in parentheses. (b) Percentage of the 12,174 RefSeq/GenBank genomes within the 16S rRNA gene tree with identical, unresolved, or conflicting taxonomic assignments at each rank relative to the NCBI taxonomy when taxonomy is assigned based on their placement in the inferred gene tree. (c) RED of taxa with ≥2 immediate subordinate taxa in the 16S rRNA gene tree, with the same coloring as used in panel a. The number of taxa plotted at each rank is given in parentheses.

Supplementary Figure 8 Comparison of GTDB and SILVA.

Comparison of GTDB and SILVA taxonomic assignments across 10,779 bacterial genomes from RefSeq/GenBank release 80. These genomes are part of the 21,943 dereplicated genomes for which a 16S rRNA gene could be reliably matched by sequence similarity to a 16S rRNA gene in SILVA release 128. For each rank, a taxon was classified as being unchanged if its name was identical in both taxonomies, passively changed if the GTDB taxonomy provided name information absent in the SILVA taxonomy, or actively changed if the name was different between the two taxonomies after adjusting the SILVA taxonomy for colloquial designations indicating of missing taxonomic information (see Methods). Changes between the GTDB and SILVA taxonomies are given in Supp. Table 9.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8 (PDF 1124 kb)

Life Sciences Reporting Summary (PDF 89 kb)

Supplementary Table 1

Robustness of GTDB taxonomy under varying marker sets, maximum-likelihood tree inference software, subsets of taxa, and evolutionary models (XLSX 4427 kb)

Supplementary Table 2

16S rRNA-based taxa names adopted in the GTDB taxonomy and their associated rank and number of circumscribed genomes. (XLSX 12 kb)

Supplementary Table 3

Correspondence between standardly named NCBI and GTDB taxa ordered by degree of polyphyly. (XLSX 131 kb)

Supplementary Table 4

Taxa found to be polyphyletic in one or more of the trees inferred with FastTree, IQ-TREE, or ExaML on species- or genusdereplicated genome sets, or in trees inferred with FastTree using the ribosomal proteins (rp1) marker set or 16S rRNA gene. (XLSX 57 kb)

Supplementary Table 5

Genomes with conflicting or unresolved taxonomic assignments when applying the GTDB taxonomy to the species-dereplicated FastTree, IQ-TREE, or ExaML trees, or trees inferred from the concatenation of 16 ribosomal proteins (rp1) or the 16S rRNA gene. (XLSX 129 kb)

Supplementary Table 6

Pairwise comparison of trees inferred with varying inference methods and marker sets. (XLSX 13 kb)

Supplementary Table 7

Percentage of GTDB taxa at each taxonomic rank that are monophyletic, operationally monophyletic, or polyphyletic in each gene within the bac120 marker set. (XLSX 30 kb)

Supplementary Table 8

NCBI taxa that have been 'retired' in the GTDB taxonomy and brief explanations for their retirement. (XLSX 18 kb)

Supplementary Table 9

Correspondence between NCBI and SILVA taxa ordered by degree of polyphyly. (XLSX 59 kb)

Supplementary Table 10

Comparison of NCBI and GTDB genus and species classifications to those proposed by Beaz-Hidalgo et al. (2015), Kook et al. (2017), and Bobay & Ochman (2017). (XLSX 14 kb)

Supplementary Table 11

Comparison of clostridia classifications proposed by Yutin & Galperin (2013) to the GTDB taxonomy. (XLSX 20 kb)

Supplementary Table 12

Draft genomes with 16S rRNA genes that did not meet the selection criteria for inclusion in the 16S rRNA tree (Online Methods). (XLSX 33 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Parks, D., Chuvochina, M., Waite, D. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36, 996–1004 (2018). https://doi.org/10.1038/nbt.4229

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.4229

This article is cited by

Search

Quick links

Nature Briefing Microbiology

Sign up for the Nature Briefing: Microbiology newsletter — what matters in microbiology research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Microbiology