Abstract
Profiling phylogenetic marker genes, such as the 16S rRNA gene, is a key tool for studies of microbial communities but does not provide direct evidence of a community's functional capabilities. Here we describe PICRUSt (phylogenetic investigation of communities by reconstruction of unobserved states), a computational approach to predict the functional composition of a metagenome using marker gene data and a database of reference genomes. PICRUSt uses an extended ancestral-state reconstruction algorithm to predict which gene families are present and then combines gene families to estimate the composite metagenome. Using 16S information, PICRUSt recaptures key findings from the Human Microbiome Project and accurately predicts the abundance of gene families in host-associated and environmental communities, with quantifiable uncertainty. Our results demonstrate that phylogeny and function are sufficiently linked that this 'predictive metagenomic' approach should provide useful insights into the thousands of uncultivated microbial communities for which only marker gene surveys are currently available.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
206,07 € per year
only 17,17 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
References
Cho, I. & Blaser, M.J. The human microbiome: at the interface of health and disease. Nat. Rev. Genet. 13, 260–270 (2012).
Suen, G. et al. An insect herbivore microbiome with high plant biomass-degrading capacity. PLoS Genet. 6, e1001129 (2010).
Kuczynski, J. et al. Direct sequencing of the human microbiome readily reveals community differences. Genome Biol. 11, 210 (2010).
Parks, D.H. & Beiko, R.G. Measures of phylogenetic differentiation provide robust and complementary insights into microbial communities. ISME J. 7, 173–183 (2013).
Knight, R. et al. Unlocking the potential of metagenomics through replicated experimental design. Nat. Biotechnol. 30, 513–520 (2012).
Segata, N. & Huttenhower, C. Toward an efficient method of identifying core genes for evolutionary and functional microbial phylogenies. PLoS ONE 6, e24704 (2011).
Snel, B., Bork, P. & Huynen, M.A. Genome phylogeny based on gene content. Nat. Genet. 21, 108–110 (1999).
Konstantinidis, K.T. & Tiedje, J.M. Genomic insights that advance the species definition for prokaryotes. Proc. Natl. Acad. Sci. USA 102, 2567–2572 (2005).
Zaneveld, J.R., Lozupone, C., Gordon, J.I. & Knight, R. Ribosomal RNA diversity predicts genome diversity in gut bacteria and their relatives. Nucleic Acids Res. 38, 3869–3879 (2010).
Xu, J. et al. Evolution of symbiotic bacteria in the distal human intestine. PLoS Biol. 5, e156 (2007).
Collins, R.E. & Higgs, P.G. Testing the infinitely many genes model for the evolution of the bacterial core genome and pangenome. Mol. Biol. Evol. 29, 3413–3425 (2012).
Martiny, A.C., Treseder, K. & Pusch, G. Phylogenetic conservatism of functional traits in microorganisms. ISME J. 7, 830–838 (2013).
Morgan, X.C. et al. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 13, R79 (2012).
Muegge, B.D. et al. Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans. Science 332, 970–974 (2011).
Barott, K.L. et al. Microbial to reef scale interactions between the reef-building coral Montastraea annularis and benthic algae. Proc. Biol. Sci. 279, 1655–1664 (2012).
Chaffron, S., Rehrauer, H., Pernthaler, J. & von Mering, C. A global network of coexisting microbes from environmental and whole-genome sequence data. Genome Res. 20, 947–959 (2010).
Kembel, S.W., Wu, M., Eisen, J.A. & Green, J.L. Incorporating 16S gene copy number information improves estimates of microbial diversity and abundance. PLoS Comput. Biol. 8, e1002743 (2012).
Smillie, C.S. et al. Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480, 241–244 (2011).
Meehan, C.J. & Beiko, R.G. Lateral gene transfer of an ABC transporter complex between major constituents of the human gut microbiome. BMC Microbiol. 12, 248 (2012).
Boucher, Y. et al. Lateral gene transfer and the origins of prokaryotic groups. Annu. Rev. Genet. 37, 283–328 (2003).
Hemme, C.L. et al. Metagenomic insights into evolution of a heavy metal-contaminated groundwater microbial community. ISME J. 4, 660–672 (2010).
The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Fierer, N. et al. Cross-biome metagenomic analyses of soil microbial communities and their functional attributes. Proc. Natl. Acad. Sci. USA 109, 21390–21395 (2012).
Harris, J.K. et al. Phylogenetic stratigraphy in the Guerrero Negro hypersaline microbial mat. ISME J. 7, 50–60 (2013).
Kunin, V. et al. Millimeter-scale genetic gradients and community-level molecular convergence in a hypersaline microbial mat. Mol. Syst. Biol. 4, 198 (2008).
Markowitz, V.M. et al. IMG: the Integrated Microbial Genomes database and comparative analysis system. Nucleic Acids Res. 40, D115–D122 (2012).
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109–D114 (2012).
Tatusov, R.L., Koonin, E.V. & Lipman, D.J. A genomic perspective on protein families. Science 278, 631–637 (1997).
DeSantis, T.Z. et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006).
Caporaso, J.G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).
Abubucker, S. et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8, e1002358 (2012).
Meyer, F. et al. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9, 386 (2008).
McHardy, A.C. & Rigoutsos, I. What's in the mix: phylogenetic classification of metagenome sequence samples. Curr. Opin. Microbiol. 10, 499–503 (2007).
Haas, B.J. et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 21, 494–504 (2011).
Patel, P.V. et al. Analysis of membrane proteins in metagenomics: networks of correlated environmental features and protein families. Genome Res. 20, 960–971 (2010).
Parks, D.H. & Beiko, R.G. Identifying biologically relevant differences between metagenomic communities. Bioinformatics 26, 715–721 (2010).
Zuniga, M. et al. Horizontal gene transfer in the molecular evolution of mannose PTS transporters. Mol. Biol. Evol. 22, 1673–1685 (2005).
Daniluk, T. et al. Aerobic and anaerobic bacteria in subgingival and supragingival plaques of adult patients with periodontal disease. Adv. Med. Sci. 51 (suppl. 1), 81–85 (2006).
Segata, N. et al. Composition of the adult digestive tract bacterial microbiome based on seven mouth surfaces, tonsils, throat and stool samples. Genome Biol. 13, R42 (2012).
Knowlton, N. & Jackson, J.B. Shifting baselines, local impacts, and global change on coral reefs. PLoS Biol. 6, e54 (2008).
Smith, J.E. et al. Indirect effects of algae on coral: algae-mediated, microbe-induced coral mortality. Ecol. Lett. 9, 835–845 (2006).
Rasher, D.B., Stout, E.P., Engel, S., Kubanek, J. & Hay, M.E. Macroalgal terpenes function as allelopathic agents against reef corals. Proc. Natl. Acad. Sci. USA 108, 17726–17731 (2011).
Gajer, P. et al. Temporal dynamics of the human vaginal microbiota. Sci. Transl. Med. 4, 132ra52 (2012).
Costello, E.K. et al. Bacterial community variation in human body habitats across space and time. Science 326, 1694–1697 (2009).
McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).
Csuros, M. Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 26, 1910–1912 (2010).
Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).
Edgar, R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
Federhen, S. The NCBI Taxonomy database. Nucleic Acids Res. 40, D136–D143 (2012).
Acknowledgements
We would like to thank A. Robbins-Pianka and N. Segata, along with all members of the Knight, Beiko, Vega Thurber, Caporaso and Huttenhower laboratories, for their assistance during PICRUSt conception and development. This work was supported in part by the Canadian Institutes of Health Research (M.G.I.L., R.G.B.), the Canada Research Chairs program (R.G.B.), US National Science Foundation (NSF) OCE #1130786 (R.V.T., D.B.), the Howard Hughes Medical Institute (R.K.), US National Institutes of Health (NIH) P01DK078669, U01HG004866, R01HG004872 (R.K.), the Crohn's and Colitis Foundation of America (R.K.), the Sloan Foundation (R.K.), NIH 1R01HG005969 (C.H.), NSF CAREER DBI-1053486 (C.H.) and ARO W911NF-11-1-0473 (C.H.).
Author information
Authors and Affiliations
Contributions
The teams of M.G.I.L. and R.G.B.; J.A.R. and C.H.; and J.Z., D.K. and R.K. each conceived versions of the gene content prediction algorithm and implemented prototype software. J.Z., M.G.I.L., J.G.C., D.M., D.K., J.C.C., R.K., R.G.B. and C.H. designed the final PICRUSt algorithm and software. J.Z., M.G.I.L., J.G.C. and D.M. wrote the PICRUSt software package. M.G.I.L., J.G.C., D.M. and J.C.C. generated precalculated PICRUSt gene content predictions. D.M. and J.G.C. added functionality to the BIOM software package and the Greengenes resource in support of PICRUSt. M.G.I.L., J.Z., J.G.C., D.M., D.K., J.C.C., J.A.R., R.K., R.G.B. and C.H. applied PICRUSt to control datasets and analyzed the benchmarking data. M.G.I.L., J.Z., J.G.C., D.M., R.K., R.G.B. and C.H. wrote the manuscript. D.E.B. and R.L.V.T. collected and analyzed coral-algal data. All authors edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Results and Supplementary Figures 1–17 (PDF 2107 kb)
Rights and permissions
About this article
Cite this article
Langille, M., Zaneveld, J., Caporaso, J. et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 31, 814–821 (2013). https://doi.org/10.1038/nbt.2676
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.2676