Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

A probabilistic view of gene function

Abstract

Cells are controlled by the complex and dynamic actions of thousands of genes. With the sequencing of many genomes, the key problem has shifted from identifying genes to knowing what the genes do; we need a framework for expressing that knowledge. Even the most rigorous attempts to construct ontological frameworks describing gene function (e.g., the Gene Ontology project) ultimately rely on manual curation and are thus labor-intensive and subjective. But an alternative exists: the field of functional genomics is piecing together networks of gene interactions, and although these data are currently incomplete and error-prone, they provide a glimpse of a new, probabilistic view of gene function. We outline such a framework, which revolves around a statistical description of gene interactions derived from large, systematically compiled data sets. In this probabilistic view, pleiotropy is implicit, all data have errors and the definition of gene function is an iterative process that ultimately converges on the correct functions. The relationships between the genes are defined by the data, not by hand. Even this comprehensive view fails to capture key aspects of gene function, not least their dynamics in time and space, showing that there are limitations to the model that must ultimately be addressed.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Comparison of top-down and bottom-up methods of generating a framework for describing gene function.
Figure 2: Integration of diverse data sets into a probabilistic network.
Figure 3: Networks of interactions between S. cerevisiae genes.
Figure 4: C. elegans gene networks derived from S. cerevisiae networks are good predictors of gene function.

Similar content being viewed by others

References

  1. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Article  CAS  Google Scholar 

  2. Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 30, 276–280 (2002).

    Article  CAS  Google Scholar 

  3. Kanehisa, M. et al. The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42–46 (2002).

    Article  CAS  Google Scholar 

  4. Karp, P.D. et al. The EcoCyc Database. Nucleic Acids Res. 30, 56–58 (2002).

    Article  CAS  Google Scholar 

  5. Karp, P.D. et al. The MetaCyc Database. Nucleic Acids Res. 30, 59–61 (2002).

    Article  CAS  Google Scholar 

  6. Mewes, H.W. et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31–34 (2002).

    Article  CAS  Google Scholar 

  7. Mulder, N.J. et al. The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315–318 (2003).

    Article  CAS  Google Scholar 

  8. The Gene Ontology Consortium. Creating the gene ontology resource: design and implementation. Genome Res. 11, 1425–1433 (2001).

  9. Tjian, R. The biochemistry of transcription in eukaryotes: a paradigm for multisubunit regulatory complexes. Philos. Trans. R. Soc. Lond. B Biol. Sci. 351, 491–499 (1996).

    Article  CAS  Google Scholar 

  10. Brand, M. et al. UV-damaged DNA-binding protein in the TFTC complex links DNA damage recognition to nucleosome acetylation. EMBO J. 20, 3187–3196 (2001).

    Article  CAS  Google Scholar 

  11. Martinez, E. et al. Human STAGA complex is a chromatin-acetylating transcription coactivator that interacts with pre-mRNA splicing and DNA damage-binding factors in vivo. Mol. Cell. Biol. 21, 6782–6795 (2001).

    Article  CAS  Google Scholar 

  12. Hall, A. The cellular functions of small GTP-binding proteins. Science 249, 635–640 (1990).

    Article  CAS  Google Scholar 

  13. Ritzi, M. et al. Human minichromosome maintenance proteins and human origin recognition complex 2 protein on chromatin. J. Biol. Chem. 273, 24543–24549 (1998).

    Article  CAS  Google Scholar 

  14. Rowles, A. et al. Interaction between the origin recognition complex and the replication licensing system in Xenopus. Cell 87, 287–296 (1996).

    Article  CAS  Google Scholar 

  15. Coleman, T.R., Carpenter, P.B. & Dunphy, W.G. The Xenopus Cdc6 protein is essential for the initiation of a single round of DNA replication in cell-free extracts. Cell 87, 53–63 (1996).

    Article  CAS  Google Scholar 

  16. Bell, S.P. & Stillman, B. ATP-dependent recognition of eukaryotic origins of DNA replication by a multiprotein complex. Nature 357, 128–134 (1992).

    Article  CAS  Google Scholar 

  17. Vashee, S. et al. Assembly of the human origin recognition complex. J. Biol. Chem. 276, 26666–26673 (2001).

    Article  CAS  Google Scholar 

  18. Dhar, S.K., Delmolino, L. & Dutta, A. Architecture of the human origin recognition complex. J. Biol. Chem. 276, 29067–29071 (2001).

    Article  CAS  Google Scholar 

  19. Raychaudhuri, S. et al. Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res. 12, 203–214 (2002).

    Article  CAS  Google Scholar 

  20. Troyanskaya, O.G. et al. A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc. Natl. Acad. Sci. USA 100, 8348–8353 (2003).

    Article  CAS  Google Scholar 

  21. Clare, A. & King, R.D. Machine learning of functional class from phenotype data. Bioinformatics 18, 160–166 (2002).

    Article  CAS  Google Scholar 

  22. Von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002).

    Article  CAS  Google Scholar 

  23. Jansen, R. et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003).

    Article  CAS  Google Scholar 

  24. Schwikowski, B., Uetz, P. & Fields, S. A network of protein-protein interactions in yeast. Nat. Biotechnol. 18, 1257–1261 (2000).

    Article  CAS  Google Scholar 

  25. Deane, C.M. et al. Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics 1, 349–356 (2002).

    Article  CAS  Google Scholar 

  26. Saito, R., Suzuki, H. & Hayashizaki, Y. Interaction generality, a measurement to assess the reliability of a protein-protein interaction. Nucleic Acids Res. 30, 1163–1168 (2002).

    Article  CAS  Google Scholar 

  27. Goldberg, D.S. & Roth, F.P. Assessing experimentally derived interactions in a small world. Proc. Natl. Acad. Sci. USA 100, 4372–4376 (2003).

    Article  CAS  Google Scholar 

  28. Jansen, R. et al. Integration of genomic datasets to predict protein complexes in yeast. J. Struct. Funct. Genomics 2, 71–81 (2002).

    Article  CAS  Google Scholar 

  29. Huynen, M. et al. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 10, 1204–1210 (2000).

    Article  CAS  Google Scholar 

  30. von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003).

    Article  CAS  Google Scholar 

  31. Marcotte, E.M. et al. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999).

    Article  CAS  Google Scholar 

  32. Ito, T. et al. Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl. Acad. Sci. USA 97, 1143–1147 (2000).

    Article  CAS  Google Scholar 

  33. Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).

    Article  CAS  Google Scholar 

  34. Tong, A.H. et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295, 321–324 (2002).

    Article  CAS  Google Scholar 

  35. Uetz, P. et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000).

    Article  CAS  Google Scholar 

  36. Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).

    Article  CAS  Google Scholar 

  37. Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).

    Article  CAS  Google Scholar 

  38. Tong, A.H. et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368 (2001).

    Article  CAS  Google Scholar 

  39. Rives, A.W. & Galitski, T. Modular organization of cellular networks. Proc. Natl. Acad. Sci. USA 100, 1128–1133 (2003).

    Article  CAS  Google Scholar 

  40. Spirin, V. & Mirny, L.A. Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. USA 100, 12123–12128 (2003).

    Article  CAS  Google Scholar 

  41. Tornow, S. & Mewes, H.W. Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res. 31, 6283–6289 (2003).

    Article  CAS  Google Scholar 

  42. Ideker, T. & Lauffenburger, D. Building with a scaffold: emerging strategies for high- to low-level cellular modeling. Trends Biotechnol. 21, 255–262 (2003).

    Article  CAS  Google Scholar 

  43. Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).

    Article  CAS  Google Scholar 

  44. Von Mering, C. et al. Genome evolution reveals biochemical networks and functional modules. Proc. Natl. Acad. Sci. USA 100, 15428–15433 (2003).

    Article  CAS  Google Scholar 

  45. Krause, R., von Mering, C. & Bork, P. A comprehensive set of protein complexes in yeast: mining large scale protein-protein interaction screens. Bioinformatics 19, 1901–1908 (2003).

    Article  CAS  Google Scholar 

  46. Manke, T., Bringas, R. & Vingron, M. Correlating protein-DNA and protein-protein interaction networks. J. Mol. Biol. 333, 75–85 (2003).

    Article  CAS  Google Scholar 

  47. Stuart, J.M. et al. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).

    Article  CAS  Google Scholar 

  48. Date, S.V. & Marcotte, E.M. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat. Biotechnol. 21, 1055–1062 (2003).

    Article  CAS  Google Scholar 

  49. Wu, L.F. et al. Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat. Genet. 31, 255–265 (2002).

    Article  CAS  Google Scholar 

  50. Snel, B., Bork, P. & Huynen, M.A. The identification of functional modules from the genomic association of genes. Proc. Natl. Acad. Sci. USA 99, 5890–5895 (2002).

    Article  CAS  Google Scholar 

  51. Wagner, A. & Fell, D.A. The small world inside large metabolic networks. Proc. R. Soc. Lond. B Biol. Sci. 268, 1803–1810 (2001).

    Article  CAS  Google Scholar 

  52. Watts, D.J. & Strogatz, S.H. Collective dynamics of 'small-world' networks. Nature 393, 440–442 (1998).

    Article  CAS  Google Scholar 

  53. Matthews, L.R. et al. Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or “interologs”. Genome Res. 11, 2120–2126 (2001).

    Article  CAS  Google Scholar 

  54. Marcotte, E. & Date, S. Exploiting big biology: integrating large-scale biological data for function inference. Brief. Bioinform. 2, 363–374 (2001).

    Article  CAS  Google Scholar 

  55. Remm, M., Storm, C.E. & Sonnhammer, E.L. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001).

    Article  CAS  Google Scholar 

  56. Kamath, R.S. et al. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421, 231–237 (2003).

    Article  CAS  Google Scholar 

  57. Gonczy, P. et al. Functional genomic analysis of cell division in C. elegans using RNAi of genes on chromosome III. Nature 408, 331–336 (2000).

    Article  CAS  Google Scholar 

  58. Piano, F., Schetter, A.J., Mangone, M., Stein, L. & Kemphues, K.J. RNAi analysis of genes expressed in the ovary of Caenorhabditis elegans. Curr. Biol. 10, 1619–1622 (2000).

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fraser, A., Marcotte, E. A probabilistic view of gene function. Nat Genet 36, 559–564 (2004). https://doi.org/10.1038/ng1370

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng1370

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing