Abstract
Cells are controlled by the complex and dynamic actions of thousands of genes. With the sequencing of many genomes, the key problem has shifted from identifying genes to knowing what the genes do; we need a framework for expressing that knowledge. Even the most rigorous attempts to construct ontological frameworks describing gene function (e.g., the Gene Ontology project) ultimately rely on manual curation and are thus labor-intensive and subjective. But an alternative exists: the field of functional genomics is piecing together networks of gene interactions, and although these data are currently incomplete and error-prone, they provide a glimpse of a new, probabilistic view of gene function. We outline such a framework, which revolves around a statistical description of gene interactions derived from large, systematically compiled data sets. In this probabilistic view, pleiotropy is implicit, all data have errors and the definition of gene function is an iterative process that ultimately converges on the correct functions. The relationships between the genes are defined by the data, not by hand. Even this comprehensive view fails to capture key aspects of gene function, not least their dynamics in time and space, showing that there are limitations to the model that must ultimately be addressed.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
206,07 € per year
only 17,17 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
References
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 30, 276–280 (2002).
Kanehisa, M. et al. The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42–46 (2002).
Karp, P.D. et al. The EcoCyc Database. Nucleic Acids Res. 30, 56–58 (2002).
Karp, P.D. et al. The MetaCyc Database. Nucleic Acids Res. 30, 59–61 (2002).
Mewes, H.W. et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31–34 (2002).
Mulder, N.J. et al. The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315–318 (2003).
The Gene Ontology Consortium. Creating the gene ontology resource: design and implementation. Genome Res. 11, 1425–1433 (2001).
Tjian, R. The biochemistry of transcription in eukaryotes: a paradigm for multisubunit regulatory complexes. Philos. Trans. R. Soc. Lond. B Biol. Sci. 351, 491–499 (1996).
Brand, M. et al. UV-damaged DNA-binding protein in the TFTC complex links DNA damage recognition to nucleosome acetylation. EMBO J. 20, 3187–3196 (2001).
Martinez, E. et al. Human STAGA complex is a chromatin-acetylating transcription coactivator that interacts with pre-mRNA splicing and DNA damage-binding factors in vivo. Mol. Cell. Biol. 21, 6782–6795 (2001).
Hall, A. The cellular functions of small GTP-binding proteins. Science 249, 635–640 (1990).
Ritzi, M. et al. Human minichromosome maintenance proteins and human origin recognition complex 2 protein on chromatin. J. Biol. Chem. 273, 24543–24549 (1998).
Rowles, A. et al. Interaction between the origin recognition complex and the replication licensing system in Xenopus. Cell 87, 287–296 (1996).
Coleman, T.R., Carpenter, P.B. & Dunphy, W.G. The Xenopus Cdc6 protein is essential for the initiation of a single round of DNA replication in cell-free extracts. Cell 87, 53–63 (1996).
Bell, S.P. & Stillman, B. ATP-dependent recognition of eukaryotic origins of DNA replication by a multiprotein complex. Nature 357, 128–134 (1992).
Vashee, S. et al. Assembly of the human origin recognition complex. J. Biol. Chem. 276, 26666–26673 (2001).
Dhar, S.K., Delmolino, L. & Dutta, A. Architecture of the human origin recognition complex. J. Biol. Chem. 276, 29067–29071 (2001).
Raychaudhuri, S. et al. Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res. 12, 203–214 (2002).
Troyanskaya, O.G. et al. A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc. Natl. Acad. Sci. USA 100, 8348–8353 (2003).
Clare, A. & King, R.D. Machine learning of functional class from phenotype data. Bioinformatics 18, 160–166 (2002).
Von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002).
Jansen, R. et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003).
Schwikowski, B., Uetz, P. & Fields, S. A network of protein-protein interactions in yeast. Nat. Biotechnol. 18, 1257–1261 (2000).
Deane, C.M. et al. Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics 1, 349–356 (2002).
Saito, R., Suzuki, H. & Hayashizaki, Y. Interaction generality, a measurement to assess the reliability of a protein-protein interaction. Nucleic Acids Res. 30, 1163–1168 (2002).
Goldberg, D.S. & Roth, F.P. Assessing experimentally derived interactions in a small world. Proc. Natl. Acad. Sci. USA 100, 4372–4376 (2003).
Jansen, R. et al. Integration of genomic datasets to predict protein complexes in yeast. J. Struct. Funct. Genomics 2, 71–81 (2002).
Huynen, M. et al. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 10, 1204–1210 (2000).
von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003).
Marcotte, E.M. et al. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999).
Ito, T. et al. Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl. Acad. Sci. USA 97, 1143–1147 (2000).
Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).
Tong, A.H. et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295, 321–324 (2002).
Uetz, P. et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000).
Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).
Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).
Tong, A.H. et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368 (2001).
Rives, A.W. & Galitski, T. Modular organization of cellular networks. Proc. Natl. Acad. Sci. USA 100, 1128–1133 (2003).
Spirin, V. & Mirny, L.A. Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. USA 100, 12123–12128 (2003).
Tornow, S. & Mewes, H.W. Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res. 31, 6283–6289 (2003).
Ideker, T. & Lauffenburger, D. Building with a scaffold: emerging strategies for high- to low-level cellular modeling. Trends Biotechnol. 21, 255–262 (2003).
Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).
Von Mering, C. et al. Genome evolution reveals biochemical networks and functional modules. Proc. Natl. Acad. Sci. USA 100, 15428–15433 (2003).
Krause, R., von Mering, C. & Bork, P. A comprehensive set of protein complexes in yeast: mining large scale protein-protein interaction screens. Bioinformatics 19, 1901–1908 (2003).
Manke, T., Bringas, R. & Vingron, M. Correlating protein-DNA and protein-protein interaction networks. J. Mol. Biol. 333, 75–85 (2003).
Stuart, J.M. et al. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).
Date, S.V. & Marcotte, E.M. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat. Biotechnol. 21, 1055–1062 (2003).
Wu, L.F. et al. Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat. Genet. 31, 255–265 (2002).
Snel, B., Bork, P. & Huynen, M.A. The identification of functional modules from the genomic association of genes. Proc. Natl. Acad. Sci. USA 99, 5890–5895 (2002).
Wagner, A. & Fell, D.A. The small world inside large metabolic networks. Proc. R. Soc. Lond. B Biol. Sci. 268, 1803–1810 (2001).
Watts, D.J. & Strogatz, S.H. Collective dynamics of 'small-world' networks. Nature 393, 440–442 (1998).
Matthews, L.R. et al. Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or “interologs”. Genome Res. 11, 2120–2126 (2001).
Marcotte, E. & Date, S. Exploiting big biology: integrating large-scale biological data for function inference. Brief. Bioinform. 2, 363–374 (2001).
Remm, M., Storm, C.E. & Sonnhammer, E.L. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001).
Kamath, R.S. et al. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421, 231–237 (2003).
Gonczy, P. et al. Functional genomic analysis of cell division in C. elegans using RNAi of genes on chromosome III. Nature 408, 331–336 (2000).
Piano, F., Schetter, A.J., Mangone, M., Stein, L. & Kemphues, K.J. RNAi analysis of genes expressed in the ovary of Caenorhabditis elegans. Curr. Biol. 10, 1619–1622 (2000).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Fraser, A., Marcotte, E. A probabilistic view of gene function. Nat Genet 36, 559–564 (2004). https://doi.org/10.1038/ng1370
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng1370
This article is cited by
-
BIONIC: biological network integration using convolutions
Nature Methods (2022)
-
BIONIC: discovering new biology through deep learning-based network integration
Nature Methods (2022)
-
Key mendelian variants
Nature Genetics (2015)
-
A gene ontology inferred from molecular networks
Nature Biotechnology (2013)
-
Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data
BMC Bioinformatics (2011)