Papers by Derek Caetano-Anollés
Proceedings of the National Academy of Sciences, 2014
Certain complex phenotypes appear repeatedly across diverse species due to processes of evolution... more Certain complex phenotypes appear repeatedly across diverse species due to processes of evolutionary conservation and convergence. In some contexts like developmental body patterning, there is increased appreciation that common molecular mechanisms underlie common phenotypes; these molecular mechanisms include highly conserved genes and networks that may be modified by lineage-specific mutations. However, the existence of deeply conserved mechanisms for social behaviors has not yet been demonstrated. We used a comparative genomics approach to determine whether shared neuromolecular mechanisms could underlie behavioral response to territory intrusion across species spanning a broad phylogenetic range: house mouse (Mus musculus), stickleback fish (Gasterosteus aculeatus), and honey bee (Apis mellifera). Territory intrusion modulated similar brain functional processes in each species, including those associated with hormone-mediated signal transduction and neurodevelopment. Changes in chromosome organization and energy metabolism appear to be core, conserved processes involved in the response to territory intrusion. We also found that several homologous transcription factors that are typically associated with neural development were modulated across all three species, suggesting that shared neuronal effects may involve transcriptional cascades of evolutionarily conserved genes. Furthermore, immunohistochemical analyses of a subset of these transcription factors in mouse again implicated modulation of energy metabolism in the behavioral response. These results provide support for conserved genetic "toolkits" that are used in independent evolutions of the response to social challenge in diverse taxa.
Brenner's Encyclopedia of Genetics, 2013
Brenner’s Encyclopedia of Genetics 2nd Edition, 2013
Brenner's Encyclopedia of Genetics, 2013
The origin and evolution of modern biochemistry remain a mystery despite advances in evolutionary... more The origin and evolution of modern biochemistry remain a mystery despite advances in evolutionary bioinformatics. Here, we use a structural census in nearly 1,000 genomes and a molecular clock of folds to define a timeline of appearance of protein families linked to single-domain enzymes. The timeline sorts out enzymatic recruitment, validates patterns in metabolic history, and reveals that the most ancient reaction of aerobic metabolism involved the synthesis of pyridoxal 5 0 -phosphate or pyridoxal and appeared 2.9 Gyr ago. The oxygen source for this primordial reaction was probably Mn catalase, which appeared at the same time and could have generated oxygen as a side product of hydrogen peroxide detoxification. Finally, evolutionary analysis of transferred groups and metabolite fragments revealed that oxidized sulfur did not participate in metabolism until the rise of oxygen. The evolutionary patterns we uncover in molecules and chemistries provide strong support for the coevolution of biochemistry and geochemistry.
Frontiers in bioscience : a journal and virtual library, 2008
The survey of components in living systems at different levels of organization enables an evoluti... more The survey of components in living systems at different levels of organization enables an evolutionary exploration of patterns and processes in macromolecules, networks, and genomic repertoires. Here we discuss how phylogenetic strategies that generate intrinsically rooted phylogenies impact the evolutionary study of RNA and protein components of the macromolecular machinery that is responsible for biological function. We used these methods to generate timelines of discovery of components in systems, such as substructures in RNA molecules, architectures in proteomes, domains in multi-domain proteins, enzymes in metabolic networks, and protein architectures in proteomes. These timelines unfolded remarkable patterns of origin and evolution of molecules, repertoires and networks, showing episodes of both functional specialization (e.g., rise of domains with specialized functions) and molecular simplification (e.g., reductive tendencies in molecules and proteomes). These observations ha...
Genome …, 2007
The repertoire of protein architectures in proteomes is evolutionarily conserved and capable of p... more The repertoire of protein architectures in proteomes is evolutionarily conserved and capable of preserving an accurate record of genomic history. Here we use a census of protein architecture in 185 genomes that have been fully sequenced to generate genome-based phylogenies ...
PLoS ONE, 2013
The genetic code shapes the genetic repository. Its origin has puzzled molecular scientists for o... more The genetic code shapes the genetic repository. Its origin has puzzled molecular scientists for over half a century and remains a long-standing mystery. Here we show that the origin of the genetic code is tightly coupled to the history of aminoacyl-tRNA synthetase enzymes and their interactions with tRNA. A timeline of evolutionary appearance of protein domain families derived from a structural census in hundreds of genomes reveals the early emergence of the 'operational' RNA code and the late implementation of the standard genetic code. The emergence of codon specificities and amino acid charging involved tight coevolution of aminoacyl-tRNA synthetases and tRNA structures as well as episodes of structural recruitment. Remarkably, amino acid and dipeptide compositions of single-domain proteins appearing before the standard code suggest archaic synthetases with structures homologous to catalytic domains of tyrosyl-tRNA and seryl-tRNA synthetases were capable of peptide bond formation and aminoacylation. Results reveal that genetics arose through coevolutionary interactions between polypeptides and nucleic acid cofactors as an exacting mechanism that favored flexibility and folding of the emergent proteins. These enhancements of phenotypic robustness were likely internalized into the emerging genetic system with the early rise of modern protein structure.
Genome research, 2003
Protein structural diversity encompasses a finite set of architectural designs. Embedded in these... more Protein structural diversity encompasses a finite set of architectural designs. Embedded in these topologies are evolutionary histories that we here uncover using cladistic principles and measurements of protein-fold usage and sharing. The reconstructed phylogenies are inherently rooted ...
Frontiers in Genetics, 2015
Journal of molecular evolution, 2005
1 Department of Crop Sciences, 332 NSRC, 1101 West Peabody Drive, University of Illinois, Urbana,... more 1 Department of Crop Sciences, 332 NSRC, 1101 West Peabody Drive, University of Illinois, Urbana, IL 61801, USA 2 Vital NRG, Knoxville, TN 37919, USA ... Received: 16 June 2004 / Accepted: 11 October 2004 [Reviewing Editor: Dr. David Pollock]
Time-calibrated phylogenomic trees of protein domain structure produce powerful chronologies desc... more Time-calibrated phylogenomic trees of protein domain structure produce powerful chronologies describing the evolution of biochemistry and life. These timetrees are built from a genomic census of millions of encoded proteins using models of nested accumulation of molecules in evolving proteomes. Here we show that a primordial stem line of descent, a propagating series of pluripotent cellular entities, populates the deeper branches of the timetrees. The stem line produced for the first time cellular grades ∼2.9 billion years (Gy)-ago, which slowly turned into lineages of superkingdom Archaea. Prompted by the rise of planetary oxygen and aerobic metabolism, the stem line also produced bacterial and eukaryal lineages. Superkingdom-specific domain repertoires emerged ∼2.1 Gy-ago delimiting fully diversified Bacteria. Repertoires specific to Eukarya and Archaea appeared 300 millions years later. Results reconcile reductive evolutionary processes leading to the early emergence of Archaea to superkingdom-specific innovations compatible with a tree of life rooted in Bacteria.
… International Journal of …, 2009
One fundamental goal of current research is to understand how complex biomolecular networks took ... more One fundamental goal of current research is to understand how complex biomolecular networks took the form that we observe today. Cellular metabolism is probably one of the most ancient biological networks and constitutes a good model system for the study of network ...
The complexity of modern biochemistry developed gradually on early Earth as new molecules and str... more The complexity of modern biochemistry developed gradually on early Earth as new molecules and structures populated the emerging cellular systems. Here, we generate a historical account of the gradual discovery of primordial proteins, cofactors, and molecular functions using phylogenomic information in the sequence of 420 genomes. We focus on structural and functional annotations of the 54 most ancient protein domains. We show how primordial functions are linked to folded structures and how their interaction with cofactors expanded the functional repertoire. We also reveal protocell membranes played a crucial role in early protein evolution and show translation started with RNA and thioester cofactor-mediated aminoacylation. Our findings allow elaboration of an evolutionary model of early biochemistry that is firmly grounded in phylogenomic information and biochemical, biophysical, and structural knowledge. The model describes how primordial α-helical bundles stabilized membranes, how these were decorated by layered arrangements of β-sheets and α-helices, and how these arrangements became globular. Ancient forms of aminoacyl-tRNA synthetase (aaRS) catalytic domains and ancient non-ribosomal protein synthetase (NRPS) modules gave rise to primordial protein synthesis and the ability to generate a code for specificity in their active sites. These structures diversified producing cofactor-binding molecular switches and barrel structures. Accretion of domains and molecules gave rise to modern aaRSs, NRPS, and ribosomal ensembles, first organized around novel emerging cofactors (tRNA and carrier proteins) and then more complex cofactor structures (rRNA). The model explains how the generation of protein structures acted as scaffold for nucleic acids and resulted in crystallization of modern translation.
Archaea, 2014
The study of the origin of diversified life has been plagued by technical and conceptual difficul... more The study of the origin of diversified life has been plagued by technical and conceptual difficulties, controversy, and apriorism. It is now popularly accepted that the universal tree of life is rooted in the akaryotes and that Archaea and Eukarya are sister groups to each other. However, evolutionary studies have overwhelmingly focused on nucleic acid and protein sequences, which partially fulfill only two of the three main steps of phylogenetic analysis, formulation of realistic evolutionary models, and optimization of tree reconstruction. In the absence of character polarization, that is, the ability to identify ancestral and derived character states, any statement about the rooting of the tree of life should be considered suspect. Here we show that macromolecular structure and a new phylogenetic framework of analysis that focuses on the parts of biological systems instead of the whole provide both deep and reliable phylogenetic signal and enable us to put forth hypotheses of origin. We review over a decade of phylogenomic studies, which mine information in a genomic census of millions of encoded proteins and RNAs. We show how the use of process models of molecular accumulation that comply with Weston's generality criterion supports a consistent phylogenomic scenario in which the origin of diversified life can be traced back to the early history of Archaea.
The intricate molecular and cellular structure of organisms converts energy to work, which builds... more The intricate molecular and cellular structure of organisms converts energy to work, which builds and maintains structure. Evolving structure implements modules, in which parts are tightly linked. Each module performs characteristic functions. In this work we propose that a module can emerge through two phases of diversification of parts. Early in the first phase of this biphasic pattern, the parts have weak linkage-they interact weakly and associate variously. The parts diversify and compete. Under selection for performance, interactions among the parts increasingly constrain their structure and associations. As many variants are eliminated, parts self-organize into modules with tight linkage. Linkage may increase in response to exogenous stresses as well as endogenous processes. In the second phase of diversification, variants of the module and its functions evolve and become new parts for a new cycle of generation of higher-level modules. This linkage hypothesis can interpret biphasic patterns in the diversification of protein domain structure, RNA and protein shapes, and networks in metabolism, codes, and embryos, and can explain hierarchical levels of structural organization that are widespread in biology.
Journal of Molecular Evolution, 2011
The origin of life has puzzled molecular scientists for over half a century. Yet fundamental ques... more The origin of life has puzzled molecular scientists for over half a century. Yet fundamental questions remain unanswered, including which came first, the metabolic machinery or the encoding nucleic acids. In this study we take a protein-centric view and explore the ancestral origins of proteins. Protein domain structures in proteomes are highly conserved and embody molecular functions and interactions that
Uploads
Papers by Derek Caetano-Anollés