Showing posts with label Selection. Show all posts
Showing posts with label Selection. Show all posts

Tuesday, 25 January 2022

Announcing ProbGen22 in Oxford 28-30 March

The organizing committee is pleased to announce the 7th Probabilistic Modeling in Genomics Conference (ProbGen22) to be held at the Blavatnik School of Government and Somerville College Oxford from 28th-30th March 2022.

The meeting will be a hybrid in-person and online event. Talk sessions will feature live speakers, both in-person and online, and will take place during the afternoons (making live attendance feasible for US timezones). Talks will be recorded and made available to registrants for a period of one month. Poster sessions will be held online during the evenings.

The conference will cover probabilistic models, algorithms, and statistical methods across a broad range of applications in genetics and genomics. We invite abstract submissions on a range of topics including population genetics, natural selection, Quantitative genetics, Methods for GWAS, Applications to cancer and other diseases, Causal inference in genetic studies, Functional genomics, Assembly and variant identification, Phylogenetics, Single cell 'omics, Deep learning in genomics and Pathogen genomics.

The registration deadline is 28th February 2022.

For more details visit the conference website. 

Friday, 20 March 2020

New paper: GenomegaMap for dN/dS in over 10,000 genomes

Published this week in Molecular Biology and Evolution, is a new paper joint with the CRyPTIC Consortium "GenomegaMap: within-species genome-wide dN/dS estimation from over 10,000 genomes".

The dN/dS ratio is a popular statistic in evolutionary genetics that quantifies the relative rates of protein-altering and non-protein-altering mutations. The rate is adjusted so that under neutral evolution - i.e. when the survival and reproductive advantage of all variants is the same - it equals 1. Typically, dN/dS is observed to be less than 1 meaning that new mutations tend to be disfavoured, implying they are harmful to survival or reproduction. Occasionally, dN/dS is observed to be greater than 1 meaning that new mutations are favoured, implying they provide some survival or reproductive advantage. The aim of estimating dN/dS is usually to identify mutations that provide an advantage.

Theoreticians are often critical of dN/dS because it is more of a descriptive statistic than a process-driven model of evolution. This overlooks the problem that currently available models make simplifying assumptions such as minimal interference between adjacent mutations within genes. These assumptions are not obviously appropriate in many species, including infectious micro-organisms, that exchange genetic material infrequently.

There are many methods for measuring dN/dS. This new paper overcomes two common problems:
  • It is fast no matter how many genomes are analysed together.
  • It is robust whether there is frequent genetic exchange (which causes phylogenetic methods to report spurious signals of advantageous mutation) or infrequent genetic exchange.
The paper includes detailed simulations that establish the validity of the approach, and it goes on to demonstrate how genomegaMap can detect advantageous mutations in 10,209 genomes of Mycobacterium tuberculosis, the bacterium that causes tuberculosis. The method reproduces known signals of advantageous mutations that make the bacteria resistant to antibiotics, and it discovers a new signal of advantageous mutations in a cold-shock protein called deaD or csdA.

Software that implements genomegaMap is available on Docker Hub and the source code and documentation are available on Git Hub.

With the steady rise of more and more genome sequences, the analysis of data becomes an increasing challenge even with modern computers, so it is hoped that this new method provides a useful way to exploit the opportunities in such large datasets to gain new insights into evolution.

Thursday, 16 August 2012

gammaMap available for download

The software gammaMap - which implements the analyses developed in Wilson, Hernandez, Andolfatto and Przeworski (2011) PLoS Genetics 7: e1002395 - is available for download. It is provided as part of a flexible program called GCAT (general computational analysis tool) which is designed to rapidly facilitate novel variations on the standard analyses. GCAT has its own google code page, http://code.google.com/p/gcat-project. GCAT resembles BEAST and BUGs in that a statistical model is specified (using XML) and parameters are then estimated using MCMC or maximum likelihood. Future extensions to GCAT are planned that implement new fast approximations to gammaMap and omegaMap, and parallel processing, allowing the analyses to be scaled more readily to whole genomes.

Friday, 2 December 2011

New method inferring natural selection published today

I am pleased to report that my new paper "A population genetics-phylogenetics approach to inferring natural selection" is published today in PLoS Genetics. This is the culmination of two years work at the University of Chicago with Molly Przeworski, plus a good deal of follow-up since I moved to Oxford. In the paper we introduce a new way of combining population genetics and phylogenetics models of natural selection, and a statistical method (gammaMap) for estimating parameters under the model. From a collection of sequences within one or more species - in the paper, we use 100 X-linked coding sequences that Peter Andolfatto produced in Drosophila melanogaster and D. simulans - the method allows you to estimate the distribution of fitness effects within each lineage, and localize the signal of selection using a Bayesian sliding window approach. Using Ryan Hernandez's simulator SFSCODE we tested the method for robustness to demographic change and linkage disequilbrium, and we investigated the effect that common assumptions concerning spatial variation in selection coefficients (sitewise, genewise and sliding window approaches) have on inference of selection. During the winter break I will work on compiling the program for different platforms and writing the documentation, with a view to releasing the software early in the New Year. Subscribe to this blog for updates or - if you are too impatient to wait - send me an email.

Saturday, 10 July 2010

What are the conditions for multiple foci of adaptation?

Selection on standing variation, soft sweeps, parallel adaptation: these alternatives to the population genetics paradigm of the S-shaped selective sweep have in common the idea that the response of a species to a change in selection pressure may frequently involve multiple mutations, which may arise in multiple locales, and which may appear at different sites in the genome. Consequently, the footprint of selection in the genome is different to that expected under a single selective sweep and therefore likely to be missed by scans of the genome looking for selection.

Many examples of parallel adaptation have been put forward, for instance multiple drug resistance in the malaria parasite Plasmodium vivax. But how plausible is parallel adaptation as an evolutionary mechanism, and what are the conditions that make it likely? These questions were addressed by Graham Coop presenting joint work with his postdoc Peter Ralph in one of the stand-out talks of the SMBE conference in Lyon.

Their key finding is that the multifarious parameters that go into building a spatial model of adaptation (strength of selection, the mutation rate, population density, average dispersal distance of offspring) can be distilled down to a single key quantity: the characteristic length given by the equation
When the geographical extent of the species range exceeds this characteristic length, the conditions are right for parallel adaptation. Graham's talk made accessible the complex mathematics behind this result. He has kindly made the slides available (click here) and the paper is now available at the Genetics website (click here).

Thursday, 8 July 2010

Discovering the distribution of fitness effects

At this year's Society for Molecular Biology and Evolution meeting in Lyon I presented ongoing work estimating the distribution of fitness effects, which is a collaborative venture with Molly Przeworski and Peter Andolfatto. Earlier versions of this research appeared in talks I presented at Chicago in December (Ecology and Evolution Departmental seminar) and Liverpool in January (UK Population Genetics Group meeting), and it follows on from last year's SMBE presentation in which I discussed methods to tease out sub-genic variation in selection pressure.

There is intrinsic interest in the fitness effects of novel mutations in coding regions of the genome, especially the relative frequency of occurrence of neutral, beneficial and deleterious variants. Yet estimating the distribution of fitness effects (the DFE) is also of practical use when localizing the signal of adaptive evolution. The reason is that in Bayesian analyses, the assumed DFE can influence the strength of evidence for or against adaptation at a particular site. Consequently it is preferably to estimate the DFE at the same time as detecting adaptation at individual sites to avoid prior assumptions unduly influencing the results.

Having estimated the DFE, it is of use in quantifying the relative contribution of adaptation versus drift to genome evolution. The figure, taken from my talk in Lyon (slides here), illustrates the idea when a normal distribution is used to estimate the DFE; the relative area of the green to the yellow shaded regions represents the respective contribution of adaptation versus drift in amino acid substitutions accrued along the Drosophila melanogaster lineage.

Thursday, 18 June 2009

SMBE Iowa City

I spent the beginning of the month at the SMBE (Society for Molecular Biology and Evolution) conference in Iowa City. It was a good chance to catch up with people and find out what research is going on in the field, as well as to speak with collaborators about on-going projects. One of those is Peter Andolfatto, who works on genome evolution in Drosophila species. Molly and I are collaborating with Peter on a project to detect natural selection within and between Drosophila species. The main idea is to improve inference by taking into account variation in selection pressure throughout the gene. Our method draws on the advantages of a number of current approaches such as Rasmus Nielsen and Ziheng Yang's codeml package (part of PAML), Carlos Bustamante's MKPRF (McDonald-Kreitman Poisson Random Field) model and Gil McVean and my program omegaMap in that it exploits patterns of polymorphism within and between species, while allowing for conservation and adaptation within the same gene. You can view the slides of my SMBE talk here, which was titled "Adaptive events in hominid (and Drosophila) evolution".

Monday, 27 April 2009

omegaMap at BioHPC

All evolutionary biologists wishing to make use of omegaMap now have access to a high performance parallel computing cluster via the internet courtesy of Cornell's CBSU and Microsoft. The software, which allows the detection of selection and recombination in DNA or RNA sequences, can be run via the web interface at cbsuapps.tc.cornell.edu/omegamap.aspx, or downloaded as part of the BioHPC suite.

The web interface consists of a simple form where users can upload their configuration file and sequences in FASTA format. Completed jobs are notified by e-mail. To learn more about the project visit the CBSU home page.

Meanwhile, I am working on several major updates to omegaMap, the most interesting of which will probably be the development of a new model that allows for the joint analysis of natural selection acting on sequences from different populations or species. The aim is to integrate population genetic and phylogenetic models of selection in order to exploit the signal of selection contained both in polymorphism within populations (or species) and divergence between them. I will be presenting progress on this work, in the context of hominid evolution, at the 2009 SMBE meeting in Iowa City this June.

Saturday, 3 January 2009

Human Evolution in New York City

Rounding off a hectic end to 2008 was a trip to visit Molly, currently on sabbatical in New York city. Joanna and I flew out to spend the final weekend before Christmas discussing projects and frequenting the local coffee shops, restaurants and bars. I took the opportunity to visit the American Museum of Natural History adjacent to Central Park after reading about its dinosaur collections in the Catcher in the Rye; pictured is an Allosaurus skeleton, which stands in the main entrance hall. Of particular interest was the Spitzer Hall of Human Origins which features a wealth of fossil remains and artefacts including a cast of the Laetoli footprints and a diorama of an Australopithecus afarensis nuclear family. Fittingly, the very focus of the New York trip was to discuss the on-going project to characterize natural selection between hominid species.

Thursday, 30 October 2008

Inferring niche membership from genetic diversity

Each Wednesday the Ecology and Evolution department run a journal club called Noon Illumination, and this week I volunteered to lead discussion on a recent article titled Resource Partitioning and Sympatric Differentiation Among Closely Related Bacterioplankton (Science 320: 1081-5), by Dana Hunt and colleagues based at MIT and Ghent. I originally prepared the presentation for a Bacterial Metagenomics workshop in Berlin this July, organized by Daniel Falush.

Of central interest in the paper is a novel methodology that infers habitat/niche based on ecological variables and DNA sequencing in the family of marine bacteria Vibrionaceae. That places it in the wider context of methods that attempt to predict phenotype (in this case niche) from genotype. Their approach is an elegant extension of familiar phylogenetic methods to model habitat switching over evolutionary time. Based on arguments put forward by Christophe Fraser and colleagues, the paper reasons that the ancestral habitat switches they detect are likely to be adaptive because the rate of recombination eclipses the mutation rate sufficiently to preclude the possibility of neutral genetic clustering.

However the high rate of recombination raises some difficulties of interpretation. The principal phylogenetic reconstruction was based on the hsp60 gene, but by sequencing other housekeeping genes, Hunt and colleagues found that in some cases, recombination between genes caused an artefactual habitat switch in the hsp60 ancestry that was not evident in the other genes. Using a permutation test, I found evidence for recombination within the vibrio hsp60 genes, which may confound the phylogenetic reconstruction of evolutionary relationships (Schierup and Hein 2000). On a more philosophical note, suppose you could directly observe ancestral habitat switches. Would that be strong evidence for adaptation? An association between habitat and genetic lineage is probably not sufficient to demonstrate the action of natural selection. On the other hand, frequent recombination could empower genome-wide scans for extreme association between genes and habitats, that would provide stronger support for adaptation.

You can view a PDF of the presentation of this stimulating article in our journal club here.

Tuesday, 15 July 2008

Exchange with the Institut Pasteur

My position at the University of Chicago is funded by a grant awarded to Molly by the National Institutes of Health (NIH) to detect the signature that natural selection has left on the human genome. One of our collaborations is with the human genetics lab of Lluis Quintana-Murci at the Institut Pasteur in Paris, and I'm making the first of a number of exchange visits to strengthen the ties between two labs.

Lluis' group are interested in the selection pressure that pathogens have exerted on the human genome, and in particular a family of genes involved in the immune system known as toll-like receptors. The idea is that together we develop methods to detect and quantify selection in these genes by comparison to neutral regions in individuals from around the world. Among the people involved in the project is Luis Barreiro, a post-doc who has just arrived in the Human Genetics department at Chicago from the Institut Pasteur.

In Paris I've been participating in group meetings, offering my opinion on manuscripts coming out of Lluis' lab, and discussing with lab members how we might analyze the DNA sequence data they're producing. I've also been acquainting myself with the produce of Château des Ravatys (the Institut Pasteur vinyard) and celebrating the 14 Juillet (photo).