Showing posts with label Phylogenetics. Show all posts
Showing posts with label Phylogenetics. Show all posts

Friday, 21 August 2020

Teaching: Online lectures and practical on Phylogenetics in Practice

On March 16th, we were in the interesting position of running an infectious disease course at the Big Data Institute on the day the national lockdown was announced in response to the COVID-19 pandemic. As a result, we were among the first in the university to do remote teaching, something Katrina Lythgoe and the rest of us had prepared for in anticipation of the lockdown a week earlier that never happened.

These are the two online lectures in the Health Data Sciences CDT that I gave called Phylogenetics in Practice.


The online practical, which applies phylogenetics approaches to understand the Zika virus epidemic, is implemented as a Docker container, and available here.

Tuesday, 31 March 2015

ClonalFrameML: accounting for recombination in bacterial phylogenies

Horizontal gene transfer in bacteria, mediated by transformation, transduction or conjugation, can result in gain, loss and replacement of genes. The replacement of horizontally transferred genes or gene fragments in a process known as homologous recombination has far-reaching effects on bacterial phylogenetics - the study of relatedness between bacteria. A new method published by Xavier Didelot and me last month in PLoS Computational Biology corrects for these distorting effects of homologous recombination on bacterial phylogenies.

Two forms of phylogenetic distortion are caused by recombination. The first affects the shape of the tree topology. Although this is a potentially serious difficulty, Jessica Hedge and I recently showed that phylogenies estimated from whole bacterial genomes are surprisingly robust to this problem. The second affects the lengths of the branches. When genetic material is replaced by a homologous but distantly related sequence, it gives the appearance of a cluster of substitutions in the genome, and this can exaggerate branch lengths. ClonalFrameML detects these clusters of substitutions, identifies them as recombination events, and corrects the branch lengths of the tree.

Correcting for recombination is important in a variety of settings. In transmission studies, recent transmission between patients can be detected by comparing the genomes of the infecting bacteria. As we show in the paper, ClonalFrameML improves detection of transmission events by accounting for the tendency of recombination to elevate the evolutionary distance between genomes. We also report the discovery of a remarkably large chromosomal replacement event spanning 310 kilobases that may have led to the evolution of the ST582 strain of Staphylococcus aureus, underlining the importance of recombination over short and long timescales.

ClonalFrameML is a much faster implementation of the popular ClonalFrame method by Xavier and Daniel Falush. It is based on the same underlying assumptions and the same explicit evolutionary model, so it provides interpretable estimates of rates of recombination, the length of DNA imported by recombination, and the relative impact of recombination versus mutation. However, it can now analyse thousands of whole bacterial genomes in a matter of hours, representing a substantial improvement over the earlier method.

Friday, 28 November 2014

New paper: bacterial phylogenetic inference is robust to recombination but demographic inference is not

Published this week in mBio, Jessica Hedge's new paper "Bacterial phylogenetic inference is robust to recombination but demographic inference is not" looks at a long-standing problem: why are phylogenetic trees so popular in bacterial genomics when everyone knows recombination (which is detectable in most species studied) leads to seriously misleading inference? A burst of research activity in the early 2000s showed that homologous recombination - which can result from various forms of horizontal gene transfer in bacteria - can distort phylogenetic trees and lead to false inference of positive selection and demographic growth in methods that rely on them.

In the intervening years there has been intense research in the field of population genetics into approaches that account for recombination, although the practically useful methods rely on approximations because of the inherent difficulties of learning about complex reticulated evolutionary networks that recombination generates. This has led many of my population genetics colleagues to regard - at least privately - the use of phylogenetic trees in recombining species as "bust", and the conclusions drawn from such studies as questionable. In this paper we show that this view is too simple.

FIG 1 

Friday, 2 December 2011

New method inferring natural selection published today

I am pleased to report that my new paper "A population genetics-phylogenetics approach to inferring natural selection" is published today in PLoS Genetics. This is the culmination of two years work at the University of Chicago with Molly Przeworski, plus a good deal of follow-up since I moved to Oxford. In the paper we introduce a new way of combining population genetics and phylogenetics models of natural selection, and a statistical method (gammaMap) for estimating parameters under the model. From a collection of sequences within one or more species - in the paper, we use 100 X-linked coding sequences that Peter Andolfatto produced in Drosophila melanogaster and D. simulans - the method allows you to estimate the distribution of fitness effects within each lineage, and localize the signal of selection using a Bayesian sliding window approach. Using Ryan Hernandez's simulator SFSCODE we tested the method for robustness to demographic change and linkage disequilbrium, and we investigated the effect that common assumptions concerning spatial variation in selection coefficients (sitewise, genewise and sliding window approaches) have on inference of selection. During the winter break I will work on compiling the program for different platforms and writing the documentation, with a view to releasing the software early in the New Year. Subscribe to this blog for updates or - if you are too impatient to wait - send me an email.

Monday, 11 May 2009

Neolithic origin of Campylobacter jejuni

As part of a recent trip to the University of Edinburgh to visit Andrew Rambaut, I gave a talk on some work of mine published in the February edition of Molecular Biology and Evolution and subsequently recommended on the Faculty of 1000 website about the evolution of the gut pathogen Campylobacter jejuni.

Part of the paper is concerned with the issue of the timescale of Campylobacter evolution, and using longitudinal samples of C. jejuni DNA sequences we attempted to calibrate the molecular clock in a similar way to that which is standard practice for viruses.

We detected surprisingly rapid evolution - 1,000 times faster than traditional estimates - which would place the split of C. jejuni from its closest relative C. coli during the Neolithic revolution. Interestingly, the point estimate of 6,500 years ago for the split from C. coli - which preferentially infects swine - coincides with the spread of pig domestication in the Near East and Europe in the 4th millennium BC.

The date is controversial because the traditional dating method, which is based on bounding deep phylogenetic splits such as the common ancestor of mitochondria and bacteria, would place the divergence of C. jejuni and C. coli closer to 10 million years ago.

After the seminar I had an interesting discussion with Paul Sharp, who was in the audience. Prof Sharp is actively researching the causes of conflict between long-term and short-term estimates of the rate of evolution in viruses. As he points out, short-term rate estimates (usually based on longitudinally-sampled viral sequences) frequently suggest that evolution is occurring much more rapidly than long-term estimates (based on deeper calibration points, such as co-phylogeny of host and pathogen). This phenomenon, observed in HIV and hepatitis C among others, may be caused by overly simplistic models of sequence evolution.

So how plausible is it that a ubiquitous bacterial pathogen such as C. jejuni evolved as recently as the Neolithic, possibly in response to changes brought about by agriculture or animal husbandry? Longitudinal studies of Helicobacter pylori and Neisseria gonnorhoeae have obtained similarly rapid rates of bacterial evolution, and evidence is mounting that the Neolithic revolution played an important role in creating new niches for human, plant and animal pathogens. Perhaps the best prospect for resolving these questions will be studies of ancient DNA preserved from the period in question.

Monday, 27 April 2009

omegaMap at BioHPC

All evolutionary biologists wishing to make use of omegaMap now have access to a high performance parallel computing cluster via the internet courtesy of Cornell's CBSU and Microsoft. The software, which allows the detection of selection and recombination in DNA or RNA sequences, can be run via the web interface at cbsuapps.tc.cornell.edu/omegamap.aspx, or downloaded as part of the BioHPC suite.

The web interface consists of a simple form where users can upload their configuration file and sequences in FASTA format. Completed jobs are notified by e-mail. To learn more about the project visit the CBSU home page.

Meanwhile, I am working on several major updates to omegaMap, the most interesting of which will probably be the development of a new model that allows for the joint analysis of natural selection acting on sequences from different populations or species. The aim is to integrate population genetic and phylogenetic models of selection in order to exploit the signal of selection contained both in polymorphism within populations (or species) and divergence between them. I will be presenting progress on this work, in the context of hominid evolution, at the 2009 SMBE meeting in Iowa City this June.

Thursday, 30 October 2008

Inferring niche membership from genetic diversity

Each Wednesday the Ecology and Evolution department run a journal club called Noon Illumination, and this week I volunteered to lead discussion on a recent article titled Resource Partitioning and Sympatric Differentiation Among Closely Related Bacterioplankton (Science 320: 1081-5), by Dana Hunt and colleagues based at MIT and Ghent. I originally prepared the presentation for a Bacterial Metagenomics workshop in Berlin this July, organized by Daniel Falush.

Of central interest in the paper is a novel methodology that infers habitat/niche based on ecological variables and DNA sequencing in the family of marine bacteria Vibrionaceae. That places it in the wider context of methods that attempt to predict phenotype (in this case niche) from genotype. Their approach is an elegant extension of familiar phylogenetic methods to model habitat switching over evolutionary time. Based on arguments put forward by Christophe Fraser and colleagues, the paper reasons that the ancestral habitat switches they detect are likely to be adaptive because the rate of recombination eclipses the mutation rate sufficiently to preclude the possibility of neutral genetic clustering.

However the high rate of recombination raises some difficulties of interpretation. The principal phylogenetic reconstruction was based on the hsp60 gene, but by sequencing other housekeeping genes, Hunt and colleagues found that in some cases, recombination between genes caused an artefactual habitat switch in the hsp60 ancestry that was not evident in the other genes. Using a permutation test, I found evidence for recombination within the vibrio hsp60 genes, which may confound the phylogenetic reconstruction of evolutionary relationships (Schierup and Hein 2000). On a more philosophical note, suppose you could directly observe ancestral habitat switches. Would that be strong evidence for adaptation? An association between habitat and genetic lineage is probably not sufficient to demonstrate the action of natural selection. On the other hand, frequent recombination could empower genome-wide scans for extreme association between genes and habitats, that would provide stronger support for adaptation.

You can view a PDF of the presentation of this stimulating article in our journal club here.