Academia.eduAcademia.edu

Genomic Analysis of Bacterial Outbreaks

2016, Evolutionary Biology

AI-generated Abstract

The paper discusses the significance of genomic analysis in managing bacterial outbreaks, emphasizing the evolution from traditional typing methods to the use of Multi-Locus Sequence Typing (MLST) and complete genome sequencing. By providing detailed insights on various bacterial pathogens and the challenges associated with genomic data analysis, it highlights the advantages of modern sequencing technologies in outbreak investigation and control, along with the issues that arise from handling large datasets.

Genomic Analysis of Bacterial Outbreaks Leonor Sánchez-Busó (1,3), Iñaki Comas (1,2, 4), Beatriz Beamud (1), Neris García-González (1), Marta Pla-Díaz (1) and Fernando González-Candelas (1,2) (1) Unidad Mixta “Infección y Salud Pública” FISABIO/CSISP – Universidad de Valencia/Instituto Cavanilles de Biodiversidad y Biología Evolutiva. Valencia, Spain. (2) CIBER en Epidemiología y Salud Pública. Valencia, Spain. (3) Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridgeshire, United Kingdom. (4) Instituto de Biomedicina, CSCIC, Valencia, Spain. Author for Correspondence: Prof. Fernando González Candelas Evolución y Salud – Instituto Cavanilles Biodiversidad y Biología Evolutiva Universidad de Valencia c/Catedrático José Beltrán, 2. 46980 Paterna (Valencia). Spain. Email: fernando.gonzalez@uv.es Phone: + 34 961925961 , +34 963543653 FAX: +34 963543670 1 Introduction Outbreaks of infectious diseases often produce social alarms. These can be very local or reach every corner of every village and city on Earth. But all they share a need for a quick control and remediation that ensures the safety of the population. The identification and control of the source of an outbreak becomes a health priority and many efforts are devoted to these activities in the first days and weeks after the detection and/or declaration of an outbreak (Mortimer, 2003). Outbreaks come in many shapes and flavors. For epidemiologists, an outbreak is simply an unusual increase in the prevalence of a disease in time and space. Hence, some outbreaks may be declared and last for years while others are reduced to a few days or weeks; similarly, there might be an outbreak in a school or nursing home, but we talked a few years ago about an epidemic outbreak of “swine influenza” (Fraser et al, 2009;General Directorate of Epidemiology et al, 2009) and the WHO and other health organizations are currently worried about the spread of Zika virus. In some cases, the spread of the infectious pathogen occurs in a series of successive infections from one host to another thus producing transmission chains or networks, depending on the topology of the resulting connections among infected persons. One of the first tasks when an outbreak is suspected is to establish the basic parameters for controlling it. This can depend on the detection of a source, and the application of actions that prevent it from spreading the pathogen, or the characterization of the vector, so it can be controlled with chemical or biological agents, or the identification of the hereditary factors that allow the pathogen eluding previous, successful treatments and originate nosocomial outbreaks of multi-resistant strains. The advent of faster and cheaper gene sequencing techniques lead to the first systematic and general proposal of using a universal typing scheme that was reproducible, cheap, objective and easily exchangeable among laboratories, known as MultiLocus Sequence Typing or MLST (Maiden et al, 1998). In this method, the nucleotide sequence of 6-7 loci is determined and used to derive an array of allele profiles in these loci. A new combination of allele profiles corresponds to a new sequence type which is uploaded to a webserver for easy access. Typing schemes, with detailed laboratory protocols, proficiency tests, and full information on identified sequences types are available for tens of bacterial species in general and specific web-servers (see, for instance, http://www.PubMLST.org). For many pathogens, the availability of a MLST scheme represented a more than significant change in the analysis of outbreaks. This method quickly became the new “gold-standard” for typing pathogens and replaced previous methods. However, for a few but important pathogens no MLST scheme revealing enough genetic variation for effectively distinguishing among nonepidemiologically linked isolates could be designed. These pathogens include the causative agents of plague (Yersinia pestis), anthrax (Bacillus anthracis), tuberculosis (Mycobacterium tuberculosis) and leprosy (Mycobacterium leprae), among others, and are collectively known as “genetically monomorphic bacteria” (Achtman, 2012). Specific typing methods such as insertion sequence RFLP and MIRU-VNTR were applied to M. tuberculosis, the pathogenic bacteria with the highest incidence and causing more deaths every year in the history of humankind. In these and other cases, the solutions adopted relied on very fast evolving markers, which are usually prone to homoplastic changes, thus resulting in some false positive identifications of phenotypic identities as indicative of very recent ancestry. Although this is not a problem in most settings, it became evident that the same logic applied in using MLST could be extended to the complete genome sequences to attain “perfect” accuracy by using all the genetic information in the isolates and not only a small sample from it. 2 This approach was first used in an outbreak setting in the investigation of the letters covered with anthrax spores in the aftermath of the 9/11 attacks in the USA. Complete genome sequences were obtained from a B. anthracis isolate derived from one of the victims and one reference strain, providing 60 SNPs that could be used subsequently to probe the common origin of the strain used in the bioterrorist attacks (Read et al, 2002). This work clearly showed that using the complete genome sequence was a more effective method for comparing isolates even in almost completely monomorphic species. However, Sanger sequencing is rather slow and painstaking as a result of the need to cut or amplify the genome in small pieces that are subsequently sequenced and assembled into a complete genome sequence. This situation changed dramatically with the introduction of new sequencing methods, then known as “nextgeneration sequencing” technologies. They offered several advantages over the traditional Sanger method (Medini et al, 2008). At the same time, other problems arose, such as the difficulties in handling and analyzing very large volumes of data, a myriad of programs and methods to analyze them, and new conceptual challenges in the interpretation of the results. In this chapter we provide a brief overview of the different next-generation sequencing platforms and methods currently available for deriving complete genome sequences from bacteria, the main results in terms of the epidemiological and evolutionary advances that have resulted from their application to bacterial outbreaks and transmission networks, and provide a more detailed analysis of two cases, the analysis of Legionella pneumophila outbreaks and of M. tuberculosis transmission networks. High throughput sequencing technologies in outbreak investigations Several high throughput sequencing platforms have been applied to the genomic study of both bacterial and virus pathogens. Encouraged by the increasing need of sequencing human genomes, three technologies were almost simultaneously released from different companies: 454 (Roche, introduced in 2005 and discontinued in 2016), Solexa (Illumina, introduced in 2006), and SOLiD (Life Technologies, introduced in 2006). These platforms share a general workflow, based on the idea of performing billions of sequencing reactions simultaneously. These are produced through molecular amplification of DNA fragments that are previously attached to a solid surface. These have been enhanced in their subsequent updates to increase both sequencing quality and throughput (Figure 1). Although 454 was the first released platform, its use has mainly been relegated to metagenomic studies (Schlüter et al, 2008b;Schlüter et al, 2008a;Ghai et al, 2010) because of its long reads and relatively high error rates, which complicates the study of transmission chains or related cases during outbreak investigations. However, it has been used as the main technology in several studies (Lewis et al, 2010;Kennemann et al, 2011;Loman and Constantinidou, 2013) and also following mixed strategies involving the usage of 454 reads as scaffolds and posterior error correction using Illumina (McAdam et al, 2012;Hasan et al, 2012). SOLiD has been the least used for outbreak investigations due to shorter and lower quality reads. As an example, it has been punctually applied in the investigation of L. pneumophila outbreaks in an endemic locality in Spain (Sánchez-Busó et al, 2014), Mycobacterium abscessus subsp. bolletii in Brazil and UK outbreaks (Davidson et al, 2013) or Coccidioides immitis producing coccidioidomycosis in transplanted patients in Los Angeles (Engelthaler et al, 2011). By far, Illumina has been the most 3 widely used platform because of its high quality and sensible sized reads, which allow more accurate mapping and SNP calling. A thorough summary of the application of different sequencing technologies to analyze different mainly bacterial outbreaks is shown in Table 1. In 2010, the Ion Torrent (Life Technologies) platform, a new benchtop device with a different sequencing strategy was commercialized. This technology is based on monitoring pH changes in multi-well plates. A single reaction occurs per well so that when a hydrogen atom is released after the incorporation of each nucleotide during amplification, the pH in the media changes in a nucleotide-specific manner, so that the system is able to translate chemical into digital information. Reads produced by the Ion Torrent were of relatively good quality and was punctually applied to the study of Escherichia coli outbreaks (Mellmann et al, 2011;Holmes et al, 2015) and Pseudomonas aeruginosa (Snyder et al, 2013;Witney et al, 2014). In early 2011, the PacBio RS system was also released, being the first platform performing Single Molecule Real Time (SMRT) sequencing, which is being increasingly applied to complete microbial genomes because of the long read lengths (Mutreja et al, 2011). But the definite current revolution in sequencing technologies with an impact in public health has been the release of the Oxford Nanopore MinION platform, currently in test mode, and scalable in the form of the GridION platform. These contain a membrane with millions of embedded nanopores coupled with a polymerase. Changes in the electrical conductivity in the membrane as the different four bases pass through the nanopore are measured, allowing sequencing in real time. Specifically, the MinION platform is an USB-like device which can be connected directly to a computer and provide the sequences from extracted DNA in real time after a very simple library preparation. The portable MinION platform has been shown to be useful in real-time outbreak investigations, such as the 2015 Ebola virus disease epidemic in West Africa (Quick et al, 2016). The different platforms differ in their sequencing strategy, which yields different throughputs and sequence qualities. Currently, the highest throughput can be achieved with the HiSeq X Ten Illumina platform, which can yield up to 3 billion of paired-end 150 bp sequences. This high level throughput is mainly directed to population-scale human genome sequencing projects. In the case of microorganism sequencing, because their genomes are much smaller, sequencing throughput must depend on the depth of coverage required for each specific study. However, large-scale microbial sequencing projects can benefit from these high throughput platforms by multiplexing different strains in the same run. Coverage depths of 50X-100X are usually sought for base call error correction, minimizing the rate of false positive SNPs. Currently, the technologies with the lowest error rates are Illumina platforms, and the highest error rate from raw data is provided by Oxford Nanopore and PacBio platforms. However, bioinformatics pipelines for error correction during the post-processing of reads improve these rates, especially in the second case, in which the current final error rate can get as low as 1E-05. Multiple reviews on the characteristics of the different sequencing technologies, applications, advantages and drawbacks have been published in the literature up to now (Metzker, 2010;Casey et al, 2013;Ekblom and Wolf, 2014). Choosing the most appropriate sequencing technology depends on the scope of the study. High throughput technologies can be applied in different steps during an outbreak investigation (Köser et al, 2012); from the detection and identification of the pathogen in direct uncultured samples (i.e. blood, sputum, etc.), epidemiological typing and detection of mutations associated to drug susceptibility to the study of transmission chains and potential super-spreaders. 4 Achievements and limitations of NGS in outbreak investigations Initial results. Although NGS techniques and devices became available around 2005 (Loman and Pallen, 2015), it took a few more years until the new technologies were firstly applied to analyze an outbreak. This corresponded to an outbreak of methicillin-resistant Staphylococcus aureus (MRSA) (Harris et al, 2010). They analyzed a set of 63 isolates from two origins, a global collection of 43 samples collected between 1982 and 2003, and 20 isolates from a Thai hospital sampled in a very short time period (months), suspected to correspond to a transmission chain. Their results provided evidence for the international spread of the resistant clone of S. aureus and the single origin of the samples from the hospital. But they also showed that bacteria can and do evolve rapidly. They estimated that in the core genome, the set of shared positions among all the studied isolates, the rate of divergence was about 1 SNP every 6 weeks. This explained the lack of identity among most hospital isolates, which differed in a few SNPs from each other, but it also revealed differences from the patterns of evolution revealed by other markers, such as spa and PFGE. Of note, the analysis of complete genomes showed that over a quarter of the homoplasies found among the isolates were directly related to the evolution of resistance to antibiotics. At about the same time, Lewis et al. (2010) used complete genome sequences to establish relationships among otherwise indistinguishable strains of Acinetobacter baumannii which had cause a small outbreak at a British hospital. The SNPs found by WGS allowed the investigators to discriminate among alternative epidemiological hypotheses. These pioneering studies have been followed many studies (Table 1) which have dealt with outbreaks and transmission networks of over 30 different bacteria species infecting humans. An even larger number of works have been published about viral infections (not included in this review) and a few have dealt with fungal infections. Two particular bacteria, M. tuberculosis and L. pneumophila, the main etiological agents of tuberculosis and legionellosis, respectively, are analyzed in more detail below, but some general patterns and conclusions have started to emerge from the analysis of more than 30 pathogenic bacteria, that we briefly review next. From retrospective to real time analysis of outbreaks. We have previously commented that the molecular analysis of outbreaks and transmission networks is necessarily a complement to the epidemiological investigations leading to the identification and control of the source(s), vectors or routes so to put a fast stop to ongoing processes. Hence, it is very important that the information obtained from the molecular analyses can be shared with the epidemiology team for a better evaluation of the total evidence available thus far and more appropriate and accurate decisions can be adopted. The initial methodologies available for WGS were very labor intensive and the shortest time since a sample was obtained until its complete sequence could be determined was in the order of weeks. Too long for a pressing demand of action. However, the advent of new technologies, such as Ion Torrent PGM and, more recently, MinION have changed this situation. Both methods can deliver sequence information within a few hours of gaining access to the sample, thus allowing a very rapid communication of results to field workers. The first case in which these new technologies were applied during the investigation of the source of an outbreak was that an enteroaggregative Escherichia coli O104:H4 strain that affected several European countries in the spring of 2011 (Mellmann et al, 2011). Complete genome sequences were obtained from a representative isolate of the outbreak and a reference strain which produced similar clinical features in just 62 hours. The comparison revealed key differences in plasmid and gene contents between the strains, indicating that the outbreak was 5 due to a new and not a previously circulating strain of the bacterium. It also allowed the design of a test to be applied for quick diagnostic in any lab. Loss of identity as hallmark of relatedness. One consequence of using complete genome sequences for the analysis of outbreaks and transmission chains is the necessary dismissal of complete identity as the proof of charge in considering two or more isolates as linked to the same transmission event or episode. This was usually the case for most previous markers which explored only a minor fraction of the nucleotides in the genome of the pathogenic bacteria. Except for a few rapidly evolving markers, usually associated to tandem repeats, the number of differences expected between two isolates depends on three factors: the mutation rate per site, the number of sites being compared and the time since they diverged from their last common ancestor. When the number of generations since divergence is relatively small, as in outbreaks and most transmission networks, and the number of sites being sampled is also small, the probabilities of finding a SNP (or a different allele in the case of MLST) are also very small. However, using complete genome sequences, and assuming that the previous assumptions remaining identical, will increase those probabilities in a three-fold factor or more, because the number of sites interrogated is now in the order of millions instead of tens or hundreds. Within-host evolution. In addition, the exploration of complete genome sequences of long- or chronically-infecting bacteria has shown that evolution does occur within hosts at relevant rates for being reflected in some nucleotide changes (Didelot et al, 2016). Even for pathogens that produce acute infections, a low per site mutation rate is compensated by the large number of nucleotides present in a genome and the different random and directional processes that occur in an infected individual, thus leading to some new mutations arising in many newly replicated genomes (Kennemann et al, 2011;Mathers et al, 2015). If the infection last longer or becomes chronic, the chances that changes occur in the pathogen are very high and additional evolutionary processes such as compartmentalization may contribute to within patient differentiation of bacterial sub-populations. These processes have important consequences at different levels. On the one hand, a variable population can adapt more rapidly to new environmental conditions which might include new treatments or an adaptive immune response by the host (Mwangi et al, 2007). On the other hand, a variable population will result in different initial compositions in successive transmission events, which will be reflected in differences among the populations established in the new hosts. The analysis of transmission networks becomes more complicated because using a single genome sequence per host cannot reveal the whole range of variation present within it (Worby et al, 2014). Under these circumstances, the use of evolutionary methods to reveal the common ancestry of isolates derived from patients presumably included in the same network becomes an absolute necessity. Mutation patterns and processes. Apart from revealing larger amounts of variation than anticipated from previous studies with just a few gene sequences, whole genome sequences have also informed about the types and distribution of mutational changes occurring at different time-scales. A few years ago, the contribution of homologous recombination and horizontal gene transfer to genetic variation in bacterial genomes was found to be considerably more important than previously thought (Doolittle, 1998). But this was thought to be the result of millions of generations in which a generally rare process might have been acting. In shorter timescales, months or years, the impact of processes generating variation other than point mutation was thought to be negligible except for loci including repeat units, such as in MIRU-VNTRs in M. tuberculosis, in which slippage-and-mispairing during replication often lead to new alleles. 6 Recent analyses at the complete genome level have shown that this view is incorrect, at least for some bacteria such as Neisseria gonorrhoeae, Salmonella enterica or L. pneumophila (Didelot and Maiden, 2010;Sánchez-Busó et al, 2014). In fact, a comparison of the relative effects of recombination and point mutation in almost 50 bacterial species revealed variation of three orders of magnitude (Vos and Didelot, 2009). Although there are not quantitative estimates yet, horizontal gene transfer, with or without final stabilization in the receiving genome, is also known to play a significant role in the short term evolution of many bacteria, as unfortunately shown by the ease of spread of many antibiotic resistance genes across species. The additional variation introduced by these processes has to be considered when analyzing large transmission networks or long-lasting outbreaks, because the incorporation of these new variants may confound inferences of recent ancestry based on overall similarity or on a few loci. Rates of evolution. The increased availability of complete genome sequences from bacteria with a more or less direct epidemiological link has also provided an opportunity for a more detailed study of evolutionary processes at the population genomic level. Apart from the different types of variants introduced in these populations, the access to asynchronically sampled isolates allows the application of Bayesian methods to estimate evolutionary rates (Drummond et al, 2006). These methods can accommodate strict and relaxed clock models, different demographic regimes, as well as variation in rates among lineages, thus allowing the estimation of relevant evolutionary parameters from organisms with different natural and evolutionary histories. Most often they are applied to rapidly evolving organisms, collectively known as measurably evolving populations (Drummond et al, 2003;Biek et al, 2015), which mainly include viruses along with some bacteria. But the methods are also valid for more slowly evolving organisms with sampling dates different enough as to provide estimates of the evolutionary rate. Recently, this approach has been used with bacterial genomes obtained from ancient samples (Schuenemann et al, 2013;Bos et al, 2014;Mendum et al, 2014;Rasmussen et al, 2015;Bos et al, 2016;Maixner et al, 2016). One apparent feature of the estimates of bacterial evolutionary rates is the negative correlation between the time to the most recent common ancestor of the sample studied and the inferred evolutionary rate (Figure 1). Higher evolutionary rates at short times can be explained by the relative inefficiency of natural selection and/or genetic drift in the removal of neutral or quasineutral polymorphisms which are continuously arising in bacterial populations. Hence, transitional polymorphisms contribute significantly to the apparent acceleration of evolutionary rates in short time-scales. At the same time, they also provide a wealth of variation what might have an adaptive value if the circumstances are appropriate. On the long run, many of these transient variants will have disappeared and evolutionary rates are reduced correspondingly. This negative correlation has to be taken into account when comparing rates across studies, even for the same species, and in the inference of other evolutionary parameters (Biek et al, 2015). The analysis of (almost) complete genome data. One of the main advantages of MLST or SBT over alternative methods for the analysis of pathogenic bacteria in the context of outbreaks and transmission chains is the objectivity and simplicity in the specification of the variants found in any isolate. The nucleotide sequences obtained for each locus are compared to a predetermined database in which previous homologous sequences have been deposited. If there is a perfect match, the newly determined variant received the same identifier as the pre-existing one. If that is not the case, curators of the database will assign a new code to the variant. The combination of allele codes in the loci included in the typing scheme is summarized in a sequence type (ST) 7 with a different number of each combination of variants. This procedure is easily communicated because it requires the identification of nucleotide variants, usually through Sanger sequencing, in just 6 or 7 loci. However, the advent of NGS and the determination of complete genome sequences makes this procedure of denoting the variants impractical. Several alternative have already been proposed for the identification of complete genome sequences for epidemiological analysis. One method consists in extending the MLST naming scheme to more loci, eventually all the loci in the genome of the corresponding species, thus leading to “whole genome MLST” (wgMLST) schemes (Cody et al, 2013). The first proposal of wgMSLT was done for Campylobacter isolates and the initial MLST scheme based on 7 loci was extended to 1667 loci, although this number was reduced to 1026 when only those present in all the isolates analyzed were considered. This represents the “core genome” of the species, which is complemented by the “auxiliary genome”, the set of loci which are present in some but not all the isolates of a species. In light of the very large genome plasticity of many bacterial species, fixed compositions of the core and auxiliary genomes are almost impossible, which creates an additional problem for the stability of the scheme. Nevertheless, this approach has gained some popularity and cgMLST (“core genome MLST”, a reduced version of wgMLST as described above) schemes are now available for several pathogens including S. aureus, Listeria monocytogenes, Enterococcus faecium (de Been et al, 2015), and S. enterica (Taylor et al, 2015), among others. To prevent the proliferation of STs which inevitably accompanies wgMLST or cgMLST, a first level classification of STs into clusters or clonal groups is usually performed (Cody et al, 2013;Qin et al, 2016). These can be based on an extension of the BURST method (Feil et al, 2001;Feil et al, 2004), which considers as variants of the same clonal group to those that differ in one single locus of the original MLST scheme, or use more sophisticated approaches based on the population genetic analysis of the actual SNPs detected in the loci included in the wgMLST or cgMLST (Qin et al, 2016) with different molecular population methods such as BAPS (Corander and Tang, 2007) or STRUCTURE (Rosenberg et al, 2002). These methods share the advantage of portability thus allowing comparisons among different laboratories and needs. However, they also discard important information, eventually crucial, contained in the auxiliary genome. Hence, although standard typing schemes are useful, whole genome sequence information should not be reduced to a ST number or complex under a wgMLST and the complete data should still be available for future use by the scientific community. Outbreak investigation in Mycobacterium tuberculosis: the genome as an epidemiological marker Mycobacterium tuberculosis is the main causative agent of human tuberculosis in the world. Every year more than 1.5 million persons die of tuberculosis, more than of any other infectious disease (WHO, 2014). The epidemiology of the disease has to take into account the natural history of the bacteria. It is an obligate human pathogen with very effective airborne transmission and that typically infects the lungs. It is estimated that one third of the human population is infected by the bacilli and this explains why every year around 9 million new cases are declared. In most cases the initial infection derives in an asymptomatic state called latency in which the bacteria have not been eliminated but are controlled by the immune system. In 58 10% of the latent cases the disease progresses to an active state in which the bacteria actively replicate and cause pulmonary disease. Only an active tuberculosis case can transmit the disease and thus in tuberculosis, disease and transmission are linked. The typical window of progress to active disease after infection is two years but the bacteria may remain latent for years or even decades. Mycobacterium tuberculosis has been traditionally regarded as a monomorphic organism due to the low genetic diversity found among representative strains datasets (Achtman, 2008). Thus epidemiological tools were developed based on fast evolving genetic elements (Barnes and Cave, 2013). Typing of the insertion sequence IS6110 by RFLP and of minisatellites, called MIRUVNTR, are the two gold standards in tuberculosis molecular epidemiology and, together with spoligotyping, based on the CRISPR region of the bacteria, have allowed to define successful M. tuberculosis clones. Among these clones, the identification of an hypervirulent clade, called Beijing family, has attracted much attention (Parwati et al., 2010). Strains from the Beijing family are more common in East Asia but can be identified across the globe. Experimental and epidemiological research have identified Beijing strains as hypervirulent in the mice model of infection and with frequent association to drug resistance in humans. In South Africa Beijing strains have been on the rise for the last 40 years (Cowley et al., 2008). Beijing strains belong to one of the seven lineages of human tuberculosis strains (Comas et al., 2013). The most common is lineage 4, which is highly frequent in Africa, Europe and America. There is a strong association between lineages and their geographic origin, being the most extreme cases the two lineages of Mycobacterium africanum, that can only be found in West Africa (De Jong et al, 2010), and Lineage 7 recently described in Ethiopia (Comas et al., 2013). Regardless the lineage, drug resistance to first and second line treatments have been identified (Farhat et al., 2013). The mutations responsible for drug resistance are always chromosomal mutations because there is no ongoing horizontal gene transfer in M. tuberculosis. Although ecological theory predicts that drug resistance mutations have a fitness cost, experimental evolution and molecular epidemiology have shown that different drug resistance mutations have different fitness costs (Comas et al., 2012). As a consequence, multidrug-resistance cases (MDR-TB) among people never treated before, and therefore due to transmission, are on the rise and in some particular areas represent more than 50% of the tuberculosis burden of the region. Although not part of this review whole genome sequencing is allowing to define the set of mutations associated to resistance to the different antibiotics but also the genotype of highly successful MDR-TB strains. The first study that showed the potential of the genome as an epidemiological marker dates back to 2009 (Niemann et al, 2009). In this study, three strains which looked almost identical using traditional molecular epidemiology markers such as restriction fragment length polymorphisms (RFLP) and minisatellite (MIRU-VNTR) were shown to differ in more than 100 SNPs. Later on, Jennifer Gardy and collaborators (2011) used genome comparison techniques to solve an on-going outbreak in British Columbia suspected to have started in the early 1990s. By combining genomic, epidemiological and social contact data the authors showed that it can be gained get a better resolution of the transmission events within transmission clusters. Such events are very difficult to identify with traditional molecular epidemiology markers. This work already defined index cases associated to multiple secondary cases, also denoted as superspreaders. Super-spreaders are becoming a common topic when analyzing large transmission clusters (Walker et al, 2013b) instead of the traditional view of a stepwise "chain" of transmission. 9 From 2010, NGS has been successfully applied to deeply resolve tuberculosis outbreaks. Considerably attention has been paid to understand those outbreaks that have been on-going over years. For example, a large outbreak in Hamburg, Germany, was identified by classical genotyping data in 1996 (Roetzer et al, 2013). However, clustering data not always correlated with epidemiological and geographical information leading to the suspicion that the outbreak was more complex than previously anticipated. By whole genome sequencing of 86 strains from the outbreak (1996-2011), Roetzer et al. (2013) were able to identify an independent transmission network, thus confirming the non-clonality of the outbreak. Two clusters were determined, one starting in 1997 and the other starting in 2010, much more in agreement with epidemiological investigations. Therefore, one important application of whole genome sequencing to investigate tuberculosis outbreaks is to ability to assign with higher confidence cases to the outbreak and exclude those that, albeit genetically close, correspond to a different chain of events. Similarly, in Bern, Switzerland, a genotype detected by RFLP profiling caused a large number of tuberculosis cases during the 1990’s (Stucki et al, 2015). The cases were associated to the typical risk factors in local populations found in European cities such as HIV infection or alcoholism. Stucki et al. (2015) sequenced the complete genome of strains belonging to the original outbreak along with local control strains. By comparing outbreak and control strains they designed a realtime SNP typing assay based on the detection of genome position with a polymorphism specific to the outbreak strains. Next, they typed a retrospective collection of isolates of the Canton of Bern from 1993 to 2011. They were able to identify 68 additional cases of the outbreak based on the presence of the mentioned SNP including cases from 2011. Therefore, the combination of whole genome sequencing and SNP typing allowed them to identify cases associated to the outbreak and find that the outbreak that started in early nineties was still on-going at the time of investigation. In addition, they obtained the whole genome sequence of all the isolates assigned to the outbreak. With this information, they were able to resolve the individual transmission patterns for 75% of the strains. Importantly, 66 out of the 68 strains had exactly the same RFLP pattern. Furthermore, the analysis of the transmission network together with the epidemiological information revealed two different sub-outbreaks initiated by two different "super-spreaders". Therefore, next generation sequencing of the Hamburg (Roetzer et al, 2013), the Bern outbreak (Stucki et al, 2015) and others (Török and Peacock, 2012;Smit et al, 2015;Lee et al, 2015) have revealed the complexity of tuberculosis outbreaks. Given that tuberculosis is not an acute disease and that a tuberculosis case can be latent, asymptomatic for years, the true extent of tuberculosis outbreaks can only be revealed by a sustained genotyping efforts over years. Furthermore, as in the case of the Bern outbreak, whole genome sequence data can be used to design new diagnostics and/or surveillance tools. A similar approach has been used to prospectively identify new outbreak-associated cases in sputum samples (Pérez-Lago et al, 2015). Apart from specific outbreaks, genomic epidemiology has been used in a population-based scale to evaluate its utility for surveillance and diagnostics. In a series of publications starting in 2012, Public Health England has applied next generation sequencing to incorporate whole genome sequencing as the default typing method of Mycobacterium tuberculosis in the United Kingdom (Walker and Beatson, 2012;Walker et al, 2014). They have shown that the genome data allow to delineate outbreaks better than MIRU-VNTR analyses. Furthermore, in an attempt to derive a rule of thumb to identify a transmission event between two cases they also sequence several 10 isolates from the same patient and known household contacts. They were able to identify a threshold of five SNPs when the cases had a confirmed epidemiological link and they proposed a threshold of up to 12 SNPs for casual transmission in the community (Walker and Beatson, 2012). Other studies have found a similar distribution of SNPs when analyzing transmission events in populations (Bryant et al, 2013a;Casali et al, 2014). However, we are still blinded about how these thresholds apply to different clinical settings than the low-burden countries of Europe. In high-burden countries delineating transmission clusters should be more difficult if public health interventions cannot stop transmission events (Yates et al, 2016). Thus, the circulating strains may be participating at the same time in several clusters. The only population-based study published in a high-burden country shows that the threshold described in (Pérez-Lago et al, 2015) may be useful, although more work will be needed to generalize the results to, for example, large urban areas. There are several factors that may distort the proposed threshold values. One of these factors is mixed infections. The true extent of co-infections in high-burden countries is not clear and there is hope that whole genome data can distinguish between relapses and re-infections (Bryant et al, 2013a;Guerra-Assunção et al, 2015). This issue is critical to delineate transmission in high burden countries but also for clinical trials investigations because relapse is one of the end points of those investigations. However, it is the diversity that can be found during infection from a single strain what is attracting more research and attention. From drug susceptibility clinical data, it has been clear for decades that several populations may co-exist in the same patient. These subpopulations were flagged due to inconsistent results in drug resistance susceptibility tests between isolates of the same patient (Rinder et al, 2001). Whole genome sequencing has shown that, in fact, this is the case and what is recovered from a sputum sample is often a mix of different sub-populations (Sun et al, 2012). These sub-populations can be revealed by looking at positions in which a mutant and a wild-type allele can be identified at the same time. In the context of drug resistance, it has been shown that several drug resistant subpopulations may co-exist and compete and that their frequencies may change over time (Liu et al, 2015). A similar phenomenon has been shown outside the context of drug resistance. The issue of within patient diversity not only has clinical and diagnostic implications. If several subpopulations co-exist and accumulate a different number of SNPs then chances are that the epidemiological investigation of outbreaks may be distorted by the isolate chosen for the analysis (Walker et al, 2013a;Walker et al, 2013b). An analysis of cases in which higher than expected diversity was expected confirmed that, although the thresholds proposed to delineate a transmission event are in general valid, there are epidemiologically cases in which a larger than expected number of SNPs can be found (Pérez-Lago et al, 2014). How frequent are those "outliers" is a matter of on-going investigation. High throughput investigation of Legionella pneumophila outbreaks High throughput sequencing can also be used to study organisms with higher level of polymorphism and strictly environmental, contrary to Mycobacterium tuberculosis. This is the case of L. pneumophila, causative agent of Legionellosis, and for which there is only one report of a possible person-to-person transmission (Correia et al, 2016) up to date. This opportunistic pathogen can produce pneumonia after inhalation of aerosols with enough bacterial load, with 11 the highest burden in warm water-related environments. The first reported outbreak dates from 1976 when more than hundred legionnaires were infected in a convention in Philadelphia (Fraser et al, 1977). A legionellosis outbreak is defined as a cluster of more than three cases occurring at the same place and time and the epidemiological investigation is crucial to find the environmental sources. The investigation of legionellosis outbreaks has traditionally been conducted by using biochemical or molecular methods that allows comparing the clinical isolates with the strains obtained from the environment (Fields et al, 2002). Broad techniques such as serogrouping benefited from genetic methods that provided improved resolution in the so-called SequenceBased Typing (SBT) (Gaia et al, 2003;Gaia et al, 2005), based on Multi-Locus Sequence Typing (MLST) approach (Urwin and Maiden, 2003) but incorporating virulence genes in the scheme to increase the discrimination power among strains. However, although SBT provided researchers with a tool that allowed the classification of strains into groups (Sequence Types, STs), the introduction of high-throughput sequencing techniques for microbial analysis and outbreak investigations in other species derived in its application to legionellosis outbreaks because of its increased discrimination power. The first published work was indeed a pilot study to test the potential of whole-genome sequencing (WGS) on the discrimination between isolates from an outbreak produced in the UK in 2003 and non-outbreak related strains (Reuter et al, 2013b). From this point, a number of other outbreaks have been analyzed using WGS, as for example an outbreak of ST62 associated to a cooling tower in Quebec City in 2012 (Lévesque et al, 2014) or a massive outbreak that occurred in Edinburgh (UK, 2012) related to multiple STs and including mixed infections (McAdam et al, 2014). WGS has also been used to investigate the persistent infection history of ST23 in a hotel in Spain in 2012 (SánchezBusó et al, 2016) and the eradication of L. pneumophila associated to a hospital in Australia that have been responsible of nosocomial cases (Bartley et al, 2016). The environmental source of legionellosis cases has been historically difficult to trace, and because of the high social and economic impact of this kind of outbreaks on the affected populations, public health interventions are obliged to be rapid and accurate. WGS has shown further variability within many STs (Underwood et al, 2013a;Sánchez-Busó et al, 2014), showing evidence that at least some of them are not clonal. This observation complicates the study of legionellosis outbreaks and was the leading aim in the study by Sánchez-Busó et al. (2014). In this work, 69 isolates including strains associated to 13 different outbreaks and sporadic cases occurred in a single locality (Alcoy, Spain) during more than 10 years (1999-2010) were analyzed by high throughput sequencing. Different STs were included, with special interest on ST578 cases, which had been recurrently reported as the causing ST of most of those outbreaks (Coscollá et al, 2010). The analysis showed two main lineages within the endemic ST578, more than 1,000 SNPs apart from each other. Not all the strains from the same outbreak clustered together, revealing the non-clonality of the isolates, as these were phylogenetically grouped independently of their source (clinical or environmental), sampling date or outbreak. Because ST578 is known to be endemic in the area of Alcoy, these results suggest that it is indeed very complicated to find an infectious source using just molecular data in endemic areas. These should be used together with the epidemiological investigation to be able to draw the accurate conclusions that public health interventions require. 12 Other interesting fact that this work shows is that the genomic data can reflect public health actions along time. As an example, using Bayesian inference, an estimate of the ST578 population dynamics revealed a decreased population size between 2006 and 2008, which correlated with a moment in which public health measures were taken in the city by removing high-risk installation from the city center. In the case of organisms where person-to-person transmission is very rare or even inexistent, whole genome sequencing can provide the most discriminant tool to link clinical cases with environmental sources, providing the accuracy that public health interventions require in these cases. But, moreover, it can help understand how outbreaks occur, which is the starting line to be able to predict and even prevent their occurrence. Conclusion Complete genome analysis of bacterial pathogens is still far from being the usual method for analyzing outbreaks and transmission networks, although it will not take long before it does so. The increasing speed, ease and reliability as well as the reduced costs associated to new highthroughput sequencing technologies point to that direction. But gaining information is only a part of the process. More data also mean an increased need for interpretative tools at all levels, from the mere analysis of reads to the inference of the evolutionary and genealogical relationships among the isolates. Progress is still pending at all levels, from the technology to obtain, fast and cheap, complete genome sequence data of a specific pathogen from an infected individual or a potential vector o source to analytical tools capable of extracting the relevant information from the deluge of data generated by high-throughput sequencers and for the integration of this information with the clinical, epidemiological and evolutionary information which are needed when they have to be interpreted in the appropriate context. Acknowledgements We thank Dr. Pierre Pontarotti for his kind invitation to write this chapter. This work has been funded by project BFU2014-58656-R from MINECO (Spanish Government) to FGC. IC is supported by Ramón y Cajal Spanish research grant RYC-2012-10627, MINECO research grant SAF2013-43521-R, and the European Research Council (ERC) (638553-TB-ACCELERATE). BB has been recipient of a Beca de Colaboración from the Spanish Ministerio de Educación y Cultura. 13 References Literature Cited Achtman M. (2008) Evolution, Population structure, and phylogeography of genetically monomorphic bacterial pathogens. Annu Rev Microbiol 62:53–70. Achtman M (2012). Insights from genomic comparisons of genetically monomorphic bacterial pathogens. Phil Trans R Soc B 367:860-867. Ahmed SA, Awosika J, Baldwin C, Bishop-Lilly KA, Biswas B, Broomall S, Chain PSG, Chertkov O, Chokoshvili O, Coyne S, Davenport K, Detter JC, Dorman W, Erkkila TH, Folster JP, Frey KG, George M, Gleasner C, Henry M, Hill KK, Hubbard K, Insalaco J, Johnson S, Kitzmiller A, Krepps M, Lo CC, Luu T, McNew LA, Minogue T, Munk CA, Osborne B, Patel M, Reitenga KG, Rosenzweig CN, Shea A, Shen X, Strockbine N, Tarr C, Teshima H, van Gieson E, Verratti K, Wolcott M, Xie G, Sozhamannan S, Gibbons HS, Threat Characterization Consortium (2012). Genomic Comparison of Escherichia coli O104:H4 Isolates from 2009 and 2011 Reveals Plasmid, and Prophage Heterogeneity, Including Shiga Toxin Encoding Phage stx2. PLoS ONE 7:e48228. Allard MW, Luo Y, Strain E, Li C, Keys CE, Son I, Stones R, Musser SM, Brown EW (2012). High resolution clustering of Salmonella enterica serovar Montevideo strains using a nextgeneration sequencing approach. BMC Genomics 13:1. Allard MW, Luo Y, Strain E, Pettengill J, Timme R, Wang C, Li C, Keys CE, Zheng J, Stones R, Wilson MR, Musser SM, Brown EW (2013). On the evolutionary history, population genetics and diversity among isolates of Salmonella Enteritidis PFGE pattern JEGX01.0004. PLoS ONE 8:e55254. Azarian T, Cook RL, Johnson JA, Guzman N, McCarter YS, Gomez N, Rathore MH, Morris JGJ, Salemi M (2015). Whole-Genome Sequencing for Outbreak Investigations of MethicillinResistant Staphylococcus aureus in the Neonatal Intensive Care Unit: Time for Routine Practice? Infection Control & Hospital Epidemiology 36:777-785. Barnes PF, Cave MD. (2003). Molecular epidemiology of tuberculosis. N Engl J Med 349:1149– 1156. Bartley PB, Ben Zakour NL, Stanton-Cook M, Muguli R, Prado L, Garnys V, Taylor K, Barnett TC, Pinna G, Robson J, Paterson DL, Walker MJ, Schembri MA, Beatson SA (2016). Hospital-wide eradication of a nosocomial Legionella pneumophila serogroup 1 outbreak. Clin Infect Dis 62:273-279. Bekal S, Berry C, Reimer AR, Van Domselaar G, Beaudry G, Fournier E, Doualla-Bell F, Levac E, Gaulin C, Ramsay D, Huot C, WAlker M, Sieffert C, Tremblay C (2016). Usefulness of HighQuality Core Genome Single-Nucleotide Variant Analysis for Subtyping the Highly Clonal and the Most Prevalent Salmonella enterica Serovar Heidelberg Clone in the Context of Outbreak Investigations. J Clin Microbiol 54:289-295. Bennett JS, Jolley KA, Earle SG, Corton C, Bentley SD, Parkhill J, Maiden MCJ (2012). A genomic approach to bacterial taxonomy: an examination and proposed reclassification of species within the genus Neisseria. Microbiology 158:1570-1580. 14 Biek R, Pybus OG, Lloyd-Smith JO, Didelot X (2015). Measurably evolving pathogens in the genomic era. Trends in Ecology & Evolution 30:306-313. Blouin Y, Cazajous G, Dehan C, Soler C, Vong R, Hassan MO, Hauck Y, Boulais C, Andriamanantena D, Martinaud C, Martin É, Pourcel C, Vergnaud G (2014). Progenitor "Mycobacterium canettii" clone responsible for lymph node tuberculosis epidemic, Djibouti. Emerging Infectious Diseases 20:21-28. Bos KI, Harkins KM, Herbig A, Coscolla M, Weber N, Comas I, Forrest SA, Bryant JM, Harris SR, Schuenemann VJ (2014). Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis. Nature 514:494-497. Bos KI, Herbig A, Sahl J, Waglechner N, Fourment M, Forrest SA, Klunk J, Schuenemann VJ, Poinar D, Kuch M, Golding GB, Dutour O, Keim P, Wagner DM, Holmes EC, Krause J, Poinar HN (2016). Eighteenth century Yersinia pestis genomes reveal the long-term persistence of an historical plague focus. eLife Sciences12994. Bryant J, Schurch A, van Deutekom H, Harris S, de Beer J, de Jager V, Kremer K, van Hijum S, Siezen R, Borgdorff M, Bentley S, Parkhill J, van Soolingen D (2013a). Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data. BMC Infectious Diseases 13:110. Bryant JM, Grogono DM, Greaves D, Foweraker J, Roddick I, Inns T, Reacher M, Haworth CS, Curran MD, Harris SR, Peacock SJ, Parkhill J, Floto RA (2013b). Whole-genome sequencing to identify transmission of Mycobacterium abscessus between patients with cystic fibrosis: a retrospective cohort study. The Lancet 381:1551-1560. Brzuszkiewicz E, Th++rmer A, Schuldes J+, Leimbach A, Liesegang H, Meyer FD, Boelter J+, Petersen H, Gottschalk G, Daniel R (2011). Genome sequence analyses of two isolates from the recent Escherichia coli outbreak in Germany reveal the emergence of a new pathotype: EnteroAggregative-Haemorrhagic Escherichia coli (EAHEC). Archives of microbiology 193:883-891. Cao G, Meng J, Strain E, Stones R, Pettengill J, Zhao S, McDermott P, Brown E, Allard M (2013). Phylogenetics and differentiation of Salmonella Newport lineages by Whole Genome Sequencing. PLoS ONE 8:e55687. Casali N, Nikolayevskyy V, Balabanova Y, Harris SR, Ignatyeva O, Kontsevaya I, Corander J, Bryant J, Parkhill J, Nejentsev S, Horstmann RD, Brown T, Drobniewski F (2014). Evolution and transmission of drug resistant tuberculosis in a Russian population. Nat Genet 46:279-286. Casali N, Nikolayevskyy V, Balabanova Y, Ignatyeva O, Kontsevaya I, Harris SR, Bentley SD, Parkhill J, Nejentsev S, Hoffner SE, Horstmann RD, Brown T, Drobniewski F (2012). Microevolution of extensively drug-resistant tuberculosis in Russia. Genome Research 22:735745. Casey G, Conti D, Haile R, Duggan D (2013). Next generation sequencing and a new era of medicine. Gut 62:920-932. Chewapreecha C, Harris SR, Croucher NJ, Turner C, Marttinen P, Cheng L, Pessia A, Aanensen DM, Mather AE, Page AJ, Salter SJ, Harris D, Nosten F, Goldblatt D, Corander J, Parkhill J, Turner P, Bentley SD (2014). Dense genomic sampling identifies highways of pneumococcal recombination. Nat Genet 46:305-309. 15 Chin CS, Sorenson J, Harris JB, Robins WP, Charles RC, Jean-Charles RR, Bullard J, Webster DR, Kasarskis A, Peluso P, Paxinos EE, Yamaichi Y, Calderwood SB, Mekalanos JJ, Schadt EE, Waldor MK (2011). The origin of the Haitian cholera outbreak strain. New England Journal of Medicine 364:33-42. Cody AJ, McCarthy ND, Jansen van Rensburg M, Isinkaye T, Bentley SD, Parkhill J, Dingle KE, Bowler ICJW, Jolley KA, Maiden MCJ (2013). Real-time genomic epidemiological evaluation of human Campylobacter isolates by use of whole-genome multilocus sequence typing. J Clin Microbiol 51:2526-2534. Comas I, Borrell S, Roetzer A, Rose G, Malla B, Kato-Maeda M, Galagan J, Niemann S, Gagneux S (2012). Whole-genome sequencing of rifampicin-resistant Mycobacterium tuberculosis strains identifies compensatory mutations in RNA polymerase genes. Nat Genet 44:106–10. Comas I, Coscolla M, Luo T, Borrell S, Holt KE, Kato-Maeda M, Parkhill J, Malla B, Berg S, Thwaites G, Yeboah-Manu D, Bothamley G, Mei J, Wei L, Bentley S, Harris SR, Niemann S, Diel R, Aseffa A, Gao Q, Young D, Gagneux S (2013). Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nat Genet 45:1176-1182. Corander J, Tang J (2007). Bayesian analysis of population structure based on linked molecular information. Mathematical Biosciences 205:19-31. Correia AM, Ferreira JS, Borges V, Nunes A, Gomes B, Capucho R, Gonçalves J, Antunes DM, Almeida S, Mendes A, Guerreiro M, Sampaio DA, Vieira L, Machado J, Simões MJ, Gonçalves P, Gomes JP (2016). Probable person-to-person transmission of Legionnaires' disease. New England Journal of Medicine 374:497-498. Coscollá M, Barry PM, Oeltmann JE, Koshinsky H, Shaw T, Cilnis M, Posey J, Rose J, Weber T, Fofanov VY, Gagneux S, Kato-Maeda M, Metcalfe JZ (2015). Genomic Epidemiology of Multidrug-Resistant Mycobacterium tuberculosis During Transcontinental Spread. Journal of Infectious Diseases. Coscollá M, Fenollar J, Escribano I, González-Candelas F (2010). Legionellosis outbreak associated with asphalt paving machine, Spain, 2009. Emerging Infectious Diseases 16:13811387. Cowley D, Govender D, February B, Wolfe M, Steyn L, Evans J, Wilkinson RJ, Nicol MP (2008) Recent and Rapid Emergence of W-Beijing Strains of Mycobacterium tuberculosis in Cape Town, South Africa. Clinical Infectious Diseases 47:1252–9. Croucher NJ, Finkelstein JA, Pelton SI, Mitchell PK, Lee GM, Parkhill J, Bentley SD, Hanage WP, Lipsitch M (2013). Population genomics of post-vaccine changes in pneumococcal epidemiology. Nature Genetics 45:656-663. Croucher NJ, Harris SR, Fraser C, Quail MA, Burton J, van der Linden M, McGee L, von Gottberg A, Song JH, Ko KS, Pichon B, Baker S, Parry CM, Lambertsen LM, Shahinas D, Pillai DR, Mitchell TJ, Dougan G, Tomasz A, Klugman KP, Parkhill J, Hanage WP, Bentley SD (2011). Rapid pneumococcal evolution in response to clinical interventions. Science 331:430-434. Cui Y, Yu C, Yan Y, Li D, Li Y, Jombart T, Weinert LA, Wang Z, Guo Z, Xu L, Zhang Y, Zheng H, Qin N, Xiao X, Wu M, Wang X, Zhou D, Qi Z, Du Z, Wu H, Yang X, Cao H, Wang H, Wang J, Yao S, Rakin A, Li Y, Falush D, Balloux F, Achtman M, Song Y, Wang J, Yang R (2013). Historical 16 variations in mutation rate in an epidemic pathogen, Yersinia pestis. Proceedings of the National Academy of Sciences USA 110:577-582. Davidson RM, Hasan NA, de Moura VCN, Duarte RS, Jackson M, Strong M (2013). Phylogenomics of Brazilian epidemic isolates of Mycobacterium abscessus subsp. bolletii reveals relationships of global outbreak strains. Infection, Genetics and Evolution 20:292-297. de Been M, Pinholt M, Top J, Bletz S, Mellmann A, van Schaik W, Brouwer E, Rogers M, Kraat Y, Bonten M, Corander J, Westh H, Harmsen D, Willems RJL (2015). Core genome multilocus sequence typing scheme for high-resolution typing of Enterococcus faecium. J Clin Microbiol 53:3788-3797. De Jong BC, Antonio M, Gagneux S (2010). Mycobacterium africanum—Review of an important cause of human tuberculosis in West Africa. PLoS Negl Trop Dis;4:e744. Devault AM, Golding GB, Waglechner N, Enk JM, Kuch M, Tien JH, Shi M, Fisman DN, Dhody AN, Forrest S, Bos KI, Earn DJD, Holmes EC, Poinar HN (2014). Second-pandemic strain of Vibrio cholerae from the Philadelphia cholera outbreak of 1849. New England Journal of Medicine 370:334-340. Didelot X, Eyre D, Cule M, Ip C, Ansari A, Griffiths D, Vaughan A, O'Connor L, Golubchik T, Batty E, Piazza P, Wilson D, Bowden R, Donnelly P, Dingle K, Wilcox M, Walker S, Crook D, Peto T, Harding R (2012a). Microevolutionary analysis of Clostridium difficile genomes to investigate transmission. Genome Biology 13:R118. Didelot X, Maiden MCJ (2010). Impact of recombination on bacterial evolution. Trends in Microbiolog 18:315-322. Didelot X, Meric G, Falush D, Darling A (2012b). Impact of homologous and non-homologous recombination in the genomic evolution of Escherichia coli. BMC Genomics 13:256. Didelot X, Walker AS, Peto TE, Crook DW, Wilson DJ (2016). Within-host evolution of bacterial pathogens. Nat Rev Micro 14:150-162. Doolittle WF (1998). You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet 14:307-311. Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006). Relaxed phylogenetics and dating with confidence. PLoS Biology 4:e88. Drummond AJ, Pybus OG, Rambaut A, Forsberg R, Rodrigo AG (2003). Measurably evolving populations. Trends in Ecology & Evolution 18:481-488. Ekblom R, Wolf JBW (2014). A field guide to whole-genome sequencing, assembly and annotation. Evol Appl 7:1026-1042. Engelthaler DM, Chiller T, Schupp JA, Colvin J, Beckstrom-Sternbergg SM, Driebe EM, Moses T, Tembe W, Sinari S, Beckstrom-Sternbergg JS, Christoforides A, Pearson JV, Capten J, Keim P, Peterson A, Tersahita D, Arunmozhi B (2011). Next-generation sequencing of Coccidioides immitis isolated during cluster investigation. Emerging Infectious Diseases 17:227-232. Espedido BA, Steen JA, Ziochos H, Grimmond SM, Cooper MA, Gosbell IB, van Hal SJ, Jensen SO (2013). Whole Genome Sequence Analysis of the First Australian OXA-48-Producing Outbreak17 Associated Klebsiella pneumoniae Isolates: The Resistome and in Vivo evolution. PLoS ONE 8:e59920. Eyre DW, Cule ML, Griffiths D, Crook DW, Peto TEA, Walker AS, Wilson DJ (2013a). Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput Biol 9:e1003059. Eyre DW, Cule ML, Wilson DJ, Griffiths D, Vaughan A, O'Connor L, Ip CLC, Golubchik T, Batty EM, Finney JM, Wyllie DH, Didelot X, Piazza P, Bowden R, Dingle KE, Harding RM, Crook DW, Wilcox MH, Peto TEA, Walker AS (2013b). Diverse sources of C. difficile infection identified on whole-genome sequencing. New England Journal of Medicine 369:1195-1205. Eyre DW, Golubchik T, Gordon NC, Bowden R, Piazza P, Batty EM, Ip CL, Wilson DJ, Didelot X, O'Connor L (2012). A pilot study of rapid benchtop sequencing of Staphylococcus aureus and Clostridium difficile for outbreak detection and surveillance. BMJ Open 2:e001124. Farhat MR, Shapiro BJ, Kieser KJ, Sultana R, Jacobson KR, Victor TC, et al. (2013). Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nature Genetics 45:1183–1189. Feil EJ, Holmes EC, Bessen DE, Chan MS, Day NPJ, Enright MC, Goldstein R, Hood DW, Kalia A, Moore CE, Zhou J, Spratt BG (2001). Recombination within natural populations of pathogenic bacteria: Short-term empirical estimates and long-term phylogenetic consequences. Proceedings of the National Academy of Sciences USA 98:182-187. Feil EJ, Li BC, Aanensen DM, Hanage WP, Spratt BG (2004). eBURST: inferring patterns of evolutionary descent among clusters of related bacterial genotypes from Multilocus Sequence Typing data. J Bacteriol 186:1518-1530. Fields BS, Benson RF, Besser RE (2002). Legionella and Legionnaires' Disease: 25 Years of Investigation. Clinical Microbiology Reviews 15:506-526. Fittipaldi N, Tyrrell GJ, Low DE, Martin I, Lin D, Hari KL, Musser JM (2013). Integrated wholegenome sequencing and temporospatial analysis of a continuing Group A Streptococcus epidemic. Emerg Microbes Infect 2:e13. Fitzpatrick MA, Ozer EA, Hauser AR (2016). Utility of whole-genome sequencing in characterizing Acinetobacter epidemiology and analyzing hospital outbreaks. J Clin Microbiol 54:593-612. Fraser C, Donnelly CA, Cauchemez S, Hanage WP, Van Kerkhove MD, Hollingsworth TD, Griffin J, Baggaley RF, Jenkins HE, Lyons EJ (2009). Pandemic potential of a strain of influenza A (H1N1): early findings. Science 324:1557-1561. Fraser DW, Tsai TR, Orenstein W, Parkin WE, Beecham HJ, Sharrar RG, Harris J, Mallison GF, Martin SM, McDade JE, Shepard CC, Brachman PS (1977). Legionnaires' disease: description of an epidemic of pneumonia. N Engl J Med 297:1189-1197. Gaia V, Fry NK, Afshar B, Luck PC, Meugnier H, Etienne J, Peduzzi R, Harrison TJ (2005). Consensus sequence-based scheme for epidemiological typing of clinical and environmental isolates of Legionella pneumophila. J Clin Microbiol 43:2047-2052. 18 Gaia V, Fry NK, Harrison TJ, Peduzzi R (2003). Sequence-based typing of Legionella pneumophila serogroup 1 offers the potential for true portability in legionellosis outbreak investigation. J Clin Microbiol 41:2932-2939. Gardy JL, Johnston JC, Sui SJH, Cook VJ, Shah L, Brodkin E, Rempel S, Moore R, Zhao Y, Holt R, Varhol R, Birol I, Lem M, Sharma MK, Elwood K, Jones SJM, Brinkman FSL, Brunham RC, Tang P (2011). Whole-genome squencing and social-network analysis of a tuberculosis outbreak. New England Journal of Medicine 364:730-739. General Directorate of Epidemiology MoHM, Pan American Health Organization, World Health Organization, Public Health Agency of Canada, CDC (United States) (2009). Outbreak of swineorigin Influenza A (H1N1) virus infectionMexico, March-April 2009. Morbidity and Mortality Weekly Report 58:467-470. Ghai R, Martin-Cuadrado AB, Molto AG, Heredia IG, Cabrera R, Martin J, Verdu M, Deschamps P, Moreira D, Lopez-Garcia P, Mira A, Rodriguez-Valera F (2010). Metagenome of the Mediterranean deep chlorophyll maximum studied by direct and fosmid library 454 pyrosequencing. ISME J 4:1154-1166. Gilmour M, Graham M, Van Domselaar G, Tyler S, Kent H, Trout-Yakel K, Larios O, Allen V, Lee B, Nadon C (2010). High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak. BMC Genomics 11:120. Grad YH, Godfrey P, Cerquiera GC, Mariani-Kurkdjian P, Gouali M, Bingen E, Shea TP, Haas BJ, Griggs A, Young S, Zeng Q, Lipsitch M, Waldor MK, Weill FX, Wortman JR, Hanage WP (2013). Comparative genomics of recent shiga toxin-producing Escherichia coli O104:H4: short-term evolution of an emerging pathogen. mBio 4. Grad YH, Kirkcaldy RD, Trees D, Dordel J, Harris SR, Goldstein E, Weinstock H, Parkhill J, Hanage WP, Bentley S, Lipsitch M (2014). Genomic epidemiology of Neisseria gonorrhoeae with reduced susceptibility to cefixime in the USA: a retrospective observational study. The Lancet Infectious Diseases 14:220-226. Grad YH, Lipsitch M, Feldgarden M, Arachchi HM, Cerqueira GC, FitzGerald M, Godfrey P, Haas BJ, Murphy CI, Russ C (2012). Genomic epidemiology of the Escherichia coli O104: H4 outbreaks in Europe, 2011. Proceedings of the National Academy of Sciences USA 109:30653070. Guerra-Assunção JA, Crampin AC, Houben RMGJ, Mzembe T, Mallard K, Coll F, Khan P, Banda L, Chiwaya A, Pereira RPA, McNerney R, Fine PEM, Parkhill J, Clark TG, Glynn JR (2015). Largescale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area. eLife Sciences 4:e05166. Harris SR, Cartwright EJ, Török ME, Holden MT, Brown NM, Ogilvy-Stuart AL, Ellington MJ, Quail MA, Bentley SD, Parkhill J, Peacock SJ (2013). Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study. The Lancet Infectious Diseases 13:130-136. Harris SR, Clarke IN, Seth-Smith HMB, Solomon AW, Cutcliffe LT, Marsh P, Skilton RJ, Holland MJ, Mabey D, Peeling RW, Lewis DA, Spratt BG, Unemo M, Persson K, Bjartling C, Brunham R, de Vries HJC, Morre SA, Speksnijder A, Bebear CM, Clerc M, de Barbeyrac B, Parkhill J, Thomson NR (2012). Whole-genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing. Nat Genet 44:413-419. 19 Harris SR, Feil EJ, Holden MTG, Quail MA, Nickerson EK, Chantratita N, Gardete S, Tavares A, Day N, Lindsay JA, Edgeworth JD, de Lencastre H, Parkhill J, Peacock SJ, Bentley SD (2010). Evolution of MRSA during hospital transmission and intercontinental spread. Science 327:469474. Hasan NA, Choi SY, Eppinger M, Clark PW, Chen A, Alam M, Haley BJ, Taviani E, Hine E, Su Q, Tallon LJ, Prosper JB, Furth K, Hoq MM, Li H, Fraser-Liggett CM, Cravioto A, Huq A, Ravel J, Cebula TA, Colwell RR (2012). Genomic diversity of 2010 Haitian cholera outbreak strains. PNAS 109:E2010-E2017. He M, Miyajima F, Roberts P, Ellison L, Pickard DJ, Martin MJ, Connor TR, Harris SR, Fairley D, Bamford KB, D'Arc S, Brazier J, Brown D, Coia JE, Douce G, Gerding D, Kim HJ, Koh TH, Kato H, Senoh M, Louie T, Michell S, Butt E, Peacock SJ, Brown NM, Riley T, Songer G, Wilcox M, Pirmohamed M, Kuijper E, Hawkey P, Wren BW, Dougan G, Parkhill J, Lawley TD (2013). Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat Genet 45:109-113. Hendriksen RS, Price LB, Schupp JM, Gillece JD, Kaas RS, Engelthaler DM, Bortolaia V, Pearson T, Waters AE, Upadhyay BP (2011). Population genetics of Vibrio cholerae from Nepal in 2010: evidence on the origin of the Haitian outbreak. mBio 2:e00157-11. Holden MTG, Hauser H, Sanders M, Ngo TH, Cherevach I, Cronin A, Goodhead I, Mungall K, Quail MA, Price C, Rabbinowitsch E, Sharp S, Croucher NJ, Chieu TB, Thi Hoang Mai N, Diep TS, Chinh NT, Kehoe M, Leigh JA, Ward PN, Dowson CG, Whatmore AM, Chanter N, Iversen P, Gottschalk M, Slater JD, Smith HE, Spratt BG, Xu J, Ye C, Bentley S, Barrell BG, Schultsz C, Maskell DJ, Parkhill J (2009). Rapid evolution of virulence and drug resistance in the emerging zoonotic pathogen Streptococcus suis. PLoS ONE 4:e6072. Holden MTG, Hsu LY, Kurt K, Weinert LA, Mather AE, Harris SR, Strommenger B, Layer F, Witte W, de Lencastre H, Skov R, Westh H, Zemlickova H, Coombs G, Kearns AM, Hill RLR, Edgeworth J, Gould I, Gant V, Cooke J, Edwards GF, McAdam PR, Templeton KE, McCann A, Zhou Z, Castillo-Ramirez S, Feil EJ, Hudson LO, Enright MC, Balloux F, Aanensen DM, Spratt BG, Fitzgerald JR, Parkhill J, Achtman M, Bentley SD, Nübel U (2013). A genomic portrait of the emergence, evolution and global spread of a methicillin resistant Staphylococcus aureus pandemic. Genome Research 23:653-664. Holmes A, Allison L, Ward M, Dallman TJ, Clark R, Fawkes A, Murphy L, Hanson M (2015). Utility of whole-genome sequencing of Escherichia coli O157 for outbreak detection and epidemiological surveillance. J Clin Microbiol 53:3565-3573. Holt KE, Baker S, Weill FX, Holmes EC, Kitchen A, Yu J, Sangal V, Brown DJ, Coia JE, Kim DW, Choi SY, Kim SH, da Silveira WD, Pickard DJ, Farrar JJ, Parkhill J, Dougan G, Thomson NR (2012). Shigella sonnei genome sequencing and phylogenetic analysis indicate recent global dissemination from Europe. Nat Genet 44:1056-1059. Holt KE, Parkhill J, Mazzoni CJ, Roumagnac P, Weill FX, Goodhead I, Rance R, Baker S, Maskell DJ, Wain J, Dolecek C, Achtman M, Dougan G (2008). High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat Genet 40:987-993. Holt KE, Thieu Nga TV, Thanh DP, Vinh H, Kim DW, Vu Tra MP, Campbell JI, Hoang NVM, Vinh NT, Minh PV, Thuy CT, Nga TTT, Thompson C, Dung TTN, Nhu NTK, Vinh PV, Tuyet PTN, Phuc HL, Lien NTN, Phu BD, Ai NTT, Tien NM, Dong N, Parry CM, Hien TT, Farrar JJ, Parkhill J, Dougan G, Thomson NR, Baker S (2013). Tracking the establishment of local endemic populations of an 20 emergent enteric pathogen. Proceedings of the National Academy of Sciences USA 110:1752217527. Hornsey M, Loman N, Wareham DW, Ellington MJ, Pallen MJ, Turton JF, Underwood A, Gaulton T, Thomas CP, Doumith M (2011). Whole-genome comparison of two Acinetobacter baumannii isolates from a single patient, where resistance developed during tigecycline therapy. J Antimicrob Chemother 66:1499-1503. Ioerger TR, Feng Y, Chen X, Dobos KM, Victor TC, Streicher EM, Warren RM, van Pittius NCG, Helden PD, Sacchettini JC (2010). The non-clonality of drug resistance in Beijing-genotype isolates of Mycobacterium tuberculosis from the Western Cape of South Africa. BMC Genomics 11:1. Jolley KA, Hill DMC, Bratcher HB, Harrison OB, Feavers IM, Parkhill J, Maiden MCJ (2012). Resolution of a Meningococcal Disease Outbreak from Whole-Genome Sequence Data with Rapid Web-Based Analysis Methods. J Clin Microbiol 50:3046-3053. Ju W, Cao G, Rump L, Strain E, Luo Y, Timme R, Allard M, Zhao S, Brown E, Meng J (2012). Phylogenetic analysis of non-O157 Shiga toxin-producing Escherichia coli by whole genome sequencing. J Clin Microbiol. Kanamori H, Parobek CM, Weber DJ, van Duin D, Rutala WA, Cairns BA, Juliano JJ (2016). Nextgeneration sequencing and comparative analysis of sequential outbreaks caused by multidrugresistant Acinetobacter baumannii at a large academic burn center. Antimicrob Agents Chemother 60:1249-1257. Kato-Maeda M, Ho C, Passarelli B, Banaei N, Grinsdale J, Flores L, Anderson J, Murray M, Rose G, Kawamura LM, Pourmand N, Tariq MA, Gagneux S, Hopewell PC (2013). Use of whole genome sequencing to determine the microevolution of Mycobacterium tuberculosis during an outbreak. PLoS ONE 8:e58235. Kennemann L, Didelot X, Aebischer T, Kuhn S, Drescher B, Droege M, Reinhardt R, Correa P, Meyer TF, Josenhans C, Falush D, Suerbaum S (2011). Helicobacter pylori genome evolution during human infection. Proceedings of the National Academy of Sciences USA 108:5033-5038. Kinnevey PM, Shore AC, Mac Aogáin M, Creamer E, Brennan GI, Humphreys H, Rogers TR, O'Connell B, Coleman DC (2016). Enhanced Tracking of Nosocomial Transmission of Endemic Sequence Type 22 Methicillin-Resistant Staphylococcus aureus Type IV Isolates among Patients and Environmental Sites by Use of Whole-Genome Sequencing. J Clin Microbiol 54:445-448. Knetsch CW, Connor TR, Mutreja A, van Dorp SM, Sanders IM, Browne HP, Harris D, Lipman L, Keessen EC, Corver J (2014). Whole genome sequencing reveals potential spread of Clostridium difficile between humans and farm animals in the Netherlands, 2002 to 2011. EuroSurveillance 19:30-41. Köser CU, Holden MTG, Ellington MJ, Cartwright EJP, Brown NM, Ogilvy-Stuart AL, Hsu LY, Chewapreecha C, Croucher NJ, Harris SR (2012). Rapid whole-genome sequencing for investigation of a neonatal MRSA outbreak. New England Journal of Medicine 366:2267-2275. Köser CU, Bryant JM, Becq J, Török ME, Ellington MJ, Marti-Renom MA, Carmichael AJ, Parkhill J, Smith GP, Peacock SJ (2013). Whole-genome sequencing for rapid susceptibility testing of M. tuberculosis. New England Journal of Medicine 369:290-292. 21 Kwong JC, Mercoulia K, Tomita T, Easton M, Li HY, Bulach DM, Stinear TP, Seemann T, Howden BP (2016). Prospective whole-genome sequencing enhances national surveillance of Listeria monocytogenes. J Clin Microbiol 54:333-342. Lee RS, Radomski N, Proulx JF, Manry J, McIntosh F, Desjardins F, Soualhine H, Domenech P, Reed MB, Menzies D, Behr MA (2015). Reemergence and amplification of tuberculosis in the Canadian Arctic. Journal of Infectious Diseases 211:1905-1914. Lévesque S, Plante PL, Mendis N, Cantin P, Marchand G, Charest H, Raymond F, Huot C, GoupilSormany I, Desbiens F (2014). Genomic characterization of a large outbreak of Legionella pneumophila serogroup 1 strains in Quebec City, 2012. PLoS ONE 9:e103852. Lewis T, Loman NJ, Bingle L, Jumaa P, Weinstock GM, Mortiboy D, Pallen MJ (2010). Highthroughput whole-genome sequencing to dissect the epidemiology of Acinetobacter baumannii isolates from a hospital outbreak. Journal of Hospital Infection 75:37-41. Lienau EK, Strain E, Wang C, Zheng J, Ottesen AR, Keys CE, Hammack TS, Musser SM, Brown EW, Allard MW, Cao G, Meng J, Stones R (2011). Identification of a salmonellosis outbreak by means of molecular sequencing. New England Journal of Medicine 364:981-982. Liu Q, Via LE, Luo T, Liang L, Liu X, Wu S, Shen Q, Wei W, Ruan X, Yuan X, Zhang G, Barry CE, Gao Q (2015). Within patient microevolution of Mycobacterium tuberculosis correlates with heterogeneous responses to treatment. Scientific reports 5:17507. Loman NJ, Constantinidou C (2013). A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of shiga-toxigenic Escherichia coli O104:H4. JAMA 309:1502-1510. Loman NJ, Gladstone RA, Constantinidou C, Tocheva AS, Jefferies JM, Faust SN, OÇÖConnor L, Chan J, Pallen MJ, Clarke SC (2013). Clonal expansion within pneumococcal serotype 6C after use of seven-valent vaccine. PLoS ONE 8:e64731. Loman NJ, Pallen MJ (2015). Twenty years of bacterial genome sequencing. Nat Rev Micro advance online publication. Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant DA, Feavers IM, Achtman M, Spratt BG (1998). Multilocus sequence typing: A portable approach to the identification of clones within populations of pathogenic microorganisms. Proceedings of the National Academy of Sciences USA 95:3140-3145. Maixner F, Krause-Kyora B, Turaev D, Herbig A, Hoopmann MR, Hallows JL, Kusebauch U, Vigl EE, Malfertheiner P, Megraud F, OÇÖSullivan N, Cipollini G, Coia V, Samadelli M, Engstrand L, Linz B, Moritz RL, Grimm R, Krause J, Nebel A, Moodley Y, Rattei T, Zink A (2016). The 5300year-old Helicobacter pylori genome of the Iceman. Science 351:162-165. Mathers AJ, Peirano G, Pitout JDD (2015). The Role of Epidemic Resistance Plasmids and International High-Risk Clones in the Spread of Multidrug-Resistant Enterobacteriaceae. Clinical Microbiology Reviews 28:565-591. McAdam P, vander broek C, Lindsay D, Ward M, Hanson M, Gillies M, Watson M, Stevens J, Edwards G, Fitzgerald R (2014). Gene flow in environmental Legionella pneumophila leads to genetic and pathogenic heterogeneity within a Legionnaires' disease outbreak. Genome Biology 15:504. 22 McAdam PR, Templeton KE, Edwards GF, Holden MTG, Feil EJ, Aanensen DM, Bargawi HJA, Spratt BG, Bentley SD, Parkhill J, Enright MC, Holmes A, Girvan EK, Godfrey PA, Feldgarden M, Kearns AM, Rambaut A, Robinson DA, Fitzgerald JR (2012). Molecular tracing of the emergence, adaptation, and transmission of hospital-associated methicillin-resistant Staphylococcus aureus. Proceedings of the National Academy of Sciences USA 109:9107-9112. McDonnell J, DALLMAN T, Atkin S, Turbitt DA, Connor TR, Grant KA, Thomson NR, Jenkins C (2013). Retrospective analysis of whole genome sequencing compared to prospective typing data in further informing the epidemiological investigation of an outbreak of Shigella sonnei in the UK. Epidemiology & Infection 141:2568-2575. Medini D, Serruto D, Parkhill J, Relman DA, Donati C, Moxon R, Falkow S, Rappuoli R (2008). Microbiology in the post-genomic era. Nat Rev Micro 6:419-430. Mellmann A, Harmsen D, Cummings CA, Zentz EB, Leopold SR, Rico A, Prior K, Szczepanowski R, Ji Y, Zhang W (2011). Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104: H4 outbreak by rapid next generation sequencing technology. PLoS ONE 6:e22751. Mendum T, Schuenemann V, Roffey S, Taylor G, Wu H, Singh P, Tucker K, Hinds J, Cole S, Kierzek A, Nieselt K, Krause J, Stewart G (2014). Mycobacterium leprae genomes from a British medieval leprosy hospital: towards understanding an ancient epidemic. BMC Genomics 15:270. Metzker ML (2010). Sequencing technologies - the next generation. Nat Rev Genet 11:31-46. Mortimer PP (2003). Five postulates for resolving outbreaks of infectious disease. Journal of Medical Microbiology 52:447-451. Mutreja A, Kim DW, Thomson NR, Connor TR, Lee JH, Kariuki S, Croucher NJ, Choi SY, Harris SR, Lebens M, Niyogi SK, Kim EJ, Ramamurthy T, Chun J, Wood JLN, Clemens JD, Czerkinsky C, Nair GB, Holmgren J, Parkhill J, Dougan G (2011). Evidence for several waves of global transmission in the seventh cholera pandemic. Nature 477:462-465. Mwangi MM, Wu SW, Zhou Y, Sieradzki K, de Lencastre H, Richardson P, Bruce D, Rubin E, Myers E, Siggia ED, Tomasz A (2007). Tracking the in vivo evolution of multidrug resistance in Staphylococcus aureus by whole-genome sequencing. Proceedings of the National Academy of Sciences USA 104:9451-9456. Niemann S, K+Âser CU, Gagneux S, Plinke C, Homolka S, Bignell H, Carter RJ, Cheetham RK, Cox A, Gormley NA, Kokko-Gonzales P, Murray LJ, Rigatti R, Smith VP, Arends FPM, Cox HS, Smith G, Archer JAC (2009). Genomic diversity among drug sensitive and multidrug resistant isolates of Mycobacterium tuberculosis with identical DNA fingerprints. PLoS ONE 4:e7407. Nübel U, Nachtnebel M, Falkenhorst G, Benzler J, Hecht J, Kube M, Bröcker F, Moelling K, Bührer C, Gastmeier P, Piening B, Behnke M, Dehnert M, Layer F, Witte W, Eckmanns T (2013). MRSA transmission on a neonatal intensive care unit: Epidemiological and genome-based phylogenetic analyses. PLoS ONE 8:e54898. Okoro CK, Kingsley RA, Connor TR, Harris SR, Parry CM, Al-Mashhadani MN, Kariuki S, Msefula CL, Gordon MA, de Pinna E (2012). Intracontinental spread of human invasive Salmonella Typhimurium pathovariants in sub-Saharan Africa. Nature Genetics 44:1215-1223. 23 Onori R, Gaiarsa S, Comandatore F, Pongolini S, Brisse S, Colombo A, Cassani G, Marone P, Grossi P, Minoja G, Bandi C, Sassera D, Toniolo A (2015). Tracking nosocomial Klebsiella pneumoniae infections and outbreaks by whole-genome analysis: Small-scale Italian scenario within a single hospital. J Clin Microbiol 53:2861-2868. Parwati I, van Crevel R, van Soolingen D. (2010). Possible underlying mechanisms for successful emergence of the Mycobacterium tuberculosis Beijing genotype strains. Lancet Infect Dis 10:103–111. Paterson GK, Harrison EM, Murray GGR, Welch JJ, Warland JH, Holden MTG, Morgan FJE, Ba X, Koop G, Harris SR, Maskell DJ, Peacock SJ, Herrtage ME, Parkhill J, Holmes MA (2015). Capturing the cloud of diversity reveals complexity and heterogeneity of MRSA carriage, infection and transmission. Nat Commun 6:6560. Pérez-Lago L, Martínez Lirola M, Herranz M, Comas I, Bouza E, García-de-Viedma D (2015). Fast and low-cost decentralized surveillance of transmission of tuberculosis based on strain-specific PCRs tailored from whole genome sequencing data: a pilot study. Clinical Microbiology and Infection 21:249. Pérez-Lago L, Comas I, Navarro Y, González-Candelas F, Herranz M, Bouza E, García de Viedma D (2014). Whole genome sequencing analysis of intrapatient microevolution in Mycobacterium tuberculosis: Potential impact on the inference of tuberculosis transmission. Journal of Infectious Diseases 209:98-108. Pinholt M, Larner-Svensson H, Littauer P, Moser CE, Pedersen M, Lemming LE, Ejlertsen T, Söndergaard TS, Holzknecht BJ, Justesen US, Dzajic E, Olsen SS, Nielsen JB, Worning P, Hammerum AM, Westh H, Jakobsen L (2015). Multiple hospital outbreaks of vanA Enterococcus faecium in Denmark, 201213, investigated by WGS, MLST and PFGE. J Antimicrob Chemother 70:2474-2482. Price JR, Golubchik T, Cole K, Wilson DJ, Crook DW, Thwaites GE, Bowden R, Sarah Walker A, Peto TEA, Paul J, Llewelyn MJ (2014). Whole-genome sequencing shows that patient-to-patient transmission rarely accounts for acquisition of Staphylococcus aureus on an intensive care unit. Clin Infect Dis 58:609-618. Qin T, Zhang W, Liu W, Zhou H, Ren H, Shao Z, Lan R, Xu J (2016). Population structure and minimum core genome typing of Legionella pneumophila. Scientific reports 6:21356. Quick J, Ashton P, Calus S, Chatt C, Gossain S, Hawker J, Nair S, Neal K, Nye K, Peters T, de Pinna E, Robinson E, Struthers K, Webber M, Catto A, Dallman T, Hawkey P, Loman N (2015). Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella. Genome Biology 16:114. Quick J, Loman NJ, Duraffour S, Simpson JT, Severi E, Cowley L, Bore JA, Koundouno R, Dudas G, Mikhail A, Ou+®draogo N, Afrough B, Bah A, Baum JHJ, Becker-Ziaja B, Boettcher JP, CabezaCabrerizo M, Camino-S+ínchez +, Carter LL, Doerrbecker J, Enkirch T, Dorival IGa, Hetzelt N, Hinzmann J, Holm T, Kafetzopoulou LE, Koropogui M, Kosgey A, Kuisma E, Logue CH, Mazzarelli A, Meisel S, Mertens M, Michel J, Ngabo D, Nitzsche K, Pallasch E, Patrono LV, Portmann J, Repits JG, Rickett NY, Sachse A, Singethan K, Vitoriano Is, Yemanaberhan RL, Zekeng EG, Racine T, Bello A, Sall AA, Faye O, Faye O, Magassouba N, Williams CV, Amburgey V, Winona L, Davis E, Gerlach J, Washington F, Monteil V, Jourdain M, Bererd M, Camara A, Somlare H, Camara A, Gerard M, Bado G, Baillet B, Delaune D+, Nebie KY, Diarra A, Savane Y, Pallawo RB, Gutierrez GJ, Milhano N, Roger I, Williams CJ, Yattara F, Lewandowski K, Taylor J, Rachwal P, Turner J, 24 Pollakis G, Hiscox JA, Matthews DA, Shea MKO, Johnston AM, Wilson D, Hutley E, Smit E, Di Caro A, W+Âlfel R, Stoecker K, Fleischmann E, Gabriel M, Weller SA, Koivogui L, Diallo B, Ke+»ta S, Rambaut A, Formenty P, G++nther S, Carroll MW (2016). Real-time, portable genome sequencing for Ebola surveillance. Nature advance online publication. Rasmussen S, Allentoft ME, Nielsen K, Orlando L, Sikora M, Sjögren KG, Pedersen AG, Schubert M, Van Dam A, Kapel CMO, Nielsen HB, Brunak S, Avetisyan P, Epimakhov A, Khalyapin MV, Gnuni A, Kriiska A, Lasak I, Metspalu M, Moiseyev V, Gromov A, Pokutta D, Saag L, Varul L, Yepiskoposyan L, Sicheritz-Pontén T, Foley RA, Lahr MM, Nielsen R, Kristiansen K, Willerslev E (2015). Early divergent strains of Yersinia pestis in Eurasia 5,000 years ago. Cell 163:571-582. Read TD, Salzberg SL, Pop M, Shumway M, Umayam L, Jiang L, Holtzapple E, Busch JD, Smith KL, Schupp JM, Solomon D, Keim P, Fraser CM (2002). Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis. Science 296:2028-2033. Reuter S, Ellington MJ, Cartwright EP (2013a). Rapid bacterial whole-genome sequencing to enhance diagnostic and public health microbiology. JAMA Internal Medicine 173:1397-1404. Reuter S, HARRISON TG, Köser CU, Ellington MJ, Smith GP, Parkhill J, Peacock SJ, Bentley SD, Török ME (2013b). A pilot study of rapid whole-genome sequencing for the investigation of a Legionella outbreak. BMJ Open 3. Reuter S, Török ME, Holden MTG, Reynolds R, Raven KE, Blane B, Donker T, Bentley SD, Aanensen DM, Grundmann H, Feil EJ, Spratt BG, Parkhill J, Peacock SJ (2016). Building a genomic framework for prospective MRSA surveillance in the United Kingdom and the Republic of Ireland. Genome Research 26:263-270. Rinder H, Mieskes KT, Löscher T (2001). Heteroresistance in Mycobacterium tuberculosis. The International Journal of Tuberculosis and Lung Disease 5:339-345. Roetzer A, Diel R, Kohl TA, Rückert C, Nübel U, Blom J, Wirth T, Jaenicke S, Schuback S, RüschGerdes S, Supply P, Kalinowski J, Niemann S (2013). Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: A longitudinal molecular epidemiological study. PLoS Med 10:e1001387. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW (2002). Genetic structure of human populations. Science 298:2381-2385. Sánchez-Busó L, Comas I, Jorques G, González-Candelas F (2014). Recombination drives genome evolution in outbreak-related Legionella pneumophila isolates. Nature Genetics 46:1205-1211. Sánchez-Busó L, Guiral S, Crespi S, Moya V, Camaró ML, Olmos P, Adrián F, Morera V, González Morán F, Vanaclocha H, González-Candelas F (2016). Genomic investigation of a legionellosis outbreak in a persistently colonized hotel. Frontiers in Microbiology 6:1556. Sandegren L, Groenheit R, Koivula T, Ghebremichael S, Advani A, Castro E, Pennhag A, Hoffner S, Mazurek J, Pawlowski A, Kan B, Bruchfeld J, Melefors +, K+ñllenius G (2011). Genomic Stability over 9 Years of an Isoniazid Resistant Mycobacterium tuberculosis Outbreak Strain in Sweden. PLoS ONE 6:e16647. Schlüter A, Bekel T, Diaz NN, Dondrup M, Eichenlaub R, Gartemann KH, Krahn I, Krause L, Krömeke H, Kruse O, Mussgnug JH, Neuweger H, Niehaus K, Pühler A, Runte KJ, Szczepanowski 25 R, Tauch A, Tilker A, Viehöver P, Goesmann A (2008a). The metagenome of a biogas-producing microbial community of a production-scale biogas plant fermenter analysed by the 454pyrosequencing technology. Journal of Biotechnology 136:77-90. Schlüter A, Krause L, Szczepanowski R, Goesmann A, Pühler A (2008b). Genetic diversity and composition of a plasmid metagenome from a wastewater treatment plant. Journal of Biotechnology 136:65-76. Schmid D, Allerberger F, Huhulescu S, Pietzka A, Amar C, Kleta S, Prager R, Preussel K, Aichinger E, Mellmann A (2014). Whole genome sequencing as a tool to investigate a cluster of seven cases of listeriosis in Austria and Germany, 2011  2013. Clinical Microbiology and Infection 20:431-436. Schuenemann VJ, Singh P, Mendum TA, Krause-Kyora B, Jäger G, Bos KI, Herbig A, Economou C, Benjak A, Busso P, Nebel A, Boldsen JL, Kjellström A, Wu H, Stewart GR, Taylor GM, Bauer P, Lee OYC, Wu HHT, Minnikin DE, Besra GS, Tucker K, Roffey S, Sow SO, Cole ST, Nieselt K, Krause J (2013). Genome-wide comparison of medieval and modern Mycobacterium leprae. Science 341:179-183. Schürch AC, Kremer K, Kiers A, Daviena O, Boeree MJ, Siezen RJ, Smith NH, van Soolingen D (2010). The tempo and mode of molecular evolution of Mycobacterium tuberculosis at patientto-patient scale. Infection, Genetics and Evolution 10:108-114. Senn L, Clerc O, Zanetti G, Basset P, ProdÇÖhom G, Gordon NC, Sheppard AE, Crook DW, James R, Thorpe HA, Feil EJ, Blanc DS (2016). The Stealthy Superbug: the Role of Asymptomatic Enteric Carriage in Maintaining a Long-Term Hospital Outbreak of ST228 Methicillin-Resistant Staphylococcus aureus. mBio 7. Seth-Smith HMB, Harris SR, Skilton RJ, Radebe FM, Golparian D, Shipitsyna E, Duy PT, Scott P, Cutcliffe LT, O'Neill C, Parmar S, Pitt R, Baker S, Ison CA, Marsh P, Jalal H, Lewis DA, Unemo M, Clarke IN, Parkhill J, Thomson NR (2013). Whole-genome sequences of Chlamydia trachomatis directly from clinical samples without culture. Genome Research 23:855-866. Shah MA, Mutreja A, Thimson N, Baker S, Parkhill J, Dougan G, Bokhari H, Wren BW (2014). Genomic epidemiology of Vibrio cholerae O1 associated with floods, Pakistan, 2010. Emerging Infectious Diseases 20:13-20. Smit PW, Vasankari T, Aaltonen H, Haanperä M, Casali N, Marttila H, Marttila J, Ojanen P, Ruohola A, Ruutu P, Drobniewski F, Lyytikäinen O, Soini H (2015). Enhanced tuberculosis outbreak investigation using whole genome sequencing and IGRA. Eur Respir J 45:276-279. Snitkin ES, Zelazny AM, Thomas PJ, Stock F, NISC Comparative Sequencing Program, Henderson DK, Palmore TN, Segre JA (2012). Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Science Translational Medicine 4:148ra116. Snyder LA, Loman NJ, Faraj LA, Levi K, Weinstock G, Boswell TC, Pallen MJ, Ala'Aldeen A (2013). Epidemiological investigation of Pseudomonas aeruginosa isolates from a six-year-long hospital outbreak using high-throughput whole genome sequencing. EuroSurveillance 18:20611. Stucki D, Ballif M, Bodmer T, Coscolla M, Maurer AM, Droz S, Butz C, Borrell S, Längle C, Feldmann J, Furrer H, Mordasini C, Helbling P, Rieder HL, Egger M, Gagneux S, Fenner L (2015). Tracking a tuberculosis outbreak over 21 years: strain-specific single-nucleotide polymorphism 26 typing combined with targeted whole-genome sequencing. Journal of Infectious Diseases 211:1306-1316. Sun G, Luo T, Yang C, Dong X, Li J, Zhu Y, Zheng H, Tian W, Wang S, Barry CE, Mei J, Gao Q (2012). Dynamic population changes in Mycobacterium tuberculosis during acquisition and fixation of drug resistance in patients. Journal of Infectious Diseases 206:1724-1733. Taylor AJ, Lappi V, Wolfgang WJ, Lapierre P, Palumbo MJ, Medus C, Boxrud D (2015). Characterization of foodborne outbreaks of Salmonella enterica serovar enteritidis with wholegenome sequencing single nucleotide polymorphism-based analysis for surveillance and outbreak detection. J Clin Microbiol 53:3334-3340. Török ME, Peacock SJ (2012). Rapid whole-genome sequencing of bacterial pathogens in the clinical microbiology laboratorypipe dream or reality? J Antimicrob Chemother 67:2307-2308. Török ME, Reuter S, Bryant J, Köser CU, Stinchcombe SV, Nazareth B, Ellington MJ, Bentley SD, Smith GP, Parkhill J, Peacock SJ (2013). Rapid whole-genome sequencing for investigation of a suspected tuberculosis outbreak. J Clin Microbiol 51:611-614. Underwood A, Jones G, Mentasti M, Fry N, Harrison T (2013a). Comparison of the Legionella pneumophila population structure as determined by sequence-based typing and whole genome sequencing. BMC Microbiology 13:302. Underwood AP, Dallman T, Thomson NR, Williams M, Harker K, Perry N, Adak B, Willshaw G, Cheasty T, Green J (2013b). Public health value of next-generation DNA sequencing of enterohemorrhagic Escherichia coli isolates from an outbreak. J Clin Microbiol 51:232-237. Urwin R, Maiden MCJ (2003). Multi-locus sequence typing: a tool for global epidemiology. Trends in Microbiology 11:479-487. Vos M, Didelot X (2009). A comparison of homologous recombination rates in bacteria and archaea. ISME J 3:199-208. Wagner DM, Klunk J, Harbeck M, Devault A, Waglechner N, Sahl JW, Enk J, Birdsell DN, Kuch M, Lumibao C, Poinar D, Pearson T, Fourment M, Golding B, Riehm JM, Earn DJD, DeWitte S, Rouillard JM, Grupe G, Wiechmann I, Bliska JB, Keim PS, Scholz HC, Holmes EC, Poinar H (2014). Yersinia pestis and the Plague of Justinian 541543 AD: a genomic analysis. The Lancet Infectious Diseases 14:319-326. Walker MJ, Beatson SA (2012). Outsmarting Outbreaks. Science 338:1161-1162. Walker TM, Monk P, Grace Smith E, Peto TEA (2013a). Contact investigations for outbreaks of Mycobacterium tuberculosis: advances through whole genome sequencing. Clinical Microbiology and Infection 19:796-802. Walker TM, Ip CL, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, Eyre DW, Wilson DJ, Hawkey PM, Crook DW, Parkhill J, Harris D, Walker AS, Bowden R, Monk P, Smith EG, Peto TE (2013b). Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. The Lancet Infectious Diseases 13:137-146. Walker TM, Lalor MK, Broda A, Ortega LS, Morgan M, Parker L, Churchill S, Bennett K, Golubchik T, Giess AP, Del Ojo Elias C, Jeffery KJ, Bowler ICJW, Laurenson IF, Barrett A, Drobniewski F, McCarthy ND, Anderson LF, Abubakar I, Thomas HL, Monk P, Smith EG, Walker 27 AS, Crook DW, Peto TEA, Conlon CP (2014). Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007-12, with whole pathogen genome sequences: an observational study. Lancet Respir Med 2:285-292. Ward MJ, Gibbons CL, McAdam PR, van Bunnik BAD, Girvan EK, Edwards GF, Fitzgerald JR, Woolhouse MEJ (2014). Time-scaled evolutionary analysis of the transmission and antibiotic resistance dynamics of Staphylococcus aureus clonal complex 398. Appl Environ Microbiol 80:7275-7282. WHO (2014). Global tubreculosis report, 2014. Witney AA, Gould KA, Pope CF, Bolt F, Stoker NG, Cubbon MD, Bradley CR, Fraise A, Breathnach AS, Butcher PD, Planche TD, Hinds J (2014). Genome sequencing and characterization of an XDR ST111 serotype O12 hospital outbreak strain of Pseudomonas aeruginosa. Clinical Microbiology and Infectionn/a. Worby CJ, Lipsitch M, Hanage WP (2014). Within-host bacterial diversity hinders accurate reconstruction of transmission networks from genomic distance data. PLoS Comput Biol 10:e1003549. Yates TA, Khan PY, Knight GM, Taylor JG, McHugh TD, Lipman M, White RG, Cohen T, Cobelens FG, Wood R, Moore DAJ, Abubakar I (2016). The transmission of Mycobacterium tuberculosis in high burden settings. The Lancet Infectious Diseases 16:227-238. Young BC, Golubchik T, Batty EM, Fung R, Larner-Svensson H, Votintseva AA, Miller RR, Godwin H, Knox K, Everitt RG (2012). Evolutionary dynamics of Staphylococcus aureus during progression from carriage to disease. Proceedings of the National Academy of Sciences USA 109:4550-4555. Zakour NLB, Venturini C, Beatson SA, Walker MJ (2012). Analysis of a Streptococcus pyogenes puerperal sepsis cluster by use of whole-genome sequencing. J Clin Microbiol 50:2224-2228. Zhou Y, Gao H, Mihindukulasuriya K, Rosa P, Wylie K, Vishnivetskaya T, Podar M, Warner B, Tarr P, Nelson D, Fortenberry JD, Holland M, Burr S, Shannon W, Sodergren E, Weinstock G (2013). Biogeography of the ecosystems of the healthy human body. Genome Biology 14:R1. 28 Table 1. A summary of published works analyzing complete genome sequences of bacterial pathogens for the study of outbreaks and transmission chains. 29 Pathogen Acinetobacter baumannii Genome size (Mb) 4.11 Bacillus anthracis Campylobacter jejuni Chlamydia trachomatis 5.2 1.64 ≈1.0 Clostridium difficile 4.0 Enterobacter cloacae Enterococcus faecium 5.31 2.9 Escherichia coli ≈5.2 Helicobacter pylori Klebsiella pneumoniae 1.5-1.7 5.6 Sequencing strategy Illumina HiSeq 2000 2x100 bp Illumina MiSeq 2x150 bp, 2x250 bp Roche 454 GS FLX PacBio Sanger Illumina HiSeq 2000 76 bp Illumina GA Illumina GAII PE 2x37 bp Illumina GAII/GAIIx 2x51/100108bp Illumina HiSeq2000 2x100bp; 2x54/108/76bp Illumina MiSeq Illumina MiSeq References (Lewis et al, 2010;Hornsey et al, 2011;Kanamori et al, 2016) (Fitzpatrick et al, 2016) Roche 454 GS Junior Roche 454 Titanium Illumina MiSeq Illumina Solexa Illumina HiSeq2000 2x101 Ilumina GAIIx Ion Torrent PGM Roche 454 Illumina Hi Seq 2000 Illumina MiSeq platform Roche 454 Titanium XLR (Mellmann et al, 2011;Brzuszkiewicz et al, 2011;Ahmed et al, 2012;Ju et al, 2012;Grad et al, 2012;Grad et al, 2013;Underwood et al, 2013b;Shah et al, 2014;Holmes et al, 2015) (Read et al, 2002) (Cody et al, 2013) (Harris et al, 2012;Seth-Smith et al, 2013) (Didelot et al, 2012a;Eyre et al, 2012;Eyre et al, 2013a;Eyre et al, 2013b;He et al, 2013;Knetsch et al, 2014) (Reuter et al, 2013a) (Reuter et al, 2013a;Pinholt et al, 2015) (Kennemann et al, 2011) (Snitkin et al, 2012;Espedido et al, 2013;Onori et al, 2015) 30 Legionella pneumophila Listeria monocytogenes Mycobacterium abscessus M. abscessus subsp. bolletii Mycobacterium canettii 3.5 3 5-5.2 ≈5 ≈ 4.5 4.4 Mycobacterium tuberculosis Illumina HiSeq 2x100 bp Illumina MiSeq 2x250 bp, 2x150bp SOLiD 5500XL SE 75bp Roche 454 GS-FLX Illumina HiSeq 2x75 bp Life Technologies SOLiD HiSeq2000 MiSeq Illumina Illumina GAII PE 2x36bp; 2x50 Illumina GAIIx PE 2x76; 2x108 Illumina HiSeq PE 2x75 bp Illumina MiSeq 150 bp Roche 454 GS FLX 36bp (Reuter et al, 2013a;Reuter et al, 2013b;Sánchez-Busó et al, 2014;Bartley et al, 2016) (Gilmour et al, 2010;Schmid et al, 2014;Kwong et al, 2016) (Bryant et al, 2013b) (Davidson et al, 2013) (Blouin et al, 2014) (Ioerger et al, 2010;Schürch et al, 2010;Gardy et al, 2011;Sandegren et al, 2011;Casali et al, 2012;Kato-Maeda et al, 2013;Bryant et al, 2013a;Roetzer et al, 2013;Köser et al, 2013;Török et al, 2013;Walker et al, 2013a;Walker et al, 2013b;Pérez-Lago et al, 2014;Coscollá et al, 2015) (Grad et al, 2014) Neisseria gonorrhoeae 2.1 Illumina HiSeq Neisseria meningitidis 2.2 Illumina GAIIx PE 2x76 Illumina MiSeq (Jolley et al, 2012;Reuter et al, 2013a;Bennett et al, 2012) Pseudomonas aeruginosa Salmonella enterica 6.26 4.76 (Witney et al, 2014;Snyder et al, 2013) (Holt et al, 2008;Lienau et al, 2011;Quick et al, 2015;Allard et al, 2013;Cao et al, 2013;Allard et al, 2012;Taylor et al, 2015;Bekal et al, 2016) Salmonella Typhimurium Shigella sonnei 4.7 5.06 Ion Torrent Illumina MiSeq Illumina HiSeq 2500 MinION Roche 454 Illumina GA II system Illumina GAII PE 2x54 bp Illumina MiSeq Illumina HiSeq2000 (Okoro et al, 2012) (Holt et al, 2012;Holt et al, 2013;McDonnell et al, 2013) 31 Staphylococcus aureus Streptococcus pneumoniae 2.8-3 1.98 - 2.19 Streptococcus pyogenes 1.85 Streptococcus suis 2.15 Vibrio cholerae ≈4 Yersinia pestis 5.46 Illumina MiSeq PE 2x150 bp Illumina GAIIx PE Illumina GAII SE 150 bp Illumina HiSeq2000 Roche 454 GS FLX Illumina HiSeq 2000 2x75 bp Illumina PE 2x54bp Roche 454 Illumina HiSeq 2000 Illumina GA1s Roche 454 Roche 454 / GS 20 Solexa Illumina HiSeq Illumina GAI Illumina GAIIx PacBio-RS Roche 454 GS FLX Illumina (Harris et al, 2010;Eyre et al, 2012;McAdam et al, 2012;Young et al, 2012;Köser et al, 2012;Holden et al, 2013;Nübel et al, 2013;Harris et al, 2013;Price et al, 2014;Azarian et al, 2015;Paterson et al, 2015;Senn et al, 2016;Kinnevey et al, 2016;Reuter et al, 2016) (Croucher et al, 2011;Loman et al, 2013;Croucher et al, 2013;Chewapreecha et al, 2014) (Zakour et al, 2012;Fittipaldi et al, 2013) (Holden et al, 2009) (Mutreja et al, 2011;Hendriksen et al, 2011;Chin et al, 2011;Hasan et al, 2012;Shah et al, 2014;Schmid et al, 2014;Devault et al, 2014;Wagner et al, 2014;Knetsch et al, 2014) (Cui et al, 2013;Wagner et al, 2014) 32 Figure 1. General workflow followed during high throughput sequencing (Metzker, 2010). 33 Figure 2. Estimates of evolutionary rate for different bacterial species and its relationship to the time elapsed since the most recent common ancestor of the isolates used to determine the rate. Sources of data: H. pylori (Kennemann et al, 2011), C. difficile (Didelot et al, 2012b), Sh. sonnei (Holt et al, 2012), Y. pestis rasmussen2015a (Rasmussen et al, 2015), S. aureus (Harris et al, 2010;Ward et al, 2014), S. pneumoniae (Croucher et al, 2011), L. pneumophila (Sánchez-Busó et al, 2014), M. tuberculosis (Comas et al, 2013), M. leprae (Schuenemann et al, 2013), S. enterica (Zhou et al, 2013). 34