Evad 008
Evad 008
Evad 008
1
Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, Germany
2
Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife KY16 9TF, UK
3
Department of Biological and Behavioural Sciences, Queen Mary University of London, UK
Abstract
Population genetics is transitioning into a data-driven discipline thanks to the availability of large-scale genomic data and the need to
study increasingly complex evolutionary scenarios. With likelihood and Bayesian approaches becoming either intractable or com
putationally unfeasible, machine learning, and in particular deep learning, algorithms are emerging as popular techniques for popu
lation genetic inferences. These approaches rely on algorithms that learn non-linear relationships between the input data and the
model parameters being estimated through representation learning from training data sets. Deep learning algorithms currently em
ployed in the field comprise discriminative and generative models with fully connected, convolutional, or recurrent layers.
Additionally, a wide range of powerful simulators to generate training data under complex scenarios are now available. The appli
cation of deep learning to empirical data sets mostly replicates previous findings of demography reconstruction and signals of nat
ural selection in model organisms. To showcase the feasibility of deep learning to tackle new challenges, we designed a branched
architecture to detect signals of recent balancing selection from temporal haplotypic data, which exhibited good predictive perform
ance on simulated data. Investigations on the interpretability of neural networks, their robustness to uncertain training data, and
creative representation of population genetic data, will provide further opportunities for technological advancements in the field.
Key words: population genetics, machine learning, artificial neural networks, simulations, balancing selection.
Significance
Deep learning, a powerful class of supervised machine learning, is emerging as a promising inferential framework in
evolutionary genomics. In this review, we introduce all deep learning algorithms currently used in population genetic
studies, highlighting their strengths, limitations, and empirical applications. We provide perspectives on their interpret
ability and usage in face of data uncertainty, whilst suggesting new directions and guidelines for making the field ac
cessible and inclusive.
© The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse,
distribution, and reproduction in any medium, provided the original work is properly cited.
Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023 1
Korfmann et al. GBE
important obstacle to applications in the domain of popu with vast amounts of genomes and metadata at hand in
lation genetics, which main objective is to uncover the gen the past few years. For instance, in human population gen
etic and evolutionary mechanisms responsible for the etics, scientists have access to high-quality whole-genome
diversity of life on our planet. Another deterrent is the ap sequencing data from more than 150,000 individuals
parent difference in foci between the fields of statistics from the UK Biobank (Halldorsson et al. 2022), and more
and machine learning. Statistics is focused on inference than 3,000 individuals distributed world-wide (Byrska-
through the creation and fitting of a probabilistic model Bishop et al. 2022), or to hundreds of genomic data from
while machine learning is focused on prediction using ancient samples (https://reich.hms.harvard.edu/datasets).
general-purpose algorithms that capture patterns present In this review, we will focus on a particular subset of su
in complex and large data sets (Bzdok et al. 2018). pervised machine learning algorithms, namely deep neural
2 Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023
Deep Learning in Population Genetics GBE
Glossary
• Accuracy: proportion of correct predictions made by a model
• Activation function: operation that each neuron performs
• Attribute: name of a variable describing an observation
• Bias term: a term attached to neurons allowing the model to represent patterns that do not pass through the origin
• Backpropagation: gradient descent-based learning algorithm for calculating derivatives through the network start
ing from the last layer
• Confusion Matrix: table that summarizes the prediction performance by providing false and true positive/negative
rates
To train a supervised machine learning algorithm, the application (which only uses observed data) and those
available data sets are typically divided into training, valid aimed at detecting selection is that the latter implement
ation, and testing sets, with the latter two sets used to evalu an innovation first introduced by Pavlidis et al. (2010)
ate the performance during and after training. In supervised whereby the ML algorithms are trained using synthetic
learning, a labeled data set (which explicitly relates any given data sets generated via simulations. These applications,
input to a specific output) is given to the algorithm. The loss therefore, can be considered as being part of likelihood-
(the distance between the predicted and true value) is calcu free simulation-based approaches (Cranmer et al. 2020),
lated, and at the next iteration the internal parameters are which are commonly employed in population genetics.
updated towards decreasing loss (and increasing accuracy). Currently, most population genetics applications of ML
Training a supervised machine learning algorithm is a fine use this strategy but, as we describe below, some recent ap
balance between prediction accuracy over the training set plications only use observed data to train the algorithms.
and generalization performance over the testing set. These applications, however, require the combination of
Machine learning has a rich history in biological sciences genotypic data with phenotypic, environmental or geo
and genomics (reviewed in Yue and Wang 2018; Zou et al. graphic coordinate data.
2019; Greener et al. 2022). Additionally, supervised ma As already stated, in this review we will focus on deep
chine learning methods have been designed and deployed learning, a class of machine learning algorithms based on arti
to perform population genetic tasks such as variant calling ficial neural networks comprising nodes in multiple layers con
(Poplin et al. 2018) and the prediction, characterization, necting features (input) and responses (output) (LeCun et al.
and localization of signatures of natural selection (Pavlidis 2015). Weights between nodes are optimized during the
et al. 2010; Lin et al. 2011; Ronen et al. 2013; Pybus training to minimize the distances between predictions and
et al. 2015; Schrider and Kern 2016; Sugden et al. 2018; ground truth. After training, an ANN can predict the response
Mughal and DeGiorgio 2019; Koropoulis et al. 2020). An given any arbitrary new input data. Unlike approaches that
important difference between the variant calling use a predefined set of summary statistics as input, deep
Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023 3
Korfmann et al. GBE
learning algorithms can effectively learn which features are generative models. For each type of algorithm, we illustrate
sufficient for the prediction (LeCun et al. 2015). This is an im their main applications in the field and the novel findings
portant aspect as summary statistics are meaningful but generated by their deployments. Note that these general al
human-constructed features. When dealing with different gorithms have a long history spanning many decades and
sources of raw data, the design of such features has been a numerous original contributions which we cannot properly
major part of information engineering. A key finding of credit in our review because of space. Thus, we refer read
deep learning was that such features emerged within a ers interested in historical developments to previous publi
well-trained deep network: they are effectively suggested cations (Schmidhuber 2014).
and discovered by a network during training (Krizhevsky
et al. 2012). This finding has been repeated in different do
4 Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023
Deep Learning in Population Genetics GBE
where b is the bias (not to be confounded with statistical (SNPs), and identity-by-state tracts are among the most im
bias), w = {wi } is a vector of weights, x = {xi } is a vector of portant features for the inference of population size
input features (explanatory variables), and f is a nonlinear changes and type of selection.
activation function. In an FCNN with a single hidden layer, Another example of an FCNN application in population
there will be a number J of hidden nodes, each carrying out genetics that uses simulated data to train the algorithm is
a similar operation using a different vector of weights, all of provided by the work of Burger and colleagues on the esti
which can be represented by a matrix W = {wij }, mation on mutation rates (Burger et al. 2022). They show
i = 1, 2 . . . , I, j = 1, 2, . . . J. A very simple example of an that a simple neural network is able to recapitulate estima
FCNN with one hidden layer and only two nodes is pre tors of mutation rate for intermediate recombination rates.
sented in figure 1. As a novel methodological advance, their implementation
Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023 5
Korfmann et al. GBE
Finally, we note that FCNNs have also been used in the step. The number of kernels, their dimensions, and initial
context of ABC frameworks. Early studies used neural net ization are all hyperparameters of the model.
works to construct the posterior distribution of parameters CNNs can be regarded as a regularized version of FCNNs
from the collection of accepted values (Blum and François with a focus on localized spatial signatures. In fact, a funda
2010), as implemented in the abc package (Csilléry et al. mental property of CNNs is the space-invariance of the
2012). More recently, Mondal and colleagues coupled an learned features in the data set, which means that they
ABC framework, using the site frequency spectrum (SFS) can identify a pattern regardless of its spatial location in
as summary statistic, with a four-layer FCNN to infer the the image. Note, however, that identification of feature
demographic history of human Eurasian populations realizations like rotations or scaling requires either appro
(Mondal et al. 2019). Their implementation includes an priate samples or perturbations of the input (Goodfellow
6 Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023
Deep Learning in Population Genetics GBE
each SNP, respectively. Under this representation, and in valuable consideration since, when reliable simulators are
opposition to the structured nature of “classic” images, available (as in the case of population genetics), we have ac
the ordering of individuals (i.e. random samples from a cess to theoretically infinite training data, the latter being
population) in an unstructured population is arbitrary and constrained by computing time only. The implemented
carries no information (Chan et al. 2018); i.e. genetic software defiNETti was applied to illustrate the accuracy
data are exchangeable. However, standard CNNs rely on of exchangeable neural networks to predict recombination
spatial information and, therefore, the ordering of the hotspots in human data.
data can affect its accuracy. To avoid this problem, indivi Further solutions to tackle the issue of exchangeable
duals need to be sorted in a “biologically meaningful” genetic data have been explored by Torada et al. (2019)
way. For example, Flagel and collaborators sort chromo in the software ImaGene. Specifically, the authors showed
somes by genetic similarity (Flagel et al. 2018). how ordering haplotypes and SNPs by frequency leads to
Additionally, they represent the information on genomic accurate predictions of positive selection. Whilst sorting
positions of SNPs as a separate branch in the architecture. SNPs implied a loss of information on LD patterns, this ap
Interestingly, the inclusion of monomorphic sites in win proach makes training faster with minimal decay in accur
dows of fixed length seems to yield good accuracy for pre acy, as the number of learnable parameters is drastically
dicting natural selection, as shown in a separate study reduced as the final fully connected layer is not required.
(Nguembang Fadja et al. 2021). Notably, several applica However, double-sorting makes the method less appropri
tions of the proposed method are illustrated, with CNN ate for a general-purpose methodology. Additionally, by
achieving equal if not better performance than training and testing ImaGene with simulations condi
state-of-the-art methods to detect gene flow and selective tioned on different demographic models, the authors
sweeps, estimate recombination rates, and infer demo quantified the drop in accuracy when CNNs are affected
graphic parameters (Flagel et al. 2018). Therefore, these by model misspecification during training. Finally, a multi
findings demonstrated the capability of CNNs to infer class classification approach was proposed as an alternative
population genetic parameters, even in cases where a the method to approximate the posterior distribution of the se
oretical framework is not available. lection coefficient, a continuous parameter typically hard to
To address the exchangeability issue, Chan et al. (2018) estimate.
proposed an exchangeable neural network. This architec In another landmark study, Sanchez et al. (2021) provide
ture consists of convolutional layers with 1-dimension ker a comprehensive framework for building deep neural net
nels with a subsequent permutation-invariant function to works taking into account several nuances of the input
allow for the network to be insensitive to the order of indi data, such as the variable number of SNPs, their correlation,
viduals. Although they employed the mean operation and the exchangeability of individuals. These challenges
as permutation-invariant function, other functions are pos were tackled by proposing an architecture, called
sible, including a fully connected layer. Another important SPIDNA (Sequence Position Informed Deep Neural
contribution of this study is the adoption of a “simulation- Architecture), which consisted of stacks of multiple blocks
on-the-fly” approach: training data is continuously gener of convolutional, pooling, and fully connected layers. In
ated by simulations to avoid the network to see the same addition to deploy their method to reconstruct changes in
data twice and therefore to reduce overfitting. This is a effective population size of cattle breed populations, the
Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023 7
Korfmann et al. GBE
authors compared the accuracy of several deep neural net selection, is presented by Isildak et al. (2021) in the software
works against ABC, including hybrid approaches. Notably, BaSe. Although both architectures exhibit high classification
results suggest that integrating deep learning with ABC accuracy to distinguish between neutrality and selection, CNN
marginally improves performance, and possibly explainabil outperformed FCNN to predict the type of balancing selec
ity. Further investigations from the same authors demon tion, a task that proved too challenging when relying solely
strated a more prominent increased performance using on summary statistics as input. Authors used forward-in-time
deep neural networks (Sanchez 2022). These studies depart simulations and conditioned the target variants to a prede
from previous attempts to adapt existing architectures, and fined range of final allele frequency. To counterbalance the in
instead they suggest to build novel architectures tailored to creased computational time associated with this simulation
the specifics of population genetic data. scheme, a data augmentation to artificially enlarge the train
8 Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023
Deep Learning in Population Genetics GBE
predictions based on previous outcomes (Minsky 1967; dramatically decreases and becomes increasingly negligible
Rumelhart and McClelland 1987; Elman 1990). In fact, compared with storage costs.
RNNs are comprised of connected nodes that form a cycle, RNNs, in all their forms, have becoming increasingly
with the output of some nodes feeding back to other (or popular in population genetics thanks to their ability to in
same) nodes. Therefore, simple RNNs can be considered corporate sequential data. Whilst training recurrent layers
as for-loops iterating along the sequential data, where at tend to be more challenging, coupling them with convolu
each position the current input and the previous output tional layers appear to be a suitable solution to overcome
are combined to form the next output (or hidden state). such issue whilst incorporating novel information. In the
Multiple RNN layers can be stacked on top of each other next section, we will explore how CNNs can be embedded
to increase the capacity of the network and extract more in a more general family of machine learning algorithms
Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023 9
Korfmann et al. GBE
point representation. Furthermore, the latent space directly (D(x)) and the second part Ez [ log (1 − D(G(z)))] stands for
offers the possibility to probe the network for any kind of the expected value of generated data (G(z), z being the la
structure as input data, which the encoder has been forced tent initialization) to be classified as fake by the discrimin
to compress, by plotting the low-dimensional latent vari ator (1 − D(G(z))). Thus, the discriminator aims to
ables against each other. Thanks to the non-linearity of maximize the loss function, whereas the generator tries
neural networks, VAEs outperform classic methods, that to minimize it. The parameters of both networks are up
is PCA, for visual data representation (Battey et al. 2021). dated alternately. Optimization can be particularly challen
VAEs have been implemented by Battey et al. (2021) in ging as neither network should be under-performing nor
the software popvae. By applying it to genomic data sets, outperforming the other network too quickly. For instance,
they recovered geographic similarities among human popu when both networks are not training synchronously, many
10 Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023
Deep Learning in Population Genetics GBE
limitations, GANs appear to be a promising deep learning tools that have been applied to train deep neutral networks for
framework to infer complex population genetic parameters population genetic inferences.
in face of an uncertain or unknown demographic model
(Booker et al. 2022).
Software
Most of the studies herein mentioned provide their imple
mentations, often as user-friendly software, of deep learn
Available Resources
ing algorithms for population genetic analyses. In table 1,
Simulators we summarize these implementations by the programming
The application of deep learning methods has been em language and required (or preferred) simulator (if any) used,
Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023 11
Korfmann et al. GBE
Table 1
List of Available Software and Implementations of Deep Learning Methods (not considering generative models) for Population Genetic Inferences
Reference Language/Library Simulator Input
evoNeta (Sheehan and Song 2016) Java msms Summary statistics
DeepGenomeScanb (Qin et al. 2022) R/keras Not trained by simulations genotype, phenotype and
sampling locations
Locaterc (Battey et al. 2020) python/keras Not trained by simulations Phenotype and sampling locations
ML_in_pop_gend (Burger et al. 2022) python/keras msprime SFS
ABC_DLe (Mondal et al. 2019) Java/Encog and R/abc fastSimcoal2 SFS
diploS/HICf (Kern and Schrider 2018) python/keras and scikit-learn discoal Summary statistics
partialS/HICg (Xue et al. 2020)
proved to be a major determinant of important pheno balancing selection) at the center of the locus, conditioned
types, including in humans (Soni et al. 2022). However, re to a demographic model of European populations
cent and fleeting balancing selection leaves cryptic genomic (Jouganous et al. 2017). We performed forward-in-time si
traces which are hard to detect and greatly confounded by mulations using SLiM (Haller and Messer 2019), similarly to
neutral evolutionary processes (Sellis et al. 2011). a previous study (Isildak et al. 2021). We imposed selection
Therefore, currently employed methods are either unsuit on a de novo mutation starting 10k years ago, with selec
able or underpowered to detect short-term balancing selec tion coefficients of 0.25% and 0.5%. We sampled 40
tion (Fijarczyk and Babik 2015). present-day haplotypes, and 10 ancient haplotypes at
Information from temporal genetic variation, either from four different time points (8k, 4k, 2k, 1k years ago, mirror
evolve-resequence or ancient DNA (aDNA) experiments, is ing a plausible human aDNA data collection).
particularly suitable to identify when and at to what extent We trained a deep neural network to distinguish be
natural selection acted (Dehasque et al. 2020). Previous at tween neutrality and selection. Using pytorch, we built
tempts to use deep learning to infer balancing selection a network comprising two branches. One branch receives
from contemporary genomes (Isildak et al. 2021) and posi present-day haplotypes and performs a series of convolu
tive selection from temporal data (Whitehouse and Schrider tional and pooling layers with permutation-invariant func
2022) suggest that training an algorithm that uses the tions. The other branch processes stacked ancient
haplotype information from both contemporary and haplotypes at different sampling points, and both branches
aDNA data has high potential to characterize signals of re performing residual convolutions. The two branches are
cent adaptation (and thus recent balancing selection). merged with a dense fully layer that performs a ternary clas
To illustrate the ability of deep learning to detect signals sification. We used 64 filters with 3x3 kernel size and 1x1
of recent balancing selection, we simulated a scenario in padding size after sorting haplotypes by frequency
spired by available data in human population genetics. (Torada et al. 2019). We performed 10 separate training
We simulated 2,000 50 kbp loci under either neutrality or operations to obtain confidence intervals in accuracy va
overdominance (i.e. heterozygote advantage, a form of lues. We report results in the form of confusion matrices,
12 Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023
Deep Learning in Population Genetics GBE
a typical representation to summarize the predictive per different operational definitions based on a wide range of
formance at testing. To further showcase the accessibility criteria. In fact, several taxonomies for interpretability of
of deep learning, we made the full implementation and neural networks have been proposed and the number of
scripts are available at https://github.com/kevinkorfmann/ published articles on interpretability has been increasing ex
temporal-balancing-selection. ponentially since 2000 (Fan et al. 2020). Therefore, here we
Results show that, despite the small training set used, will restrict ourselves to distinguishing between global and
the network has high accuracy to infer recent balancing se local interpretability and explaining the relevance of these
lection under this tested scenario (fig. 3). Notably, we ob two concepts for population genomics studies. Also, we
serve a significant decrease in accuracy for distinguishing note that we will not consider very recent efforts aimed
between weak and moderate selection when silencing at designing inherently interpretable deep neural networks
the time-series branch, suggesting an important contribu (e.g. Chen et al. 2020) and instead focus on post-hoc inter
tion of ancient samples in the prediction. In this illustrative pretation methods, that is algorithms that can be used to
example, we do not attempt to take into account the uncer interpret an already trained network.
tainty given by degraded and low-coverage aDNA data and Global interpretability aims at explaining the overall
population structure across time points, among other con behaviour of a model (Ancona et al. 2019), which in turn
founding factors. Nevertheless, these results demonstrate can inform us about the system being studied. In principle,
that building and training novel deep learning algorithms this goal can be achieved by analysing the hyperparameters
is accessible and generates powerful predictions to address (which control the learning process and the values taken by
current questions in population genetics. the parameters; for example learning rate, activation func
tion, number of hidden layers, number of neurons per hid
den layer) or parameters (weights and biases) of a deep
Interpretable Machine Learning neural network. However, the information provided by hy
As already mentioned in the Introduction, population genet perparameters tend to be limited to model complexity, for
ics and evolution in general are aimed at uncovering the me example, in terms of the number of nodes and hidden
chanisms responsible for the diversity of life in our planet. layers retained after tuning and fitting or the type of activa
Thus, the black-box nature of deep learning methods re tion function. On the other hand, the values taken by para
present an important obstacle for their application in these meters (weights and biases) after fitting can provide more
research fields. However, very recent advances in “interpret meaningful biological information; in particular, they help
able machine learning” algorithms (Linardatos et al. 2021) identify the features that contributed the most to the pre
are providing the tools needed to overcome this hurdle. dictive power of the algorithm. For example, Sheehan
But what exactly do we mean by interpretability? There is and Song (2016) (see FCNN section above) use random per
no general consensus on what the word “interpretability” mutation of each summary statistic (feature) and identify as
means (Doshi-Velez and Kim 2017; Fan et al. 2020) and dis most informative for the detection of population size
cussions of this concept in the artificial intelligence litera changes those statistics that, when randomly permuted,
ture tend to be rather abstract and sometimes highly lead to the sharpest decrease in accuracy. Another ap
technical. In the context of machine learning, a common proach is based on feature importance (Olden and
definition is “the ability to explain or present in understand Jackson 2002), which was used by another study (Qin
able terms to a human” (Doshi-Velez and Kim 2017). This et al. 2022) to identify as outlier loci those that contributed
abstract definition has been translated into a myriad of the most to the power of an FCNN to predict an individual’s
Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023 13
Korfmann et al. GBE
phenotype or geographic origin. Feature importance is (KernelShap and DeepShap) and R (shapr). However, they
based on the idea that the magnitude of connection are limited to deep neural networks with moderate number
weights between neurons connecting input and output of features. Nevertheless, very recent developments have
nodes measure the extent to which each feature contri led to new approaches, DASP (Ancona et al. 2019) and
butes to the network’s predictive power. The architecture G-DeepShap (Chen et al. 2022), that may scale up to popula
used for these two examples was an FCNN. A different ap tion genomics datasets. For the moment, there are no applica
proach is necessary in the case of CNNs. For example, in the tions of Shapley values to population genomics studies; there
case of a CNN that classify images into different categories, is only an application in population genetics but in the context
a common approach is to use saliency maps, which meas of random forests (Kittlein et al. 2022).
ure the support that different groups of pixels in an image Much work remains to be done in order to incorporate the
14 Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023
Deep Learning in Population Genetics GBE
Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023 15
Korfmann et al. GBE
16 Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023
Deep Learning in Population Genetics GBE
Battey CJ, Coffing GC, Kern AD. 2021. Visualizing population struc Cho K, van Merrienboer B, Bahdanau D, Bengio Y. 2014. On the prop
ture with variational autoencoders. G3 11(1):jkaa036. doi:10. erties of neural machine translation: encoder-decoder approaches.
1093/g3journal/jkaa036 CoRR. Available from: http://arxiv.org/abs/1409.1259.
Battey CJ, Ralph PL, Kern AD. 2020. Predicting geographic location Cranmer K, Brehmer J, Louppe G. 2020. The frontier of simulation-
from genetic variation with deep neural networks. eLife 9: based inference. Proc Natl Acad Sci U S A. 117(48):30055–30062.
e54507. doi:10.7554/eLife.54507 doi:10.1073/pnas.1912789117
Baumdicker F, et al. 2021. Efficient ancestry and mutation simulation Csilléry K, Blum MG, Gaggiotti OE, François O. 2010. Approximate
with msprime 1.0. Genetics. 220(3):iyab229. doi:10.1093/ Bayesian computation (ABC) in practice. Trends Ecol Evol. 25(7):
genetics/iyab229 410–418. doi:10.1016/j.tree.2010.04.001
Beaumont MA, Zhang W, Balding DJ. 2002. Approximate Bayesian Csilléry K, François O, Blum MGB. 2012. abc: an r package for approxi
computation in population genetics. Genetics 162(4): mate Bayesian computation (ABC). Methods Ecol Evol. 3(3):
2025–2035. doi:10.1093/genetics/162.4.2025 475–479. doi:10.1111/j.2041-210X.2011.00179.x
Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023 17
Korfmann et al. GBE
Goodfellow I, Bengio Y, Courville A. 2016. Deep learning. Jouganous J, Long W, Ragsdale AP, Gravel S. 2017. Inferring the joint
Cambridge: MIT Press. demographic history of multiple populations: beyond the diffusion
Gower G, Picazo PI, Fumagalli M, Racimo F. 2021. Detecting adaptive approximation. Genetics 206(3):1549–1567. doi:10.1534/
introgression in human evolution using convolutional neural net genetics.117.200493
works. eLife 10:e64669. doi:10.7554/eLife.64669 Kelleher J, et al. 2019. Inferring whole-genome histories in large popu
Grealey J, et al. 2022. The carbon footprint of bioinformatics. Mol Biol lation datasets. Nat Genet. 51(9):1330–1338. doi:10.1038/
Evol. 39(3):msac034. doi:10.1093/molbev/msac034 s41588-019-0483-y
Greener JG, Kandathil SM, Moffat L, Jones DT. 2022. A guide to ma Kelleher J, Thornton KR, Ashander J, Ralph PL. 2018. Efficient pedi
chine learning for biologists. Nat Rev Mol Cell Biol. 23(1):40–55. gree recording for fast population genetics simulation.
doi:10.1038/s41580-021-00407-0 PLoS Comput Biol. 14(11):e1006581. doi:10.1371/journal.
Halldorsson BV, et al. 2022. The sequences of 150,119 genomes in the pcbi.1006581
UK Biobank. Nature 607(7920):732–740. doi:10.1038/s41586- Kern AD, Schrider DR. 2016. Discoal: flexible coalescent simulations
18 Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023
Deep Learning in Population Genetics GBE
LeCun Y, Bengio Y. 1995. Convolutional networks for images, speech, Mondal M, Bertranpetit J, Lao O. 2019. Approximate Bayesian compu
and time-series. Cambridge: MIT Press. tation with deep learning supports a third archaic introgression in
LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature 521(7553): Asia and Oceania. Nat Commun. 10(1):246. doi:10.1038/s41467-
436–444. doi:10.1038/nature14539 018-08089-7
LeCun Y, Huang FJ, Bottou L. 2004. Learning methods for generic ob Mughal MR, DeGiorgio M. 2019. Localizing and classifying adaptive
ject recognition with invariance to pose and lighting. In targets with trend filtered regression. Mol Biol Evol. 36(2):
Proceedings of the 2004 IEEE Computer Society Conference on 252–270. doi:10.1093/molbev/msy205
Computer Vision and Pattern Recognition, 2004. CVPR 2004. Nguembang Fadja A, Riguzzi F, Bertorelle G, Trucchi E. 2021.
IEEE. Vol. 2. p. II–104. doi:10.1109/CVPR.2004.1315150 Identification of natural selection in genomic data with deep con
Levy SE, Myers RM. 2016. Advancements in next-generation sequen volutional neural network. BioData Min. 14(1):51. doi:10.1186/
cing. Annu Rev Genomics Hum Genet. 17(1):95–115. doi:10. s13040-021-00280-9
1146/annurev-genom-083115-022413 Nielsen R. 2005. Molecular signatures of natural selection. Annu Rev
Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023 19
Korfmann et al. GBE
Rumelhart DE, McClelland JL. 1987. Learning internal representations Suvorov A, Hochuli J, Schrider DR. 2020. Accurate inference of
by error propagation: Cambridge: MIT Press. p. 318–362. tree topologies from multiple sequence alignments using
Sanchez T. 2022. Reconstructing our past deep learning for population deep learning. Syst Biol. 69(2):221–233. doi:10.1093/sysbio/
genetics [theses]. Université Paris-Saclay. Available from: https:// syz060
theses.hal.science/tel-03701132. Teh YW, Hinton GE. 2000. Rate-coded restricted Boltzmann ma
Sanchez T, et al. 2022. dnadna: a deep learning framework for popu chines for face recognition. In: Leen T, Dietterich T, Tresp V, edi
lation genetics inference. Bioinformatics 39:btac765. doi:10.1093/ tors. Advances in neural information processing systems. Vol.
bioinformatics/btac765 13. MIT Press. Available from: https://proceedings.neurips.cc/
Sanchez T, Caramiaux B, Thiel P, Mackay WE. 2022. Deep learning uncer paper/2000/file/c366c2c97d47b02b24c3ecade4c40a01-Paper.
tainty in machine teaching. In: IUI 2022 - 27th Annual Conference on pdf.
Intelligent User Interfaces. Finland: Helsinki/Virtual. Available from: Tejero-Cantero A, et al. 2020. SBI: a toolkit for simulation-based in
https://hal.archives-ouvertes.fr/hal-03579448. ference. J Open Source Softw. 5(52):2505. doi:10.21105/joss.
20 Genome Biol. Evol. 15(2) https://doi.org/10.1093/gbe/evad008 Advance Access publication 23 January 2023