Fep Amber

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

pubs.acs.

org/jcim Article

Alchemical Binding Free Energy Calculations in AMBER20: Advances


and Best Practices for Drug Discovery
Tai-Sung Lee, Bryce K. Allen, Timothy J. Giese, Zhenyu Guo, Pengfei Li, Charles Lin,
T. Dwight McGee, Jr., David A. Pearlman, Brian K. Radak, Yujun Tao, Hsu-Chun Tsai, Huafeng Xu,
Woody Sherman,* and Darrin M. York*
Cite This: J. Chem. Inf. Model. 2020, 60, 5595−5623 Read Online
Downloaded via UNIVERSIDAD DE LOS ANDES ANDES on November 1, 2022 at 03:03:50 (UTC).

ACCESS *
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

Metrics & More Article Recommendations sı Supporting Information

ABSTRACT: Predicting protein−ligand binding affinities and the associated


thermodynamics of biomolecular recognition is a primary objective of structure-
based drug design. Alchemical free energy simulations offer a highly accurate and
computationally efficient route to achieving this goal. While the AMBER
molecular dynamics package has successfully been used for alchemical free energy
simulations in academic research groups for decades, widespread impact in
industrial drug discovery settings has been minimal because of the previous
limitations within the AMBER alchemical code, coupled with challenges in system
setup and postprocessing workflows. Through a close academia-industry
collaboration we have addressed many of the previous limitations with an aim
to improve accuracy, efficiency, and robustness of alchemical binding free energy
simulations in industrial drug discovery applications. Here, we highlight some of
the recent advances in AMBER20 with a focus on alchemical binding free energy
(BFE) calculations, which are less computationally intensive than alternative binding free energy methods where full binding/
unbinding paths are explored. In addition to scientific and technical advances in AMBER20, we also describe the essential practical
aspects associated with running relative alchemical BFE calculations, along with recommendations for best practices, highlighting the
importance not only of the alchemical simulation code but also the auxiliary functionalities and expertise required to obtain accurate
and reliable results. This work is intended to provide a contemporary overview of the scientific, technical, and practical issues
associated with running relative BFE simulations in AMBER20, with a focus on real-world drug discovery applications.

1. INTRODUCTION While alchemical free energy simulation capability has been


Accurate, robust prediction of the relative binding free energy in AMBER since the 1980s, a number of technical and
(BFE) of ligands to a target protein is of tremendous value in scientific challenges have impeded progress toward the broader
adoption and higher impact of AMBER in drug discovery
drug discovery, serving as an in silico assay and a way to gain
projects. This work provides a modern update of advances in
deeper insights into the origin of biomolecular recognition.1−4
BFE simulations in AMBER20 and a description of current
Rigorous free energy simulations of ligand-protein binding
guidelines and best practices in the context of real-world drug
yield both thermodynamic and kinetic information but can be
discovery applications. The manuscript is organized as follows:
extremely computationally intensive to converge to high
The remainder of this section describes the history and origin
precision due to the need to explore and sufficiently sample
of free energy simulations in AMBER leading up to the latest
the minimum free energy pathway that connect bound and
developments in AMBER20 that enable large-scale application
unbound states (including often starkly different entropic
in drug discovery projects. Section 2 provides an overview of
differences). Alchemical BFE simulations, on the other hand,
the background formalism for alchemical free energy
can be engineered to be much more tractable owing to the
simulations, with extended discussion of transformation
property that the free energy is a state function from which
pathways and protocols using so-called “softcore potentials”.
thermodynamic end states (bound and unbound) can be
connected by any pathway. In practice, thermodynamic cycles
can be constructed that utilize “alchemical” pathways between Special Issue: Novel Directions in Free Energy
end states that can be optimally computed. Whereas alchemical Methods and Applications
BFE simulations do not provide a complete mechanistic and Received: June 2, 2020
kinetic characterization of the binding process, they provide a Published: September 16, 2020
highly efficient and practical approach to predict the binding
affinities of lead compounds important in drug discovery.

© 2020 American Chemical Society https://dx.doi.org/10.1021/acs.jcim.0c00613


5595 J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Sections 3 and 4 discuss performance and feature advances, AMBER molecular dynamics (MD) code base that had been
respectively, in AMBER20 for drug discovery. Section 5 published a year earlier as AMBER 2.0.28 The newly christened
reviews practical considerations and provides guidance to “Gibbs” module of AMBER was subsequently applied to
achieving robust and reliable BFE results, including system several systems in a collaboration between Singh and Kollman
preparation, docking, atom mapping and λ scheduling, use of with Bash of the Langridge Laboratory at UCSF. Together,
restraints and estimation of confidence and errors. Section 6 they published the first papers describing the application of
presents our perspective about important forthcoming and AMBER for free energy calculations, and the first computa-
future work to advance the state-of-the-art. The final section tional free energy paper to appear in Science Magazine.29−31
provides brief concluding remarks that emphasize the purpose While this was not the first application of such free energy
and main points of the manuscript. calculations to be published (see section 1.1), in many ways it
1.1. Historical Overview of Alchemical Binding Free exploded interest in the field, owing to the broad journal
Energy Simulations. Alchemical binding free energy readership, the pharmaceutically relevant test systems (nucleic
simulations on computers have been performed since the acid bases, amino acid side chains, organic small molecules,
1980s, although the theoretical foundation began decades and protein−ligand interactions), and the asserted high quality
earlier with studies of nonpolar gases by Zwanzig, where he of the results.
derived the master equation for free energy perturbation Upon release, this first implementation of free energy
(FEP) to compute thermodynamic differences between two calculations within AMBER supported three free energy
states A and B.5 In complementary (and earlier) work, protocols: Free Energy Perturbation (FEP), Thermodynamic
Kirkwood described a coupling parameter, typically called Integration (TI), and “Slow Growth,” which represented the
lambda (λ), that has since been used to improve the accuracy limiting case of TI where (it was asserted) if you used a very
of FEP calculations for meaningful chemical transformations large number of λ windows, you could evaluate each window
by making neighboring states much closer (and therefore with exactly one sample point. It was later demonstrated that
having a smoother thermodynamic path) as one moves from the slow growth approximation was unreliable in practice due
state A to B.6 Later, Bennett introduced an alternative to a “Hamiltonian lag”,32 and therefore, this approach was not
approach based on minimizing the expected squared error pursued for long (although there has been a recent resurgence
(known as Bennett Acceptance Ratio, BAR),7 which was due to foundational work by Jarzynski,33 as demonstrated by
further improved based on a statistically optimal analysis of Gapsys et al.34 and others). Free energy-specific options in this
samples (multistate BAR or MBAR).8−11 Thermodynamic first release were limited to setting the number of λ windows,
integration (TI), an alternative approach to FEP/BAR based the durations of equilibration, the amount of data collection at
on Kirkwood’s work on the theory of liquids, requires the each window, and the ability to “decouple” the vdW and
calculation of the Boltzmann averaged potential energy electrostatic contributions. At this time, calculations were
derivative at each intermediate state λ.12 AMBER20 now has limited to the single topology approach (wherein only a single
functionality for these multiple approaches of FEP (BAR, geometry for the molecule exists at any time, and changes with
MBAR, and TI), which can be used in tandem with minimal lambda are reflected by modifications to the target values of
computational overhead to gain confidence in free energy the internal coordinates and modifications of atom types)35
estimations. and the integration required for TI was performed using the
The first published free energy methods applied to chemical trapezoidal rule.
systems came from Postma, Berendsen, and Haak in 1982, At this early stage in the development and application of free
where the authors reported the free energy cost associated with energy methods, Gibbs in AMBER was one of only a few
the formation of a cavity in water13 followed by Jorgensen’s software packages broadly available. The primary molecular
seminal work in 1985 computing the hydration free energy simulation packages at this time were research software
difference of ethane to methanol, demarking the first true packages from the academic groups of Kollman (AMBER),36
alchemical transformation.14 Alchemical free energy simula- Jorgensen (MCPRO),14 Karplus (CHARMM),37 and van
tions were made more efficient by Tembe and McCammon, Gunsteren (GROMOS).38 Shortly after their initial publica-
who noted the concept of the thermodynamic cycle and tions, both Singh and Bash left UCSF for other positions. A
designed a model system to compute the ΔΔG between second generation of development of the Gibbs free energy
atoms.15 This approach was applied to compute the free module was carried out primarily by Pearlman and Kollman,
energies for model systems, such as ligand binding in a host− where they focused on (1) addressing shortfalls in the first
guest system16,17 and hydration of noble gases.17 While these implementation (e.g., the contribution from bond con-
early works showed the promise of alchemical free energy straints),39,40 (2) dynamically changing the λ schedule to
simulations in drug discovery, it took years for the first reflect the evolution of the system as it progressed,41 (3)
prospective applications to appear in the literature18 and over a validating/improving the intermediate mixing rules for non-
decade for the first published industry application to appear,19 physical λ states,42 (4) developing, characterizing, and
yet these studies involved only single heavy atom changes. implementing best practices,35 and (5) integrating error
Eventually, larger and more pharmaceutically relevant chemical propagation.43 During this period, the Gibbs module was
transformations were shown to be tractable with alchemical also updated to reflect improvements to the base molecular
free energy simulations.20 Details about the history, theory, dynamics methods, including the development of the particle
methods, and applications of alchemical simulations can be mesh Ewald (PME) method44,45 for efficient treatment of
found in a number of excellent reviews.3,4,21−27 long-range electrostatic interactions in simulations of pro-
1.2. The Origin of Free Energy Simulations in AMBER. teins46 and nucleic acids,47,48 and their parallel implementation
The first implementation of free energy calculations within the to accommodate the supercomputing platforms of the era. In
AMBER suite came in 1986. Singh implemented and tested the the early 2000s, reflecting a desire to lower the maintenance
software, which he built upon the previously developed overhead in light of a rapidly increasing number of
5596 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

modifications to the base AMBER molecular dynamics involve the balance between more advanced controls for
platform, the fundamental Gibbs functionality was reimple- optimal performance and simplified interfaces to improve
mented into the Sander module of AMBER, which also serves usability. Much work has been done to expose BFE
as the (nonfree energy) molecular dynamics platform, and the calculations to a broader audience through graphical user
Gibbs module was retired. More recently, the free energy interfaces,61,62 workflow tools,63,64 and integration with
methods have been implemented in the AMBER PMEMD powerful molecular operating environments.65−68 While ease
program, which generally replicates the functionality of Sander of use has expanded the user base for BFE methods, it has also
but provides appreciably better efficiency for highly parallel reduced the degree of expertise that can be inserted by the user
CPU platforms. to optimize performance and reliability. Indeed, performing
Free energy-specific improvements since integration into BFE simulations is still an expert process where experience
Sander (and then PMEMD) have included methods that can plays an important role, especially for challenging targets where
improve the efficiency of sampling (e.g., Replica exchange), the time scales of important degrees of freedom and
more control over the λ scheduling, improved methods for conformational flexibility might be unknown. As such, in this
integrating the TI curve, and tools to support absolute binding work we also highlight the practical considerations for reliable
free energy calculations. Broadly, changes to free energy predictions in real-world applications. In some cases robust
methodologies in AMBER since the first implementations have automated programs are available, but as will be described,
been evolutionary. The fundamental advancement that has had there are many subtle details that require close attention by
the greatest effect on the ability to obtain reliable results from expert users to obtain optimal results.
free energy calculations has been an increase by more than 6 Through an academia-industry collaboration, we have
orders of magnitude in compute power since the first AMBER addressed some of the primary issues that have in the past
free energy simulations, coupled with better force fields and limited AMBER utilization in drug discovery efforts. Most
auxiliary tools for facilitating control over advanced simulation recently, we have improved the softcore potential to ensure
options. more reliable simulations across a broad set of diverse
A major performance enhancement introduced in alchemical transformations. We have also implemented
AMBER11 was the ability to use graphical processing units restraints for absolute binding free energy (ABFE) simulations
(GPUs) to massively accelerate PMEMD for both explicit and finer control of the bonded terms in relative binding free
solvent PME and implicit solvent/Generalized Born (GB) energy (RBFE) simulations. These advances, coupled with the
simulations.49,50 The performance envelope was pushed even high performance of AMBER on GPUs and the practical
further with AMBER14 and AMBER16. Those releases considerations outlined herein, should facilitate the broader
represented leaps in both performance and functionality adoption of alchemical free energy simulations in drug
through the full utilization of the single-precision floating- discovery. We hope that these advances, coupled with the
point format (SPFP), which significantly boosted performance great work by others in the field, will aid researchers in drug
on GPUs without sacrificing numerical accuracy.51 Although, discovery to more efficiently design medicines to treat diseases
the GPU accelerated version of PMEMD, namely PMEMD.- with unmet medical needs.
cuda, has been designed to support as many of the standard
PMEMD features as possible, there were some limitations, 2. BACKGROUND FORMALISM FOR ALCHEMICAL
such as the inability to perform alchemical free energy FREE ENERGY SIMULATIONS
simulations on GPUs. Giese and York52 recognized that The change in free energy between two thermodynamic states
certain types of alchemical transformations that involved only can be computed from equilibrium simulations using a free
the interpolation of force field parameters representing the two energy perturbation (FEP)5 (sometimes referred to as
end states (rather than mixing of their Hamiltonians) could be “thermodynamic perturbation”) or thermodynamic integration
achieved without modification of the PMEMD.cuda engine. By (TI)6,69 formulations, or through nonequilibrium ensemble
bringing the Gibbs functionality in Sander out of retirement simulations using the Jarzynski equality and its equation
and making minor extensions to work with PME, some variations.33,70−74 For the purposes of the current work, we will
alchemical transformations could be achieved with a focus on the calculation of relative binding free energies from
postprocessing tool. Around the same time, the GPU- equilibrium simulations using TI and FEP formulations with
accelerated alchemical free energy module was first imple- Bennett Acceptance Ratio7,75 (BAR) and its multistate
mented as a patch of AMBER1653 and later incorporated into generalization (MBAR).9,76 For additional discussion of factors
the official AMBER18 release.54 Since then, the free energy that influence accuracy and robustness of free energy
methods in AMBER have been carefully validated55 and simulations, we refer the reader to several excellent
applied,56,57 and many advances for alchemical free energy examples.2,3,71,77−85
calculations have been actively developed, such as a novel Consider the transformation of a system of N particles in an
softcore potential,58 various types of restraints,59 and robust initial state “0” characterized by potential energy function
analysis methods.52,60 U0(rN), where rN = r1, r2, ..., rN represents the Cartesian
1.3. AMBER20 for Drug Discovery Applications. positions of each particle, to a final state “1” characterized by
Despite the advances and applications described above, the potential energy function U1(rN) having the same degrees of
impact of alchemical binding free energy simulations has been freedom. The potential energy functions can, for example,
limited in drug discovery for a number of reasons, most represent different molecular species or environments. In
commonly noted as inaccurate force fields, insufficient general we will refer to this type of transformation as an
sampling, and ease of use. However, with the many force “alchemical transformation” from which differences in
field advances in the past decades (discussed below in sections thermodynamic end states can be determined, to distinguish
5.1, 5.9, and 6.5) and increased computational throughput via it from a physical or chemical transformation that involves a
GPUs (discussed in section 3, many of the remaining issues real mechanistic pathway that contains both thermodynamic
5597 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

and kinetic information. The change in free energy of the requires a separate simulation for each “λ window” that
alchemical transformation between states 0 and 1 can be corresponds to a specific value of λ and using forces derived
computed from the ratio of configurational integrals Z0 and Z1 from the potential energy U(rN; λ). These simulations can then
as be analyzed using the BAR7,75 or MBAR9,76 methods.
Alternatively, with the introduction of a defined pathway
ΔA 0 → 1 = A1 − A 0 = −β −1 ln(Z1/Z0) (1) between states, the change in free energy can be equated to the
−1
where β = kBT, kB is the Boltzmann constant, and T is the reversible work of conducting the transformation between
absolute temperature, and states, and this gives rise to the TI formulation,6,69 which is
characterized by the formula and numerical quadrature
N
estimate
Zs = ∫V e−βU (r )dr N
s
(2)
1 ∂U (r N ; λ)
where s is the state of the system (0 or 1) and V is the volume ΔA 0 → 1 = ∫0 dλ
∂λ
λ
of the configurational space. Here, we use the Helmholtz free
energy, ΔA, in the N, V, T ensemble to motivate discussion, M
∂U (r N ; λ)
whereas extension to the Gibbs free energy and the N, P, T ≈ ∑ wk λk
ensemble is straightforward. k=1
∂λ (5)
In the FEP formulation, substitution of eq 2 into eq 1 leads
to the so-called Zwanzig, or “exponential average” relationship: where the second sum indicates numerical integration over M
quadrature points (λk, for k = 1, ..., M) with associated weights
ΔA 0 → 1 = −β −1ln⟨e−β ΔU ⟩0 (3) wk.
2.1. Alchemical Transformation Pathways and Soft-
where ΔU = U1 − U0 and the average ⟨...⟩0 involves integration core Potentials. The simplest way in which the U(rN; λ) can
over the configurational space of the Boltzmann probability for be constructed is to use a linear interpolation between states,
N
state 0, P0(r N ) = e−βU0(r )/Z0 , or equivalently, Boltzmann which we will designate as UL(rN; λ):
sampling from this ensemble from a molecular simulation U L(r N ; λ) = U0(r N ) + λΔU (r N ) (6)
using the forces derived from the potential energy U0(rN). This
expression allows the free energy of the transformation to be where the end point difference ΔU(r ) ≡ U1(r ) − U0(r ) is
N N N

computed while requiring sampling only at one thermody- also, by coincidence, the thermodynamic derivative with
namic end state. The above relation has many useful variants respect to λ. Hence, the common energy components that
that consider the other or both end states: are identical between U1(rN) and U0(rN) need not be explicitly
considered as the corresponding difference is zero. As has been
ΔA 0 → 1 = −β −1 ln⟨e−β ΔU ⟩0 well established, the linear alchemical transformation pathway
leads to practical problems that can be partially overcome by

ÄÅ É
Å ⟨f (β(ΔU − C))⟩0 ÑÑÑ
= −β −1 ln⟨e β ΔU ⟩1−1 the use of so-called “softcore” potentials for nonbonded
−1 Å Å
Å ÑÑ + C
= −β lnÅÅ Ñ
ÅÅÇ ⟨f (β(C − ΔU ))⟩1 ÑÑÑÖ
Lennard-Jones (LJ) and Coulombic electrostatic (Coul)
interactions.89,90 A commonly used softcore potential trans-
(4) formation pathway90 originally implemented in AMBER is
where f(x) = 1/[1 + exp(x)] is the Fermi function and C is a U0SC(r N ; λ) + λΔU SC(r N ; λ) (7)
constant with units of energy. If C is set to zero, one can
where ΔU (r ;λ) ≡
SC N
USC N
1 (r ;1−λ) − USC N
0 (r ;λ)
as before.
recover the original formula of eq 3, or if one solves for C such
There have been many different proposed softcore potential
that the numerator and denominator of the logarithmic term
forms that modify, or “soften”, these interactions. In the
are equal (making this term vanish), then one obtains an
following sections, to be more clear, we only show the softcore
optimal statistical estimate using the BAR method.7,75 One can
potential corresponding to one end state and the system total
further generalize this expression to consider non-Boltzmann
potential should be written as the properly weighted sum of
sampling.86
the two end states. The LJ and Coul interactions for a set of
In principle, the above FEP equations only require sampling
interacting point particles i and j separated by a distance rij are

ÅÄÅ ÑÉ
at the thermodynamic end states. However, the statistical

ÅÅij σ yz12 ij σ yz6ÑÑÑ


given by
precision requires that there is sufficient phase space
ÅÅjj ij zz j ij z Ñ
ULJ(rij) = 4ϵijÅÅÅjj zz − jjj zzz ÑÑÑÑ
overlap,60,87,88 which typically necessitates stratifying the

ÅÅj rij z j rij z ÑÑ


ÅÅÇk { k { ÑÑÖ
transformation into smaller steps along a pathway. In theory,
the free energy is a state function, and thus the free energy
difference between states is independent of the path that (8)
connects them. In practice, the choice of this pathway is of

ji qiqj zyz 1
and

UCoul(rij) = jjj z
immense importance, as it can be extremely challenging to

j 4π ϵ0 zz rij
k {
converge sampling along the pathway itself. Let us then define
a thermodynamic parameter λ that smoothly connects states 0 (9)
and 1 through a λ-dependent potential U(rN; λ), such that
U(rN; 0) = U0(rN) and U(rN; 1) = U1(rN). Within the FEP where σij and ϵij are the pairwise LJ contact distance and well
formulation, the transformation can be broken down into a depth, respectively, and qi and qj are the partial charges of
series of M steps corresponding to a set of λ values λ1, λ2, ..., λM particles i and j. To “soften” these pairwise interactions with
ranging from 0 to 1, such that there is sufficient phase space particles contained within the selected softcore region, one can
overlap between neighboring intermediate λ states. This modify the effective interaction distance by introducing a
5598 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

parametric form for separation-shifted scaling with an adjust- where Rcut,i is the onset distance where the weight function
able parameter. A commonly used form of these modifications becomes effective and Rcut,f is the final distance of the weight
is89,90 function where the softcore potential completely diminishes
and is set to the same as the nonbonded cutoff distance.
rijLJ(λ ; α) = [rijn + λασijn]1/ n (10) 2.2. Common Problems with Softcore Potentials and
Their Solutions. We call specific attention to three problems
and
that commonly occur in simulations of alchemical trans-
rijCoul(λ ; β) = [rijm + λβ ]1/ m formations, and in particular for “concerted transformations”
(11)
that involve simultaneous changes in both nonbonded LJ and
where n and m are positive integers and α and β are adjustable Coul terms. These are referred to as the “end point
positive semidefinite parameters for the LJ and Coul softcore catastrophe”, the “particle collapse problem”, and the “large
interactions, respectively, with values of zero corresponding to gradient-jump problem”.
no softcore modification for any λ value. In several molecular The end point catastrophe is well-known, and arises from a
simulation software suites, including the default in AMBER, sharp divergence of the contribution to the free energy at the
the values of n = 6 and m = 2 are used, although other values thermodynamic end points (λ values near 0 and 1) due to poor
have also been suggested,90 and as will be discussed below, phase space overlap, and can be avoided by the use of softcore
combined with new smoothstep softcore potentials, consid- potentials. The particle collapse problem involves the
erable improvements can be made to stabilize sampling of the introduction of new spurious minima at intermediate λ states,
transformations. frequently manifesting in the artificial superposition of particles
The LJ and Coul softcore potentials are defined from these that can lead to large amplitude fluctuations or phase transition
scaled effective interaction distances as behavior along the λ dimension.94 This problem results from
SC an imbalance of Coulomb attraction and exchange repulsion,
ULJ (rij ; λ) = ULJ[rijLJ(λ ; α)] (12) and can be overcome by ensuring that these terms are scaled in
and such a way that preserves overall repulsive behavior at short
distances for all λ values (e.g., by ensuring that the repulsive
SC
UCoul(rij ; λ) = UCoul[rijCoul(λ ; β)] (13) terms are sufficiently large to overcompensate for any attractive
Coulomb interactions). Finally, the large gradient-jump
The thermodynamic derivatives with respect to λ can be problem involves sensitivity of the free energy to certain
obtained using the chain relation. softcore parameter values that adjust the exchange repulsion
Recently, we developed a new smoothstep softcore potential and can lead to spurious jumps in the free energy near the
for nonbonded LJ and Coul interactions, implemented in thermodynamic end points. This problem can be solved
AMBER20 and demonstrated below, that further improves the through use of a smoothstep softcore potential that has scaling
stability in practical calculations.58 We introduce a nonlinear λ weights with derivatives that vanish at the end points.
scaling function by replacing λ in eq 7 with a so-called second- 2.3. Stepwise versus Concerted and Absolute versus
order smoothstep function, S2(λ), defined as Relative Protocols for Alchemical Transformations.
S2(λ) = 6λ 5 − 15λ 4 + 10λ 3 Here, we discuss strategies for alchemical free energy
(14)
simulation protocols and parameters that will yield the best
Note that S2(λ) varies smoothly from 0 to 1 and has vanishing results for a given system of interest. One of the most pivotal
derivatives at λ = 0 and 1. Details of the implementation and technical issues is the choice of the alchemical path connecting
testing of the second-order smoothstep softcore potential in the two real states (i.e., connecting the two thermodynamic
AMBER20 can be found elsewhere.58 Similar in spirit, but end points). While, the free energy difference between two
slightly different in details, is closely related work first states is independent of the path that connects them in the
introduced by Hritz and Oostenbrink91 and described in regime of complete conformational sampling, in practical
further detail by Riniker and co-workers92 where the use of calculations of complex systems, the choice of the alchemical
third-order polynomials enable different λ-dependency (re- transformation path is critical to obtain stable, converged
ferred to a “individual Lambdas”) for calculation of relative free results with affordable sampling.
energies. This form of the softcore potential also has been In the discussion that follows, we separate the atoms
shown to have impact on the ability to predict λ derivatives at involved in the alchemical transformation into two regions: the
nonsimulated points in extended TI methods.93 softcore region, and the common core region. The common
Further, as will be demonstrated below, a promising new core atoms are those that transform from a “real atom” in the
form of the effective interaction distance with separation- initial state to another real atom in the final state, and in
shifted scaling is given as intermediate λ states, interact with other (nonsoftcore) atoms
via normal Lennard-Jones (LJ) and Coulombic electrostatic
rijX(λ ; α X ) = [rijn + α X W (rij)S2(λ)σijn]1/ n (15) (Coul) interactions. The softcore atoms, on the other hand,
where X generically represents either LJ or Coul, α is the
X are those selected to interact with other atoms (including the
corresponding unitless parameter, and the weight function of common core atoms) via a softcore potential90 in intermediate
the softcore potential W(rij) is designed to smoothly return to λ states. Often the atoms of the softcore region are transformed
the normal rij value by the end of the cutoff from “real atoms” in the initial state to “dummy atoms” in the

ji rij − R cut, i zyz


final state. A discussion of the requirements that the dummy
W (rij) ≡ 1 − S2jjjj zz
j R cut, f − R cut, i zz
atoms reproduce the ensemble and potential of mean force of

k {
the real state has been discussed extensively by Boresch and
(16) Karplus95,96 and Roux and co-workers.84,97
5599 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

The two most commonly applied procedures for alchemical perturbation (e.g., a small ligand change that induces a large
transformations are referred to as “stepwise” and “concerted” protein conformational change or alters the ligand conforma-
protocols.83,98 For the stepwise protocol, also referred to as tional preference). While there are no procedures to our
”split”, “multi-step” or “decoupled” procedures,56 the trans- knowledge that can a priori determine when a given
formation is carried out by scaling Coulombic and Lennard- perturbation is too large, there are many reviews of free
Jones interactions separately, where the charges of the dummy energy methods that provide guidance and best practices for
atoms are scaled linearly and LJ interactions are scaled via the dealing with such situations.20,22,25,115 In some cases, where
softcore potentials. In these procedures, all or parts of the large perturbations lead to an exceedingly long thermodynamic
Coulomb and LJ transformations are decoupled, and path, it is recommended to insert intermediate molecules that
performed as separate steps. An example of a 3-step bridge the two molecules of interest, as described in a recent
“decharge-LJ-recharge” protocol would be as follows: First, application to BACE.116 Additionally, it is possible to compute
the atoms in the softcore region (those atoms that will absolute binding free energies (ABFEs), whereby an entire
transform into dummy atoms) are fully decharged. Next, these ligand is transformed to a dummy-state that is noninteracting
decharged atoms undergo a LJ transformation using softcore with its environment. While this process typically involves a
potentials, while at the same time the charges of the much larger “perturbation” and, consequently, more sampling
nonsoftcore atoms are also transformed. Last, the atoms in to achieve a fixed level of precision, in some cases, it may be
the softcore region are recharged to the final state. This complementary or even preferable to the calculation of RBFEs
protocol is generally quite robust, since the softcore LJ alone. ABFE is particularly useful when exploring diverse
transformations occur after the partial charges of the softcore ligands, such as in virtual screening, as described recently by
atoms have been eliminated. One caveat of the conventional Cournia et al.4
stepwise procedure is that, depending on the selection of
atoms in the softcore region, a noninteger charge change can 3. PERFORMANCE
be introduced at intermediate λ states even for alchemical A critical aspect of BFE simulations is the amount of
transformations between molecules having the same net conformational sampling, which directly relates to the
charge. Care should be taken to include net charge corrections convergence and accuracy of the simulations. While longer
and appropriate sampling in these cases (see below). simulations can be achieved with more wallclock time, there
Alternatively, one can use a concerted protocol (also quickly comes a point where impact in drug discovery will be
referred to as “unified,” “single step” transformation). In this limited due to real-time throughput of guiding predictions in
procedure, the softcore LJ and Coulomb terms are in some time-critical projects. Historically, compute power has been
way performed in concert. This procedure might have some dominated by the speed of individual cores. As single core
advantages in terms of throughput performance and ease of use performance stagnated in the past decade, parallel computing
with advanced λ-schedule optimization and enhanced sampling emerged to allow performance scaling to remain. Molecular
schemes, such as λ dynamics,99−102 Hamiltonian replica dynamics (MD) is a problem that is inherently parallelizable,
exchange methods,103−107 adaptive biasing100,108,109 or self- although challenging, as each atom has to compute its energy
adjusted mixture sampling110,111 methods. Concerted alchem- and forces relative to the current state of the system (i.e., all
ical transformations, however, are more sensitive to the other atoms). This makes MD an ideal candidate for graphical
treatment of softcore atoms, and are more susceptible to the processing units (GPUs),117 and since 2012,49,50 MD has been
end point, particle collapse and large-gradient jump problems largely performed on GPUs. Current single GPU performance
discussed earlier. Consequently, it is of tremendous practical offers orders of magnitude increased performance relative to a
interest to work toward more robust and efficient methods to conventional central processing unit (CPU) hardware for most
enable stable concerted alchemical transformations. common protein systems. While more sampling is generally
Related to these issues is the choice of the atoms in the preferred in free energy simulations, it is not always the case
softcore region. There are a number of strategies, methods, and that more sampling affords better results. The disconnect
software tools that have been developed to assist in defining between sampling and accuracy can broadly be attributed to
the optimal sets of transformations for a library of compounds. (1) poor force field (sampling cannot help), (2) local minima,
This is sometimes referred to as a “perturbation map”.3 One where sampling in the local minimum of interest is sufficient to
method of common core/softcore atom selection is based on attain converged free energy results but additional sampling
maximizing the common substructure (i.e., minimizing the opens new regions of phase space, thereby resulting in poorer
number of softcore atoms that are to be transformed).112 apparent convergence, and (3) poor system setup, where a
Alternatively, selection can be based on grouping softcore longer simulation may result in propagation of errors that
atoms into chemical functional groups.84,113,114 Additional increasingly degrade the results over time, such as protein
details can be found in section 5.5. unfolding events.
Alchemical transformations are most reliable when the With the AMBER18 release, a GPU-accelerated thermody-
transformations involve a short thermodynamic path (i.e., namic integration (GTI) method was implemented.53,54,118
minimal perturbation to the free energy landscape). This often The key technical challenge overcome in AMBER18 GTI
translates into perturbing the smallest number of atoms, involved cleverly enabling TI-based calculations without
although the nature of the perturbation (size, polarity, compromising the optimized AMBER energy kernels. This
conformational preferences, etc.) can have a significant impact was accomplished by using a streaming kernel to filter and
on reliability. Consequently, most drug discovery applications separately process alchemical atoms and their interactions.
focus on computation of relative binding free energies Thus, the GTI code has a slight performance dip in
(RBFEs), where a common core is unperturbed. Still, even comparison to standard MD simulations in AMBER but still
with a small number of perturbed atoms, the thermodynamic a tremendous speedup relative to CPU implementations and
path between the states may be long due to the nature of the other GPU codes. For example, a TI calculation on cyclin-
5600 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Figure 1. Performance of AMBER20 for standard MD and thermodynamic integration (TI) on GeForce 1080Ti and 2080Ti graphics cards
compiled on CUDA 9.1 using a Monte Carlo barostat, Langevin thermostat, and 4 fs time step.

dependent kinase 2 (CDK2) with approximately 54 000 atoms the alchemical transformation have derivatives that vanish at
takes approximately 4.5 GPU-hours using a GTX 1080Ti GPU the transformation end points (λ = 0 and 1) and enable
with 24 λ windows split between a complex stage and a smooth adjustment of the λ-dependent terms in the potential.
solvated stage (contains 4500 atoms) using 2 ns simulation The second-order smoothstep softcore potential, SSC(2), with
time per λ, with a 4 fs time step facilitated with hydrogen mass α = 0.2 and β = 50 Å2 has been demonstrated to overcome all
repartitioning (HMR).119 These simulations can also be run at three problems for a broad set of alchemical transformations
a 2 fs time step without HMR with shake, and 1 fs time steps used in the calculation of hydration free energies and RBFEs.
without shake. This same simulation takes 2.5 GPU-hours on Results are examined for edge cases where the original AMBER
the more recent RTX 2080Ti GPUs. As each λ window is softcore potential is observed to failthe SSC(2) smoothstep
independent, these calculations can be run in parallel across softcore potential was demonstrated to remain accurate. The
multiple GPUs with no hit to performance and can be done in SSC(2) potential has been further tested against a broad set of
less than 20 min on a GTX 1080Ti across 24 GPUs, or 12 min
hydration free energy and RBFEs for a commonly used FEP
across 24 RTX 2080Ti GPUs. Figure 1 summarizes the results
validation data set containing 200 ligands and spanning 8
of AMBER20 on three targets of different size with GeForce
1080Ti and 2080Ti graphics cards using standard MD and protein targets.121 The SSC(2) potential has the advantage
GTI. that it can be used in concerted transformations and is better
suited for enhanced sampling methods with more advanced,
adaptive λ scheduling requirements, which is part of our
4. ADVANCES IN AMBER20
ongoing research collaboration and intended to be in
A number of important improvements were introduced in upcoming AMBER releases (see section 6.3).
AMBER20 to facilitate large-scale RBFE and ABFE simu- In AMBER20, the λ-dependence of individual interactions
lations. Specifically, the softcore potential was improved using (e.g., bonded, Coulombic and Lennard-Jones) now can be
a smoothstep function,58 which significantly reduces a number controlled by the user, including both linear and smoothstep
of known issues in previous versions of AMBER (namely the functional forms, and advanced λ-scheduling within the λ
end-point catastrophe, particle collapse, and large gradient interval [0, 1]. This “λ-scheduling” can be applied to individual
jumps in the dU/dλ curve). Additionally, Boresch restraints59 interactions independently and gives the users a very flexible
have been implemented, which can be used in an automated
way to control the mixing scheme of the softcore potentials.
fashion for ABFE simulations with many diverse ligands.
For example, one can utilize a smoothstep function with
Boresch bonded terms95,96,120 were also implemented, which
can be used to control which energy terms are included in the boundaries at [0.0, 0.5] for Coulombic (Coul) interaction and
softcore region. Below is a more detailed description of the a smoothstep function with boundaries at [0.5, 1.0] for
advances in AMBER20. Lennard-Jones (LJ) potential, which will execute a stepwise
4.1. Smoothstep Softcore Potentials. With the newly alchemical transformation with the Coulombic interactions
developed class of smoothstep softcore potentials described in being transformed in the λ interval [0.0, 0.5] and the Lennard-
section 2.1, we were able to demonstrate that, unlike the Jones being transformed in the λ interval [0.5, 1.0]. Similar λ-
conventional softcore potential in previous versions of scheduling features have been reported and implemented in
AMBER, there is a single set of α and β values that can be other simulation packages, such as GROMOS 92 and
utilized for reliable and accurate simulations across a wide NAMD.84,122 Application of these λ scheduling features in
range of diverse molecular systems.58 The key characteristic of combination with the new smoothstep softcore potentials are
the smoothstep softcore function is that the weights used in discussed in more detail in Future Work (section 6) below.
5601 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

4.2. Pose Restraints for ABFE. Accurate predictions of treatment of these terms will not cause significant deviations of
absolute binding free energies of small organic molecules from the calculated free energy differences, theoretically it should be
MD simulations offer significant value in drug discovery and treated more rigorously when applicable.
design. In particular, ABFE (as opposed to RBFE) is not 4.3.1. Bonded Terms between the Common and Softcore
restricted to perturbations on a common core and is thus Regions. A key advantage of RBFE simulations is that the
amenable for use in virtual screening,4 selectivity screening,123 ligand scaffold (common core) is always present and interacts
and core hopping.124 However, this added flexibility also with the receptor binding site, thus obviating (or at least
introduces new challenges and uncertainties, which may greatly reducing) the need for orientational restraints as with
explain why ABFE has seen minimal use in actual drug ABFE. However, a similar issue is encountered when chemical
discovery projects (in addition to the additional sampling groups extending off of the common core are created or
requirements). annihilated (transforming from or into “dummy atoms”)
There are multiple ways to realize a valid thermodynamic similar to the ligand drift problem in ABFE, the chemical group
cycle that is compatible with the aims of ABFE, but most must be tethered to the common core. While this may seem
schemes employ a set of restraints to restrict the ligand to readily accomplished by retaining bonded terms between the
remain near the binding site.59,125−128 The necessary (or disappearing group (softcore region) and the common core,
allowable) extent of this restriction is rather dependent on the care should be taken that these retained bonded terms obey
system. For example, some strategies employ loose “flat- certain constraints and conditions. These conditions require
bottom” restraints that only restrict the ligand center of mass the ensembles generated in the state with “dummy atoms” in
motion,128 while others accommodate more elaborate the softcore region that have “disappeared” to reproduce the
restraints on the ligand conformation, translation, and same potential of mean force on the real atoms as the real
rotation,125 possibly in multiple unique poses.126,127 system without the dummy atoms. Extensive discussion of
In our drug discovery efforts, we have worked under the these conditions, including “rules” of how to select retained
assumption that the unrestrained ligand is fairly strongly bound bonded terms between the softcore and common core regions,
and therefore can be assumed to occupy a single pose with are provided by Boresch95,96,120 and Roux and co-work-
high occupancy. This assumption is generally safe because we ers.84,132
are primarily interested in the identification of tight binding AMBER20 now includes a general facility for selectively
species and we can tolerate bias or inaccuracy in weakly bound retaining bonded interactions with noninteracting softcore
species−other use cases may have different needs, such as atoms. The nonretained terms are then decoupled using the
identifying weakly bound fragments129 or molecules that usual scheduling strategies for nonbonded interactions.
stabilize intrinsically disordered proteins.130 Working under Importantly, the simulation efficiency can be highly affected
the assumption of a single well-defined binding mode permits by which terms are selected for retention and which are not
the rather simple restraint framework described by Boresch et poorly chosen terms can lead to high variability or even
al.,59 which only places harmonic translational and rotational nonergodicity. Unfortunately, there does not appear to be a
restraints on the ligand in a local coordinate frame via one general solution to this issue. Theoretically rigorous results can
distance, two angles, and three dihedrals (see section 5.7 for only be obtained by retaining terms that involve not more than
further details). The implementation in AMBER20 also three atoms in the common core. However, for efficiency, the
permits these restraints to be included in the overall alchemical retained terms must also keep the softcore atoms in or near a
transformation such that the component of the free energy physically relevant geometry and not hinder rotameric
arising from the restraints in the bound state can be computed transitions.
in the same way as other force field terms. The absence of 4.3.2. Nonbonded Terms between the Common and
restraints when simulating the unbound state can be accounted Softcore Regions. The nonbonded terms between the
for using a simple analytic formula in the limit that the common region and the softcore regions should be always
harmonic restraints are relatively stiff.59,131 The implementa- scaled with the alchemical variable λ. Nevertheless, the 1−4
tion in AMBER20 has been validated on virtual screens with nonbonded terms across the softcore boundary were not
thousands of diverse compounds run through ABFE. The treated properly in some previous versions of AMBER.98 A fix
results demonstrate the usefulness of the approach and confirm has been implemented and verified in AMBER20, resulting in
that the calculated binding free energy is independent of the much improved relative hydration free energies of 9 bench-
details of the restraints, as determined by comparing results mark molecules using the concerted transformation protocol.55
from multiple runs with randomized restraint combinations.4 4.3.3. Interactions within the Softcore Region. In
4.3. Handling Interactions Involving Softcore Atoms AMBER20, both bonded and nonbonded interaction terms
for RBFE. In AMBER, the TI region is defined as the part of within the softcore regions can be either scaled with λ or not.
the system to undergo alchemical transformation from one end Either implementation is theoretically correct, provided that
state to another one; hence there are two regions representing the conformational sampling of the softcore regions at the end
two end states. There are two parts for each TI region: the point states are sufficient. Users can control how the
common region to both TI regions and the softcore region interactions within the softcore regions are treated. For
unique to each TI region. The softcore potential is utilized to recommended guidelines, refer to recent validation studies of
treat the interactions between the softcore regions and other free energy methods in AMBER.55
parts of the system. Atoms which are growing or disappearing 4.4. RBFE Accuracy on Drug Targets. The GPU-
during the alchemical transformation must be included in the accelerated free energy simulation methods in AMBER have
softcore region. An atom included in a softcore region is been validated in an Application Note appearing in the current
defined as a softcore atom. Previous versions of AMBER have special issue.55 Although the methods discussed here are quite
not allowed for detailed control over the interactions between new, they have already seen a number of applications,56−58
the common and softcore regions. While most of the time particularly against a well-studied data set that includes ∼200
5602 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Table 1. Mean Unsigned Error (MUE) in kcal/mol for 8 Protein Targetsa


method BACE CDK2 JNK1 MCL1 P38 PTP1B thrombin TYK2 mean
AMBER20 (this work) 0.88 0.93 0.73 1.30 0.79 0.79 0.39 0.93 0.84
FEP+ (Wang et al.121) 0.84 0.91 0.78 1.16 0.80 0.89 0.76 0.75 0.86
Cresset (Kuhn et al.116) 0.95 0.95 0.78 1.36 1.18 1.04 0.23 0.71 0.90
PMX (Gapsys et al.34) 0.84 0.68 0.80 1.23 0.77 0.90 0.77 1.01 0.88
a
Results are presented for the work here and other recent relative binding free energy (RBFE) publications on the same data set.121 The AMBER20
results are an average of 10 independent runs using the smoothstep softcore SSC(2)58 and the GAFF2 force field. Results for each individual
perturbation, statistical errors across the 10 runs, correlation coefficients, and null model results can be found in Supporting Information.

ligand mutations spanning 8 protein targets (Bace, CDK2, of chemical structures similar to those in the training set. Yet in
Jnk1, MCL1, p38, PTP1B, thrombin, and Tyk2).121 This data real-world applications, it is not uncommon to encounter
set serves as a tractable benchmark set for RBFE calculations molecules that these generalized force fields have not been
because there are no known significant conformational changes tuned for and thus do not yield accurate results.
or other challenging scenarios such as ambiguities in tautomer/ With advances in computational hardware and GPU-enabled
ionization states or buried waters. Results using the protocols quantum chemistry software, such as TeraChem,141 it has
described in this work with the GAFF2 force field133 and become feasible to parametrize hundreds of small molecules
TIP3P14 water model are on par with other recent RBFE as commonly required each week in drug discovery
publications on the same data set, as seen in Table 1, and stand programsindividually based on detailed quantum chemistry
to improve with the forthcoming release of new MM and QM calculations: a complete force field parametrization of a small
force fields for ligand-binding predictions. molecule may be performed in approximately 1 GPU-h. Such
bespoke molecular force fields help to avoid gross para-
5. PRACTICAL CONSIDERATIONS metrization errors and often lead to improved free energy
The aforementioned topics (force field, sampling, and results.
alchemical parameters) are critical to achieving robust and One common type of error in force field parametrization is
reliable BFE results, yet many publications under-emphasize in the torsional parameters that determine the potential
the importance of other considerations. Items such as system energies at different torsional angles of a rotatable bond in the
preparation, docking, and confidence estimates can be just as molecule. For example, a biphenyl system with substitutions at
important as force field, sampling, and alchemical parameters the ortho-, meta-, and para-positions can substantially perturb
for obtaining robust BFE predictions. In some cases, it is the torsional energy profile around the bond connecting the
possible to define best practices and even automate the process phenyl rings, and the perturbation depends strongly on the
to some extent. However, other instances may require expert position and the moiety of the substitution. As a demon-
decisions on a case by case basis. Below, we highlight practical stration of the benefit of refitting the torsional parameters to
considerations found to be most important in prospective the quantum chemistry calculations of the torsional energy
applications of free energy simulations in AMBER. When profile, Figure 2 shows the hydration free energies computed
possible, we provide guidance for best practices, yet in other for a set of alcohol molecules, comparing the generalized
cases we simply highlight the challenges and leave it to the GAFF2 and the bespoke force field in which the torsional
reader to further explore these areas. More details about parameters are refit; the latter significantly improves the
practical considerations in alchemical binding free energy agreement between the predicted hydration free energies and
simulations can be found in a Perspective by Cournia et al.3 the experimental measurements.
5.1. Force Field. A force field is used to model the Generalized force fields sometimes fail to capture the
interactions between atoms in the molecular system of interest. electrostatic potential around the molecule. A well-known
The force field allows determination of potential energy as a example is the σ-hole in aromatic halogens,143 in which a
function of configuration and is used along with the kinetic “hole” of positive potential along the carbon−halogen bond
energy to calculate the Hamiltonian for molecular dynamics cannot be reproduced by the common atom-centered charges.
simulations and binding free energy calculations. The accuracy Inclusion of off-atom-center charges, or virtual sites, is an
of the force field may limit that of the binding free energy effective approach to resolving such discrepancies. Such virtual
predictions, but not all inaccuracies in the predictions should sites for select functional groups are now finding their way into
be blamed on force field problems: poor quality in the initial generalized force fields, but transferable parameters take
structure, erroneous protonation or tautomeric states, and onerous efforts to derive. In contrast, they are straightforward
inadequate sampling should first be inspected. One lesson that to parametrize for bespoke molecular force fields. Figure 3
we learned from the past decade is that the force field accuracy shows how their inclusion in our bespoke force fields for
can be substantially improved by simply avoiding the obvious aromatic halogens and aromatic nitrogens improves the fit for
mistakes in the parametrization of ligand molecules. the electrostatic potential computed by DFT and for the
Generalized force field models, such as GAFF,133,134 predicted hydration free energy.
CGenFF,135,136 and OPLS,137−140 represent efforts to provide Parametrizing bespoke molecular force field is associated
force field parameters for any molecule at a small computa- with a smaller computational cost than the binding free energy
tional cost, using look-up tables for parameters predetermined calculations, yet they may significantly improve the predictive
for different bond, angle, and torsion types. Such models and accuracy. A number of automated tools for parametrizing small
the associated software tools are good starting points for molecules, including several in the public domain, have been
parametrizing molecules for binding free energy calculations. developed, such as CGenFF,135,136 GAAMP (https://gaamp.
They provide reasonable parameters for molecules consisting lcrc.anl.gov/index.html), FFTK,144 and the tools developed by
5603 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

resolution structures, the following guidelines also apply to


structures obtained by other means, such as NMR or Cryo-
EM. In general, structures must be prepared to add hydrogen
atoms, optimize hydrogen bond networks, remove atomic
clashes, in some cases insert regions missing from the
refinement, such as disordered loops, and perform other
operations that are not part of the experimental structure
refinement process. While the prerequisite for good system
preparation is generally accepted in the field, the specific steps
are not well-defined. Fortunately, many of the considerations
for BFE simulations are similar to those for other structure-
based approaches, like docking, and have been described in
detail in other works.146−148 Nonetheless, docking for BFE
simulations may entail additional considerations beyond
standard docking calculations for pose prediction or virtual
screening, as described in the Docking section below.
It should be noted that protein preparation can have a
significant impact on the quality of results and can introduce
artificial biases, especially in the case of retrospective validation
studies, as has been demonstrated for docking studies149 and
Figure 2. Hydration free energies of molecules containing alcohol likely has similar issues in BFE calculations. After protein
functional groups. The bespoke force field uses the same parameters preparation, including the following steps, it is recommended
for bond stretch, bond angle, and van der Waals interactions as to manually inspect the structure, run protein analysis
GAFF2. It derives the partial charges by fitting to the electrostatic programs (e.g., PROCHECK,150 WHATCHECK,151 MolPro-
potential computed using restricted Hartree−Fock with the 6-31G* bity,152 and SurVol153), and perform MD simulations148,154 to
basis set.142 The torsional parameters are then optimized to fit the ensure stability of the system before running computationally
potential energy surface computed by B3LYP/6-31G** for costly BFE simulations. For example, multiple short simu-
conformations generated at different torsional values of the rotatable
bonds. Refitting the torsional parameters improves the agreement
lations on different protein preparation states can reveal
between the predicted and experimental hydration free energies. problematic cases where there are large structural fluctuations,
degradation in secondary structural elements, or loss of key
binding site interactions.155 Additionally, such MD simulations
can be used to improve the overall protein structure.156−158
Details of the protein preparation capabilities and options for
AMBER20 can be found in the user manual (https://
ambermd.org/doc12/Amber20.pdf).
5.2.1. Hydrogen Bonds. Hydrogen atoms are not typically
present in experimentally determined structures (other than
those at resolution better than ∼1.0 Å) and, therefore, need to
be added computationally. The initial coordinates of hydrogen
atoms are inconsequential, as long as proper valences are
satisfied and subsequent sampling is performed. The
protonation state of titratable residues should be determined
for the pH of interest (typically this involves His, Asp, and Glu,
although this could be expanded to Lys and Cys). Additionally,
the two His tautomers should be sampled (proton on the Nδ,
Nϵ, or both). Programs, such as WHATIF,159 can be used for
this step, which can be augmented with pKa predictions
programs such as PROPKA.160
Once hydrogen atoms are added, the H-bond network
should be optimized by sampling 180° flips of the terminal chi
angle for Asn, Gln, and His, which significantly changes the
Figure 3. Hydration free energies of molecules containing aromatic
spatial H-bonding capabilities of the side chains, but does not
halogens and aromatic nitrogens. Including virtual sites on the
halogen and nitrogen atoms improves fitting to the electrostatic appreciably change the fit to the electron density. In addition,
potential calculated by quantum chemistry and the agreement hydrogens on hydroxyls and thiols should be sampled to
between the predicted and experimental hydration free energies. optimize the H-bond network. After the above steps are
completed, it is recommended to perform an analysis of the
structure to ensure a viable state has been generated.
the OpenFF Initiative.145 We believe that automated programs Automated programs, such as WHATIF, PROCHECK, and
for bespoke parametrization will become the default option in MolProbity,152 are useful for the analysis, although manually
future applications of binding free energy calculations. inspecting changes in the atomic fit to electron density with
5.2. Protein Preparation. Protein structures must be programs such as Coot161 is strongly recommended. If it is
prepared prior to running MD free energy simulations. While unclear which states are correct, it is recommended to perform
X-ray structures are the most common source of atomic- modest MD simulations (on the order of 100 ns) and
5604 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

structural analysis to determine the stability of the structures. generate the correct ligand state (ionization, tautomers,
The aforementioned protocol for H-bond optimization is stereochemistry, etc.). Incorrect states could result in false
necessary because sampling times required to overcome negatives (e.g., where a favorable H-bond cannot be made) or
sampling barriers needed to rectify an incorrect initial state false positives (e.g., where an incorrect H-bond is made). In
could be prohibitively long, and local denaturing of the protein addition to calculating reasonable ligand states, it is ideal to
can occur in the process, which would take even more predict an energetic penalty associated with each state to
sampling time to correct, if it could be corrected at all. account for the energetic cost it takes to generate each state in
Nonetheless, if sufficient sampling time is achievable or solution. This energetic penalty should then be added to a
enhanced sampling approaches are available to solve this computed free energy to get a final binding prediction.
problem, then simply adding hydrogen atoms as needed to Significantly, ionization and tautomerization energies are
satisfy valencies for the pH of interest may be sufficient. absent from traditional molecular mechanics force fields, since
5.2.2. Waters. The treatment of explicit water molecules can there are no terms for bond making/breaking. Some empirical
influence docking accuracy and enrichment results, as has been corrections are possible via methods like constant-pH MD, but
extensively demonstrated by other works.162−167 Indeed, water these may not be cost-effective for the large number of ligands
is the source of the hydrophobic effect168subtle changes in studied in a drug design setting. Although largely speculative at
waters can impact ligand binding energetics169 and even this point, we suppose that advances rooted in quantum
reverse the thermodynamic signature of ligands binding to a mechanics or machine learning will be necessary for progress in
protein.170 The treatment of water molecules in BFE this area.
calculations should be considered during the initial system 5.4. Docking. Docking is an essential part of binding free
setup and during the alchemical simulations themselves. The energy simulations, while in theory the binding free energy
determination of which waters to retain during the setup of results should be independent of input pose, that assumes
BFE calculations is often unclear, primarily because not all sufficient sampling to explore all accessible poses with MD,
waters are present in crystal structures and even when there are which would be prohibitively computationally expensive. As
many waters, the free energy of a water molecule is not directly such, it is critical to obtain a reasonable initial pose and in cases
related to the crystallographic occupancy. Furthermore, the where the bet pose is ambiguous, then multiple poses should
crystal structure being used might not correspond to the be pursued. The nature of the docking problem is different
specific ligand or ligand series being explored and therefore the between RBFE and ABFE (and different from docking as a
water pattern may be inaccurate. final calculation): For RBFE calculations a reference pose is
In most cases, it is recommended that all crystallographic typically known and can be used to constrain the docking,
waters are retained for the system setup, although the electron whereas with ABFE, there is typically no reference molecule
density should be inspected to ensure that there is confidence and therefore unconstrained docking is required.
in the water presence. Even in cases where there are many 5.4.1. Docking for ABFE. Docking for an ABFE calculation
water molecules in the experimentally determined structure, it can be quite challenging, especially if one does not have any
is generally necessary to add additional water molecules before prior knowledge that can be employed when evaluating docked
MD simulations. Programs such as 3D-RISM,171 GCMC,172 poses. In addition to sampling the ligand conformation/
JAWS,173 WaterMap,174 and other approaches175−178 can be orientation, the receptor might undergo induced-fit.184 As
useful for this step, since it is generally a fast calculation relative such, to generate a reasonable starting pose, it may be
to the BFE simulations. Importantly, the method to place necessary to induce the site.185 Numerous approaches have
water should be capable of solvating buried pockets that are been developed to address this issue ranging from employing a
challenging to sample during the simulation time of a BFE run softened nonbonded potential to alleviate the penalty of
due to large energetic barrier for entering/exiting the binding protein−ligand clashes, followed by a robust protocol that
pocket. incorporates sampling different side-chain rotamers of the
Once waters have been placed for the initial system setup, it receptor and redocking the compound to multiple receptors
still might be necessary to explicitly sample waters (beyond (ensemble docking).186−188 Cases where large-scale backbone
MD sampling) during the alchemical simulations. This is motions are required to generate the correct binding pose still
especially important when dealing with regions of the binding remains a challenge for the field even when incorporating
site that are occluded from exchange with bulk solvent, such as enhanced sampling methods.
fully buried binding sites or subpockets that are blocked from Fortunately, it has been shown that combining docking with
bulk solvent exchange due to parts of the ligand that are not molecular dynamics (MD) to further refine the pose can be
being perturbed (in the case of RBFE). A combined MC/MD beneficial.186,189 One attractive feature of coupling MD with
method has recently been described and is available in docking is that the receptor and ligand are sampled
AMBER20,179,180 which allows water to equilibrate between simultaneously in the presence of explicit water, allowing for
bulk and buried cavities. This method allows for partial water the receptor to become induced in a physically meaningful
densities during the BFE calculation by allowing the locations way. For example, one might generate N docked poses and run
and occupancy of buried sites to vary with λ in the course of a MD simulation in replicate varying the random seed to assess
alchemical calculations. Other approaches, such as Grand pose stability. As an additional example, to prioritize poses for
Canonical Monte Carlo (GCMC), have been proposed to more rigorous free energy simulations (e.g., ABFE), multiple
address buried water sampling in the context of alchemical free short MD simulations can be performed and the RMSD from
energy calculations.181−183 the docked pose can be utilized as a metric to assess pose
5.3. Ligand Preparation. All-atom three-dimensional stability. Generally, low RMSDs are attributed to the ligand
(3D) ligands are required for RBFE and ABFE simulations. making energetically favorable interactions within a targeted
As such, a critical issue to investigate before embarking on site.190−192 It is important not only to consider the averaged
computationally expensive free energy simulations is to RMSD value but also evaluate the RMSD versus time as a
5605 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

ligand could have adopted a stable conformation that is possibly lead to erroneous results. Therefore, it may be
substantially different from (e.g., larger than 2.5 Å RMSD) advantageous to make a series of smaller perturbations or
from its docked pose and remained there for the duration of instead run ABFE.
the simulation. Excluding atoms of the ligand that are very 5.5. Atom Mapping. For RBFE calculations, a critical step
solvent accessible from the RMSD calculation may also be is determining the relationship between atoms of the reference
required. While MD refinement offers advantages over and perturbed structure such that the common atoms
traditional docking (protein flexibility, explicit water, etc.), it (“mapped” atoms) are linearly interpolated with λ and the
does not contain a rigorous treatment of the binding unmapped atoms are treated with a softcore functional form to
thermodynamics (e.g., no unbound ligand calculation) and, allow for their insertion or deletion. Theoretically, the best
therefore, should not be used as a final scoring estimator. atom mapping scheme is one that minimizes the thermody-
5.4.2. Docking for RBFE. Docking for a RBFE calculation is namic path between the two molecules, however there are
theoretically simpler when compared to ABFE because of the many factors to consider in practice such as atom type, bond
conserved binding mode of most congeneric molecules. The order, ring membership, chirality, and binding conformation.
technical challenges of constraining ligand atoms, especially Generally, topological similarity is assessed computationally
when done in a high-throughput automated fashion for drug using a maximum common substructure (MCS) algorithm,
discovery, presents challenges. Core-constrained docking is which aims to maximize the number of mapped atoms between
generally the most effective way to generate poses, which two molecules from a congeneric series.112 Many MCS
requires a definition of the core atoms (either manually or algorithms require specification of atom type, bond order,
based on a maximum common substructure, MCS). As seen in and ring membership considerations to define the maximum
Figure 4, core constrained poses produce much cleaner atomic overlap between molecules. Assuming perfect geo-
metric complementarity between molecules, mapping of atoms
that differ in type should maximize phase space overlap
between states by decoupling as few atoms as possible. In
practice, many times this is not the case due to conformational
differences between molecules that can lead to convergence
issues and large errors between neighboring λ values along a
thermodynamic path.
In some cases the mappings are clear, such as the
substitution of an aromatic para-fluorine for a para-methoxy,
where the fluorine and methoxy are the only unmapped atoms.
However, in other cases the mapping can be less clear (or even
ambiguous), such as bulky ortho/meta substitutions to a
similarly substituted phenyl ring. In addition, mapping of
Figure 4. Example of ligand poses (purple carbons) docked (A) not atoms with different bond orders can be problematic as atomic
using core-restraints and (B) using core-restraints. Employing core- torsional preferences change between atomic environments
constraints ensures that the binding mode is conserved between all of and such mappings should be avoided whenever possible.
ligands in a congeneric series. The mapping of atoms within ring systems requires special
consideration and introduces a potential source of error
alignments, which facilitate the atom mapping and stability propagation if mapped inappropriately, such as allowing for
of the perturbations in RBFE simulations. Open source ring breaking/forming.3 As such, most atom mapping
docking programs, such as rDOCK,193 support core con- protocols avoid ring breaking when possible. Previous
straints, although many commercial solutions are also available. literature has demonstrated that bonded term contributions
Common atoms shared between the lead and the candidate from dummy atoms should cancel in RBFE simulations of the
ligand are constrained while the degrees of freedom of other bound and unbound states.194 Yet, if the conformational
atoms are sampled during the docking calculation. It should be ensemble of the molecule is significantly affected by the
noted that using an MCS is not always optimal, as the 2D remaining core atoms, as is the case for members of a ring, the
mapping does not ensure the correct 3D characteristics (see cancellation of error is no longer valid. It is for this reason that
section 5.5). large errors are often observed in RBFE calculations involving
Steric clashes present another challenge when docking ring breaking/forming, as the free energies are only collected
ligands that do not fit into the rigid receptor and therefore from a restricted and inaccurate conformational ensemble. To
should be handled carefully. As a consequence of the core address these conformational restrictions, recently a “soft
constraints, significant protein−ligand clashes might be bond” potential has been added to the softcore functional form
unavoidable while still satisfying the core positional con- and suggests that improvements to core hopping trans-
straints. Generally, these types of issues can be resolved formations can be made. Still, more work is necessary to
through an energy minimization or short restrained MD demonstrate its utility across broad ring breaking/forming
equilibration only allowing key atoms to move or redocking scenarios.124
the compound with a reduced number of constrained atoms. When performing manual RBFE calculations it is often
Another important item to note is that if the protein residues straightforward to determine the correct mapping between
involved in the steric clash have to move significantly in order ligands, especially if the binding poses are well determined.
to alleviate the clash during the minimization or MD However, manual mapping is tedious, time-consuming, and
equilibration prior to running the RBFE calculation, then the error prone, especially when processing hundreds of molecules
energy required to adopt this new protein conformation will on a weekly basis. As such, manual mapping is impractical in
not be accounted for during the RBFE simulation, which could drug discovery applications, where hundreds of molecules will
5606 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

be explored on a weekly basis. There are programs that outside the scope of what constitutes a best practice in
perform automatic mapping, such as LOMAP,195 which AMBER.
operates on 2D structures (typically using a 2D method such In general, the aim is to have enough, but not too many, λ
as maximum common substructure). It should be noted that in windows in order to obtain sufficient accuracy at the lowest
some cases the mapping is ambiguous based solely on the 2D possible cost. From the perspective of TI, this means sampling
information, such as ortho substitutions to a similarly the integrand more densely in regions where the curvature
substituted phenyl ring, as seen in Figure 5A. In this example, changes rapidly and possibly spacing the values so as to abide
by a numerical quadrature rule. In the context of FEP-like
protocols (e.g., MBAR), this means choosing neighboring
sampling distributions to achieve minimum variance behavior
with respect to a set of Monte Carlo moves (see, for example,
the overlap metric introduced by Bennett7). Optimizing
according to either of these schemes requires a priori
information. Lacking this advantage, the most straightforward
approach is to use equally spaced values and a generic
quadrature scheme such as the trapezoidal rule. In this case, TI
essentially reduces to a piece-wise linear approximation of the
integrand and is roughly equivalent to approximating the
Figure 5. Potential mappings using (A) 2D or (B) 3D information neighboring sampling distributions as Gaussians.198 Using this
related to differing ortho substitutions on a terminal phenyl ring.
approximation can still provide good results for calculations
where the lambda spacing is small enough to capture the
the preferred conformation of the ortho methoxy substituted essence of the variations in the integrand with lambda
molecule is forming an intermolecular hydrogen bond between throughout the full λ = [0,1] trajectory.
the amide nitrogen and the methoxy oxygen. However, the Interestingly, for simple transformations that only add,
bulky chloro substitution prefers an alternate conformation, remove, or change a charge distribution in a limited volume
and using only 2D topological information the oxygen is the TI integrand tends to be approximately linear or perhaps
oriented toward the same substitution vector. Mapping issues cubic199 (Figure 6, top row). This follows from a Born-like
such as this will often lead to incorrect results due to the
unreasonably long thermodynamic path between the states (a
high-energy conformational transition would be required to
interconvert between the two states). Fortunately, mapping
based on 3D poses would yield the correct result, as seen in
Figure 5B. As such, it is highly recommended to perform atom
mapping using accurate 3D poses when possible.
To our knowledge, there is currently no widely adopted tool
for atom mapping based on 3D poses. A sensible approach is
to align the two moleculessay, A and Bin their binding
poses and preferentially map each atom in A to an atom in B
that is spatially close. This may be formulated as a discrete
optimization problem: one can define a quantitative measure
of spatial overlap between each pair of atoms, and then find the
graph-isomorphic mapping that maximizes the total overlap of
the mapped atom pairs. We anticipate that such 3D atom
mapping tools will eventually replace the current 2D atom Figure 6. Representative TI integrands for different λ schedules used
mapping tools. in binding free energy calculations. The shape of the curve is highly
5.6. λ Schedule. Alchemical BFE simulations are dependent on the use of a single step versus multistep protocol. For
performed by defining a transformation (e.g., between two simple charge changing transformations the curves may have a near
different bound ligands). The extent of the transformation is linear character (fit dashed lines). The number of atoms being
defined in terms of a coupling parameter, usually denoted as a transformed (e.g., RBFE versus ABFE) also has a strong effect (left
value λ between zero and one. As such, the intermediate steps and right columns, respectively). The specific transformations are the
involved in the perturbation are often referred to as “λ values” Tyk2 ejm-47 (ABFE) and p382v → 3fhm (RBFE) perturbations from
or “λ windows”. We use the term “λ schedule” to refer to (1) the Wang et al. data set.121 The specific coupling protocols are as
the number and placement of the specific values included in outlined in section 2.3. Note that discharge and recharge use opposite
conventions for direction (λ = 1 is fully coupled for discharge and λ =
the simulations and (2) the functional form of the coupling in
0 is fully coupled for recharge).
terms of λ (e.g., use of a softcore potential). One would like to
choose the λ schedule in an optimal way. The answer to this
problem is strongly dependent on the methodology being used linear response model where a charge or point dipole is
and decisions to be made by the practitioner. In what follows introduced into a spherical cavity in a homogeneous dielectric
we will assume that a conventional alchemical approach is environment.198 In this case, the integrand may be extremely
being used and that multiple λ values will be chosen with linear and well-behaved and only a few λ values could be
simulations carried out at each value with λ held fixed. Other required (only two points are needed to accurately integrate a
simulation approaches may permit variation in λ either as a line). Any additional curvature tends to occur near the end
discrete196 or continuous197 quantity, but these are currently points, although the general prescription is to focus λ values
5607 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

where the slope is largest (usually at intermediate values).198 such as fused ring systems or the central core of the ligand.
Strong deviation from linearity could simply indicate that the One should be careful not to select multiple atoms within a
charge change is occurring over a large extent and the linear rotatable torsion, otherwise one might “lock-in” the config-
response character could be breaking down. An approach that uration and induce nonergodicity. Candidate coordinates can
still samples intermediate values is thus recommended in then be created from both groups and the six terms (one
general, rather than assuming linearity. We have found that as distance, two angles, and three dihedrals) can be tracked over
few as five λ values can give reliable results for small the trajectory.
perturbations (assuming other sources of error are considered, The ideal coordinates are unimodally (perhaps Gaussian)
as discussed below). Indeed, it has been shown that in some distributed and have low variance. In AMBER it is also useful
scenarios a one-step λ schedule can be sufficient to achieve to avoid overly long distances (<30 Å, say), noncolinear angles
accurate binding free energy predictions,200−202 although such (far from 0° or 180°), and dihedrals that are from the periodic
cases of very small perturbations provide insufficient coverage boundary (i.e., not near ±180°). Any combination of atom or
of chemical space to have high impact for most drug discovery point selections that fit these criteria should constitute a
applications. reasonable set of restraints.
More complicated transformations, especially those that 5.8. Periodicity and Charge Corrections. The use of
introduce short-range repulsive interactions, are by far the periodic boundary conditions (PBCs) has long been known to
most difficult to handle (Figure 6, bottom row). The Lennard- introduce subtle artifacts in MD simulations. However, the
Jones potential is widely understood to introduce large alternative of no boundary conditions is generally not preferred
variance and/or singularities which were originally overcome because it would induce artifacts of a different nature (and
by introducing many finely spaced λ values near (but not at) larger magnitude). While some methodologies may avoid
the end point. This approach was supplanted by the different issues, the general philosophy in AMBER is that the
introduction of softcore potentials which tend to redistribute, PME scheme for PBCs is the best compromise between
but not entirely eliminate, the higher variance across the more accuracy and efficiency.44,45 In current AMBER20 implemen-
intermediate points. The variance of the result is generally tation, the thermodynamic derivative of the PME reciprocal
proportional to the size of the chemical group being part is calculated in the linear way (eq: UL, i.e., the PME
introduced. For example, one should expect higher uncertainty reciprocal calculations are only done on the end states and the
from an ABFE calculation of a drug-like molecule compared to dU
is the difference of the PME reciprocal energies of the
removing/inserting a small chemical moiety onto a ligand dλ
scaffold. Unfortunately, introducing many additional λ values end states. This approach is simple but requires two PME
does not seem to mitigate this issue beyond a pointone only calculations per MD iteration.
needs enough values to capture the shape of the integrand. Nonetheless, specific care must be taken in alchemical
This is because the insertion of uncharged, repulsive simulations and the issues are unusually pronounced for
interactions generally leads to configurations of low physical alchemical transformations that do not conserve the net
relevance and so the variance is inherently a sampling issue. charge. Rocklin et al.203 recently catalogued these issues with
When time and resources are available, then more λ values can an eye toward alchemical ABFE calculations and proposed
be added to enhance overlap between adjacent windows and specific approximation schemes for correcting them. A follow
thereby improve reliability of results.83,202 up work by Chen et al.204 also examined the Rocklin
5.7. ABFE Pose Restraints. The purpose of pose restraints corrections in the case of RBFE, along with other possible
in ABFE is to hold the ligand in the binding pocket when the solutions. These issues are briefly described here.
interactions are scaled to extremely small values (or zero). At The most significant artifacts due to PBCs arise from the
the same time, the restraints can also be interpreted as defining mean electrostatic potential definition imposed by PME. An
the bound microscopic state.125,128 Therefore, a reasonable extensive review has been supplied by Lin et al.205 This term is
criterion is to require that the restraints impose an orientation normally innocuous, as it amounts to a simple shift in the zero
that is similar to the fully interacting ligand. Put another way, of energy and does not affect forces. However, alchemical
the restraints should approximate the potential of mean force simulations are extremely sensitive to arbitrary shifts in the
of the physical system. A similar perspective has been offered zero of energy as this effectively shifts the binding free energy
in several theoretical frameworks125−128 and more elaborate of a ligand based purely on the system charge. Following Lin et
choices than the one described here could also be employed al., the artifact can be considered as the work required to move
using AMBER. In general, the assumptions here hold for a charged species across the boundary between a solvent and
relatively strongly bound compounds and different restraint vacuumclearly no such boundary exists under PBCs and so
protocols may work better in other regimes. this contribution is missing. Rocklin et al. refer to this shift as
A procedure that we have found effective is to first perform a arising from a residual integrated potential (RIP), as it
relatively short (∼5 ns) nonalchemical simulation of the corresponds to the energy “left over” when a charge species
ligand-protein complex. We then search for relatively sta- is removed from a PBC box. The RIP energy is proportional to
tionary points on both the ligand and receptor that can be used the integral of the mean potential over the whole volume. Lin
to define both their relative orientation as well as the internal et al. describe how a correction can be approximated a
conformation of the ligand, as described by Kim et al.131 For posteriori from the spatial electrostatic potential averaged over
proteins, we look for low-mobility, buried residues by a trajectory and then integrated over the box. Fortuitously,
searching for minimal solvent-exposed surface area over the along with others,204 we find that the approximate Poisson−
course of the trajectorythe α-carbons of these residues are Boltzmann scheme proposed by Rocklin et al. provides
then considered good candidates, but other choices are reasonable corrections based on only a single structure. In
certainly possible (e.g., the backbone center of mass). For practice we see minimal statistical noise of ∼0.1−0.2 kcal/mol,
ligands, we look for heavy atoms within rigid scaffold motifs which is quite acceptable given the expected accuracy of ABFE.
5608 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Other PBC-based corrections are possible but generally Shirts and Chodera developed an analytic expression for the
much smaller in magnitude than the RIP correction.203 The (large sample size) free energy errors calculated from the
net charge interaction and solvation correction terms are MBAR method, which is more complicated due to MBARs use
generally quite small and only depend on the magnitude of the of coupled equations.9
charge change and box size/shape. Another term unrelated to An alternative approach for calculating the standard errors is
PME but dependent on charge is the discrete solvation bootstrapping. The bootstrap algorithm requires one to
correction, which is meant to compensate for the distortion in generate many estimates of the free energy and then calculate
solvent structure when, for example, water molecules interact the standard deviation of those estimates, which is the standard
with their overly ordered image counterpart. Interestingly, this error of the mean. To generate many estimates of the free
term can be quite large for a given ligand−receptor complex energy, new time series values are artificially created by
(∼10−20 kcal/mol), but it is generally much less sensitive to resampling the observed distribution with replacement. That is,
the differences between the ligand bound and unbound states if the observed distribution contains N data points, then a
and so tends to cancel extensively. Note, some care must be bootstrap distribution containing N points is created by
taken because the correction depends on the specific nature of randomly selecting samples from the observed distribution. In
the solvent model (namely, the charge distribution). the case that the observed distribution contains correlated data,
5.9. Confidence and Error Analysis. As has been amply a block bootstrap algorithm can be used. The block bootstrap
discussed above, there are numerous reasons that a BFE algorithm differs only by grouping the observed data into
prediction may be incorrect. False positive predictions consecutive segments, such that the length of each segment is
(compounds predicted to be good that are not) can be costly at least as long as the autocorrelation time of the data. The
in drug design projects, where synthesizing an inactive bootstrap distributions are then generated by randomly
compound can cost thousands of dollars and weeks of lab selecting blocks from the observed distribution. In general a
work. Therefore, one should leverage the relatively low cost of block bootstrap error estimate will be greater than that from
additional simulations to establish confidence (or lack thereof) the standard algorithm and the ratio of the two estimates can
in a prediction. Here, we suggest several best practices for be a used as a rough estimate of the autocorrelation time.10
appropriately assessing the quality of binding free-energy The second source of error in free energy estimates arises
predictions, as well for building more robust hypotheses from insufficient sampling, such that the observed distribution
around the chemical matter surrounding a target. Other works is not reflective of the true distribution.208 To estimate the
have discussed techniques for confidence assessment and error magnitude of this error, one can repeat the simulations with
analysis in binding free energy calculations,206 including ways different initial conditions and calculate the standard error
to improve free energy predictions by meaningful error across independent trials. This strategy has been called the
estimates.79 Sources of errors can also be attributed to “ensemble average approach”.209−212 It has been suggested that
validation data sets and data set biases, as examined in detail the length of each simulation should be at least 50 times the
elsewhere.207 Below we discuss ways to assess some of the length of the autocorrelation time of the data. Unfortunately,
most common sources of errors in BFE calculations, including autocorrelation times are largely system-dependent and so this
statistical analysis, structural analysis, binding pose uncertainty, is difficult to verify in practice. For example, it was found that
sampling, and force fields. ∂U/∂λ had an autocorrelation time of up to 3 ns in charge-
5.9.1. Statistical Errors. There are two main sources of changing pKa simulations of base pairs,52 whereas autocorre-
random error in free energy estimates in addition to the lation times between 1 and 2 ps in solvation free energy
systematic error related to the quality of the model potential bookending simulations.60 Finally, it is worth noting that
energy function. These contributions are Hamiltonian replica exchange between λ-simulations has been
(1) The uncertainty in the ensemble average quantities due shown to reduce autocorrelation times and improve the
to fluctuations in the observed sample distribution. reproducibility of free energy estimates between independent
trials.52
(2) The uncertainty caused by approximating the true
Perhaps the simplest (although not always sufficient) way to
distribution via finite-length simulations.
create an ensemble of simulations is to repeat a calculation
The uncertainty of the ensemble averages within the with the same initial structure but different random seed. This
observed distribution can be estimated by considering only behavior has long been the default in AMBER when repeating
the statistically significant samples to calculate the standard a simulation with a stochastic integration scheme. If one or
error of the mean. The statistically significant samples are those more simulations initiated from the same configuration but
data points separated in time by at least the autocorrelation using different seeds do not return similar answers, then it is
time of the time series. Upon pruning the correlated data of likely that the simulations are of insufficient length. As always,
observable x, the sample standard deviation σx is calculated, one should be cautious of the false negative rate of this
and the standard error of the mean is then σ⟨x⟩ = σx 2/Nx , approach and collect as many repeats of appropriate length as
can be afforded in order to obtain the necessary accuracy. For
where Nx is the number of statistically independent samples of
example, in the limit of many infinitesimally short simulations
x. For TI calculations, the observables are the time series
this approach would (very probably incorrectly) suggest that
values of ∂U/∂λ. The estimated error in the free energy due to
no convergence problems exist.
the uncertainty within the observed distribution requires the
5.9.2. Structural Analysis. BFE simulations are only
calculation of the standard errors σ⟨∂U/∂λ⟩λ and propagation of
accurate if the calculations sample important states and
these errors through the quadrature formula, that is transitions. If one could sample infinitely, then it would be
relatively simple to calculate a binding coefficient by directly
σΔG = ∑ wi2σ⟨∂U /∂λ⟩ λi
2
counting transitions from a bound state to an unbound state.
i (17) The difficulty is that the transitions themselves are quite rare
5609 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

and therefore require exceedingly long simulations to observe variation in initial coordinates, simulations can sample
sufficient transitions to accurately calculate the binding significantly different portions of phase space.
coefficient. Indeed, the real time scale of binding events is In cases where the poses are ambiguous, it is necessary to
generally of the order of microseconds or longer. This is run BFE simulations in each of the viable poses and combine
currently inaccessible on commodity computing architectures the results. Multiple possible ligand poses may be encountered
and can only be accomplished with modest throughput on throughout a drug design project, especially during hit finding
purpose-built or leadership computing platforms.213,214 For- or early lead optimization when detailed and trustworthy
tunately, the thermodynamics of binding can be recapitulated structural data may not be available via experimental or hybrid
by exclusively sampling the alchemical (as opposed to means (e.g., homology modeling). Since the docking scores are
conformational) transition between the states. The problem often only weakly correlated with the true binding affinities,226
then becomes a much more tractable issue of correctly a subsequent binding free energy assessment can provide
characterizing these two specific states.215 valuable information. After an initial test that the binding poses
It is important to ensure, as a basic test, that the protein and are in fact stable (see section 5.9), free energy simulations can
ligand are bound for the duration of the complex phase of be launched from each of the viable poses.
simulation. This can be done by computing the root-mean- A few scenarios are possible when running simulations from
square deviation (RMSD) or center of mass (COM) motion of multiple poses:
a ligand in a pocket relative to that of the protein. That is, these (1) the poses interconvert and yield the same free energy
quantities should be computed once the translational and (2) the poses interconvert and do not yield the same free
rotational motion of the protein have been minimized via rigid energy
transformations of all coordinates. If the RMSD or COM (3) the poses do not interconvert and do not yield the same
motion of the ligand deviates significantly then the ligand has free energy
likely fallen out of the pocket. (4) the poses do not interconvert but still yield the same free
Another test that can be done to ensure the consistency of a energy
binding calculation is to examine torsional profiles of the ligand If interconversion does occur, then the presence of multiple
and perhaps even proximal protein side chains. As a general poses could imply complex dynamics that may offer a useful
rule, one expects more flexibility from a ligand in solution. As guide for assessing sampling in other ligands. If one gets the
such, comparisons between bound and unbound simulations same free energies from different poses (scenario 1), then
can easily expose obvious under-sampling when the torsional sufficient sampling can generally be assumed. If different free
populations appear to be completely uncorrelated. Of course, energy predictions are produced in the separate interconvert-
even when sampling is sufficient the overlap need not be ing simulations (scenario 2), then likely a hysteresis issue has
exactly identical because the protein binding pocket will impact been uncovered, suggesting insufficient sampling.
the relevant configurations in the bound state relative to the If interconversion does not occur then it is possible that a
unbound state. If there is reasonable confidence in the particular pose is not physically relevant or that the energetic
sampling, then differences between the torsional profiles can barrier between poses is too large to overcome within the
also be used to identify specific interactions in the protein. simulation time scale. Poses like this can be identified based on
Additionally, it is important to check dihedral profiles having significantly different binding affinity predictions
between independent runs, since rare transitions involving between them (scenario 3). In this case, the more favorable
hidden high energy barriers may need to be captured to get a free energy state is most probably the correct (or most
truly accurate binding free energy calculation. This can simply relevant) state (assuming that the pose for the reference ligand
be done by extending your simulation time or, in difficult cases, is correct, in the case of RBFE simulations). Finally, if the
applying enhanced sampling techniques. A number of poses do not interconvert but the free energy estimate is the
enhanced sampling methods are available in AMBER20, such same or similar (scenario 4), then a correction should be made
as Gaussian Accelerated MD (GaMD),216 the replica exchange to account for the multiple states. For poses giving rise to
version of GaMD (rex-GaMD),217 and the recently introduced separated states one can use a simple discrete model (more
Ligand Gaussian Accelerated Molecular Dynamics (LiG- sophisticated alternatives have also been reported126,128). For
aMD).218 Other enhanced sampling methods, such as replica N different poses with binding free energies ΔGi (i = 1, ..., N),
exchange with solute tempering,219,220 will be available in the corrected binding free energy ΔGcorr across all states is
future versions of AMBER. Other methods for enhanced N N
sampling, such as umbrella sampling,221,222 metadynam- ΔGcorr = ∑ pi ΔGi + β −1∑ pi ln pi
ics,223,224 and adaptive biasing force,109,225 could be employed, i=1 i=1 (18)
although such methods rely on the definition a collective
where β is 1/kT and the probabilities pi are the normalized
variable (CV) prior to simulating the system. Boltzmann weights for the ith pose
5.9.3. Multiple Poses. Similar to using multiple random
seeds, it is also possible to use variations of the input poses. In e−β ΔGi
cases where there is an unambiguous pose for each ligand, pi ≡ N
∑i = 1 e−β ΔGi (19)
small variations in the poses can still provide insights regarding
the local convergence of the BFE simulations. Multiple poses The first term is just a weighted average of the binding free
can be generated many ways, such as saving multiple poses energies for each state (pose) while the second term is the
from the docking program, using different docking programs, entropic contribution. In the special case that all the ΔGi are
subjecting a pose to different minimization routines, or the same, one obtains pi = 1/N and the second term is simply
minimizing with different force fields. Such deviations act in β−1 ln N. In the case that one pose is considerably more
a similar way to random seeds, although due to the slight favorable than the others, it will dominate the first term but not
5610 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

the second. A more detailed discussion, including expanded molecules, GAFF257 is the latest version of the generalized
variations to the expression for ΔGcorr, is provided else- AMBER force field. Additional details can be found in the
where.126,128 AMBER20 manual.
In the case, where interconversion between poses takes place Here, we specifically suggest that lack of consensus in BFE
in a subset of the λ windows, special care should be taken to calculations among different force fields is a red flag and should
understand the nature of the poses and the relationship be investigated further. In some cases a particular force field
between them. While differential interconversion is expected in may be significantly better than another for the ligands of
different λ windows due to differences in the Hamiltonian, it interest, in which case differences in results would be expected
may also indicate insufficient sampling. In such cases, (and results from the better force field should be more
employing a Hamiltonian replica exchange approach may trusted). However, when results vary and it is not clear which
improve results (or indicate that the differential interconver- force field is better, it is not obvious to us that there is a general
sion is not an issue).104 Alternatively, methods can be course of action to take other than to increase scrutiny of the
employed to directly sample the transition between the states results. Recently, Gapsys et al.34 proposed a consensus method
using an alchemical227 or Monte Carlo approach.228 using multiple force fields that improved results in certain
5.9.4. Reversibility and Hysteresis. A related concept to situations. This may be a profitable avenue for future research.
using multiple poses arises in RBFE calculations when a
reference compound (e.g., one observed in a crystallographic 6. FUTURE WORK
structure) is used to dock a candidate compound but multiple There are a number of areas that will be the focus of new free
orientations are returned. In theory, a free energy simulation energy developments for drug discovery in AMBER driven by
should yield the same result no matter which compound is academic-industry partnership. These include the development
alchemically morphed into whichany discrepancy likely of new force fields (QM, MM, and machine learning),
implies a sampling issue. The two may also differ if the enhanced sampling methods (both in the λ dimension as
candidate is suitably different in character from the reference as well as conformational degrees of freedom), improved
to take on a completely different pose and/or induce a alchemical transformation pathways, and optimization of
conformational change in (part of) the protein. These two RBFE networks (including integration of experimental
scenarios can be difficult to tell apart when the ligand constraints). These are briefly summarized below.
perturbation is spatially quite large. 6.1. MM → QM Bookending Approaches. The robust
5.9.5. Force Field Variations. Unfortunately, sampling prediction afforded by alchemical free energy methods in drug
issues, whatever the cause, are not the only concerns when discovery presents considerable challenges for conventional
assessing confidence and error estimationthe details of the molecular mechanical (MM) force fields.232 This is due, in
force field can preordain a calculation to yield an inaccurate part, to the need to test chemically diverse molecules for which
outcome, no matter how much care is placed on the other tested parameters may not exist.3 Modeling certain types of
steps (garbage in, garbage out). Ideally, one would be able to electrostatic interactions, such as σ holes and cation−π
probe specific characteristics of a model. What is the extent of interactions are especially challenging for point charge and
charge “prepolarization”? Which torsional populations are multipole models. Further, the process of drug binding
dominant? How close are specific van der Waals contacts? involves a considerable change in the molecular electrostatic
Unfortunately these are rarely clear-cut knobs that the user can environment that requires explicit consideration of electronic
dial up and down, but it may be possible to correlate them response for high accuracy. Finally, the modeling of complex
roughly with a family of force fields. While it may not be interactions of metal ions and formation/cleavage of chemical
possible in all cases, it can be informative to repeat a simulation bonds for covalent inhibitors demands a more sophisticated
with a different force field model of the ligand and/or protein quantum electronic structure treatment. Quantum mechanical
(our experience is that solvent models display less sensitivity, (QM) methods, if made sufficiently fast to be computationally
at least on short time scales). A useful practice is to reserve this tractable, offer a potentially transformative solution to these
strategy for extreme predictions (e.g., ligand modifications >2 problems.
kcal/mol more or less favorable than the reference). If such In this regard, one could argue that practically every
outcomes are reproducible, the result may be due to a alchemical free energy prediction used in drug discovery could
fundamental bias in the force field. For example, excessive potentially benefit from accurate QM methods. In addition to
polarization may lead to overstabilized hydrogen or halide the general cases described above, particularly prominent
bonds or an unusual torsional profile might favor an examples that demand QM methods include systems, such as
implausible conformation. Rough consensus across multiple metalloproteins where drugs target inner-sphere coordination
force field models can be a source of higher confidence or else to the metal centers, and in general highly charged systems
an indication that a trend is qualitatively, but not quantitatively (including RNA targets) or systems that involve charge-
correct. changing transformations (including protonation/deprotona-
There are a number of force fields available in AMBER for tion events) that exacerbate the need for many-body
proteins, nucleic acids, carbohydrates, lipids, solvents, and ions polarization and charge transfer effects. As these QM methods
(for details about recommended available force fields, see the are still in early stages with respect to their application to drug
AMBER20 manual http://ambermd.org/doc12/Amber20. discovery in alchemical free energy simulations, it remains to
pdf). Additionally, variants of the general AMBER force field be seen the degree to which they may have impact and over
(GAFF133 and GAFF257) are available for nonstandard what range of targets and drugs.
residues including drug-like molecules and modified amino Free energy simulations with combined quantum mechan-
acids. For protein systems, the Stony Brook (SB) family of ical/molecular mechanical (QM/MM) potentials or fully
protein force fields (ff19SB,229 ff14SB,230 and ff99SB231) are quantum mechanical force fields (QMFFs), if made sufficiently
the most commonly used in AMBER. For organic drug-like fast to be computationally tractable, offer a potentially
5611 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

transformative solution to these problems.233−238 Quantum rijLJ(λ ; α LJ ) = [rijn + α LJW (rij)S2(λ)σijn]1/ n


models, if made affordable, are thus highly attractive for drug
design applications owing to their accuracy, robustness, and rijCoul(λ ; αCoul) = [rijm + αCoulW (rij)S2(λ)σijm]1/ m (20)
lack of adjustable free parameters relative to MM force
fields.239 for the Lennard-Jones and the Coulombic interactions,
A common strategy to efficiently correct is to perform the respectively. In the current default AMBER softcore form, m
alchemical transformation with a MM method and then apply is 2 and n is 6. As mentioned earlier, now both rLJ LJ
ij (λ;α ) and
MM → QM/MM free energy corrections to the end-states. rij (λ;α ) are in exactly the same form and α and αCoul are
Coul Coul LJ

This is referred to as a “bookending”, “indirect”. or “reference unitless. (Note that αLJ is the same as the original α in
potential” approach60,240−252 The primary goal of these AMBER.) Hence, it is easier to explore and directly compare
methods is to indirectly estimate a free energy difference different parameters. One obvious observation is that, in the
between states A and B, ΔGA→B, using a computationally current form, a larger m or n will imply a shorter range of the
demanding Hamiltonian by evaluating the free energy change softcore effect. The current setting (m = 2, n = 6) will have
at a low-level of theory and then correcting for the free energy stronger short-range softness in LJ than in Coul interactions,
difference associated with changing the Hamiltonian. which could be one of the reasons that it is difficult to balance
The bookending method development was motivated by the them by only modifying α and αCoul.
idea of using QM and QM/MM Hamiltonians to improve the We explored different combinations of (m, n), including (m
accuracy of solvation and RBFE predictions.239 Bookending = 1, n = 1) and (m = 2, n = 2), incorporated with the
methods circumvent a number of obstacles. First, the smoothstep function SSC(2) and the new softcore parameter
alchemical transformation step often involves simulation of αCoul and the preliminary results are shown in Figures 7 and 8.
several nonphysical intermediate states connecting the end-
states, and it may not be obvious how the nonphysical states
might be modeled with a QM Hamiltonian. For example, in
the case where atoms are deleted or inserted into the system,
one might have to contend with the idea of having a
noninteger number of electrons and partial nuclear charges.
Furthermore, alchemical thermodynamic pathways often
require a number of intermediate states, each of which
requiring a significant amount of sampling to converge the free
energy result, which would be prohibitively expensive with a
QM or QM/MM Hamiltonian. A bookending method instead
computes the alchemical transformation with a molecular
mechanical Hamiltonian method, where softcore potentials
have been developed and which are inexpensive enough to be
sampled. The estimation of the MM → QM/MM free energy
changes does not require simulation of alchemical systems; one
simulates the two end-states (A and B) using one-or-more
Hamiltonians that connect the low- and high-level Hamil-
tonians. It is important to choose the most compatible
reference (MM) potential for the particular high-level Figure 7. Comparison of ⟨dU/dλ⟩ curves from (leftmost column) the
Hamiltonian to avoid slow convergence of the Hamiltonian original SSC(2) scheme with (m = 2, n = 6), from (middle column)
free energy correction estimate.253−256 This has led to work the modified SSC(2) scheme with (m = 2, n = 2), and from
that sought to increase the distribution overlap between the (rightmost column) the modified SSC(2) scheme with (m = 1, n = 1)
reference and high-level Hamiltonians,244,245,254,257−261 includ- (defined in eq 20). The β value is 12 Å2 and αCoul is 1. The molecular
ing methods that perform ad hoc parametrization of the MM systems are (upper row) the absolute hydration free energy of
diphenyltoluene, (middle row) the relative hydration free energy
reference potential via “force matching” to the QM/MM
between the Factor Xa ligand L51h and L51c, and (lower row) the
potential.245,255,260−267 absolute hydration free energy of a single Na+ ion.
It is worthwhile to note that other methods have been
explored to reduce the number of energy and force evaluations
necessary to converge QM or QM/MM free energy estimates. Figure 7 compares the current default scheme (m = 2, n = 6)
These include trajectory reweighting,251,268−271 the use of with (m = 1, n = 1) and (m = 2, n = 2) schemes with the
frozen density functional approximations,272,273 integrated AMBER18 default β value (12 Å2) for (m = 2, n = 6) and αLJ =
Hamiltonian sampling,274 orthogonal space random walk 1 for (m = 1, n = 1) and (m = 2, n = 2), with various α (the
strategies,275 and paradynamics.258,276 Collectively, these same as αLJ in the new form) values. Figure 8 shows the same
methods offer considerable promise to greatly improve the comparison except β is the AMBER20 default β value for
accuracy and predictive capability of alchemical free energy SSC(2) scheme (50 Å2) and αCoul = 4. The preliminary results
simulations with practical computational resources. shown in Figures 7 and 8 suggest that by modifying the
6.2. Further Softcore Improvements. 6.2.1. Modify the softcore exponents m and n, the imbalance between the LJ and
Exponents in the Softcore Potential. We are exploring Coul interactions can be significantly reduced. We are
different forms of the softcore potentials to obtain more currently extending the verification to other more realistic
numerically stable and smooth results in TI, BAR, and MBAR molecular systems.
calculations. The general forms of “effective diatomic 6.3. λ-Scheduling with Smoothstep Functions. As
distances” (eq 15) are mentioned earlier, AMBER20 now provides λ-scheduling for
5612 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

With the advanced methods recently implemented in


AMBER20 reported here, we are in a much better position
to explore various advanced enhanced sampling methods in the
alchemical space. For example, the developed SSC(2) scheme
is well suited for advanced λ-scheduling optimization and
enhanced sampling schemes in the alchemical space, where a
single-pass concerted λ transformation is desirable, including λ
dynamics, 99−102 Hamiltonian replica exchange meth-
ods,91,103−107 adaptive biasing,100,108,109 and self-adjusted
mixture sampling110,111 methods. For the conformational
enhanced sampling at a given λ, the REST/REST2
methods277,278 have been shown to be very successful. We
are actively investigating possible incorporation of the SSC(2)
potential with these techniques.
Another approach is the use of Gaussian accelerated
molecular dynamics (GaMD) and its more recent derivatives,
particularly LiGaMD, as an enhanced sampling method-
ology.216,218 This methodology allows for a Gaussian boost
Figure 8. Comparison of ⟨dU/dλ⟩ curves from (leftmost column) the potential to be selectively applied to either the bonded terms,
original SSC(2) scheme with (m = 2, n = 6), from (middle column) nonbonded terms, and/or the potential energy. These can be
the modified SSC(2) scheme with (m = 2, n = 2), and from applied to the entire system or selectively to the ligand and its
(rightmost column) the modified SSC(2) scheme with (m = 1, n = 1)
(defined in Eq. 20). The β value is 50 Å2, and αCoul is 4. The
contacts. This amount of control allows for faster sampling of
molecular systems are (upper row) the absolute hydration free energy ligand binding states and to overcome high energy barriers. As
of diphenyltoluene, (middle row) the relative hydration free energy the methodology currently does not support λ scaling
between the Factor Xa ligand L51h and L51c, and (lower row) the simulations, we are actively investigating incorporating these
absolute hydration free energy of a single Na+ ion. methodologies for TI and MBAR.
6.5. Force Field. Substantial progress has been made in the
parametrization of empirical molecular force fields, as manifest
flexible turning on and off for individual interactions at in the consistent improvements in the agreement between the
different stages along the alchemical λ-axis, similar to other predictions of molecular dynamics simulations and the
simulation packages such as GROMOS92 and NAMD.84,122 experimental measurements,279 including in the binding free
The commonly used “stepwise” scheme is equivalent to energies.116,140 The remaining journey to experimental-level
“schedule” the LJ and Coul interactions at different stages (e.g., accuracy, however, is almost surely no shorter and no less
in NAMD). Furthermore, the λ-scheduling can be applied to arduous than the road that has led us to where we are now.
bonded terms, that is, bond length, bond angle, and torsion Although further improvements in predictive accuracy are
terms, so that the internal conformation of a disappearing expected from fine-tuning the parameters for the current
softcore region can be kept until very late in the λ functional forms in the force fields, more substantive
transformation, which could prevent or reduce the con- modifications may be required to reach experimental level
formation sampling problems when the internal bonded terms accuracy.
need to be scaled with λ. We are currently exploring different It is now widely recognized that off-atom-center partial
λ-scheduling for different interactions to identify the best charges are necessary to accurately capture the electrostatic
dU
scheduling that will deliver the most smooth and stable potential around a molecule.140,280−282 These off-atom-center

curves. charges, referred to as virtual sites, are placed at a
6.4. Enhanced Sampling. In the past few decades predetermined position within a molecular frame defined by
methods have been developed that address the sampling a parent atom and up to three neighboring atoms that are
problem, such as replica-exchange molecular dynamics, covalently bonded to the parent atom. The current AMBER
metadynamics, simulated annealing, and orthonomal space code only supports a limited number of ways for placing the
methods. Major focus has been on enhanced sampling in the virtual sites, and refer to them as extra points (EP).
conformational spaces. Nevertheless, with the rapid developing Generalized virtual sites, however, have been implemented in
advances in hardware and software, the MD-based free energy a developmental branch of AMBER. Some new types of virtual
methods become feasible and hence emerges the importance sites are shown in Figure 9 and will likely become available in
of enhanced sampling in the alchemical space. On one hand, the next official release. To make the virtual sites truly useful,
enhanced sampling methods in the alchemical space concern however, substantial work is required to optimize their
similarly as the counter methods in the conformational space. locations and the methods to parametrize their charge values.
They both need to have adequate space coverage to obtain The short-range repulsive interactions between atoms have
proper statistically meaningful ensembles and at the same time commonly been modeled by a 1/r12 potential, which was
need to reduce the time spent on the spaces that are not originally proposed for computational efficiency rather than for
critical for the desired properties. On the other hand, a physical accuracy. Buckingham suggested that the repulsive
fundamental difference is that the free energy is a state potential due to the Pauli exclusion principle should resemble
function; hence, theoretically, it is possible to reduce an exponential, and he proposed the following potential
alchemical sampling through efficient and theoretically robust function as a substitute for the prevalent Lennard-Jones 12−6
methods. potential:
5613 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

used in binding free energy calculations, due to both the


roughly 10- to 20-fold increase in the computational cost and
the difficulty in parametrizing the models.289 Nevertheless, it
will be worthwhile to keep an eye out for the development of
polarizable models290 and to implement them in AMBER
when the time comes.
6.6. Network RBFE. In a typical drug discovery project,
BFE is routinely used to compute the binding free energies of a
large numberusually 10−100 s in each batchof candidate
molecules against the same protein target of interest. There are
Figure 9. Seven types of virtual sites will be made available in a future many ways to collectively compute these binding free energies.
release of AMBER. (a) Aromatic halogens: fixed-distance VS from 2- Take a simple example of 3 molecules: A, B, and C. One can
atom frame. (b) Aromatic halogens: flexible-distance VS from 2-atom compute the individual binding free energy for A (ΔGA) and
frame. (c) Aromatic nitrogens: flexible-distance VS from 3-atom
frame. (d) Aromatic nitrogens: fixed-distance VS from 3-atom frame. the relative binding free energies between A and B (ΔΔGAB)
(e) Aromatic nitrogens: fixed-distance-with-angle VS from 3-atom and between A and C (ΔΔGAC), and then estimate the binding
frame. (f) Aromatic carbons: out-of-plane VS from 3-atom frame. (g) free energies for B and C by ΔGB = ΔGA + ΔΔGAB and ΔGC =
Amines: in(out-of)-pyramid VS from 4-atom-frame. The virtual sites ΔGA + ΔΔGAC. Alternatively, one can compute the individual
are shown as cyan beads: P and fn (n = 1, 2, 3) are parent and frame binding free energies ΔGA and ΔGB, as well as the relative
atoms to define virtual sites. The relative positions of the virtual sites binding free energies ΔΔGBC and ΔΔGAC, and estimate the
are specified by the illustrated geometric parameters. binding free energies for C by ΔGC = (ΔGA + ΔΔGAC + ΔGB
+ ΔΔGBC)/2. Moreover, some of the calculations can be
C
V (r ) = A e−Br − allocated more simulation time than others, resulting in
r6 (21) different statistical errors in different calculations. Given a fixed
The Buckingham potential, first published in an article computational cost, there are infinite numbers of ways to
authored by Buckingham and communicated by no other allocate them to the calculations of different individual and
than Lennard-Jones himself to the Proceedings of the Royal relative binding free energy calculations. This poses an
Society of London,283−285 has found widespread use in interesting problem of experimental design: how to best
material science simulations. But its adoption in biomolecular allocate the computational resources to the calculations so as
simulations is so far limited. It will be interesting to see to minimize the overall statistical error in the estimated
whether its more realistic description of the repulsive potential binding free energies?
can lead to more accurate binding free energy results. A number of approaches have been developed to address
In protein−ligand binding, the ligand molecule transfers this problem and to plan the binding free energy calculations
from the high dielectric environment of water to a different for a set of many (between 10 and 100) compounds against
dielectric environment of the protein binding pocket. This the same target. The earlier approaches such as LOMAP195
typically induces a redistribution of the electrons in the ligand aim to construct a network of relative binding free energy
due to molecular polarizability. Thus, at least in theory, a calculations so that any pair of molecules can be computed by
polarizable force field should improve the accuracy of binding combining the results of a small number of pairs of structurally
free energy calculations.286 Polarizability can be modeled by similar molecules and the binding free energy difference
either inducible dipoles287 or Drude oscillators.288 So far, between any pair can be computed by at least two different
however, polarizable force field models have not been widely combinations above. Subsequent works introduced rigorous

Figure 10. RBFE results for CDK2 (16 ligands). The left pane computes each edge of the RBFE network from independent MBAR optimizations.
The center pane simultaneously optimizes all edges in the network, coupling the results through 22 cycle closure constraints. The right pane further
includes a constraint that forces the RBFE of “ligand 28” to match experiment.

5614 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

mathematical frameworks to minimize the overall statistical 7. CONCLUSION


error with respect to either the selection of computed pairs291 In this work, we describe new features in AMBER20 for
or the allocation of computational resources to each pair.292 performing GPU-accelerated alchemical binding free energy
Using these optimal allocations, the same level of statistical simulations. We focus on features and functionality related to
precision may be achieved at half of the computational cost drug discovery effort that arose from an ongoing collaboration
compared to commonly used ad hoc allocations. Such between the York Group, the Laboratory for Biomolecular
optimized network of binding free energy calculations may Simulation Research at Rutgers University, and Silicon
help expand the number of molecules that can be characterized Therapeutics. We also describe the ancillary tools outside of
by binding free energy calculations in each round of drug AMBER needed for preparing and analyzing alchemical
design. binding free energy simulations. We have attempted to note
6.7. Network-Wide Free Energy Analysis. In the future, the nuances associated with free energy simulations in the
methods will be explored that can incorporate known context of drug discovery, especially regarding the balance
experimental binding energies into RBFE networks to improve between ease-of-use and expert control. While automated
the quality of the remaining free energy estimates. One idea for protocols can work, especially for highly validated protein
achieving this is to incorporate constraints into the RBFE targets, each target presents unique challenges with respect to
calculations. Recently, the solution of the MBAR equations has free energy landscapes and intrinsic time scales for sampling
been re-expressed in terms of a nonlinear minimization of an relevant protein and solvent motions. As such, currently the
objective function, rather than having to solve a self-referential best results can be obtained by experienced users with fine-
set of equations.293 This re-expression of the MBAR solution tuned control of the software packages that they use. Indeed,
allows one to use nonlinear parameter optimization software to while alchemical free energy simulations offer great value even
minimize the objective function, where the parameters of the with current automated packages, it is important to note that
optimization are the simulation free energies (to within a significant challenges still exist related to obtaining accurate
constant). When a network of RBFEs are considered, one can and robust binding free energy predictions for drug discovery
expand the optimization method by constructing a new applications. We hope that this work has illustrated some of
objective function that is a sum of objectives corresponding the critical issues that should be considered and will help
to each edge of the network. Optimization of the objective educate the broader population of researchers engaged in the
function sum is equivalent to the independent optimization of use of binding free energy simulations in drug discovery and
each objective unless constraints are introduced that couple the related applications to emerging areas such as precision
medicine.


free energies between edges. An obvious set of constraints are
cycle closures that force the sum of free energies along a closed
path to be zero. Furthermore, if a partial list of known RBFEs ASSOCIATED CONTENT
are experimentally known, then those free energies can be *
sı Supporting Information

included as affine constraints. The Supporting Information is available free of charge at


Giese and York have performed a preliminary, proof-of- https://pubs.acs.org/doi/10.1021/acs.jcim.0c00613.
concept implementation of the constrained network MBAR Alchemical binding free energy calculations in
approach described above, and an example of its use is shown AMBER20 (XLSX)


in Figure 10 which displays RBFEs of ligands to the CDK2
protein. There are 16 ligands, and the ligand with the highest
AUTHOR INFORMATION
experimental binding free energy was chosen to define the zero
of free energy. There are a total 22 cycle closure constraints Corresponding Authors
(11 constraints for the solution-phase network and 11 Woody Sherman − Silicon Therapeutics, Boston, Massachusetts
constraints for the CDK2-bound network). The image shows 02210, United States; orcid.org/0000-0001-9079-1376;
that enforcing cycle closure constraints within the MBAR Email: Woody@silicontx.com
optimization has little effect on the RBFEs unless an additional Darrin M. York − Rutgers, the State University of New Jersey,
experimental RBFE is included. In this example, the calculated Laboratory for Biomolecular Simulation Research, Institute for
RBFE of “ligand 28” was constrained to match experiment. It is Quantitative Biomedicine, and Department of Chemistry and
noteworthy that the inclusion of the experimental RBFE Chemical Biology, New Brunswick, New Jersey 08901-8554,
improves the prediction of most ligands because the calculated United States; orcid.org/0000-0002-9193-7055;
RBFEs are highly coupled by cycle closures. If the cycle Email: Darrin.York@rutgers.edu
closures were not included as constraints, then the Authors
experimental constraint on “ligand 28” would only change Tai-Sung Lee − Rutgers, the State University of New Jersey,
the result of “ligand 28”. Improvement to the correlation Laboratory for Biomolecular Simulation Research, and
coefficient (R) would still be made if a different ligand’s RBFE Department of Chemistry and Chemical Biology, New
was constrained to match experiment. The average correlation Brunswick, New Jersey 08901-8554, United States
coefficient of the 15 possible constraints (the “ligand 28” Bryce K. Allen − Silicon Therapeutics, Boston, Massachusetts
constraint illustrated in Figure 10 is only one of the 15 cases) is 02210, United States; orcid.org/0000-0002-0804-8127
0.84 ± 0.02, which is a significant improvement relative to the Timothy J. Giese − Rutgers, the State University of New Jersey,
0.69 correlation coefficient obtained when cycle-closure and Laboratory for Biomolecular Simulation Research, and
experimental constraints are not enforced. Furthermore, the Department of Chemistry and Chemical Biology, New
average mean unsigned error is reduced from 1.0 to 0.6 ± 0.1 Brunswick, New Jersey 08901-8554, United States
kcal/mol, and the average mean signed error is improved from Zhenyu Guo − Silicon Therapeutics, Suzhou, Jiansu 215000,
0.9 to 0.3 ± 0.2 kcal/mol. China
5615 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Pengfei Li − Silicon Therapeutics, Suzhou, Jiansu 215000, (7) Bennett, C. H. Efficient Estimation of Free Energy Differences
China from Monte Carlo Data. J. Comput. Phys. 1976, 22, 245−268.
Charles Lin − Silicon Therapeutics, Boston, Massachusetts (8) Souaille, M.; Roux, B. Extension to the Weighted Histogram
02210, United States Analysis Method: Combining Umbrella Sampling with Free Energy
T. Dwight McGee, Jr. − Silicon Therapeutics, Boston, Calculations. Comput. Phys. Commun. 2001, 135, 40−57.
(9) Shirts, M. R.; Chodera, J. D. Statistically Optimal Analysis of
Massachusetts 02210, United States
Samples from Multiple Equilibrium States. J. Chem. Phys. 2008, 129,
David A. Pearlman − QSimulate Incorporated, Cambridge, 124105.
Massachusetts 02139, United States (10) Tan, Z.; Gallicchio, E.; Lapelosa, M.; Levy, R. M. Theory of
Brian K. Radak − Silicon Therapeutics, Boston, Massachusetts Binless Multi-state Free Energy Estimation with Applications to
02210, United States; orcid.org/0000-0001-8628-5972 Protein-Ligand Binding. J. Chem. Phys. 2012, 136, 144102.
Yujun Tao − Rutgers, the State University of New Jersey, (11) Shirts, M. R. Reweighting from the Mixture Distribution as a
Laboratory for Biomolecular Simulation Research, and Better Way to Describe the Multistate Bennett Acceptance
Department of Chemistry and Chemical Biology, New RatioarXiv, 2017, 1704.00891. https://arxiv.org/abs/1704.00891
Brunswick, New Jersey 08901-8554, United States; (12) Kirkwood, J. G. Theory of Liquids; Gordon and Breach, 1968;
orcid.org/0000-0002-4520-941X Vol. 2.
Hsu-Chun Tsai − Rutgers, the State University of New Jersey, (13) Postma, J. P.; Berendsen, H. J.; Haak, J. R. Thermodynamics of
Laboratory for Biomolecular Simulation Research, and Cavity Formation in Water. A Molecular Dynamics Study. Faraday
Department of Chemistry and Chemical Biology, New Symp. Chem. Soc. 1982, 17, 55−67.
(14) Jorgensen, W. L.; Ravimohan, C. Monte Carlo Simulation of
Brunswick, New Jersey 08901-8554, United States;
Differences in Free Energies of Hydration. J. Chem. Phys. 1985, 83,
orcid.org/0000-0001-7027-5649 3050−3054.
Huafeng Xu − Silicon Therapeutics, Boston, Massachusetts (15) Tembre, B. L.; McCammon, J.A. Ligand-Receptor Interactions.
02210, United States; orcid.org/0000-0001-5447-0452 Comput. Chem. 1984, 8, 281−283.
Complete contact information is available at: (16) Lybrand, T. P.; McCammon, J. A.; Wipff, G. Theoretical
https://pubs.acs.org/10.1021/acs.jcim.0c00613 Calculation of Relative Binding Affinity in Host-Guest Systems. Proc.
Natl. Acad. Sci. U. S. A. 1986, 83, 833−835.
Notes (17) Straatsma, T.; Berendsen, H.; Postma, J. Free Energy of
The authors declare no competing financial interest. Hydrophobic Hydration: A Molecular Dynamics Study of Noble


Gases in Water. J. Chem. Phys. 1986, 85, 6720−6727.
(18) Merz, K. M., Jr.; Kollman, P. A. Free Energy Perturbation
ACKNOWLEDGMENTS Simulations of the Inhibition of Thermolysin: Prediction of the Free
The authors are grateful for financial support provided by the Energy of Binding of A New Inhibitor. J. Am. Chem. Soc. 1989, 111,
National Institutes of Health (Grant GM107485 to D.M.Y.). 5649−5658.
Computational resources were provided by the Office of (19) Pearlman, D. A.; Connelly, P. R. Determination of the
Advanced Research Computing (OARC) at Rutgers, The State Differential Effects of Hydrogen Bonding and Water Release on the
University of New Jersey (specifically, the Amarel cluster and Binding of FK506 to Native and Tyr82 → Phe82 FKBP-12 Proteins
associated research computing resources), the Extreme Science Using Free Energy Simulations. J. Mol. Biol. 1995, 248, 696−717.
and Engineering Discovery Environment (XSEDE), which is (20) Helms, V.; Wade, R. C. Computational Alchemy to Calculate
supported by National Science Foundation Grant ACI- Absolute Protein-Ligand Binding Free Energy. J. Am. Chem. Soc.
1998, 120, 2710−2713.
1548562 294 (specifically, the resources COMET and
(21) Mobley, D. L.; Klimovich, P. V. Perspective: Alchemical Free
COMET GPU at SDSC through allocation TG- Energy Calculations for Drug Discovery. J. Chem. Phys. 2012, 137,
CHE190067), and the Texas Advanced Computing Center 230901.
(TACC) at the University of Texas at Austin (specifically, the (22) Chodera, J. D.; Mobley, D. L.; Shirts, M. R.; Dixon, R. W.;
Frontera Supercomputer through allocation CHE20002). We Branson, K.; Pande, V. S. Alchemical Free Energy Methods for Drug
also gratefully acknowledge the support of the NVIDIA Discovery: Progress and Challenges. Curr. Opin. Struct. Biol. 2011, 21,
Corporation with the donation of several Pascal, Volta, and 150−160.
Turing GPUs for testing. (23) Shirts, M. R.; Mobley, D. L.; Chodera, J. D. Alchemical Free

■ REFERENCES
(1) Jorgensen, W. L. Efficient Drug Lead Discovery and
Energy Calculations: Ready for Prime Time? Annu. Rep. Comput.
Chem. 2007, 3, 41−59.
(24) Deng, Y.; Roux, B. Computations of Standard Binding Free
Energies with Molecular Dynamics Simulations. J. Phys. Chem. B
Optimization. Acc. Chem. Res. 2009, 42, 724−733.
(2) Abel, R.; Wang, L.; Harder, E. D.; Berne, B. J.; Friesner, R. A. 2009, 113, 2234−2246.
Advancing Drug Discovery through Enhanced Free Energy Calcu- (25) Gilson, M. K.; Zhou, H.-X. Calculation of Protein-Ligand
lations. Acc. Chem. Res. 2017, 50, 1625−1632. Binding Affinities. Annu. Rev. Biophys. Biomol. Struct. 2007, 36, 21−42.
(3) Cournia, Z.; Allen, B.; Sherman, W. Relative Binding Free (26) Gilson, M. K.; Given, J. A.; Bush, B. L.; McCammon, J. A. The
Energy Calculations in Drug Discovery: Recent Advances and Statistical-Thermodynamic Basis for Computation of Binding
Practical Considerations. J. Chem. Inf. Model. 2017, 57, 2911−2937. Affinities: A Critical Review. Biophys. J. 1997, 72, 1047−1069.
(4) Cournia, Z.; Allen, B. K.; Beuming, T.; Pearlman, D. A.; Radak, (27) Woo, H.-J.; Roux, B. Calculation of Absolute Protein-Ligand
B. K.; Sherman, W. Role of Rigorous Free Energy Simulations in Binding Free Energy from Computer Simulations. Proc. Natl. Acad.
Virtual Screening. J. Chem. Inf. Model. 2020, DOI: 10.1021/ Sci. U. S. A. 2005, 102, 6825−6830.
acs.jcim.0c00116. (28) Singh, U.; Weiner, S. J.; Kollman, P. Molecular Dynamics
(5) Zwanzig, R. W. High-Temperature Equation of State by a Simulations of d(CGCGA) X d(TCGCG) With and Without
Perturbation Method. I. Nonpolar Gases. J. Chem. Phys. 1954, 22, ”Hydrated” Counterions. Proc. Natl. Acad. Sci. U. S. A. 1985, 82,
1420−1426. 755−759.
(6) Kirkwood, J. G. Statistical Mechanics of Fluid Mixtures. J. Chem. (29) Bash, P.; Singh, U.; Langridge, R.; Kollman, P. Free Energy
Phys. 1935, 3, 300−313. Calculations by Computer Simulation. Science 1987, 236, 564−568.

5616 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

(30) Singh, U.; Brown, F. K.; Bash, P. A.; Kollman, P. A. An (49) Götz, A.; Williamson, M. J.; Xu, D.; Poole, D.; Le Grand, S.;
Approach to The Application of Free Energy Perturbation Methods Walker, R. C. Routine Microsecond Molecular Dynamics Simulations
Using Molecular Dynamics: Applications to the Transformations of with AMBER on GPUs. 1. Generalized Born. J. Chem. Theory Comput.
Methanol → Ethane, Oxonium → Ammonium, Glycine → Alanine, 2012, 8, 1542−1555.
and Alanine → Phenylalanine in Aqueous Solution and to H3O+ (50) Salomon-Ferrer, R.; Götz, A. W.; Poole, D.; Le Grand, S.;
(H2O) 3 → NH4+ (H2O) 3 in the Gas Phase. J. Am. Chem. Soc. Walker, R. C. Routine Microsecond Molecular Dynamics Simulations
1987, 109, 1607−1614. with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald. J.
(31) Bash, P. A.; Singh, U. C.; Brown, F. K.; Langridge, R.; Kollman, Chem. Theory Comput. 2013, 9, 3878−3888.
P. A. Calculation of the Relative Change in Binding Free Energy of A (51) Le Grand, S.; Götz, A. W.; Walker, R. C. SPFP: Speed without
Protein-Inhibitor Complex. Science 1987, 235, 574−576. Compromise−A Mixed Precision Model for GPU Accelerated
(32) Pearlman, D. A.; Kollman, P. A. The Lag Between the Molecular Dynamics Simulations. Comput. Phys. Commun. 2013,
Hamiltonian and The system Configuration in Free Energy 184, 374−380.
Perturbation Calculations. J. Chem. Phys. 1989, 91, 7831−7839. (52) Giese, T. J.; York, D. M. A GPU-Accelerated Parameter
(33) Jarzynski, C. Nonequilibrium Equality for Free Energy Interpolation Thermodynamic Integration Free Energy Method. J.
Differences. Phys. Rev. Lett. 1997, 78, 2690−2693. Chem. Theory Comput. 2018, 14, 1564−1582.
(34) Gapsys, V.; Pérez-Benito, L.; Aldeghi, M.; Seeliger, D.; van (53) Lee, T.-S.; Hu, Y.; Sherborne, B.; Guo, Z.; York, D. M. Toward
Vlijmen, H.; Tresadern, G.; de Groot, B. L. Large Scale Relative Fast and Accurate Binding Affinity Prediction with pmemdGTI: An
Protein Ligand Binding Affinities Using Non-Equilibrium Alchemy. Efficient Implementation of GPU-Accelerated Thermodynamic
Chem. Sci. 2020, 11, 1140−1152. Integration. J. Chem. Theory Comput. 2017, 13, 3077−3084.
(35) Pearlman, D. A. A. Comparison of Alternative Approaches to (54) Lee, T.-S.; Cerutti, D. S.; Mermelstein, D.; Lin, C.; LeGrand, S.;
Free Energy Calculations. J. Phys. Chem. 1994, 98, 1487−1493. Giese, T. J.; Roitberg, A.; Case, D. A.; Walker, R. C.; York, D. M.
(36) Pearlman, D. A.; Case, D. A.; Caldwell, J. W.; Ross, W. S.; GPU-Accelerated Molecular Dynamics and Free Energy Methods in
Cheatham, T. E., III; DeBolt, S.; Ferguson, D.; Seibel, G.; Kollman, P. Amber18: Performance Enhancements and New Features. J. Chem.
AMBER, A Package of Computer Programs for Applying Molecular Inf. Model. 2018, 58, 2043−2050.
Mechanics, Normal Mode Analysis, Molecular Dynamics and Free (55) Tsai, H.-C.; Tao, Y.; Lee, T.-S.; Merz, K. M.; York, D. M.
Energy Calculations to Simulate The Structural and Energetic Validation of Free Energy Methods in AMBER. J. Chem. Inf. Model.
Properties of Molecules. Comput. Phys. Commun. 1995, 91, 1−41. 2020, DOI: 10.1021/acs.jcim.0c00285.
(37) Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; (56) Song, L. F.; Lee, T.-S.; Zhu, C.; York, D. M.; Merz, K. M., Jr.
Swaminathan, S. A.; Karplus, M. CHARMM: A Program for Using AMBER18 for Relative Free Energy Calculations. J. Chem. Inf.
Macromolecular Energy, Minimization, and Dynamics Calculations. Model. 2019, 59, 3128−3135.
J. Comput. Chem. 1983, 4, 187−217. (57) He, X.; Liu, S.; Lee, T.-S.; Ji, B.; Man, V. H.; York, D. M.;
(38) Fujinaga, M.; Gros, P.; Van Gunsteren, W. Testing the Method Wang, J. Fast, Accurate, and Reliable Protocols for Routine
of Crystallographic Refinement using Molecular Dynamics. J. Appl. Calculations of Protein-Ligand Binding Affinities in Drug Design
Crystallogr. 1989, 22, 1−8. Projects Using AMBER GPU-TI with ff14SB/GAFF. ACS Omega
(39) Pearlman, D. A.; Kollman, P. A. The Overlooked Bond- 2020, 5, 4611−4619.
Stretching Contribution in Free Energy Perturbation Calculations. J. (58) Lee, T.-S.; Lin, Z.; Allen, B. K.; Lin, C.; Radak, B. K.; Tao, Y.;
Chem. Phys. 1991, 94, 4532−4545. Tsai, H.-C.; Sherman, W.; York, D. M. Improved Alchemical Free
(40) Sun, Y.; Spellmeyer, D.; Pearlman, D. A.; Kollman, P. Energy Calculations with Optimized Smoothstep Softcore Potentials.
Simulation of the Solvation Free Energies for Methane, Ethane, and J. Chem. Theory Comput. 2020, DOI: 10.1021/acs.jctc.0c00237.
Propane and Corresponding Amino Acid Dipeptides: A Critical Test (59) Boresch, S.; Tettinger, F.; Leitgeb, M.; et al. Absolute Binding
of the Bond-PMF Correction, A New Set of Hydrocarbon Parameters, Free Energies: A Quantitative Approach for Their Calculation. J. Phys.
and the Gas Phase-Water Hydrophobicity Scale. J. Am. Chem. Soc. Chem. B 2003, 107, 9535−9551.
1992, 114, 6798−6801. (60) Giese, T. J.; York, D. M. Development of a Robust Indirect
(41) Pearlman, D. A.; Kollman, P. A. A New Method for Carrying Approach for MM → QM Free Energy Calculations That Combines
out Free Energy Perturbation Calculations: Dynamically Modified Force-Matched Reference Potential and Bennett’s Acceptance Ratio
Windows. J. Chem. Phys. 1989, 90, 2460−2470. Methods. J. Chem. Theory Comput. 2019, 15, 5543−5562.
(42) Ferguson, D. M.; Pearlman, D. A.; Swope, W. C.; Kollman, P. (61) Humphrey, W.; Dalke, A.; Schulten, K. VMD − Visual
A. Free Energy Perturbation Calculations Involving Potential Molecular Dynamics. J. Mol. Graphics 1996, 14, 33−38.
Function Changes. J. Comput. Chem. 1992, 13, 362−370. (62) Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.;
(43) Pearlman, D. A.; Rao, B. G. Free Energy Calculations: Methods Greenblatt, D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera−A
and Applications. Encyclopedia of Computational Chemistry; Wiley Visualization System for Exploratory Research and Analysis. J.
Online Library, 2002; Vol. 2. Comput. Chem. 2004, 25, 1605.
(44) Darden, T.; York, D.; Pedersen, L. Particle Mesh Ewald: An N (63) Purawat, S.; Ieong, P. U.; Malmstrom, R. D.; Chan, G. J.;
log (N) Method for Ewald Sums in Large Systems. J. Chem. Phys. Yeung, A. K.; Walker, R. C.; Altintas, I.; Amaro, R. E. A Kepler
1993, 98, 10089−10092. Workflow Tool for Reproducible AMBER GPU Molecular Dynamics.
(45) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Biophys. J. 2017, 112, 2469−2474.
Pedersen, L. G. A Smooth Particle Mesh Ewald Method. J. Chem. (64) Doerr, S.; Harvey, M. J.; Noé, F.; De Fabritiis, G. HTMD:
Phys. 1995, 103, 8577−8593. High-Throughput Molecular Dynamics for Molecular Discovery. J.
(46) York, D. M.; Wlodawer, A.; Pedersen, L. G.; Darden, T. A. Chem. Theory Comput. 2016, 12, 1845−1852.
Atomic-Level Accuracy in Simulations of Large Protein Crystals. Proc. (65) ULC Chemical Computing Group Molecular Operating
Natl. Acad. Sci. U. S. A. 1994, 91, 8715−8718. Environment(MOE), 2019.01; Montreal, QC, Canada, 2019.
(47) York, D. M.; Yang, W.; Lee, H.; Darden, T.; Pedersen, L. G. (66) Homer, R. W.; Swanson, J.; Jilek, R. J.; Hurst, T.; Clark, R. D.
Toward the Accurate Modeling of DNA: The Importance of Long- SYBYL Line Notation (SLN): A Single Notation To Represent
Range Electrostatics. J. Am. Chem. Soc. 1995, 117, 5001−5002. Chemical Structures, Queries, Reactions, and Virtual Libraries. J.
(48) Cheatham, T. E., III; Miller, J. L.; Fox, T.; Darden, T. A.; Chem. Inf. Model. 2008, 48, 2294−2307.
Kollman, P. A. Molecular Dynamics Simulations on Solvated (67) Maestro, release S; Schrödinger, LLC: New York, NY, 2017.
Biomolecular Systems: The Particle Mesh Ewald Method Leads to (68) Scheen, J.; Wu, W.; Mey, A.; Tosco, P.; Mackey, M.; Michel, J.
Stable Trajectories of DNA, RNA, and Proteins. J. Am. Chem. Soc. A Hybrid Alchemical Free Energy and Machine Learning Method-
1995, 117, 4193−4194. ology for the Calculation of Absolute Hydration Free Energies of

5617 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Small Molecules. ChemRxiv, 2020, ver. 3. DOI: 10.26434/ (90) Steinbrecher, T.; Joung, I.; Case, D. A. Soft-Core Potentials in
chemrxiv.12380612.v1 Thermodynamic Integration: Comparing One- and Two-Step Trans-
(69) Straatsma, T. P.; Berendsen, H. J. Free Energy of Ionic formations. J. Comput. Chem. 2011, 32, 3253−3263.
Hydration: Analysis of a Thermodynamic Integration Technique to (91) Hritz, J.; Oostenbrink, C. Hamiltonian Replica Exchange
Evaluate Free Energy Differences by Molecular Dynamics Simu- Molecular Dynamics Using Soft-Core Interactions. J. Chem. Phys.
lations. J. Chem. Phys. 1988, 89, 5876−5886. 2008, 128, 144121.
(70) Crooks, G. E. Path-Ensemble Averages in Systems Driven Far (92) Riniker, S.; Christ, C. D.; Hansen, H. S.; Hünenberger, P. H.;
from Equilibrium. Phys. Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Oostenbrink, C.; Steiner, D.; van Gunsteren, W. F. Calculation of
Interdiscip. Top. 2000, 61, 2361−2366. Relative Free Energies for Ligand-Protein Binding, Solvation, and
(71) Boresch, S.; Woodcock, H. L. Convergence of Single-Step Free Conformational Transitions Using the GROMOS Software. J. Phys.
Energy Perturbation. Mol. Phys. 2017, 115, 1200−1213. Chem. B 2011, 115, 13570−13577.
(72) Gapsys, V.; Michielssens, S.; Peters, J. H.; de Groot, B. L.; (93) Ruiter, A. d.; Oostenbrink, C. Extended Thermodynamic
Leonov, H. Calculation of Binding Free Energies. Methods Mol. Biol. Integration: Efficient Prediction of Lambda Derivatives at Non-
2015, 1215, 173−209. simulated Points. J. Chem. Theory Comput. 2016, 12, 4476−4486.
(73) Jeong, D.; Andricioaei, I. Reconstructing Equilibrium Entropy (94) Pal, R. K.; Gallicchio, E. Perturbation Potentials to Overcome
and Enthalpy Profiles from Non-Equilibrium Pulling. J. Chem. Phys. Order/Disorder Transitions in Alchemical Binding Free Energy
2013, 138, 114110. Calculations. J. Chem. Phys. 2019, 151, 124116.
(74) Wei, D.; Song, Y.; Wang, F. A Simple Molecular Mechanics (95) Boresch, S.; Karplus, M. The Role of Bonded Terms in Free
Potential for μm Scale Graphene Simulations from the Adaptive Force Energy Simulations. 2. Calculation of Their Influence on Free Energy
Matching Method. J. Chem. Phys. 2011, 134, 184704. Differences of Solvation. J. Phys. Chem. A 1999, 103, 119−136.
(75) Torrie, G. M.; Valleau, J. P. Nonphysical Sampling (96) Boresch, S.; Karplus, M. The Role of Bonded Terms in Free
Distributions in Monte Carlo Free-Energy Estimation: Umbrella Energy Simulations: 1. Theoretical Analysis. J. Phys. Chem. A 1999,
Sampling. J. Comput. Phys. 1977, 23, 187−199. 103, 103−118.
(76) Shirts, M. R.; Pande, V. S. Comparison of Efficiency and Bias of (97) Shobana, S.; Roux, B.; Andersen, O. S. Free Energy
Free Energies Computed by Exponential Averaging, the Bennett Simulations: Thermodynamic Reversibility and Variability. J. Phys.
Acceptance Ratio, and Thermodynamic Integration. J. Chem. Phys. Chem. B 2000, 104, 5179−5190.
2005, 122, 144107. (98) Loeffler, H. H.; Bosisio, S.; Duarte Ramos Matos, G.; Suh, D.;
(77) Lu, N.; Kofke, D. A. Accuracy of Free-Energy Perturbation Roux, B.; Mobley, D. L.; Michel, J. Reproducibility of Free Energy
Calculations in Molecular Simulation. I. Modeling. J. Chem. Phys. Calculations across Different Molecular Simulation Software Pack-
2001, 114, 7303−7311. ages. J. Chem. Theory Comput. 2018, 14, 5567−5582.
(78) Lu, N.; Kofke, D. A. Accuracy of Free-Energy Perturbation (99) Ding, X.; Vilseck, J. Z.; Hayes, R. L.; Brooks, C. L. Gibbs
Calculations in Molecular Simulation. II. Heuristics. J. Chem. Phys. Sampler-Based λ-Dynamics and Rao-Blackwell Estimator for
2001, 115, 6866−6875. Alchemical Free Energy Calculation. J. Chem. Theory Comput. 2017,
(79) Pohorille, A.; Jarzynski, C.; Chipot, C. Good Practices in Free- 13, 2501−2510.
Energy Calculations. J. Phys. Chem. B 2010, 114, 10235−10253. (100) Hayes, R. L.; Armacost, K. A.; Vilseck, J. Z.; Brooks, C. L.
(80) Bruckner, S.; Boresch, S. Efficiency of Alchemical Free Energy Adaptive Landscape Flattening Accelerates Sampling of Alchemical
Simulations. I. A Practical Comparison of the Exponential Formula, Space in Multisite λ Dynamics. J. Phys. Chem. B 2017, 121, 3626−
Thermodynamic Integration, and Bennett’s Acceptance Ratio 3635.
Method. J. Comput. Chem. 2011, 32, 1303−1319. (101) Guo, Z.; Brooks, C. L. Rapid Screening of Binding Affinities:
(81) Bruckner, S.; Boresch, S. Efficiency of Alchemical Free Energy Application of the λ-Dynamics Method to a Trypsin-Inhibitor System.
Simulations. II. Improvements for Thermodynamic Integration. J. J. Am. Chem. Soc. 1998, 120, 1920−1921.
Comput. Chem. 2011, 32, 1320−1333. (102) Guo, Z.; Brooks, C. L.; Kong, X. Efficient and Flexible
(82) de Ruiter, A.; Boresch, S.; Oostenbrink, C. Comparison of Algorithm for Free Energy Calculations Using the λ-Dynamics
Thermodynamic Integration and Bennett Acceptance Ratio for Approach. J. Phys. Chem. B 1998, 102, 2032−2036.
Calculating Relative Protein-Ligand Binding Free Energies. J. Comput. (103) Jiang, W.; Roux, B. Free Energy Perturbation Hamiltonian
Chem. 2013, 34, 1024−1034. Replica-Exchange Molecular Dynamics (FEP/H-REMD) for Absolute
(83) Klimovich, P. V.; Shirts, M. R.; Mobley, D. L. Guidelines for the Ligand Binding Free Energy Calculations. J. Chem. Theory Comput.
Analysis of Free Energy Calculations. J. Comput.-Aided Mol. Des. 2015, 2010, 6, 2559−2565.
29, 397−411. (104) Arrar, M.; de Oliveira, C. A. F.; Fajer, M.; Sinko, W.;
(84) Jiang, W.; Chipot, C.; Roux, B. Computing Relative Binding McCammon, J. A. w-REXAMD: A Hamiltonian Replica Exchange
Affinity of Ligands to Receptor: An Effective Hybrid Single-Dual- Approach to Improve Free Energy Calculations for Systems with
Topology Free-Energy Perturbation Approach in NAMD. J. Chem. Inf. Kinetically Trapped Conformations. J. Chem. Theory Comput. 2013, 9,
Model. 2019, 59, 3794−3802. 18−23.
(85) Gumbart, J. C.; Roux, B.; Chipot, C. Standard Binding Free (105) Itoh, S. G.; Okumura, H. Hamiltonian Replica-Permutation
Energies from Computer Simulations: What Is the Best Strategy? J. Method and Its Applications to an Alanine Dipeptide and Amyloid-
Chem. Theory Comput. 2013, 9, 794−802. β(29−42) Peptides. J. Comput. Chem. 2013, 34, 2493−2497.
(86) König, G.; Boresch, S. Non-Boltzmann Sampling and Bennett’s (106) Armacost, K. A.; Goh, G. B.; Brooks, C. L. Biasing Potential
Acceptance Ratio Method: How to Profit from Bending the Rules. J. Replica Exchange Multisite λ-Dynamics for Efficient Free Energy
Comput. Chem. 2011, 32, 1082−1090. Calculations. J. Chem. Theory Comput. 2015, 11, 1267−1277.
(87) Wu, D.; Kofke, D. A. Phase-Space Overlap Measures. I. Fail- (107) Yang, M.; Huang, J.; MacKerell, A. D., Jr. Enhanced
Safe Bias Detection in Free Energies Calculated by Molecular Conformational Sampling Using Replica Exchange with Concurrent
Simulation. J. Chem. Phys. 2005, 123, 054103. Solute Scaling and Hamiltonian Biasing Realized in One Dimension.
(88) Wu, D.; Kofke, D. A. Phase-Space Overlap Measures. II. Design J. Chem. Theory Comput. 2015, 11, 2855−2867.
and Implementation of Staging Methods for Free-Energy Calcu- (108) Babin, V.; Roland, C.; Sagui, C. Adaptively Biased Molecular
lations. J. Chem. Phys. 2005, 123, 084109. Dynamics for Free Energy Calculations. J. Chem. Phys. 2008, 128,
(89) Beutler, T. C.; Mark, A. E.; van Schaik, R. C.; Gerber, P. R.; van 134101.
Gunsteren, W. F. Avoiding Singularities and Numerical Instabilities in (109) Darve, E.; Rodríguez-Gómez, D.; Pohorille, A. Adaptive
Free Energy Calculations Based on Molecular Simulations. Chem. Biasing Force Method for Scalar and Vector Free Energy Calculations.
Phys. Lett. 1994, 222, 529−539. J. Chem. Phys. 2008, 128, 144120.

5618 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

(110) Carlson, D. E.; Stinson, P.; Pakman, A.; Paninski, L. Partition (129) Steinbrecher, T. B.; Dahlgren, M.; Cappel, D.; Lin, T.; Wang,
Functions from Rao-Blackwellized Tempered Sampling. Proceedings of L.; Krilov, G.; Abel, R.; Friesner, R.; Sherman, W. Accurate Binding
the 33rd International Conference on Machine Learning; New York, NY, Free Energy Predictions in Fragment Optimization. J. Chem. Inf.
USA, 2016; pp 2896−2905. Model. 2015, 55, 2411−2420.
(111) Tan, Z. Optimally Adjusted Mixture Sampling and Locally (130) Chen, J.; Kriwacki, R. W. Intrinsically Disordered Proteins:
Weighted Histogram Analysis. J. Comput. Graph. Stat. 2017, 26, 54− Structure, Function and Therapeutics. J. Mol. Biol. 2018, 430, 2275.
65. (131) Kim, M. O.; Blachly, P. G.; Kaus, J. W.; McCammon, J. A. J.
(112) Liu, S.; Wu, Y.; Lin, T.; Abel, R.; Redmann, J. P.; Summa, C. Phys. Chem. B 2015, 119, 861−872.
M.; Jaber, V. R.; Lim, N. M.; Mobley, D. L. Lead Optimization (132) Shobana, S.; Roux, B.; Andersen, O. S. Free Energy
Mapper: Automating Free Energy Calculations for Lead Optimiza- Simulations: Thermodynamic Reversibility and Variability. J. Phys.
tion. J. Comput.-Aided Mol. Des. 2013, 27, 755−770. Chem. B 2000, 104, 5179−5190.
(113) Loeffler, H. H.; Michel, J.; Woods, C. FESetup: Automating (133) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case,
Setup for Alchemical Free Energy Simulations. J. Chem. Inf. Model. D. A. Development and Testing of a General AMBER Force Field. J.
2015, 55, 2485−2490. Comput. Chem. 2004, 25, 1157−1174.
(114) Homeyer, N.; Gohlke, H. FEW: A Workflow Tool for Free (134) Träg, J.; Zahn, D. Improved GAFF2 Parameters for
Energy Calculations of Ligand Binding. J. Comput. Chem. 2013, 34, Fluorinated Alkanes and Mixed Hydro-and Fluorocarbons. J. Mol.
965−973. Model. 2019, 25, 39.
(115) Shirts, M. R.; Mobley, D. L. An Introduction to Best Practices in (135) Vanommeslaeghe, K.; MacKerell, A., Jr. Automation of the
Free Energy Calculations; Humana Press, 2013; Vol. 924; pp 271−311. CHARMM General Force Field (CGenFF) I: Bond Perception and
(116) Kuhn, M.; Firth-Clark, S.; Tosco, P.; Mey, A. S. J. S.; Mackey, Atom Typing. J. Chem. Inf. Model. 2012, 52, 3144−3154.
M. D.; Michel, J. Assessment of Binding Affinity via Alchemical Free (136) Vanommeslaeghe, K.; Raman, P. E.; MacKerell, A., Jr.
Energy Calculations. J. Chem. Inf. Model. 2020, 60, 3120−3130. Automation of the CHARMM General Force Field (CGenFF) II:
(117) Stone, J. E.; Hardy, D. J.; Ufimtsev, I. S.; Schulten, K. GPU- Assignment of Bonded Parameters and Partial Atomic Charges. J.
Accelerated Molecular Modeling Coming of Age. J. Mol. Graphics Chem. Inf. Model. 2012, 52, 3155−3168.
Modell. 2010, 29, 116−125. (137) Jorgensen, W. L.; Tirado-Rives, J. The OPLS [Optimized
(118) Mermelstein, D. J.; Lin, C.; Nelson, G.; Kretsch, R.; Potentials for Liquid Simulations] Potential Functions for Proteins,
McCammon, J. A.; Walker, R. C. Fast and Flexible GPU Accelerated Energy Minimizations for Crystals of Cyclic Peptides and Crambin. J.
Binding Free Energy Calculations within the AMBER Molecular Am. Chem. Soc. 1988, 110, 1657−1666.
Dynamics Package. J. Comput. Chem. 2018, 39, 1354−1358. (138) Shivakumar, D.; Williams, J.; Wu, Y.; Damm, W.; Shelley, J.;
(119) Hopkins, C. W.; Le Grand, S.; Walker, R. C.; Roitberg, A. E. Sherman, W. Prediction of Absolute Solvation Free Energies Using
Long-Time-Step Molecular Dynamics through Hydrogen Mass Molecular Dynamics Free Energy Perturbation and the OPLS Force
Repartitioning. J. Chem. Theory Comput. 2015, 11, 1864−1874.
Field. J. Chem. Theory Comput. 2010, 6, 1509−1519.
(120) Boresch, S. The Role of Bonded Energy Terms in Free Energy
(139) Shivakumar, D.; Harder, E.; Damm, W.; Friesner, R. A.;
Simulations - Insights from Analytical Results. Mol. Simul. 2002, 28,
Sherman, W. Improving the Prediction of Absolute Solvation Free
13−37.
Energies Using the Next Generation OPLS Force Field. J. Chem.
(121) Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.;
Theory Comput. 2012, 8, 2553−2558.
Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero,
(140) Roos, K.; Wu, C.; Damm, W.; Reboul, M.; Stevenson, J. M.;
D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T.; Damm,
Lu, C.; Dahlgren, M. K.; Mondal, S.; Chen, W.; Wang, L.; Abel, R.;
W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.;
Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B. Friesner, R. A.; Harder, E. D. OPLS3e: Extending Force Field
J.; Friesner, R. A.; Abel, R. Accurate and Reliable Prediction of Coverage for Drug-Like Small Molecules. J. Chem. Theory Comput.
Relative Ligand Binding Potency in Prospective Drug Discovery by 2019, 15, 1863−1874.
Way of a Modern Free-energy Calculation Protocol and Force Field. J. (141) Ufimtsev, I. S.; Martinez, T. J. Quantum Chemistry on
Am. Chem. Soc. 2015, 137, 2695−2703. Graphical Processing Units. 3. Analytical Energy Gradients, Geometry
(122) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, Optimization, and First Principles Molecular Dynamics. J. Chem.
E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kalé, L.; Schulten, K. Scalable Theory Comput. 2009, 5, 2619−2628.
Molecular Dynamics with NAMD. J. Comput. Chem. 2005, 26, 1781− (142) Bayly, C. I.; Cieplak, P.; Cornell, W.; Kollman, P. A. A Well-
1802. Behaved Electrostatic Potential Based Method Using Charge
(123) Huggins, D. J.; Sherman, W.; Tidor, B. Rational Approaches Restraints for Deriving Atomic Charges: The RESP Model. J. Phys.
to Improving Selectivity in Drug Design. J. Med. Chem. 2012, 55, Chem. 1993, 97, 10269−10280.
1424−1444. (143) Shinada, N. K.; de Brevern, A. G.; Schmidtke, P. Halogens in
(124) Wang, L.; Deng, Y.; Wu, Y.; Kim, B.; LeBard, D. N.; Protein-Ligand Binding Mechanism: A Structural Perspective. J. Med.
Wandschneider, D.; Beachy, M.; Friesner, R. A.; Abel, R. Accurate Chem. 2019, 62, 9341.
Modeling of Scaffold Hopping Transformations in Drug Discovery. J. (144) Mayne, C. G.; Saam, J.; Schulten, K.; Tajkhorshid, E.;
Chem. Theory Comput. 2017, 13, 42−54. Gumbart, J. C. Rapid Parameterization of Small Molecules Using the
(125) Wang, J.; Deng, Y.; Roux, B. Absolute Binding Free Energy Force Field Toolkit. J. Comput. Chem. 2013, 34, 2757−2770.
Calculations Using Molecular Dynamics with Restraining Potentials. (145) Slochower, D. R.; Henriksen, N. M.; Wang, L.-P.; Chodera, J.
Biophys. J. 2006, 91, 2798−2814. D.; Mobley, D. L.; Gilson, M. K. Binding Thermodynamics of
(126) Jayachandran, G.; Shirts, M. R.; Park, S.; Pande, V. S. HostâGuest Systems with SMIRNOFF99Frosst 1.0.5 from the Open
Parallelized-Over-Parts Computation of Absolute Binding Free Force Field Initiative. J. Chem. Theory Comput. 2019, 15, 6225−6242.
Energy with Docking and Molecular Dynamics. J. Chem. Phys. (146) Madhavi Sastry, G.; Adzhigirey, M.; Day, T.; Annabhimoju,
2006, 125, 084901. R.; Sherman, W. Protein and Ligand Preparation: Parameters,
(127) Mobley, D. L.; Chodera, J. D.; Dill, K. A. Confine-And- Protocols, and Influence on Virtual Screening Enrichments. J.
Release Method: Obtaining Correct Binding Free Energies in the Comput.-Aided Mol. Des. 2013, 27, 221−234.
Presence of Protein Conformational Change. J. Chem. Theory Comput. (147) ten Brink, T.; Exner, T. E. Influence of Protonation,
2007, 3, 1231−1235. Tautomeric, and Stereoisomeric States on Protein-ligand Docking
(128) Gallicchio, E.; Levy, R. M. Recent Theoretical and Results. J. Chem. Inf. Model. 2009, 49, 1535−1546.
Computational Advances in Modeling Protein-Ligand Binding (148) Martínez-Rosell, G.; Giorgino, T.; De Fabritiis, G. Play-
Affinities. Adv. Protein Chem. Struct. Biol. 2011, 85, 27−80. Molecule ProteinPrepare: A Web Application for Protein Preparation

5619 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

for Molecular Dynamics Simulations. J. Chem. Inf. Model. 2017, 57, (169) Breiten, B.; Lockett, M. R.; Sherman, W.; Fujita, S.; Al-Sayah,
1511−1516. M.; Lange, H.; Bowers, C. M.; Heroux, A.; Krilov, G.; Whitesides, G.
(149) Jain, A. N. Bias, Reporting, and Sharing: Computational M. Water Networks Contribute to Enthalpy/Entropy Compensation
Evaluations of Docking Methods. J. Comput.-Aided Mol. Des. 2008, 22, in Protein−Ligand Binding. J. Am. Chem. Soc. 2013, 135, 15579−
201−212. 15584.
(150) Laskowski, R. A.; MacArthur, M. W.; Moss, D. S.; Thornton, J. (170) Fox, J. M.; Kang, K.; Sastry, M.; Sherman, W.; Sankaran, B.;
M. PROCHECK: A Program to Check the Stereochemical Quality of Zwart, P. H.; Whitesides, G. M. Water-Restructuring Mutations Can
Protein Structures. J. Appl. Crystallogr. 1993, 26, 283−291. Reverse the Thermodynamic Signature of Ligand Binding to Human
(151) Hooft, R. W.; Vriend, G.; Sander, C.; Abola, E. E. Errors in Carbonic Anhydrase. Angew. Chem., Int. Ed. 2017, 56, 3833−3837.
Protein Structures. Nature 1996, 381, 272. (171) Imai, T.; Kovalenko, A.; Hirata, F. Solvation Thermodynamics
(152) Chen, V. B.; Arendall, W. B.; Headd, J. J.; Keedy, D. A.; of Protein Studied by the 3D-RISM Theory. Chem. Phys. Lett. 2004,
Immormino, R. M.; Kapral, G. J.; Murray, L. W.; Richardson, J. S.; 395, 1−6.
Richardson, D. C. MolProbity: All-atom Structure Validation for (172) Ross, G. A.; Bodnarchuk, M. S.; Essex, J. W. Water Sites,
Macromolecular Crystallography. Acta Crystallogr., Sect. D: Biol. Networks, and Free Energies with Grand Canonical Monte Carlo. J.
Crystallogr. 2010, 66, 12−21. Am. Chem. Soc. 2015, 137, 14930−14943.
(153) Pontius, J.; Richelle, J.; Wodak, S. J. Deviations from Standard (173) Michel, J.; Tirado-Rives, J.; Jorgensen, W. L. Prediction of the
Atomic Volumes as a Quality Measure for Protein Crystal Structures. Water Content in Protein Binding Sites. J. Phys. Chem. B 2009, 113,
J. Mol. Biol. 1996, 264, 121−136. 13337−13346.
(154) Zhao, H.; Caflisch, A. Molecular Dynamics in Drug Design. (174) Beuming, T.; Che, Y.; Abel, R.; Kim, B.; Shanmugasundaram,
Eur. J. Med. Chem. 2015, 91, 4−14. V.; Sherman, W. Thermodynamic Analysis of Water Molecules at the
(155) Wang, Y.; Zhu, G.-F.; Ren, S.-Y.; Han, Y.-G.; Luo, Y.; Du, L.- Surface of Proteins and Applications to Binding Site Prediction and
F. Insight into the Structural Stability of Wild Type and Mutants of Characterization. Proteins: Struct., Funct., Genet. 2012, 80, 871−883.
the Tobacco Etch Virus Protease with Molecular Dynamics (175) Bodnarchuk, M. S. Water, Water, Everywhere··· It’s Time to
Simulations. J. Mol. Model. 2013, 19, 4865−4875. Stop and Think. Drug Discovery Today 2016, 21, 1139−1146.
(156) Mirjalili, V.; Noyes, K.; Feig, M. Physics-based Protein (176) Nittinger, E.; Flachsenberg, F.; Bietz, S.; Lange, G.; Klein, R.;
Structure Refinement through Multiple Molecular Dynamics Rarey, M. Placement of Water Molecules in Protein Structures: From
Trajectories and Structure Averaging. Proteins: Struct., Funct., Genet. Large-scale Evaluations to Single-case Examples. J. Chem. Inf. Model.
2014, 82, 196−207. 2018, 58, 1625−1637.
(157) Feig, M.; Mirjalili, V. Protein Structure Refinement via (177) Mayol, E.; García-Recio, A.; Tiemann, J. K.; Hildebrand, P.
Molecular-Dynamics Simulations: What Works and What Does Not? W.; Guixà-González, R.; Olivella, M.; Cordomí, A. HomolWat: A
Proteins: Struct., Funct., Genet. 2016, 84, 282−292. Web Server Tool to Incorporate ’Homologous’ Water Molecules into
(158) Heo, L.; Feig, M. Experimental Accuracy in Protein Structure GPCR Structures. Nucleic Acids Res. 2020, 48, W54.
Refinement via Molecular Dynamics Simulations. Proc. Natl. Acad. Sci. (178) Li, Y.; Gao, Y. D.; Holloway, M. K.; Wang, R. Prediction of
U. S. A. 2018, 115, 13276−13281. the Favorable Hydration Sites in a Protein Binding Pocket and Its
(159) Vriend, G. WHAT IF: A Molecular Modeling and Drug Application to Scoring Function Formulation. J. Chem. Inf. Model.
Design Program. J. Mol. Graphics 1990, 8, 52−56. 2020, DOI: 10.1021/acs.jcim.9b00619.
(160) Olsson, M. H.; Søndergaard, C. R.; Rostkowski, M.; Jensen, J. (179) Ben-Shalom, I. Y.; Lin, C.; Kurtzman, T.; Walker, R. C.;
H. PROPKA3: Consistent Treatment of Internal and Surface Gilson, M. K. Simulating Water Exchange to Buried Binding Sites. J.
Residues in Empirical pKa Predictions. J. Chem. Theory Comput. Chem. Theory Comput. 2019, 15, 2684−2691.
2011, 7, 525−537. (180) Ben-Shalom, I. Y.; Lin, C.; Kurtzman, T.; Walker, R. C.;
(161) Emsley, P.; Cowtan, K. Coot: Model-Building Tools for Gilson, M. K. Equilibration of Buried Water Molecules to Enhance
Molecular Graphics. Acta Crystallogr., Sect. D: Biol. Crystallogr. 2004, Protein-Ligand Binding Free Energy Calculations. Biophys. J. 2020,
60, 2126−2132. 118, 144a.
(162) Corbeil, C. R.; Moitessier, N. Docking Ligands into Flexible (181) Bodnarchuk, M. S.; Packer, M. J.; Haywood, A. Utilizing
and Solvated Macromolecules. 3. Impact of Input Ligand Con- Grand Canonical Monte Carlo Methods in Drug Discovery. ACS Med.
formation, Protein Flexibility, and Water Molecules on The Accuracy Chem. Lett. 2020, 11, 77−82.
of Docking Programs. J. Chem. Inf. Model. 2009, 49, 997−1009. (182) Deng, Y.; Roux, B. Computation of Binding Free Energy with
(163) Verdonk, M. L.; Chessari, G.; Cole, J. C.; Hartshorn, M. J.; Molecular Dynamics and Grand Canonical Monte Carlo Simulations.
Murray, C. W.; Nissink, J. W. M.; Taylor, R. D.; Taylor, R. Modeling J. Chem. Phys. 2008, 128, 115103.
Water Molecules in Protein-Ligand Docking Using GOLD. J. Med. (183) Ross, G.; Russell, E.; Deng, Y.; Lu, C.; Harder, E.; Abel, R.;
Chem. 2005, 48, 6504−6515. Wang, L. Enhancing Water Sampling in Free Energy Calculations with
(164) Lemmon, G.; Meiler, J. Towards Ligand Docking Including Grand Canonical Monte Carlo. ChemRxiv, 2020, ver. 1.
Explicit Interface Water Molecules. PLoS One 2013, 8, e67536. DOI: 10.26434/chemrxiv.12595073.v1.
(165) Rarey, M.; Kramer, B.; Lengauer, T. The Particle Concept: (184) McGovern, S. L.; Shoichet, B. K. Information Decay in
Placing Discrete Water Molecules During Protein-ligand Docking Molecular Docking Screens against Holo, Apo, and Modeled
Predictions. Proteins: Struct., Funct., Genet. 1999, 34, 17−28. Conformations of Enzymes. J. Med. Chem. 2003, 46, 2895−2907.
(166) Robinson, D. D.; Sherman, W.; Farid, R. Understanding (185) Sherman, W.; Day, T.; Jacobson, M. P.; Friesner, R. A.; Farid,
Kinase Selectivity Through Energetic Analysis of Binding Site Waters. R. Novel Procedure for Modeling Ligand/Receptor Induced Fit
ChemMedChem 2010, 5, 618−627. Effects. J. Med. Chem. 2006, 49, 534−553.
(167) Lenselink, E. B.; Beuming, T.; Sherman, W.; van Vlijmen, H. (186) Miller, E.; Murphy, R.; Sindhikara, D.; Borrelli, K.; Grisewood,
W.; IJzerman, A. P. Selecting an Optimal Number of Binding Site M.; Ranalli, F.; Dixon, S.; Jerome, S.; Boyles, N.; Day, T.; Ghanakota,
Waters to Improve Virtual Screening Enrichments Against the P.; Mondal, S.; Rafi, S. B.; Troast, D. M.; Abel, R.; Friesner, R. A
Adenosine A2A Receptor. J. Chem. Inf. Model. 2014, 54, 1737−1746. Reliable and Accurate Solution to the Induced Fit Docking Problem
(168) Snyder, P. W.; Mecinović, J.; Moustakas, D. T.; Thomas, S. for Protein-Ligand Binding. ChemRxiv, 2020, ver. 1. DOI: 10.26434/
W.; Harder, M.; Mack, E. T.; Lockett, M. R.; Héroux, A.; Sherman, chemrxiv.11983845.v1.
W.; Whitesides, G. M. Mechanism of the Hydrophobic Effect in the (187) Evangelista Falcon, W.; Ellingson, S. R.; Smith, J. C.; Baudry,
Biomolecular Recognition of Arylsulfonamides by Carbonic Anhy- J. Ensemble Docking in Drug Discovery: How Many Protein
drase. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 17889−17894. Configurations from Molecular Dynamics Simulations are Needed

5620 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

to Reproduce Known Ligand Binding? J. Phys. Chem. B 2019, 123, Alchemical Protein-Ligand Binding Free Energy Calculations. Curr.
5189−5195. Top. Med. Chem. 2017, 17, 2577−2585.
(188) Rao, S.; Sanschagrin, P. C.; Greenwood, J. R.; Repasky, M. P.; (208) Zuckerman, D. M.; Woolf, T. B. Overcoming Finite-Sampling
Sherman, W.; Farid, R. Improving Database Enrichment Through Errors in Fast-Switching Free-Energy Estimates: Extrapolative
Ensemble Docking. J. Comput.-Aided Mol. Des. 2008, 22, 621−627. Analysis of a Molecular System. Chem. Phys. Lett. 2002, 351, 445−
(189) Alonso, H.; Bliznyuk, A. A.; Gready, J. E. Combining Docking 453.
and Molecular Dynamic Simulations in Drug Design. Med. Res. Rev. (209) Caves, L. S.; Evanseck, J. D.; Karplus, M. Locally Accessible
2006, 26, 531−568. Conformations of Proteins: Multiple Molecular Dynamics Simu-
(190) Zhou, X.; Chou, J.; Wong, S. T. C. Protein structure similarity lations of Crambin. Protein Sci. 1998, 7, 649−666.
from Principle Component Correlation analysis. BMC Bioinf. 2006, 7, (210) Loccisano, A. E.; Acevedo, O.; DeChancie, J.; Schulze, B. G.;
40. Evanseck, J. D. Enhanced Sampling by Multiple Molecular Dynamics
(191) Liu, K.; Kokubo, H. Exploring the Stability of Ligand Binding Trajectories: Carbonmonoxy Myoglobin 10 micros A0 → A(1−3)
Modes to Proteins by Molecular Dynamics Simulations: A Cross- Transition from Ten 400 ps Simulations. J. Mol. Graphics Modell.
Docking Study. J. Chem. Inf. Model. 2017, 57, 2514−2522. 2004, 22, 369−376.
(192) Liu, K.; Watanabe, E.; Kokubo, H. Exploring the stability of (211) Likic, V. A.; Gooley, P. R.; Speed, T. P.; Strehler, E. E. A
ligand binding modes to proteins by molecular dynamics simulations. Statistical Approach to the Interpretation of Molecular Dynamics
J. Comput.-Aided Mol. Des. 2017, 31, 201−211. Simulations of Calmodulin Equilibrium Dynamics. Protein Sci. 2005,
(193) Ruiz-Carmona, S.; Alvarez-Garcia, D.; Foloppe, N.; 14, 2955−2963.
Garmendia-Doval, A. B.; Juhos, S.; Schmidtke, P.; Barril, X.; (212) Elofsson, A.; Nilsson, L. How Consistent are Molecular
Hubbard, R. E.; Morley, S. D. rDock: A Fast, Versatile and Open Dynamics Simulations? Comparing Structure and Dynamics in
Source Program for Docking Ligands to Proteins and Nucleic Acids. Reduced and Oxidized Escherichia Coli Thioredoxin. J. Mol. Biol.
PLoS Comput. Biol. 2014, 10, e1003571. 1993, 233, 766−780.
(194) Liu, S.; Wang, L.; Mobley, D. L. Is Ring Breaking Feasible in (213) Pan, A. C.; Xu, H.; Palpant, T.; Shaw, D. E. Quantitative
Relative Binding Free Energy Calculations? J. Chem. Inf. Model. 2015, Characterization of the Binding and Unbinding of Millimolar Drug
55, 727−735. Fragments with Molecular Dynamics Simulations. J. Chem. Theory
(195) Liu, S.; Wu, Y.; Lin, T.; Abel, R.; Redmann, J. P.; Summa, C. Comput. 2017, 13, 3372−3377.
M.; Jaber, V. R.; Lim, N. M.; Mobley, D. L. Lead Optimization (214) Shaw, D. E.; Grossman, J.P.; Bank, J. A.; Batson, B.; Butts, J.
Mapper: Automating Free Energy Calculations for Lead Optimiza- A.; Chao, J. C.; Deneroff, M. M.; Dror, R. O.; Even, A.; Fenton, C. H.;
tion. J. Comput.-Aided Mol. Des. 2013, 27, 755−70. Forte, A.; Gagliardo, J.; Gill, G.; Greskamp, B.; Ho, C. R.; Ierardi, D.
(196) Chodera, J. D.; Shirts, M. R. Replica Exchange and Expanded J.; Iserovich, L.; Kuskin, J. S.; Larson, R. H.; Layman, T.; Lee, L.-S.;
Ensemble Simulations as Gibbs Sampling: Simple Improvements for Lerer, A. K.; Li, C.; Killebrew, D.; Mackenzie, K. M.; Mok, S. Y.-H.;
Enhanced Mixing. J. Chem. Phys. 2011, 135, 194110. Moraes, M. A.; Mueller, R.; Nociolo, L. J.; Peticolas, J. L.; Quan, T.;
(197) Knight, J. L.; Brooks, C. L., III λ - Dynamics Free Energy
Ramot, D.; Salmon, J. K.; Scarpazza, D. P.; Schafer, U. B.; Siddique,
Simulation Methods. J. Comput. Chem. 2009, 30, 1692−1700.
N.; Snyder, C. W.; Spengler, J.; Tang, P. T. P.; Theobald, M.; Toma,
(198) Radak, B. K.; Suh, D.; Roux, B. A Generalized Linear
H.; Towles, B.; Vitale, B.; Wang, S. C.; Young, C. Anton 2: Raising
Response Framework for Expanded Ensemble and Replica Exchange
Simulations. J. Chem. Phys. 2018, 149, 072315. the Bar for Performance and Programmability in a Special-Purpose
(199) Jandova, Z.; Fast, D.; Setz, M.; Pechlaner, M.; Oostenbrink, C. Molecular Dynamics Supercomputer. SC14: International Conference
Saturation Mutagenesis by Efficient Free-Energy Calculation. J. Chem. for High Performance Computing, Networking, Storage and Analysis
Theory Comput. 2018, 14, 894−904. 2014, 41−53.
(200) Lin, Z.; Van Gunsteren, W. F.; Liu, H. Conformational State- (215) Gohlke, H.; Kiel, C.; Case, D. A. Insights into Protein-Protein
specific Free Energy Differences by One-step Perturbation: Protein Binding by Binding Free Energy Calculation and Free Energy
Secondary Structure Preferences of the GROMOS 43A1 and 53A6 Decomposition for the Ras-Raf and Ras-RalGDS Complexes. J. Mol.
Force Fields. J. Comput. Chem. 2011, 32, 2290−2297. Biol. 2003, 330, 891−913.
(201) Oostenbrink, C.; van Gunsteren, W. F. Free Energies of (216) Miao, Y.; Feher, V. A.; McCammon, J. A. Gaussian
Ligand Binding for Structurally Diverse Compounds. Proc. Natl. Acad. Accelerated Molecular Dynamics: Unconstrained Enhanced Sampling
Sci. U. S. A. 2005, 102, 6750−6754. and Free Energy Calculation. J. Chem. Theory Comput. 2015, 11,
(202) Liu, H.; Mark, A. E.; van Gunsteren, W. F. Estimating the 3584−3595.
Relative Free Energy of Different Molecular States with Respect to a (217) Huang, Y.-m. M.; McCammon, J. A.; Miao, Y. Replica
Single Reference State. J. Phys. Chem. 1996, 100, 9485−9494. Exchange Gaussian Accelerated Molecular Dynamics: Improved
(203) Rocklin, G. J.; Mobley, D. L.; Dill, K. A.; Hünenberger, P. H. Enhanced Sampling and Free Energy Calculation. J. Chem. Theory
Calculating the Binding Free Energies of Charge Species Based on Comput. 2018, 14, 1853−1864.
Explicit-Solvent Simulations Employing Lattice-Sum Methods: An (218) Miao, Y.; Bhattarai, A.; Wang, J. Ligand Gaussian Accelerated
Accurate Correction Scheme for Electrostatic Finite-Size Effects. J. Molecular Dynamics (LiGaMD): Characterization of Ligand Binding
Chem. Phys. 2013, 139, 184103. Thermodynamics and Kinetics. J. Chem. Theory Comput. 2020,
(204) Chen, W.; Deng, Y.; Russell, E.; Wu, Y.; Abel, R.; Wang, L. DOI: 10.1021/acs.jctc.0c00395.
Accurate Calculation of Relative Binding Free Energies between (219) Jo, S.; Jiang, W. A Generic Implementation of Replica
Ligands with Different Net Charges. J. Chem. Theory Comput. 2018, Exchange with Solute Tempering (REST2) Algorithm in NAMD for
14, 6346−6358. Complex Biophysical Simulations. Comput. Phys. Commun. 2015, 197,
(205) Lin, Y.-L.; Aleksandrov, A.; Simonson, T.; Roux, B. An 304−311.
Overview of Electrostatic Free Energy Computations for Solutions (220) Wang, L.; Berne, B.; Friesner, R. A. On Achieving High
and Proteins. J. Chem. Theory Comput. 2014, 10, 2690−2709. Accuracy and Reliability in the Calculation of Relative Protein−
(206) Homeyer, N.; Stoll, F.; Hillisch, A.; Gohlke, H. Binding Free Ligand Binding Affinities. Proc. Natl. Acad. Sci. U. S. A. 2012, 109,
Energy Calculations for Lead Optimization: Assessment of Their 1937−1942.
Accuracy in an Industrial Drug Design Context. J. Chem. Theory (221) Mobley, D. L.; Chodera, J. D.; Dill, K. A. Confine-and-Release
Comput. 2014, 10, 3331−3344. Method: Obtaining Correct Binding Free Energies in the Presence of
(207) Abel, R.; Wang, L.; Mobley, D. L.; Friesner, R. A. A Critical Protein Conformational Change. J. Chem. Theory Comput. 2007, 3,
Review of Validation, Blind Testing, and Real- World Use of 1231−1235.

5621 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

(222) Leitgeb, M.; Schröder, C.; Boresch, S. Alchemical Free Energy (241) Gao, J.; Xia, X. A priori Evaluation of Aqueous Polarization
Calculations and Multiple Conformational Substates. J. Chem. Phys. Effects Through Monte Carlo QM-MM Simulations. Science 1992,
2005, 122, 084109. 258, 631−635.
(223) Laio, A.; Gervasio, F. L. Metadynamics: A Method to Simulate (242) Luzhkov, V.; Warshel, A. Microscopic Models for Quantum
Rare Events and Reconstruct the Free Energy in Biophysics, Mechanical Calculations of Chemical Processes in Solutions: LD/
Chemistry and Material Science. Rep. Prog. Phys. 2008, 71, 126601. AMPAC and SCAAS/AMPAC Calculations of Solvation Energies. J.
(224) Laio, A.; Parrinello, M. Escaping Free-energy Minima. Proc. Comput. Chem. 1992, 13, 199−213.
Natl. Acad. Sci. U. S. A. 2002, 99, 12562−12566. (243) Kearns, F. L.; Warrensford, L.; Boresch, S.; Woodcock, H. L.
(225) Fu, H.; Shao, X.; Chipot, C.; Cai, W. Extended Adaptive The Good, the Bad, and the Ugly: ”HiPen”, a New Dataset for
Biasing Force Algorithm. An On-The-Fly Implementation for Validating (S)QM/MM Free Energy Simulations. Molecules 2019, 24,
Accurate Free-Energy Calculations. J. Chem. Theory Comput. 2016, 681.
12, 3506−3513. (244) Hudson, P. S.; Han, K.; Woodcock, H. L.; Brooks, B. R. Force
(226) Enyedy, I. J.; Egan, W. J. Can We Use Docking and Scoring Matching as a Stepping Stone to QM/MM CB[8] Host/Guest
for Hit-To-Lead Optimization? J. Comput.-Aided Mol. Des. 2008, 22, Binding Free Energies: A SAMPL6 Cautionary Tale. J. Comput.-Aided
161−168. Mol. Des. 2018, 32, 983−999.
(227) Sakae, Y.; Zhang, B. W.; Levy, R. M.; Deng, N. Absolute (245) Hudson, P. S.; Boresch, S.; Rogers, D. M.; Woodcock, H. L.
Protein Binding Free Energy Simulations for Ligands with Multiple Accelerating QM/MM Free Energy Computations via Intramolecular
Poses, a Thermodynamic Path That Avoids Exhaustive Enumeration Force Matching. J. Chem. Theory Comput. 2018, 14, 6327−6335.
of the Poses. J. Comput. Chem. 2020, 41, 56−68. (246) Kearns, F. L.; Hudson, P. S.; Woodcock, H. L.; Boresch, S.
(228) Gill, S. C.; Lim, N. M.; Grinaway, P. B.; Rustenburg, A. S.; Computing Converged Free Energy Differences between Levels of
Fass, J.; Ross, G. A.; Chodera, J. D.; Mobley, D. L. Binding Modes of Theory via Nonequilibrium Work Methods: Challenges and
Ligands Using Enhanced Sampling (BLUES): Rapid Decorrelation of Opportunities. J. Comput. Chem. 2017, 38, 1376−1388.
Ligand Binding Modes via Nonequilibrium Candidate Monte Carlo. J. (247) Hudson, P. S.; Woodcock, H. L.; Boresch, S. Use of
Phys. Chem. B 2018, 122, 5579−5598. Nonequilibrium Work Methods to Compute Free Energy Differences
(229) Tian, C.; Kasavajhala, K.; Belfon, K. A.; Raguette, L.; Huang, Between Molecular Mechanical and Quantum Mechanical Represen-
H.; Migues, A. N.; Bickel, J.; Wang, Y.; Pincay, J.; Wu, Q.; et al. tations of Molecular Systems. J. Phys. Chem. Lett. 2015, 6, 4850−
ff19SB: Amino-Acid-Specific Protein Backbone Parameters Trained 4856.
against Quantum Mechanics Energy Surfaces in Solution. J. Chem. (248) König, G.; Pickard, F. C.; Huang, J.; Thiel, W.; MacKerell, A.
Theory Comput. 2020, 16, 528−552. D., Jr.; Brooks, B. R.; York, D. M. A Comparison of QM/MM
(230) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Simulations with and without the Drude Oscillator Model Based on
Hauser, K. E.; Simmerling, C. ff14SB: Improving the Accuracy of Hydration Free Energies of Simple Solutes. Molecules 2018, 23,
2695−2720.
Protein Side Chain and Backbone Parameters from ff99SB. J. Chem.
(249) König, G.; Brooks, B. R.; Thiel, W.; York, D. M. On the
Theory Comput. 2015, 11, 3696−3713.
Convergence of Multi-Scale Free Energy Simulations. Mol. Simul.
(231) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.;
2018, 44, 1062−1081.
Simmerling, C. Comparison of Multiple Amber Force Fields and
(250) König, G.; Brooks, B. R. Correcting for the Free Energy Costs
Development of Improved Protein Backbone Parameters. Proteins:
of Bond or Angle Constraints in Molecular Dynamics Simulations.
Struct., Funct., Genet. 2006, 65, 712−725. Biochim. Biophys. Acta, Gen. Subj. 2015, 1850, 932−943.
(232) Cavasotto, C. N.; Adler, N. S.; Aucar, M. G. Quantum (251) König, G.; Hudson, P. S.; Boresch, S.; Woodcock, H. L.
Chemical Approaches in Structure-Based Virtual Screening and Lead Multiscale Free Energy Simulations: An Efficient Method for
Optimization. Front. Chem. 2018, 6, 188. Connecting Classical MD Simulations to QM or QM/MM Free
(233) Giese, T. J.; Chen, H.; Huang, M.; York, D. M. Para- Energies Using Non-Boltzmann Bennett Reweighting Schemes. J.
metrization of an Orbital-Based Linear-Scaling Quantum Force Field Chem. Theory Comput. 2014, 10, 1406−1419.
for Noncovalent Interactions. J. Chem. Theory Comput. 2014, 10, (252) Olsson, M. A.; Ryde, U. Comparison of QM/MM Methods
1086−1098. To Obtain Ligand-Binding Free Energies. J. Chem. Theory Comput.
(234) Giese, T. J.; Huang, M.; Chen, H.; York, D. M. Recent 2017, 13, 2245−2253.
Advances toward a General Purpose Linear-Scaling Quantum Force (253) Shaw, K. E.; Woods, C. J.; Mulholland, A. J. Compatibility of
Field. Acc. Chem. Res. 2014, 47, 2812−20. Quantum Chemical Methods and Empirical (MM) Water Models in
(235) Giese, T. J.; York, D. M. Quantum mechanical force fields for Quantum Mechanics/Molecular Mechanics Liquid Water Simula-
condensed phase molecular simulations. J. Phys.: Condens. Matter tions. J. Phys. Chem. Lett. 2010, 1, 219−223.
2017, 29, 383002. (254) Heimdal, J.; Ryde, U. Convergence of QM/MM Free-Energy
(236) Lu, X.; Fang, D.; Ito, S.; Okamoto, Y.; Ovchinnikov, V.; Cui, Perturbations Based on Molecular-Mechanics or Semiempirical
Q. QM/MM Free Energy Simulations: Recent Progress and Simulations. Phys. Chem. Chem. Phys. 2012, 14, 12592−12604.
Challenges. Mol. Simul. 2016, 42, 1056−1078. (255) Li, P.; Jia, X.; Pan, X.; Shao, Y.; Mei, Y. Accelerated
(237) Kamerlin, S. C. L.; Haranczyk, M.; Warshel, A. Progress in Ab Computation of Free Energy Profile at ab Initio Quantum
Initio QM/MM Free-Energy Simulations of Electrostatic Energies in Mechanical/Molecular Mechanics Accuracy via a Semi-Empirical
Proteins: Accelerated QM/MM Studies of pKa, Redox Reactions and Reference Potential. I. Weighted Thermodynamics Perturbation. J.
Solvation Free Energies. J. Phys. Chem. B 2009, 113, 1253−1272. Chem. Theory Comput. 2018, 14, 5583−5596.
(238) Duarte, F.; Amrein, B. A.; Blaha-Nelson, D.; Kamerlin, S. C. (256) Wang, M.; Mei, Y.; Ryde, U. Host-Guest Relative Binding
Recent Advances in QM/MM Free Energy Calculations Using Affinities at Density-Functional Theory Level from Semiempirical
Reference Potentials. Biochim. Biophys. Acta, Gen. Subj. 2015, 1850, Molecular Dynamics Simulations. J. Chem. Theory Comput. 2019, 15,
954−965. 2659−2671.
(239) Rathore, R.S.; Sumakanth, M.; Reddy, M.; Reddanna, P.; Rao, (257) Š trajbl, M.; Hong, G.; Warshel, A. Ab Initio QM/MM
A.; Erion, M.; Reddy, M.R. Advances in Binding Free Energies Simulation with Proper Sampling: “First Principle” Calculations of the
Calculations: QM/MM-Based Free Energy Perturbation Method for Free Energy of the Autodissociation of Water in Aqueous Solution. J.
Drug Design. Curr. Pharm. Des. 2013, 19, 4674−4686. Phys. Chem. B 2002, 106, 13333−13343.
(240) Gao, J. Absolute Free Energy of Solvation from Monte Carlo (258) Plotnikov, N. V.; Warshel, A. Exploring, Refining, and
Simulations Using Combined Quantum and Molecular Mechanical Validating the Paradynamics QM/MM Sampling. J. Phys. Chem. B
Potentials. J. Phys. Chem. 1992, 96, 537−540. 2012, 116, 10342−10356.

5622 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

(259) Bentzien, J.; Muller, R. P.; Florián, J.; Warshel, A. Hybrid ab (277) Liu, P.; Kim, B.; Friesner, R. A.; Berne, B. J. Replica Exchange
Initio Quantum Mechanics/Molecular Mechanics Calculations of with Solute Tempering: A Method for Sampling Biological Systems in
Free Energy Surfaces for Enzymatic Reactions: The Nucleophilic Explicit Water. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 13749−13754.
Attack in Subtilisin. J. Phys. Chem. B 1998, 102, 2293−2301. (278) Wang, L.; Friesner, R. A.; Berne, B. Replica Exchange with
(260) Kroonblawd, M. P.; Pietrucci, F.; Saitta, A. M.; Goldman, N. Solute Scaling: A More Efficient Version of Replica Exchange with
Generating Converged Accurate Free Energy Surfaces for Chemical Solute Tempering (REST2). J. Phys. Chem. B 2011, 115, 9431−9438.
Reactions with a Force-Matched Semiempirical Model. J. Chem. (279) Piana, S.; Robustelli, P.; Tan, D.; Chen, S.; Shaw, D. E.
Theory Comput. 2018, 14, 2207−2218. Development of a Force Field for the Simulation of Single-Chain
(261) Pinnick, E. R.; Calderon, C. E.; Rusnak, A. J.; Wang, F. Proteins and Protein-Protein Complexes. J. Chem. Theory Comput.
Achieving Fast Convergence of ab Initio Free Energy Perturbation 2020, 16, 2494−2507.
Calculations with the Adaptive Force-Matching Method. Theor. Chem. (280) Izadi, S.; Anandakrishnan, R.; Onufriev, A. V. Building Water
Acc. 2012, 131, 1146. Models: A Different Approach. J. Phys. Chem. Lett. 2014, 5, 3863−
(262) Ercolessi, F.; Adams, J. B. Interatomic Potentials from First- 3871.
Principles Calculations: The Force-Matching Method. EPL 1994, 26, (281) Kolár,̌ M. H.; Hobza, P. Computer Modeling of Halogen
583. Bonds and Other σ-Hole Interactions. Chem. Rev. 2016, 116, 5155−
(263) Maurer, P.; Laio, A.; Hugosson, H. W.; Colombo, M. C.; 5187.
Rothlisberger, U. Automated Parametrization of Biomolecular Force (282) Spinn, A.; Handle, P. H.; Kraml, J.; Hofer, T. S.; Liedl, K. R.
Fields from Quantum Mechanics/Molecular Mechanics (QM/MM) Charge Anisotropy of Nitrogen: Where Chemical Intuition Fails. J.
Simulations through Force Matching. J. Chem. Theory Comput. 2007, Chem. Theory Comput. 2020, 16, 4443−4453.
(283) Buckingham, R. A. The Classical Equation of State of Gaseous
3, 628−639.
Helium, Neon and Argon. P. R. Soc. A-Math. Phys. 1938, 168, 264−
(264) Izvekov, S.; Parrinello, M.; Burnham, C. J.; Voth, G. A.
283.
Effective Force Fields for Condensed Phase Systems from ab Initio
(284) Wade, A. D.; Wang, L.-P.; Huggins, D. J. Assimilating Radial
Molecular Dynamics Simulation: A New Method for Force-Matching.
Distribution Functions To Build Water Models with Improved
J. Chem. Phys. 2004, 120, 10896. Structural Properties. J. Chem. Inf. Model. 2018, 58, 1766−1778.
(265) Zhou, Y.; Pu, J. Reaction Path Force Matching: A New (285) Migliorati, V.; Serva, A.; Terenzio, F. M.; D’Angelo, P.
Strategy of Fitting Specific Reaction Parameters for Semiempirical Development of Lennard-Jones and Buckingham Potentials for
Methods in Combined QM/ MM Simulations. J. Chem. Theory Lanthanoid Ions in Water. Inorg. Chem. 2017, 56, 6214−6224.
Comput. 2014, 10, 3038. (286) Jiao, D.; Golubkov, P. A.; Darden, T. A.; Ren, P. Calculation
(266) Akin-Ojo, O.; Song, Y.; Wang, F. Developing ab Initio Quality of Protein-Ligand Binding Free Energy by Using a Polarizable
Force Fields from Condensed Phase Quantum-Mechanics/Molecular- Potential. Proc. Natl. Acad. Sci. U. S. A. 2008, 105, 6290−6295.
Mechanics Calculations through the Adaptive Force Matching (287) Ren, P.; Wu, C.; Ponder, J. W. Polarizable Atomic Multipole-
Method. J. Chem. Phys. 2008, 129, 064108. Based Molecular Mechanics for Organic Molecules. J. Chem. Theory
(267) Akin-Ojo, O.; Wang, F. The Quest for the Best Non- Comput. 2011, 7, 3143.
polarizable Water Model from the Adaptive Force Matching Method. (288) Lemkul, J. A.; Huang, J.; Roux, B.; MacKerell, A. D., Jr. An
J. Comput. Chem. 2011, 32, 453−462. Empirical Polarizable Force Field Based on the Classical Drude
(268) Sampson, C.; Fox, T.; Tautermann, C. S.; Woods, C.; Skylaris, Oscillator Model: Development History and Recent Applications.
C.-K. A ”Stepping Stone” Approach for Obtaining Quantum Free Chem. Rev. 2016, 116, 4983−5013.
Energies of Hydration. J. Phys. Chem. B 2015, 119, 7030−7040. (289) Wang, J.; Cieplak, P.; Li, J.; Cai, Q.; Hsieh, M.-J.; Luo, R.;
(269) Olsson, M. A.; Söderhjelm, P.; Ryde, U. Converging Ligand- Duan, Y. Development of Polarizable Models for Molecular
Binding Free Energies Obtained with Free-Energy Perturbations at Mechanical Calculations. 4. van der Waals Parametrization. J. Phys.
the Quantum Mechanical Level. J. Comput. Chem. 2016, 37, 1589− Chem. B 2012, 116, 7088−7101.
1600. (290) Shi, Y.; Xia, Z.; Zhang, J.; Best, R.; Wu, C.; Ponder, J. W.; Ren,
(270) Dybeck, E. C.; König, G.; Brooks, B. R.; Shirts, M. R. P. Polarizable Atomic Multipole-Based AMOEBA Force Field for
Comparison of Methods To Reweight from Classical Molecular Proteins. J. Chem. Theory Comput. 2013, 9, 4046.
Simulations to QM/MM Potentials. J. Chem. Theory Comput. 2016, (291) Yang, Q.; Burchett, W.; Steeno, G. S.; Liu, S.; Yang, M.;
12, 1466−1480. Mobley, D. L.; Hou, X. Optimal Designs for Pairwise Calculation: An
(271) Genheden, S.; Ryde, U.; Söderhjelm, P. Binding Affinities by Application to Free Energy Perturbation in Minimizing Prediction
Alchemical Perturbation Using QM/MM with a Large QM System Variability. J. Comput. Chem. 2020, 41, 247.
and Polarizable MM Model. J. Comput. Chem. 2015, 36, 2114−2124. (292) Xu, H. Optimal Measurement Network of Pairwise Differ-
(272) Wesolowski, T.; Warshel, A. Ab Initio Free Energy ences. J. Chem. Inf. Model. 2019, 59, 4720.
Perturbation Calculations of Solvation Free Energy Using the Frozen (293) Ding, X.; Vilseck, J. Z.; Brooks, C. L., III. Fast Solver for Large
Density Functional Approach. J. Phys. Chem. 1994, 98, 5183−5187. Scale Multistate Bennett Acceptance Ratio Equations. J. Chem. Theory
(273) Olsson, M. H. M.; Hong, G.; Warshel, A. Frozen Density Comput. 2019, 15, 799−802.
Functional Free Energy Simulations of Redox Proteins: Computa- (294) Towns, J.; Cockerill, T.; Dahan, M.; Foster, I.; Gaither, K.;
tional Studies of the Reduction Potential of Plastocyanin and Grimshaw, A.; Hazlewood, V.; Lathrop, S.; Lifka, D.; Peterson, G. D.;
Rusticyanin. J. Am. Chem. Soc. 2003, 125, 5025−5039. Roskies, R.; Scott, J. R.; Wilkins-Diehr, N. XSEDE: Accelerating
(274) Mori, T.; Hamers, R. J.; Pedersen, J. A.; Cui, Q. Integrated Scientific Discovery. Comput. Sci. Eng. 2014, 16, 62−74.
Hamiltonian Sampling: A Simple and Versatile Method for Free
Energy Simulations and Conformational Sampling. J. Phys. Chem. B
2014, 118, 8210−8220.
(275) Min, D.; Zheng, L.; Harris, W.; Chen, M.; Lv, C.; Yang, W.
Practically Efficient QM/MM Alchemical Free Energy Simulations:
The Orthogonal Space Random Walk Strategy. J. Chem. Theory
Comput. 2010, 6, 2253−2266.
(276) Plotnikov, N. V.; Kamerlin, S. C. L.; Warshel, A. Para-
dynamics: An Effective and Reliable Model for Ab Initio QM/MM
Free-Energy Calculations and Related Tasks. J. Phys. Chem. B 2011,
115, 7950−7962.

5623 https://dx.doi.org/10.1021/acs.jcim.0c00613
J. Chem. Inf. Model. 2020, 60, 5595−5623

You might also like