Virtual Screening

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Chapter 18

Docking and Virtual Screening in Drug Discovery


Maria Kontoyianni

Abstract
Stages in a typical drug discovery organization include target selection, hit identification, lead optimization,
preclinical and clinical studies. Hit identification and lead optimization are very much intertwined with
computational modeling. Structure-based virtual screening (VS) has been a staple for more than a decade
now in drug discovery with its underlying computational technique, docking, extensively studied. Depend-
ing on the objective, the parameters for VS may change, but the overall protocol is very straightforward.
The idea behind VS is that a library of small compounds are docked into the binding pocket of a protein
(e.g., receptor, enzyme), a number of solutions per molecule, among the top-ranked, are being returned,
and a choice is made on the fraction of compounds to be moved forward for testing toward hit identifica-
tion. The underlying principle of VS is that it differentiates between active and inactive compounds, thus
reducing the number of molecules moving forward and possibly offering a complementary tool to high-
throughput screening (HTS). Best practices in library selection, target preparation and refinement, criteria
in selecting the most appropriate docking/scoring scheme, and a step-wise approach in performing Glide
VS are discussed.

Key words Virtual screening, High-throughput screening, Docking, Scoring, Structure-based drug
design, GOLD, Glide, Drug discovery

1 Introduction

VS is a complementary tool to high-throughput screening (HTS)


that attempts to find hits in the early stages of drug discovery.
Specifically, once a macromolecular target is selected, compounds
are needed to initiate efforts toward a clinical candidate. The goal of
VS is to identify these early “hits” among a library of compounds.
What differentiates HTS from VS is that HTS is an experimental
approach, while VS is a theoretical one. HTS tests large numbers of
compounds for their ability to affect the activity of target molecules
by addressing whether a compound reacts biochemically with the
target. For example, questions such as “does it bind to the target
protein?,” “does it trigger enzymatic reactions?,” “does it activate
signaling pathways?” are explored by HTS assays. Compounds
showing positive results are passed onto a more rigorous assay. It

Iulia M. Lazar et al. (eds.), Proteomics for Drug Discovery: Methods and Protocols, Methods in Molecular Biology,
vol. 1647, DOI 10.1007/978-1-4939-7201-2_18, © Springer Science+Business Media LLC 2017

255
256 Maria Kontoyianni

cannot be emphasized enough that positive results must be recon-


firmed, because if false positives are being pursued, the investment
detriment down the road will be high. On the other hand, negative
results can mean that a potentially valuable compound is not con-
sidered. The latter could be an issue if no hits are found. However,
the goal of HTS is not to find all possible hits in a library collection,
but a sufficiently enough set to use as starting scaffolds for initial
discovery efforts. Therefore, while false positives could be costly if
pursed and thus, re-confirmation of the results is critical, false
negatives are not and should not be a cause for worry. In the
following step, chemists need to intelligently select two to three
classes of compounds that show the most promise for potential
clinical candidates. HTS is time consuming, requires an infrastruc-
ture, and has a low success rate (<5%); nonetheless, it has been the
method of choice for the last 20 years in the pharmaceutical sector.
VS, an in-silico HTS method, consists of virtually placing
(docking) collections of millions of compounds into a biological
target, followed by an evaluation of the tightness of the fit (scor-
ing). VS offers a quick assessment of huge libraries and reduces the
number of compounds that need testing in order to identify early
hits. Thus, the basic requirements for VS are:
l A compound collection, which highly depends on the objective
of the project
l The structure of the biological target
l An appropriate docking/scoring scheme
The choices that are made for each of these requirements come
with a set of questions that need to be addressed in order to make
the process as efficient and accurate as possible. However, depend-
ing on the objective and the target investigated, the protocol varies.
The steps of the protocol are discussed in detail in the following
sections.

2 Materials

2.1 Compound Table 1 provides a representative set of diverse private and public
Collections compound collections. Most public databases have grown out of
the need of academics to have access to chemicals. PubChem [1, 2]
2.1.1 Public Collections
was launched in 2004 and to date it includes three primary data-
bases: Substance, Compound, and Bioassay. While the Substance
database may have redundant records of the same molecule from
different contributors, the Compound database extracts all records
for a specific molecule into an aggregate record, thus making
searching more efficient. The Bioassay database provides descrip-
tions of biological experiments on the tested compounds, particu-
larly from HTS. It should be pointed out that PubChem represents
Computational Drug Discovery 257

Table 1
List of available databases for VS

Database Website Availability Size


PubChem https://pubchem.ncbi.nlm.nih.gov/ Public 92,345,074
ChEMBL https://www.ebi.ac.uk/chembl/ Public 2,036,512
BindingDB https://www.bindingdb.org/bind/index.jsp Public 600,622
ZINC http://zinc.docking.org/ Public 35 million
ChemSpider http://www.chemspider.com/ Public 57 million
DrugBank http://www.drugbank.ca/ Public 8,261 drugs
GRAC http://www.guidetopharmacology.org/about.jsp Public 8,674
ChemBridge http://www.chembridge.com/index.php Commercial 1 million
Maybridge http://www.maybridge.com Commercial 53,000
ChemDiv http://www.chemdiv.com/products/screening-libraries/ Commercial 1.5 million
Life Chemicals http://www.lifechemicals.com/ Commercial 1.2 million
Specs http://www.specs.net/ Commercial 1.5 million
Enamine http://www.enamine.net/ Commercial 2 million

the largest body of molecular structures. The ChEMBL [3] data-


base holds to date about 1.5 million distinct compounds, with
accompanying information on functional assays, binding data, and
ADMET (absorption, distribution, metabolism, excretion, and
toxicity) assays. The bioassay data are derived from the literature,
multiple screening resources, PubChem bioassays, GSK (Glaxo
SmithKline) deposited data, and the BindingDB database. The
BindingDB [4–6] collection focuses on small molecules interact-
ing with proteins. As of May 2017, it contained 1,346,745 binding
findings for 7,100 protein targets and 600,622 small molecules.
BindingDB reports binding data stemming from the literature,
PubChem confirmed assays, and those ChEMBL entries that are
associated with a clearly defined protein target. It includes findings
from enzyme inhibition and kinetics, isothermal titration calorime-
try, NMR, and binding and competition assays. Details about
experimental conditions accompany the data extracted from the
BindingDB. ZINC15 [7–9] is a database of over 120 million
drug-like compounds that can be purchased. Besides being
compound-centric, the latest version of ZINC [9] links compounds
to biological targets or processes, using data from other databases,
while maintaining its original capability to provide information on
purchasing of reagents. ChemSpider provides access to 57 million
chemical structures from 500 data sources. The entire database is
not available for a free download, however, without permission,
258 Maria Kontoyianni

while one download can include up to 5,000 structures along with


respective properties. For downloading a bigger dataset, it is neces-
sary to contact the ChemSpider team. The DrugBank [10] data-
base is a bioinformatics and cheminformatics resource with detailed
drug information and relevant drug target data (i.e., sequence,
structure, and pathway). As of today, the database contains 8,261
drug entries, including 2,012 FDA-approved small molecules, 233
FDA-approved protein/peptide drugs, 94 nutraceuticals, and more
than 6,000 experimental drugs. Furthermore, 4,338 non-redun-
dant protein sequences are linked to these drug entries. Selected
text components and sequence data can be downloaded. Similarly, a
smaller database, GRAC [11], includes about 8,611 ligands. These
are approved drugs, phase I and beyond candidates, monoclonal
antibodies, compounds from repurposing initiatives, representative
compounds directed against reported Alzheimer’s disease targets,
new human Protein Data Bank ligand structures, and ligands dis-
closed in papers with ligand-protein relationships.

2.1.2 Commercial ChemBridge and Maybridge are commercially available screening


Collections libraries complementary to one another, with little overlap between
the two. ChemBridge includes the CORE library with about
640,000 compounds, covering unique chemical spaces produced
from 810 scaffolds, and another non-overlapping library, the
EXPRESS-Pick collection. The latter consists of 460,000 novel
and drug-like chemical structures, covering a broad chemical
space, and offering diversity for initial structure-activity relationship
efforts. EXPRESS-Pick compounds are selected using novelty,
diversity, drug-like properties, and chemical structure analyses.
The Maybridge library is much smaller containing approximately
53,000 compounds. ChemDiv offers a collection of over 1.5 mil-
lion compounds, which have been validated in biological assays and
are mostly focused libraries. Life Chemicals consists of 431,000
diverse compounds based on 2,800 distinct scaffolds. The Specs
compounds can be downloaded at specs.net in “sd” format.
Enamine is another commercial library with the largest collection
of screening compounds (2,000,000). It includes the Advanced
Collection with 294,000 compounds for targeted library design,
an HTS Collection of 1,757,000 diverse screening compounds,
and the Premium Collection of 120,000 compounds having favor-
able physicochemical properties.

2.1.3 Considerations The choice of the database should not be driven by the number of
Regarding the Library compounds that it contains, but by the existing knowledge-base of
Selection the target regarding already known actives or information pertain-
ing to the binding pocket. For example, are there known active
compounds in the library? If yes, perform ligand-based similarity
searching and eliminate the compounds that are similar to the
already known actives (binders). If not, then the use of filters to
Computational Drug Discovery 259

eliminate potentially dead-end-leads or the generation of a diverse


set may be the answer to selecting the most appropriate library. In
regards to filters, much debate exists on whether to use the Lipins-
ki’s rule of five (RO-5) [12], which was developed based on the
molecular properties of orally bioavailable compounds. This is due
to the fact that in attempting to find early hits and using the RO-5
approach, one might eliminate scaffolds that could otherwise be
optimized in later lead optimization cycles to improve oral bioavail-
ability. On the other hand, filtering based on the Pan Assay Inter-
ference Compounds (PAINS) [13] is advisable in order to eliminate
promiscuous binders. Thus, PAINS would be the filter of choice,
while RO-5 is a filter used in the hit-to-lead stage. The size of the
binding pocket can also be used as a guide in that if it is small, larger
compounds may be eliminated from the library. Finally, a diverse
subset is generated by identifying compounds of similar structure
and choosing one representative of each scaffold class. The resul-
tant library will then be employed in VS diversity sets.
Another critical aspect of library selection refers to the ligand
geometries. Even though most libraries are in an “sd” format, atten-
tion should be given to bond lengths, rings, chirality, protonation
and charges, and proper atom types for the program to be used.
Docking programs consider the ligands as flexible at the torsional
angle level. There is no optimization for lengths and bonds, thus the
researcher needs to make sure that these parameters are accurate
before proceeding. There are several excellent programs, freely avail-
able to academics, that generate conformations [14].

2.2 The Structure Typically, crystal structures of the biological targets are used. All the
of the Biological Target experimentally resolved receptor structures (NMR or X-ray) can be
found in the Protein Databank (www.rcsb.org) as PDB IDs (for
2.2.1 Crystal Structures
example, 3nxu corresponds to the cytochrome P450 3A4 isozyme
complexed with the inhibitor ritonavir) [15]. If the PDB ID is not
known, in the keyword window one should enter the name of the
target of interest, and a list of resolved crystal or NMR structures
will appear with their respective PDB IDs. However, even if the
crystal structure is available, its quality needs to be carefully eval-
uated, because crystal structures are static snapshots not accounting
for the dynamic behavior of macromolecules. Another point to
note is that the crystal structures many times have water molecules
attached. Common practice is to remove all waters. This is a point
of contention, however, because some water molecules are very
critical, i.e., form hydrogen-bonding networks with the ligand, or
are involved in the mechanism of action of the enzyme. If there are
water molecules involved in ligand binding, there are programs that
can include these molecules either as part of the grid that defines
the active site (i.e., Glide, however, the water molecules remain
static) [16–18] or enable the user to select the water molecules that
260 Maria Kontoyianni

move during docking (i.e., GOLD) [19, 20]. In this chapter, we


will not discuss the methods used to identify which water molecules
are critical and/or are needed.

2.2.2 Target The following parameters are evaluated prior to performing VS: (a)
Considerations Prior to VS The resolution of the structure (the higher, the better); (b) The R-
factor (or residual factor or agreement factor) which is indicative of
the agreement between the crystallographic model and the experi-
mental X-ray diffraction data, and (c) The B factor that is reflective
of the true static or dynamic mobility of an atom, and therefore it
shows where the errors are in the structure (the lower the B factor,
the better). Also, crystal structures do not have hydrogens, so
hydrogen atoms must be added prior to doing any VS experiments.
Finally, we have to consider if the crystal structure is a complex
(receptor-small molecule) or not. In the first case, the binding
pocket, which needs to be defined for docking, represents the
area around the small molecule. However, if the crystal structure
is an apo-structure (containing no small molecule within it), then
programs must be used that find the binding pockets prior to VS
[21]. If the crystal structure is not available, homology modeling
can be used to generate a receptor structure, provided there is
reasonable homology with template(s). The reader is advised to
refer to CASP (Critical Assessment of Protein Structure Prediction)
papers regarding advances in the field of model building or homol-
ogy modeling [22]. Caution should be given to metal ions and
cofactors. If the receptor is a metalloenzyme, docking with pro-
grams that have been parameterized in their scoring functions for
the specific ion is critical, or VS will not work.

2.2.3 Other Biological targets are dynamic in nature, however, experimentally


Considerations resolved macromolecular structures with or without a bound ligand
are isolated snapshots and not reflective of flexibility. Some docking
programs attempt to address flexibility by including rotamers of
side chains of the binding site residues [23] or by using an induced
fit (IF) methodology or ensemble docking. In IF, a ligand is docked
giving rise to multiple receptor-ligand complexes; each ligand is
subsequently re-docked into each of these receptor conformations,
as is the case with Schrodinger’s Induced Fit Docking protocol.
Ensemble docking employs multiple protein structures stemming
from the PDB, if available, or from molecular dynamics simula-
tions, or from normal mode analyses. It is not within the scope of
the present chapter to delve into alternative approaches and meth-
odologies that incorporate receptor flexibility. It should be, how-
ever, emphasized that not all receptors undergo conformational
changes, which means that the success of a VS experiment is not
always determined by the use of a flexible receptor structure.
Computational Drug Discovery 261

2.3 An Appropriate Most investigators use a docking program that they are familiar
Docking/Scoring with or one that is available in the laboratory. However, this is not
Scheme good practice. If the receptor-small molecule complex structure is
available (see Subheading 2.2), several docking algorithms must be
used to see which one places the small molecule in the same
orientation as in the PDB ID of that target. Toward that goal, a
number of poses should be generated and visually inspected (typi-
cally 30–60 poses/ligand). It is advisable to visually inspect all
poses, as the top-ranked ones with the native scoring function are
often not accurate. Once the docking program that reproduces the
experimental pose (i.e., what is seen in the crystal structure) is
identified, the researcher should check how the pose is ranked by
the native scoring function (docking programs come with their
own scoring functions). If it is ranked at the top, then this dock-
ing/scoring scheme should be used in the VS experiment. If not,
poses should be rescored with different scoring functions. This will
lead to a docking/scoring combination and provide information on
how close to being top-ranked the observed modes are.
Finally, it is advisable that a pilot VS experiment with a smaller
subset of the chosen library, seeded with the known active com-
pounds against the target of interest as decoys, is performed using
the docking/scoring scheme identified in the preceding paragraph.
This is performed to ensure optimal parameter choices, robustness,
and to hopefully get a sense of accuracy in ranking the known hits
high(er). An excellent list of available docking programs can be
found on the Swiss Institute of Bioinformatics website (https://
www.click2drug.org), whereas Table 2 shows a representative list of
the most widely used programs.

3 Methods

VS with the Glide program is described below. The main steps that
must be performed include: (1) Prepare the target after download-
ing it from the PDB (Glide has its own preparation routine, called
“protein preparation wizard”); (2) Generate a GRID, which will
define the area that the docking algorithm has to search (binding
pocket) to place/dock the ligands; (3) Run the docking algorithm
with the generated GRID.

3.1 Target 1. Under Project, select “get pdb auto,” enter the PDB entry of
Preparation the receptor of interest, and download. This downloads the
PDB ID that the investigator is interested in.
2. Under tasks, the “protein preparation wizard” should be
selected.
3. Then “add hydrogens”. Hydrogens may not be displayed (even
if they are added) if the button “none” for “display hydrogens”
262 Maria Kontoyianni

Table 2
Representative docking programs used in VS

Program Free for academics Source


Autodock Yes http://autodock.scripps.edu/
Autodock Vina Yes http://vina.scripps.edu/
DOCK Yes http://dock.compbio.ucsf.edu/
RosettaLigand Yes http://rosettadock.graylab.jhu.edu/
iGEMDOCK Yes http://gemdock.life.nctu.edu.tw/dock/igemdock.php
SLIDE Yes http://www.kuhnlab.bmb.msu.edu/
rDOCK Yes http://rdock.sourceforge.net/
iDOCK Yes http://istar.cse.cuhk.edu.hk/idock/
FlipDock Yes http://flipdock.scripps.edu/
paraDOCKs Yes https://github.com/cbaldauf/paradocks
DAIM/SEED Yes http://www.biochem-caflisch.uzh.ch/download/
GlamDock Yes http://www.chil2.de/Glamdock.html
BetaDock Yes http://voronoi.hanyang.ac.kr/software.htm
FRED Yes http://www.eyesopen.com/oedocking
ICM No http://www.molsoft.com/docking.html
FlexX(E) No https://www.biosolveit.de/FlexX/
Glide No https://www.schrodinger.com/glide
GOLD No http://www.ccdc.cam.ac.uk/solutions/csd-discovery/
components/gold/
MOW No http://www.chemcomp.com/

is selected. Other options for hydrogen display include polar


only, mixed, or all. The default is “Mixed” (Fig. 1 shows the
1ubq receptor structure before and after this step).
4. Next, we assign bond orders (double bonds, single bonds, and
aromaticity are corrected in this step).
5. Select “delete waters” within 5 Å. If some water molecules are
thought to be critical, we can keep them in the pdb, and delete
the rest with an editor. In the latter case, we do not delete any
waters in the protein preparation wizard.
6. Proceed with “preprocess.”
7. In the “review and modify” button, the option to delete a chain,
in the case that the crystal structure has more than one chain and
they are all the same, is available. Keeping only one chain saves
time.
Computational Drug Discovery 263

Fig. 1 PDB ID 1ubq, without hydrogens (left panel ) and with hydrogens (right panel ). Backbone is orange, side
chains are purple, and water molecules are depicted in blue in the left panel, while the hydrogens are blue in
the right panel

8. Under “refine,” hydrogen-bonds are assigned at pH 7.0, while


the water orientations are sampled, if this option is checked.
This is done via the “optimize” button. The next step should
be minimization, which refines the positions of hydrogens and
heavy atoms.

3.2 Grid Generation 1. The grid will define the space within which the docking pro-
gram will place the ligands (binding pocket). Under tasks,
select “docking grid generation.”
2. To generate the grid, one needs to click on an atom of the
ligand (“pick to identify the ligand” is checked), thus defining
it as the center of the grid. In the case that the crystal structure
does not have a ligand bound to it, the button that identifies
the ligand should be left unchecked. The van der Waals (vdw)
scaling factor is set to 1.0 (default), unless docking does not
result in any poses; in the latter case, the scaling factor should
be lowered to 0.8 in order to allow for more tolerance for
closer contacts (see Note 1).
3. Under “site,” either the centroid of the workspace ligand or
the centroid of selected residues should be checked. In the first
case, the ligand is part of the structure (see step 2 above), while
in the latter, an amino acid needs to be selected, provided we
know from experiments (i.e., mutagenesis) which amino acids
are critical for binding and/or for the enzymatic mechanism.
Once this is done, a purple rectangular box surrounding the
binding pocket appears on the workspace.

3.3 Docking/VS 1. Under “tasks,” glide docking should be selected.


2. Indicate the name of the grid that was just generated.
264 Maria Kontoyianni

3. Select the filename of the library collection that will be used for
screening.
4. The vdw scaling factor and partial charge cutoff should be left
at default values, unless softening of the potential is thought to
be necessary (see Note 2).
5. Under “settings,” HTVS (high-throughput virtual screening)
should be selected. Ligand sampling is flexible, which means
that unlike the receptor which remains rigid throughout the
experiment, the ligands are flexible so that different conforma-
tions are identified and docked.
6. In the output tab, “write pose viewer” should be enabled
because this is the output file. Depending on the library,
“write out at most 5 poses per ligand” means that five poses
per ligand will be generated. This may not be ideal, if the library
has millions of compounds. It is up to the researcher to decide
how many poses per ligand are wanted.
7. Click Run.

3.4 Analysis 1. The results come in a project table that will be opened once the
poseviewer file (output from the run) is imported (see Note 3).
2. Open the project table, and under “entry,” “view poses”
should be selected. The arrows on the top can be used to step
through each of the docked ligands.
3. To view interactions, in the Pose Viewer menu, one can click on
what he/she is interested, i.e., contacts, hydrogen-bonds, etc.
which will be marked on the screen for each pose.

4 Notes

1. The docking programs consider the receptor as a rigid struc-


ture that is not moving during docking. With the scaling of van
der Waals radii of nonpolar atoms, the investigator has the
option to decrease the penalties for close contacts, and in turn
allow for a slight “give” in the receptor and/or the ligand. This
is by no means an approach to generate a flexible receptor, but a
way of allowing poses that would be otherwise rejected due to
close contacts to active site residues.
2. Successful docking requires sometimes that the ligand or the
receptor “give” a bit in order to bind. To model this behavior,
Glide can scale the vdw radii of nonpolar atoms (where nonpolar
is defined by a partial charge threshold that the researcher can
set), thereby decreasing penalties for close contacts. By default,
scaling is performed for qualifying atoms in the ligand, but not
those in the receptor. Ligand atom radii scaling settings can be
changed using the options in this section.
Computational Drug Discovery 265

3. Before proceeding to visual inspection of the results, one needs


to address the following questions: (1) Which percentage of the
database should be considered in order to identify a sufficient
number of actives for initial scaffolding? It is suggested that
2.5% of the library collection should be examined [24]; (2)
Which criteria to use, besides scoring, to select compounds?
The reader should be reminded that the objective of VS is to
find actives among inactives. Ideally, all actives are scored on
the top, but realistically that is never the case. Consequently,
knowledge regarding interactions with key residues, if experi-
mental data is available, structural diversity, and possibly finger-
prints are the most common criteria used for aiding compound
selection.

References
1. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, 8. Irwin JJ, Sterling T, Mysinger MM, Bolstad
Gindulyte A, Han L, He J, He S, Shoemaker ES, Coleman RG (2012) ZINC: a free tool to
BA, Wang J, Yu B, Zhang J, Bryant SH (2016) discover chemistry for biology. J Chem Inf
PubChem Substance and Compound data- Model 52(7):1757–1768. doi:10.1021/
bases. Nucleic Acids Res 44(D1): ci3001277
D1202–D1213. doi:10.1093/nar/gkv951 9. Sterling T, Irwin JJ (2015) ZINC 15–Ligand
2. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Discovery for Everyone. J Chem Inf Model 55
Zhou Z, Han L, Karapetyan K, Dracheva S, (11):2324–2337. doi:10.1021/acs.jcim.5b00
Shoemaker BA, Bolton E, Gindulyte A, Bryant 559
SH (2012) PubChem’s BioAssay Database. 10. Wishart DS, Knox C, Guo AC, Cheng D, Shri-
Nucleic Acids Res 40(Database issue): vastava S, Tzur D, Gautam B, Hassanali M
D400–D412. doi:10.1093/nar/gkr1132 (2008) DrugBank: a knowledgebase for
3. Gaulton A, Bellis LJ, Bento AP, Chambers J, drugs, drug actions and drug targets. Nucleic
Davies M, Hersey A, Light Y, McGlinchey S, Acids Res 36(Database issue):D901–D906.
Michalovich D, Al-Lazikani B, Overington JP doi:10.1093/nar/gkm958
(2012) ChEMBL: a large-scale bioactivity 11. Pawson AJ, Sharman JL, Benson HE, Faccenda
database for drug discovery. Nucleic Acids Res E, Alexander SP, Buneman OP, Davenport AP,
40(Database issue):D1100–D1107. doi:10. McGrath JC, Peters JA, Southan C, Spedding
1093/nar/gkr777 M, Yu W, Harmar AJ, Nc I (2014) The
4. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK IUPHAR/BPS Guide to PHARMACOL-
(2007) BindingDB: a web-accessible database OGY: an expert-driven knowledgebase of
of experimentally determined protein-ligand drug targets and their ligands. Nucleic Acids
binding affinities. Nucleic Acids Res 35(Data- Res 42(Database issue):D1098–D1106.
base):D198–D201. doi:10.1093/nar/gkl999 doi:10.1093/nar/gkt1143
5. Gilson MK, Liu T, Baitaluk M, Nicola G, 12. Lipinski CA (2000) Drug-like properties and
Hwang L, Chong J (2016) BindingDB in the causes of poor solubility and poor perme-
2015: A public database for medicinal chemis- ability. J Pharmacol Toxicol Methods 44
try, computational chemistry and systems phar- (1):235–249
macology. Nucleic Acids Res 44(D1): 13. Baell JB, Holloway GA (2010) New substruc-
D1045–D1053. doi:10.1093/nar/gkv1072 ture filters for removal of pan assay interference
6. Nicola G, Liu T, Gilson MK (2012) Public compounds (PAINS) from screening libraries
domain databases for medicinal chemistry. J and for their exclusion in bioassays. J Med
Med Chem 55(16):6987–7002. doi:10. Chem 53(7):2719–2740. doi:10.1021/
1021/jm300501t jm901137j
7. Irwin JJ, Shoichet BK (2005) ZINC–a free 14. Ebejer JP, Morris GM, Deane CM (2012)
database of commercially available compounds Freely available conformer generation meth-
for virtual screening. J Chem Inf Model 45 ods: how good are they? J Chem Inf Model
(1):177–182. doi:10.1021/ci049714þ 52(5):1146–1158. doi:10.1021/ci2004658
266 Maria Kontoyianni

15. Berman HM, Westbrook J, Feng Z, Gilliland 20. Verdonk ML, Chessari G, Cole JC, Hartshorn
G, Bhat TN, Weissig H, Shindyalov IN, MJ, Murray CW, Nissink JW, Taylor RD, Tay-
Bourne PE (2000) The Protein Data Bank. lor R (2005) Modeling water molecules in
Nucleic Acids Res 28(1):235–242 protein-ligand docking using GOLD. J Med
16. Friesner RA, Banks JL, Murphy RB, Halgren Chem 48(20):6504–6515
TA, Klicic JJ, Mainz DT, Repasky MP, Knoll 21. Ghersi D, Sanchez R (2011) Beyond structural
EH, Shelley M, Perry JK, Shaw DE, Francis P, genomics: computational approaches for the
Shenkin PS (2004) Glide: a new approach for identification of ligand binding sites in protein
rapid, accurate docking and scoring. 1. Method structures. J Struct Funct Genomics 12
and assessment of docking accuracy. J Med (2):109–117. doi:10.1007/s10969-011-
Chem 47(7):1739–1749 9110-6
17. Friesner RA, Murphy RB, Repasky MP, Frye 22. Moult J, Fidelis K, Kryshtafovych A, Schwede
LL, Greenwood JR, Halgren TA, Sanschagrin T, Tramontano A (2014) Critical assessment of
PC, Mainz DT (2006) Extra precision glide: methods of protein structure prediction
docking and scoring incorporating a model of (CASP)–round x. Proteins 82(Suppl 2):1–6.
hydrophobic enclosure for protein-ligand com- doi:10.1002/prot.24452
plexes. J Med Chem 49(21):6177–6196 23. Gaudreault F, Chartier M, Najmanovich R
18. Halgren TA, Murphy RB, Friesner RA, Beard (2012) Side-chain rotamer changes upon
HS, Frye LL, Pollard WT, Banks JL (2004) ligand binding: common, crucial, correlate
Glide: a new approach for rapid, accurate dock- with entropy and rearrange hydrogen bonding.
ing and scoring. 2. Enrichment factors in data- Bioinformatics 28(18):i423–i430. doi:10.
base screening. J Med Chem 47(7):1750–1759 1093/bioinformatics/bts395
19. Jones G, Willett P, Glen RC, Leach AR, Taylor 24. Kontoyianni M, Sokol GS, McClellan LM
R (1997) Development and validation of a (2005) Evaluation of library ranking efficacy
genetic algorithm for flexible docking. J Mol in virtual screening. J Comput Chem 26
Biol 267(3):727–748 (1):11–22

You might also like