Virtual Screening
Virtual Screening
Virtual Screening
Abstract
Stages in a typical drug discovery organization include target selection, hit identification, lead optimization,
preclinical and clinical studies. Hit identification and lead optimization are very much intertwined with
computational modeling. Structure-based virtual screening (VS) has been a staple for more than a decade
now in drug discovery with its underlying computational technique, docking, extensively studied. Depend-
ing on the objective, the parameters for VS may change, but the overall protocol is very straightforward.
The idea behind VS is that a library of small compounds are docked into the binding pocket of a protein
(e.g., receptor, enzyme), a number of solutions per molecule, among the top-ranked, are being returned,
and a choice is made on the fraction of compounds to be moved forward for testing toward hit identifica-
tion. The underlying principle of VS is that it differentiates between active and inactive compounds, thus
reducing the number of molecules moving forward and possibly offering a complementary tool to high-
throughput screening (HTS). Best practices in library selection, target preparation and refinement, criteria
in selecting the most appropriate docking/scoring scheme, and a step-wise approach in performing Glide
VS are discussed.
Key words Virtual screening, High-throughput screening, Docking, Scoring, Structure-based drug
design, GOLD, Glide, Drug discovery
1 Introduction
Iulia M. Lazar et al. (eds.), Proteomics for Drug Discovery: Methods and Protocols, Methods in Molecular Biology,
vol. 1647, DOI 10.1007/978-1-4939-7201-2_18, © Springer Science+Business Media LLC 2017
255
256 Maria Kontoyianni
2 Materials
2.1 Compound Table 1 provides a representative set of diverse private and public
Collections compound collections. Most public databases have grown out of
the need of academics to have access to chemicals. PubChem [1, 2]
2.1.1 Public Collections
was launched in 2004 and to date it includes three primary data-
bases: Substance, Compound, and Bioassay. While the Substance
database may have redundant records of the same molecule from
different contributors, the Compound database extracts all records
for a specific molecule into an aggregate record, thus making
searching more efficient. The Bioassay database provides descrip-
tions of biological experiments on the tested compounds, particu-
larly from HTS. It should be pointed out that PubChem represents
Computational Drug Discovery 257
Table 1
List of available databases for VS
2.1.3 Considerations The choice of the database should not be driven by the number of
Regarding the Library compounds that it contains, but by the existing knowledge-base of
Selection the target regarding already known actives or information pertain-
ing to the binding pocket. For example, are there known active
compounds in the library? If yes, perform ligand-based similarity
searching and eliminate the compounds that are similar to the
already known actives (binders). If not, then the use of filters to
Computational Drug Discovery 259
2.2 The Structure Typically, crystal structures of the biological targets are used. All the
of the Biological Target experimentally resolved receptor structures (NMR or X-ray) can be
found in the Protein Databank (www.rcsb.org) as PDB IDs (for
2.2.1 Crystal Structures
example, 3nxu corresponds to the cytochrome P450 3A4 isozyme
complexed with the inhibitor ritonavir) [15]. If the PDB ID is not
known, in the keyword window one should enter the name of the
target of interest, and a list of resolved crystal or NMR structures
will appear with their respective PDB IDs. However, even if the
crystal structure is available, its quality needs to be carefully eval-
uated, because crystal structures are static snapshots not accounting
for the dynamic behavior of macromolecules. Another point to
note is that the crystal structures many times have water molecules
attached. Common practice is to remove all waters. This is a point
of contention, however, because some water molecules are very
critical, i.e., form hydrogen-bonding networks with the ligand, or
are involved in the mechanism of action of the enzyme. If there are
water molecules involved in ligand binding, there are programs that
can include these molecules either as part of the grid that defines
the active site (i.e., Glide, however, the water molecules remain
static) [16–18] or enable the user to select the water molecules that
260 Maria Kontoyianni
2.2.2 Target The following parameters are evaluated prior to performing VS: (a)
Considerations Prior to VS The resolution of the structure (the higher, the better); (b) The R-
factor (or residual factor or agreement factor) which is indicative of
the agreement between the crystallographic model and the experi-
mental X-ray diffraction data, and (c) The B factor that is reflective
of the true static or dynamic mobility of an atom, and therefore it
shows where the errors are in the structure (the lower the B factor,
the better). Also, crystal structures do not have hydrogens, so
hydrogen atoms must be added prior to doing any VS experiments.
Finally, we have to consider if the crystal structure is a complex
(receptor-small molecule) or not. In the first case, the binding
pocket, which needs to be defined for docking, represents the
area around the small molecule. However, if the crystal structure
is an apo-structure (containing no small molecule within it), then
programs must be used that find the binding pockets prior to VS
[21]. If the crystal structure is not available, homology modeling
can be used to generate a receptor structure, provided there is
reasonable homology with template(s). The reader is advised to
refer to CASP (Critical Assessment of Protein Structure Prediction)
papers regarding advances in the field of model building or homol-
ogy modeling [22]. Caution should be given to metal ions and
cofactors. If the receptor is a metalloenzyme, docking with pro-
grams that have been parameterized in their scoring functions for
the specific ion is critical, or VS will not work.
2.3 An Appropriate Most investigators use a docking program that they are familiar
Docking/Scoring with or one that is available in the laboratory. However, this is not
Scheme good practice. If the receptor-small molecule complex structure is
available (see Subheading 2.2), several docking algorithms must be
used to see which one places the small molecule in the same
orientation as in the PDB ID of that target. Toward that goal, a
number of poses should be generated and visually inspected (typi-
cally 30–60 poses/ligand). It is advisable to visually inspect all
poses, as the top-ranked ones with the native scoring function are
often not accurate. Once the docking program that reproduces the
experimental pose (i.e., what is seen in the crystal structure) is
identified, the researcher should check how the pose is ranked by
the native scoring function (docking programs come with their
own scoring functions). If it is ranked at the top, then this dock-
ing/scoring scheme should be used in the VS experiment. If not,
poses should be rescored with different scoring functions. This will
lead to a docking/scoring combination and provide information on
how close to being top-ranked the observed modes are.
Finally, it is advisable that a pilot VS experiment with a smaller
subset of the chosen library, seeded with the known active com-
pounds against the target of interest as decoys, is performed using
the docking/scoring scheme identified in the preceding paragraph.
This is performed to ensure optimal parameter choices, robustness,
and to hopefully get a sense of accuracy in ranking the known hits
high(er). An excellent list of available docking programs can be
found on the Swiss Institute of Bioinformatics website (https://
www.click2drug.org), whereas Table 2 shows a representative list of
the most widely used programs.
3 Methods
VS with the Glide program is described below. The main steps that
must be performed include: (1) Prepare the target after download-
ing it from the PDB (Glide has its own preparation routine, called
“protein preparation wizard”); (2) Generate a GRID, which will
define the area that the docking algorithm has to search (binding
pocket) to place/dock the ligands; (3) Run the docking algorithm
with the generated GRID.
3.1 Target 1. Under Project, select “get pdb auto,” enter the PDB entry of
Preparation the receptor of interest, and download. This downloads the
PDB ID that the investigator is interested in.
2. Under tasks, the “protein preparation wizard” should be
selected.
3. Then “add hydrogens”. Hydrogens may not be displayed (even
if they are added) if the button “none” for “display hydrogens”
262 Maria Kontoyianni
Table 2
Representative docking programs used in VS
Fig. 1 PDB ID 1ubq, without hydrogens (left panel ) and with hydrogens (right panel ). Backbone is orange, side
chains are purple, and water molecules are depicted in blue in the left panel, while the hydrogens are blue in
the right panel
3.2 Grid Generation 1. The grid will define the space within which the docking pro-
gram will place the ligands (binding pocket). Under tasks,
select “docking grid generation.”
2. To generate the grid, one needs to click on an atom of the
ligand (“pick to identify the ligand” is checked), thus defining
it as the center of the grid. In the case that the crystal structure
does not have a ligand bound to it, the button that identifies
the ligand should be left unchecked. The van der Waals (vdw)
scaling factor is set to 1.0 (default), unless docking does not
result in any poses; in the latter case, the scaling factor should
be lowered to 0.8 in order to allow for more tolerance for
closer contacts (see Note 1).
3. Under “site,” either the centroid of the workspace ligand or
the centroid of selected residues should be checked. In the first
case, the ligand is part of the structure (see step 2 above), while
in the latter, an amino acid needs to be selected, provided we
know from experiments (i.e., mutagenesis) which amino acids
are critical for binding and/or for the enzymatic mechanism.
Once this is done, a purple rectangular box surrounding the
binding pocket appears on the workspace.
3. Select the filename of the library collection that will be used for
screening.
4. The vdw scaling factor and partial charge cutoff should be left
at default values, unless softening of the potential is thought to
be necessary (see Note 2).
5. Under “settings,” HTVS (high-throughput virtual screening)
should be selected. Ligand sampling is flexible, which means
that unlike the receptor which remains rigid throughout the
experiment, the ligands are flexible so that different conforma-
tions are identified and docked.
6. In the output tab, “write pose viewer” should be enabled
because this is the output file. Depending on the library,
“write out at most 5 poses per ligand” means that five poses
per ligand will be generated. This may not be ideal, if the library
has millions of compounds. It is up to the researcher to decide
how many poses per ligand are wanted.
7. Click Run.
3.4 Analysis 1. The results come in a project table that will be opened once the
poseviewer file (output from the run) is imported (see Note 3).
2. Open the project table, and under “entry,” “view poses”
should be selected. The arrows on the top can be used to step
through each of the docked ligands.
3. To view interactions, in the Pose Viewer menu, one can click on
what he/she is interested, i.e., contacts, hydrogen-bonds, etc.
which will be marked on the screen for each pose.
4 Notes
References
1. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, 8. Irwin JJ, Sterling T, Mysinger MM, Bolstad
Gindulyte A, Han L, He J, He S, Shoemaker ES, Coleman RG (2012) ZINC: a free tool to
BA, Wang J, Yu B, Zhang J, Bryant SH (2016) discover chemistry for biology. J Chem Inf
PubChem Substance and Compound data- Model 52(7):1757–1768. doi:10.1021/
bases. Nucleic Acids Res 44(D1): ci3001277
D1202–D1213. doi:10.1093/nar/gkv951 9. Sterling T, Irwin JJ (2015) ZINC 15–Ligand
2. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Discovery for Everyone. J Chem Inf Model 55
Zhou Z, Han L, Karapetyan K, Dracheva S, (11):2324–2337. doi:10.1021/acs.jcim.5b00
Shoemaker BA, Bolton E, Gindulyte A, Bryant 559
SH (2012) PubChem’s BioAssay Database. 10. Wishart DS, Knox C, Guo AC, Cheng D, Shri-
Nucleic Acids Res 40(Database issue): vastava S, Tzur D, Gautam B, Hassanali M
D400–D412. doi:10.1093/nar/gkr1132 (2008) DrugBank: a knowledgebase for
3. Gaulton A, Bellis LJ, Bento AP, Chambers J, drugs, drug actions and drug targets. Nucleic
Davies M, Hersey A, Light Y, McGlinchey S, Acids Res 36(Database issue):D901–D906.
Michalovich D, Al-Lazikani B, Overington JP doi:10.1093/nar/gkm958
(2012) ChEMBL: a large-scale bioactivity 11. Pawson AJ, Sharman JL, Benson HE, Faccenda
database for drug discovery. Nucleic Acids Res E, Alexander SP, Buneman OP, Davenport AP,
40(Database issue):D1100–D1107. doi:10. McGrath JC, Peters JA, Southan C, Spedding
1093/nar/gkr777 M, Yu W, Harmar AJ, Nc I (2014) The
4. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK IUPHAR/BPS Guide to PHARMACOL-
(2007) BindingDB: a web-accessible database OGY: an expert-driven knowledgebase of
of experimentally determined protein-ligand drug targets and their ligands. Nucleic Acids
binding affinities. Nucleic Acids Res 35(Data- Res 42(Database issue):D1098–D1106.
base):D198–D201. doi:10.1093/nar/gkl999 doi:10.1093/nar/gkt1143
5. Gilson MK, Liu T, Baitaluk M, Nicola G, 12. Lipinski CA (2000) Drug-like properties and
Hwang L, Chong J (2016) BindingDB in the causes of poor solubility and poor perme-
2015: A public database for medicinal chemis- ability. J Pharmacol Toxicol Methods 44
try, computational chemistry and systems phar- (1):235–249
macology. Nucleic Acids Res 44(D1): 13. Baell JB, Holloway GA (2010) New substruc-
D1045–D1053. doi:10.1093/nar/gkv1072 ture filters for removal of pan assay interference
6. Nicola G, Liu T, Gilson MK (2012) Public compounds (PAINS) from screening libraries
domain databases for medicinal chemistry. J and for their exclusion in bioassays. J Med
Med Chem 55(16):6987–7002. doi:10. Chem 53(7):2719–2740. doi:10.1021/
1021/jm300501t jm901137j
7. Irwin JJ, Shoichet BK (2005) ZINC–a free 14. Ebejer JP, Morris GM, Deane CM (2012)
database of commercially available compounds Freely available conformer generation meth-
for virtual screening. J Chem Inf Model 45 ods: how good are they? J Chem Inf Model
(1):177–182. doi:10.1021/ci049714þ 52(5):1146–1158. doi:10.1021/ci2004658
266 Maria Kontoyianni
15. Berman HM, Westbrook J, Feng Z, Gilliland 20. Verdonk ML, Chessari G, Cole JC, Hartshorn
G, Bhat TN, Weissig H, Shindyalov IN, MJ, Murray CW, Nissink JW, Taylor RD, Tay-
Bourne PE (2000) The Protein Data Bank. lor R (2005) Modeling water molecules in
Nucleic Acids Res 28(1):235–242 protein-ligand docking using GOLD. J Med
16. Friesner RA, Banks JL, Murphy RB, Halgren Chem 48(20):6504–6515
TA, Klicic JJ, Mainz DT, Repasky MP, Knoll 21. Ghersi D, Sanchez R (2011) Beyond structural
EH, Shelley M, Perry JK, Shaw DE, Francis P, genomics: computational approaches for the
Shenkin PS (2004) Glide: a new approach for identification of ligand binding sites in protein
rapid, accurate docking and scoring. 1. Method structures. J Struct Funct Genomics 12
and assessment of docking accuracy. J Med (2):109–117. doi:10.1007/s10969-011-
Chem 47(7):1739–1749 9110-6
17. Friesner RA, Murphy RB, Repasky MP, Frye 22. Moult J, Fidelis K, Kryshtafovych A, Schwede
LL, Greenwood JR, Halgren TA, Sanschagrin T, Tramontano A (2014) Critical assessment of
PC, Mainz DT (2006) Extra precision glide: methods of protein structure prediction
docking and scoring incorporating a model of (CASP)–round x. Proteins 82(Suppl 2):1–6.
hydrophobic enclosure for protein-ligand com- doi:10.1002/prot.24452
plexes. J Med Chem 49(21):6177–6196 23. Gaudreault F, Chartier M, Najmanovich R
18. Halgren TA, Murphy RB, Friesner RA, Beard (2012) Side-chain rotamer changes upon
HS, Frye LL, Pollard WT, Banks JL (2004) ligand binding: common, crucial, correlate
Glide: a new approach for rapid, accurate dock- with entropy and rearrange hydrogen bonding.
ing and scoring. 2. Enrichment factors in data- Bioinformatics 28(18):i423–i430. doi:10.
base screening. J Med Chem 47(7):1750–1759 1093/bioinformatics/bts395
19. Jones G, Willett P, Glen RC, Leach AR, Taylor 24. Kontoyianni M, Sokol GS, McClellan LM
R (1997) Development and validation of a (2005) Evaluation of library ranking efficacy
genetic algorithm for flexible docking. J Mol in virtual screening. J Comput Chem 26
Biol 267(3):727–748 (1):11–22