Frontiers in Computational Chemistry: Volume 3
By Zaheer Ul-Haq Qasmi and Jeff Madura
()
About this ebook
Related to Frontiers in Computational Chemistry
Related ebooks
Frontiers in Computational Chemistry: Volume 3 Rating: 0 out of 5 stars0 ratingsPlant Mediated Synthesis of Metal Nanoparticles Rating: 0 out of 5 stars0 ratingsBiocarbon Polymer Composites Rating: 0 out of 5 stars0 ratingsBio-Inspired Materials Rating: 0 out of 5 stars0 ratingsAdvanced Catalysts Based on Metal-organic Frameworks (Part 1) Rating: 0 out of 5 stars0 ratingsFrontiers in Clinical Drug Research - Anti-Cancer Agents: Volume 8 Rating: 0 out of 5 stars0 ratingsFrontiers in Computational Chemistry: Volume 5 Rating: 0 out of 5 stars0 ratingsDFT-Based Studies on Atomic Clusters Rating: 0 out of 5 stars0 ratingsMulti-Objective Optimization in Theory and Practice II: Metaheuristic Algorithms Rating: 0 out of 5 stars0 ratingsPolarity Index In Proteins - A Bioinformatics Tool Rating: 0 out of 5 stars0 ratingsAdvances in Organic Synthesis: Volume 3 Rating: 0 out of 5 stars0 ratingsFrontiers in Natural Product Chemistry: Volume 4 Rating: 0 out of 5 stars0 ratingsFrontiers in Nano and Microdevice Design for Applied Nanophotonics, Biophotonics and Nanomedicine Rating: 0 out of 5 stars0 ratingsComputational Modelling and Simulation in Biomedical Research Rating: 0 out of 5 stars0 ratingsChemistry of Bipyrazoles: Synthesis and Applications Rating: 0 out of 5 stars0 ratingsFrontiers in Anti-Infective Drug Discovery: Volume 7 Rating: 0 out of 5 stars0 ratingsMaterials and Technologies for a Green Environment Rating: 0 out of 5 stars0 ratingsIndicator Displacement Assays (IDAs): An Innovative Molecular Sensing Approach Rating: 0 out of 5 stars0 ratingsNanotechnology in Drug Discovery Rating: 0 out of 5 stars0 ratingsApplications of NMR Spectroscopy Volume 7 Rating: 0 out of 5 stars0 ratingsMultimodal Affective Computing: Affective Information Representation, Modelling, and Analysis Rating: 0 out of 5 stars0 ratingsFrontiers in Computational Chemistry: Volume 4 Rating: 0 out of 5 stars0 ratingsApplications of Nanomaterials in Medical Procedures and Treatments Rating: 0 out of 5 stars0 ratingsSolar Chimney Power Plants: Numerical Investigations and Experimental Validation Rating: 0 out of 5 stars0 ratingsCutting Edge Techniques in Biophysics, Biochemistry and Cell Biology: From Principle to Applications Rating: 0 out of 5 stars0 ratingsApplications of NMR Spectroscopy: Volume 6 Rating: 0 out of 5 stars0 ratingsSilver Nanoparticles: Synthesis, Functionalization and Applications Rating: 0 out of 5 stars0 ratingsFrontiers in Computational Chemistry: Volume 7 Rating: 0 out of 5 stars0 ratings
Chemistry For You
Understanding Wine Chemistry Rating: 5 out of 5 stars5/5Organic Chemistry for Schools: Advanced Level and Senior High School Rating: 0 out of 5 stars0 ratingsThe Biggest Ideas in the Universe 2: Quanta and Fields Rating: 0 out of 5 stars0 ratingsChemistry: a QuickStudy Laminated Reference Guide Rating: 5 out of 5 stars5/5Cellulose Nanocrystals: Properties, Production and Applications Rating: 0 out of 5 stars0 ratingsBasics of Chemistry Rating: 0 out of 5 stars0 ratingsSterling Test Prep College Organic Chemistry Practice Questions: Practice Questions with Detailed Explanations Rating: 0 out of 5 stars0 ratingsInorganic Chemistry For Dummies Rating: 4 out of 5 stars4/5Inventing Chemistry: Herman Boerhaave and the Reform of the Chemical Arts Rating: 5 out of 5 stars5/5Chemistry For Dummies Rating: 4 out of 5 stars4/5Alternative Fuels for Sustainable Combustion Rating: 0 out of 5 stars0 ratingsChemistry and the Sense of Smell Rating: 0 out of 5 stars0 ratingsChemistry: Concepts and Problems, A Self-Teaching Guide Rating: 5 out of 5 stars5/5Black Holes: The Key to Understanding the Universe Rating: 5 out of 5 stars5/5Dust Explosion and Fire Prevention Handbook: A Guide to Good Industry Practices Rating: 0 out of 5 stars0 ratingsOil: A Beginner's Guide Rating: 4 out of 5 stars4/5The Production of Volatile Oils and Perfumery Plants in the United States Rating: 0 out of 5 stars0 ratingsPractical Analysis of Flavor and Fragrance Materials Rating: 0 out of 5 stars0 ratingsLubricants and Lubrication Rating: 0 out of 5 stars0 ratingsGeneral Chemistry: Stoichiometry Rating: 5 out of 5 stars5/5Fundamentals and Applications of Organic Electrochemistry: Synthesis, Materials, Devices Rating: 0 out of 5 stars0 ratingsPainless Chemistry Rating: 0 out of 5 stars0 ratingsHandbook of Explosion Prevention and Protection Rating: 0 out of 5 stars0 ratingsStatistical Mechanics: Principles and Selected Applications Rating: 4 out of 5 stars4/5Most Delicious Poison: From Spices to Vices – The Story of Nature's Toxins Rating: 4 out of 5 stars4/5The Chemical History of a Candle Rating: 0 out of 5 stars0 ratingsSmart Materials for Waste Water Applications Rating: 0 out of 5 stars0 ratingsCatch Up Chemistry, second edition: For the Life and Medical Sciences Rating: 5 out of 5 stars5/5Explosive Science Experiments for Little Chemists - Science Project | Children's Science Experiment Books Rating: 0 out of 5 stars0 ratingsGeochemistry and the Biosphere: Essays Rating: 0 out of 5 stars0 ratings
Reviews for Frontiers in Computational Chemistry
0 ratings0 reviews
Book preview
Frontiers in Computational Chemistry - Zaheer Ul-Haq Qasmi
In Silico Approaches for Drug Discovery and Development
Thomas Leonard Joseph¹, Vigneshwaran Namasivayam², Vasanthanathan Poongavanam³, Srinivasaraghavan Kannan¹, *
¹ Bioinformatics Institute, A*STAR, Singapore 138671, Singapore
² Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universitaet, Dahlmannstr 53113 Bonn, Germany
³ Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Campusvej 55, DK-5230, Odense M, Denmark
Abstract
Discovery of new therapeutics is a very challenging, expensive and time-consuming process. With the number of approved drugs declining steadily combined with increasing costs, a rational approach is needed to facilitate, expedite and streamline the drug discovery process. In this regard computational methods are playing increasingly important roles, largely assisted by developments in algorithms and greatly increased computer power. With in silico methods playing key roles in the discovery of growing numbers of marketed drugs, nowadays use of computational tools has become an integral part of most drug discovery programs. Computational tools can be applied at different stages: from target selection through identification of hits to optimization. In this chapter we aim to provide an overview of major tools that have been developed and are routinely being used in the search of novel drug candidates. In addition, we present recent advances, especially in the application of physics-based simulation methodologies, in the drug discovery process for the development of improved therapeutics.
* Corresponding author Srinivasaraghavan Kannan:Bioinformatics Institute, A*STAR, Singapore 138671; E-mail: raghavk@bii.a-star.edu.sg
1. INTRODUCTION
Drug discovery is the process of creating or finding a molecule which has a specific activity on a biological organism. The aim of the discovery process is
to identify compounds with pharmacological interest that can be used in the treatment of diseases. As several factors decide the activity of a drug molecule, undoubtedly the development of a new drug is a complex and difficult process. It is estimated that a drug discovery process can cost several hundred million dollars and a typical discovery cycle can take as many as 15 years from the first compound identified in the laboratory until the drug is brought to market [1-6]. Traditionally drug discovery starts with an experimental screening of compound libraries of molecule that bind to biomolecular targets and modulate their activity. This is followed by subsequent rounds of iterative chemical modifications to enhance their potency, with further optimization for increased selectivity and pharmacological properties [5, 6]. The emergence of combinatorial chemistry combined with rapid developments in high throughput screening (HTS) technologies have speeded up the discovery process by enabling huge libraries of compounds to be screened in short periods of time [7-10]. However the hit rates for high throughput screens are often extremely low and most identified hits do not proceed to actual leads [7-10].
The sequencing of human genome has revealed unknown proteins that might serve as new drug targets. However the therapeutic importance of most of these proteins is either unknown or poorly characterized. The routine set of experiments (blind expression, purification and in vitro assays) that are typically used, cannot be applied for thousands of proteins against libraries of several hundreds of thousands of compounds. Therefore new approaches are needed to speed up and streamline drug discovery and development process to save time, money and resources. In this regard computational approaches have a major role to play.
A variety of computational approaches can be applied at different stages of the drug-design process; right from target identification and validations, identification of initial hits, hit-to-lead selection, and optimization of leads to avoid safety issues.
In this chapter we aim to provide an overview of major in silico tools and approaches that have been developed and are routinely being used to search for novel drug candidates. In addition we will also present recent advances (enhancements), especially the application of physics-based simulation methodologies that lead to a dynamic view of receptor drug interaction, replacing the traditional dogma of single structure-based drug design with the concept of ensemble–based drug design, where conformational flexibility of a receptor molecule plays key roles.
In the first section, we introduce two major Computer Aided Drug Design (CADD) strategies namely ligand based and structure based methods that are widely used in the drug discovery process. Next we briefly introduce several computational techniques that are routinely used. In the third section we will introduce Molecular Dynamics (MD) simulations and applications at various steps of the drug discovery process. We then discuss computational methods for predictions and optimization of drug metabolism and pharmacokinetics. Finally we will discuss targeting protein-protein interactions and briefly introduce peptide based inhibitor design for inhibitions of protein-protein interactions. The goal here is to offer an overview of highly promising themes and tools in this interdisciplinary field.
2. COMPUTER AIDED DRUG DESIGN STRATEGIES
Drug discovery is an extended and time consuming process, which can take several years to translate a compound into a drug molecule. Therefore development of a drug discovery process with the ability for rapid identification of potential binders to the target of therapeutic interest is of great importance in the biotech and pharmaceutical companies. In this regard computational methods enable rapid screening of huge libraries of pharmacologically interesting compounds for identifying potential binders through modelling and simulation. Strategies for CADD vary depending on the availability of structural and other information regarding the target (enzyme/receptor) and the drug (ligand). Two major modelling strategies indirect
and direct
are currently used in the drug discovery process (Fig. 1). In the indirect approach, also known as Ligand based
the design is based on a comparative analysis of the structural features of compounds with known activity. The direct approach, also known as Structure based
, utilizes the three-dimensional structural features of the target molecule of interest. We now examine these two in some detail.
Fig. (1))
Flow chart of structure based and ligand based drug discovery approaches.
2.1. Ligand Based Drug Discovery
The ligand-based computer-aided drug discovery approach is considered as an indirect approach for the design for small molecules, and does not require knowledge of the structure of the target molecule. This approach uses a set of ligands that are known to interact with a target of interest and analyze their 2D and 3D structures. The aim here is to represent the compounds in a way the key physicochemical properties that are important for their desire interactions are retained. The ligand based approach is based on the similarity property principle [11], which states that molecules that are structurally similar are likely to have similar properties. The two fundamental ligand based approaches are (1) Chemical similarity: compound selection is based on chemical similarity to known actives using some similarity measure, (2) Quantitative Structure Activity Relationship (QSAR) model: compound selection is based on prediction of biological activity from the compound’s chemical structure via some statistical model. Since the ligand based techniques rely entirely on chemical structures, physicochemical properties and/or associated biological activity, it uses several methods (computational algorithms) to describe features of small molecules. Molecular descriptors [12-19] can describe both structural as well as physicochemical properties. Once the molecular descriptor of a bio-active small molecule is derived, then this can be used to screen against databases of small molecule libraries that are structurally and or physicochemically similar. Fingerprint methods can be used to search databases of compounds that are similar in structure to a lead query [20, 21]. QSAR methods describe the relationship between structure/descriptors and their experimental/biological activity mathematically [22]. The aim is to produce a suitably robust model capable of reliable predictions for novel chemical species. From a set of compounds together with their known biological activity a QSAR model will be generated then applied on a library of test compounds to predict the activity that are encoded with same descriptors. A pharmacophore model generated from compounds with known biological activity can also be used to screen databases of small molecules. A pharmacophore is a spatial arrangement of the functional groups that are important for a compound or drug to evoke a desired biological response [23]. In addition to the functional group, an effective pharmacophore will contain information about their interactions with the target. A pharmacophore is usually generated from multiple active compounds that are overlaid in their bio active conformations in such a way that a maximum number of chemical features overlap geometrically.
These ligand based methods are applied to screen compounds virtually screening for novel compounds possessing the biologic activity of interest, hit-to-lead and lead-to-drug optimization and also for optimization of DMPK/ADMET properties [24-46].
2.2. Structure Based Drug Discovery
Structure-based computer-aided drug design approaches rely on the 3D structures of target molecules. This direct approach is based on the assumption that a molecule’s ability to invoke a desired biological effect depends on its ability to interact with a specific target at a particular binding site. Therefore molecules that share the favorable interactions will have similar biological effects. For the screening of compound libraries, in an effort to find novel binders, the structure based approach employs a docking algorithm to rank large libraries of compound. Docking based virtual screening is an important aspect of structure based approaches. For rapid identification of hits, the docking method employs various algorithms to first predict the protein–ligand complex structure and then to assess the energetics of the predicted complexes, in order to discriminate potential binders from non-binders.
Structural information about the target is a prerequisite for any structure based approach. Typically experimentally determined target structures provide an ideal starting point for docking. In the absence of experimental structures, computational methods are used to predict the 3D structures of target proteins. Comparative modelling is used to generate a 3D structure of a target molecule, using as a template, the known 3D structure of a protein that is similar in sequence to the target protein. Several successful virtual screening campaigns have been reported based on comparative models of target proteins [47-60]. In HTS the experiments assert the general ability of a ligand to bind, orthosterically or allosterically, either inhibit or alter a protein’s function. In contrast, the structure based drug discovery approaches employ virtual screening methods whereby molecules that bind a particular binding site in the target structure are screened for. Therefore knowledge about the structures of binding sites and protein-ligand interactions is a prerequisite for structure based approaches. A 3D structure of a receptor-ligand complex can provide information on where the ligand binds to their macromolecular targets, and the specific interactions that are important for the binding. Often the small molecule binding sites are known from co-crystal structures of the target or a homologous protein. In the absence of experimental structures, mutational studies can aid in identifying the ligand binding sites. Alternatively, various in silico approaches [50, 51] can be used to identify putative binding sites. Once the binding site is identified, protein–ligand docking algorithms that simulate the binding of molecules to these sites are applied to screen for potential binders. The aim of a docking experiment is to find the best position and orientation of a molecule in the binding site of the target. Over the years several protein–ligand docking programs have been developed [52-55]. Depending on the degree of flexibility considered for both the ligand and receptor molecule, docking methods can be classified as rigid-body docking or flexible docking. Although earlier docking methods treated both the ligand and receptor as rigid entities, with advances in algorithms and computational facilities, most of the docking programs now treat ligand molecules as flexible, however the receptor is still treated with only partial flexibility. Therefore in structure based virtual screening approaches, generally, during the initial screening, rigid docking is preferred to dock a large number of compounds. This is followed by refinement and optimization of the protein–ligand poses by flexible docking methods. A docking run may generate hundreds of thousands of protein-ligand complex conformations, therefore the docking applications need to rapidly and accurately assess these protein-ligand complexes. Docking methods use physics based scoring functions to rank and differentiate valid binding mode predictions from invalid predictions [53, 55]. These scoring functions range from simple empirical schemes to extremely computer intensive theoretical calculations. For efficient screening of large libraries of compounds, simple scoring functions are favored, over methods that are computationally intensive, to obtain a qualitative useful score in a reasonable amount of time. A common practice is to use very simple scoring functions at the early stage for rapid screening and use computer intensive sophisticated functions on a subset for accurate prediction.
The success of virtual screening is dependent upon the amount and the quality of structural information known about both the target and the small molecules being docked. Structure based approaches have been used successfully in identifying novel and potent hits in several drug discovery campaigns [56-68].
3. TOPICS IN CADD
In the following section we discuss briefly a number of current topics in CADD.
3.1. Databases
Huge amounts of organic molecules, biological sequences and related information has been accumulated in the scientific literature including case reports. Several computational algorithms are actively developed to organize and store this huge volume of available information, in the form of databases [69]. Access to such databases is very critical for the success of drug discovery and development campaigns. Some of the important data sources are reviewed in this section.
3.1.1. Small Molecule Databases
The increasing availability of small molecules database plays a major role in modern drug discovery. Several compilations of small molecules and their physicochemical properties are readily available [70-77]. One of the important in silico methods, virtual screening relies on a database or library of molecules. Virtual libraries can be assembled in variety of sizes that possibly applied to screen against any target focused libraries are designed to a related family of targets, and targeted libraries are designed specifically for a particular target of interest. In general, virtual screening approaches focus on drug-like molecules that are already synthesized or can easily synthesized from available starting materials. Thus, the small molecule databases [69-75] provide a variety of information including known/available chemical compounds, drugs, carbohydrates, enzymes, reactants, and natural products. To some extent, the success achieved in discovering new ligands also rests on the quality of the database used for screening. Indeed careful database preparation can lead to better results in virtual screens [76].
3.1.2. Preparation of Ligand Libraries
Ligands need to be represented as chemical data structures. Some ligands may require multiple structures depending on their chirality and/or tautomerization and/or protonation state(s) [77, 78]. For 3D virtual screening applications, preparation of a small molecule database involves conversion of a 2D molecular representation to a 3D structure file. Depending on the intended use of the database, each structure may further require elucidation of one or more possible 3D conformers. Database preparation invariably involves deleting from, and adding to, the database. Generally, libraries of molecules are generated with the application of computational and combinatorial tools. As comprehensive computational enumeration of all chemical space is and will remain infeasible, it is necessary to filter the compounds to obtain those with a high likelihood of bio-medical relevance. A wide range of filters may be applied to discredit compounds with unfavorable pharmacodynamic or pharmacokinetic properties [79]. Typically, chemical functionalities that may cause unfavorable DMPK/ADMET properties and molecules containing reactive or otherwise generally undesirable functional groups are excluded [79]. Drug likeness is commonly evaluated using Lipinski’s rule of five [80]. Lipinski's rule of five states that a compound with more than 5 hydrogen bond donors (HBD), 10 hydrogen bond acceptors (HBA), MW>500, and ClogP > 5, is more likely to manifest poor absorption or permeation [80]. Compound collections as well as initial chemical leads can benefit from these rules. Lipinski et al. [80, 81] noted that finding good starting points for medicinal chemistry based drug discovery is key to the quality of the final optimized compounds and overall project success. Lipinski's rules were elaborated upon with the introduction of number of rotatable bonds
(NRB) and polar surface area
(PSA), which can be useful descriptors for oral bioavailability [82] and passive absorption [83, 84]. Compound libraries are often enriched for a particular target or family of targets. Physiochemical filters derived from observed ligand-target complexes are used for enriching a library with compounds that satisfy specific geometric or physicochemical constraints [84, 85]. Such libraries are prepared by searching for ligands that are similar to known active ligands. In addition, a small molecule library requires preparations, such as conformational sampling and assigning proper stereo isometric and protonation states [86, 87]. Molecules are flexible in solvent environments and hence representation of conformational flexibility is an important aspect of molecular recognition. Many screening tools have integral conformational search engines, thus requiring only one conformer as input. Other approaches (e.g., rigid docking) require multiple conformers of ligands that are pre-computed using simulations or knowledge-based methods. A virtual screening tool that generates conformers on the fly avoids the calculation and storage of a multi-conformer database, but requires additional computation time for each execution. Alternatively, generating conformers as a separate process may allow more control and fine-tuning of this important step [88-92].
3.1.3. Virtual Combinatorial libraries
In modern drug discovery combinatorial chemistry is an important component, however often far too large a number of compounds are synthesized or screened and possibly these libraries contain compounds that have similar physicochemical properties. Therefore an improved design of such libraries, by optimizing the library’s diversity or similarity to a target that can maximize the number of true leads and reduce redundancy could be the best way forward. The compounds in the libraries can be optimized for molecular diversity or similarity. This can be achieved using descriptors such as chemical composition, topology, 3D structures and functionality [93]. Additionally, drug-likeness using heuristic rules to detect ADME/Tox deficiencies [94].
3.1.4. Representation of Small Molecules
Efficient use of ligand databases requires generalized methods for the virtual representation of small molecules. SMILES (Simplified Molecular Input Line System) was introduced as a simplified format to represent small molecules in two-dimensions [95-97]. Formal charges, bond types can all be described explicitly in the SMILES representation. There is no necessity for defining the aromaticity with an extended version of Huckel’s rule [98]. SMILES does not explicitly encode hydrogen atoms and conventionally assumes that hydrogens make up the remainder of an atom’s lowest normal valence. Due to the representation of molecular structures as linear strings of symbols that could be efficiently read and stored by computer systems across multiple platforms, the method was most preferred. In general, there are many different but equally valid SMILES descriptions for the same structure. SMARTS (SMILES ARbitrary Target Specification) is an extension of the SMILES representation of small molecules and allows for variability within the represented molecular structures [99]. It also provides substructure search functionality to SMILES, including logical operators such as AND
(&), OR
(,), and NOT
(!), and special atomic and bond symbols that provide a level of flexibility to chemical names. InChI (International Chemical Identifier) is an open source structure representation algorithm to unify searches across multiple chemical databases using modern internet search engines [100]. The main purpose of InChI and the hash-key version InChIKey is to provide a nonproprietary machine-readable code unique for all chemical structures that can be indexed without any alteration by major search engines. InChI is made up of several layers and these layers represent different classes of structural information.
3.1.5. Molecular Descriptors/Features
Molecular descriptors are numerical representations of chemical features or information that are encoded in the chemical structure of a molecule. Molecular descriptors can be electronic, structural, physicochemical, or topological, and can also be described at multiple levels of increasing complexity with both global and local features. The descriptors are generated by utilizing knowledge-based, graph-theoretical methods, molecular mechanical or quantum-mechanical tools [101, 102]. Currently, there are over 3,700 types of descriptors, classified into three broad categories: 1-, 2- and 3-D descriptors encoding chemical composition, topology, and 3D shape and functionality, respectively [103]. Descriptors available within the same dimensionality can show a range of complexity. For example, descriptors such as molecular weight and number of hydrogen bond donors are relatively simple and can be rapidly and accurately computed. On the other hand more complex descriptors encoding multiple physicochemical and structural properties of a compound are quite difficult to compute. However, the higher the information content provided by these descriptors the better is its use for model development. The compromise in computing such descriptors is between the high speed needed to encode thousands of molecules and sufficient accuracy. Different computer programs [104] have been developed to derive molecular descriptors of a compound.
3.2. Target Databases for Computer-Aided Drug Design
For structure-based computer aided drug discovery, the knowledge of the 3D structure of a target protein is required. In 1971, the Brookhaven National Laboratory established the Protein Data Bank (PDB) [105] as a single worldwide archive of structural data of biological macromolecules. The PDB currently houses more than 100,000 protein structures that are determined experimentally, mostly by X-ray crystallography and NMR spectroscopy. In case an experimental structure is not available for a protein molecule, computational modelling of protein structures is possible on the basis of experimentally determined structures of homologue proteins; this process is referred to as homology modelling. The Swiss-Model server [106] and Modeller [107] are the most widely used tools for homology modeling.
The genome sequencing of human and other model organisms produce increasingly large amounts of data relevant to the study of human disease. This provides an opportunity to identify many unknown proteins that possibly serve as new drug targets. However, in the absence of a well-established experimental setup and detailed 3D structures, validation of these proteins as potential drug targets is a challenging task. Thus, there is a need for rapid and accurate functional assignment of novel proteins. Effectively, identification and validation of possible targets is the first step in the drug discovery process. Many new methods and integrated approaches are continuously explored in order to improve the discovery rate and exploration of novel therapeutic targets. By utilizing the global sequence and structure comparisons the putative functions of the proteins have been primarily assigned. For the rapid assignment of biological function to hypothetical or unknown function proteins, sequence homology has been used routinely [108]. In addition to global sequence similarity, methods that compare the ligand binding sites to infer biological function are used to aid drug discovery. Recently [109] there has been substantial progress in exploring the usefulness of in silico machine learning methods, such as support vector machines (SVM) for predicting druggable proteins. Independent of amino acids sequence similarity, the SVM approach attempts to predict target proteins. This facilitates the prediction of druggable proteins that exhibit no or low homology to known targets. Determining the potential of a protein as a therapeutic target and its structural details are essential for the structure-based drug design approach.
3.3. Similarity Searches
The basic concept for the ligand based screening methodologies is the similarity property principle [11], which asserts that molecules with similar structures share similar properties. Ligand-based similarity methods are largely depend on this basic principle that structural likeness enhances the chances to share a common bioactive profile. Thus, selecting compounds similar to the available drugs increases the possibility of identifying an alternative compound and possibly it could be another potential lead. In general, it is common to apply similarity searches in the identification of compounds based on their similarity to active ones. Therefore, in ligand based virtual screening efforts the molecular structure and property descriptors of interacting molecules are extrapolated to search for other molecules with similar characteristics. For this purpose several methods have been proposed and used [20, 21, 103, 110-113]. Molecular fingerprints are the most widely used method for similarity search in ligand based virtual screening approaches. Molecular fingerprints are string
representations of chemical structures and properties [110]. Because of its simple representation, these fingerprint-based techniques allow rapid structural comparison in an effort to identify structurally similar molecules or to cluster collections based on structural similarity. Molecular fingerprints encode 2D and /or 3D features of the molecular structure in a series of binary bits that represent the presence or absence of particular substructures in the molecule [110]. Although it splits the entire molecule into a large number of fragments, it has the potential to retain the overall complexity of drug molecules. The main strength of this approach is its ability to compare multiple fingerprints and compute their similarity by using, for example, the Tanimoto coefficient [113], which greatly facilitates similarity based searches. In addition fingerprints are also used to increase molecular diversity of test compounds.
Fingerprints may be classified according to their dimensionality, ranging from one dimensional (1D) to three dimensional (3D) [110]. Among the commonly used ones, the most popular and efficient are 2D fingerprints. However the major drawback of the fingerprint-based method is that the identified features of a query molecule are considered equally important for ranking candidate molecules, regardless of the effect reflected from these features on the biologic activity of a given target. Despite this drawback, 2D fingerprints continue to be the selected method for similarity-based virtual screening [113]. These similarity based search methods are less hypotheses driven and less computationally expensive in comparison to pharmacophore or QSAR models. They depend on chemical structures of compounds and do not rely on biological activity, making the approach more qualitative in nature than other ligand based approaches. From a structural similarity search within a dataset of small molecules, it is possible to retrieve compounds containing identical substructures that share affinity for the same receptor.
3.4. Quantitative Structure-Activity Relationship (QSAR)
QSAR is derived from the quantitative relationship between the chemical structure and its associated biological activity [114-118]. Computational techniques available on the basis of structure–activity relationships have accelerated the drug design process [22]. By the application of statistical methods for a set of chemically related compounds, QSAR attempts to correlate structural/molecular properties (descriptors) with biological activities. These descriptors of chemical structures are characterized by physicochemical, structural and topological properties. These properties can be obtained from either experimentally measured quantities or calculated using molecular modeling software. Biological activity is usually the concentration of effectors at which they exert certain pharmacological or biological effects. The objective of structure-activity modelling is to analyze and identify the determining factors for the measured activity for a particular system in order to obtain an insight of the mechanism and behavior of that system. For such purposes, the employed strategy is to generate a mathematical model for connecting experimental measures with a set of chemical descriptors determined from the molecular structure of a set of compounds. Model building is an iterative process to find the right combination of descriptors to relate to the property and their predictive potential. Depending on the descriptor/properties calculated for a ligand, the QSAR approach is classified into different types, for example, 1D, 2D, 3D, 4D QSAR etc. [119]. 1D/2D QSARs are called classical QSAR, where the molecular properties like logP, molar refractivity, molecular weight, connectivity indices are correlated with activity. In the 3D QSAR approach, the three-dimensional structure of the ligands will be used to calculate the surrounding molecular interaction field (MIF) effect, such as steric, electrostatic and hydrophobic effects using force field parameters [120].
3.4.1. Classical QSAR (1D/2D)
In 1960, Hansch and Fujita [121, 122] proposed a function to express the biological activity as molecular or fragmental descriptors: Biological activity = f * (molecular or fragmental descriptor). This approach from Hansch-Fujita involves the correlation of various electronic, hydrophobic, and steric features with biological activity through linear or non-linear regression. In 1964, Free and Wilson [123] developed a mathematical model include various chemical substituents to biological activity (each type of chemical group was assigned an activity contribution). This Free-Wilson approach is also called the true structure activity relationship model. These two methods were later combined to create the Hansch/Free-Wilson method [124]. The combination of these two approaches results in the advantages of both the Hansch and Free-Wilson analysis and widens the application of both methods. The data utilized to establish the QSAR equation are assembled into a matrix of numbers representing the data for compounds as rows and the physicochemical property descriptors as columns. In 2D QSAR, descriptors are substituent constants that are assumed to be exchanged from one series to another. A large number of substituent constants have been assembled and used to find a quantitative relationship between the chemical space (i.e., descriptors) against the biological data points through a statistical method, i.e. multi linear regression (MLR) [125]. The general purpose of statistical methods is to relate several independent variables (i.e., descriptors) and a dependent or criterion variable. MLR is the most extensively used mathematical method in classical QSAR, due to the ease in its ability to interpret; a number of pitfalls exist. Keeping the optimum ratio of compounds to descriptors and limiting inter-correlation between the descriptors (0.3) within the model will rule out non-significant relationships. Partial least squares (PLS) is another statistical method executes strongly correlated and/or noisy or numerous variables [126, 127] and gives a reduced solution, which is statistically more robust than MLR for a larger set. The linear PLS model finds new variables
with linear combinations of the original variables. To obtain the optimum number of components, PLS is normally used in combination with cross-validation. This confirms that the QSAR equations are selected not on the basis of their ability to predict the data rather than to fit the data, but PLS models may not be easily interpretable. However, other statistical learning methods such as neural networks and SVM have been explored for predicting compounds of higher structural diversity [128-130].
3.4.2. 3D-QSAR
3D QSAR are quantitative models which are developed by relating the biological activity of small molecules and their properties calculated in 3D space. In 1988, an approach was introduced to describe molecular properties as fields (usually steric, electronic, hydrogen bonding, and hydrophobic fields) calculated in a regular grid [131]. This method, called Comparative molecular field analysis (CoMFA), is one of the most widely and commonly used 3DQSAR methods. In CoMFA, the small molecules are aligned and features are extracted from this alignment to relate compound properties with biological activity. This method largely focuses on the molecular interaction fields alignment rather than of each individual atom features. Over the years in the absence of structural data of a target, CoMFA has been established as a standard technique for constructing 3D models [132]. However, the most difficult aspect of a 3D QSAR analysis is selecting the appropriate alignment rules for the training set i.e. the bioactive conformation. For certain datasets this can present difficulties, such as for compounds with a large number of rotatable bonds it is difficult to find a proper alignment, or even impossible. These problems limit the applicability of CoMFA. In order to overcome this, new approaches have been developed recently that do not depend on a common alignment of the molecules [133-138]. The comparative molecular similarity indices (CoMSIA) method [133] is an important extension to CoMFA which was developed to improve limitations of the steric and electrostatic fields in CoMFA. In CoMSIA, the molecular fields includes hydrophobic and hydrogen-bonding terms in addition to the steric and columbic contributions. Similarity indices are calculated instead of interaction energies by comparing each ligand with a common probe, and Gaussian-type functions are used to avoid extreme values. However, one important limitation to these methods is that their applicability is limited to static structures or low energy conformations of similar scaffolds, hence neglecting the dynamical nature of the ligands.
3.4.3. Multidimensional QSAR
Multidimensional QSAR (mQSAR) was developed in the quest to quantify all energy contributions of ligand binding, including removal of solvent molecules, loss of conformational entropy, and binding pocket adaptation. 4D-QSAR [139] is an extension of 3D-QSAR that considers each molecule as an ensemble of different conformations, tautomers, stereoisomers, and protonation states. The ensemble sampling of spatial features of each molecule is referred to as the fourth dimension in 4D-QSAR. In the case of 5D-QSAR [140], local changes in the binding site has been contributed to an induced fit model of ligand binding.
Similar to pharmacophore and similarity search methods, QSAR has been used to screen to virtual libraries of compounds for novel therapeutics [31]. In addition to predicting the function of novel compounds within a virtual library, QSAR has been used to enhance compound libraries used in traditional HTS [39]. It can direct combinatorial library synthesis for constructing libraries to be screened against targets of a particular class or classes [35]. This allows the researcher to cover a wide range of chemical spaces that have been enriched with compounds more likely to be hits for their target of interest. When the structural information of the target is unknown, QSAR has also been applied in de novo drug design techniques [104, 141]. Descriptor and model generation is performed and used to score the molecules generated with a de novo-technique in place of other structure-based scoring techniques, such as docking [104].
The success of QSAR not only depends on the quality of the initial set of active/inactive compounds but also on the selection of descriptors, and the ability to generate an appropriate mathematical model. One of the most important considerations is that all models generated will be dependent on the chemical space of the initial set of compounds with known activity. In other words, divergent scaffolds or functional groups which are not represented within this training
set of compounds, will not be represented in the final model. This means that any potential hits within the library to be screened that contain these groups will likely be avoided. It is therefore favorable to include a wide chemical space within the training set.
3.5. Pharmacophores
A pharmacophore is defined as a spatial arrangement of functional groups (steric and electronic features) that a chemical entity contains for optimal interactions with a specific biological target structure to evoke a desired biological response [142]. An effective pharmacophore will contain information about functional groups and information about the type of interactions with the target (Fig. 2). The geometric and topological constraints can be derived either in a structure based approach, by mapping the sites of contact between a ligand and binding site, or in a ligand-based approach by comparing structures of several known active compounds. With an established pharmacophore model, a 3D search against large databases can