Raghava13.4 Open Dources of Bioinformati Cs
Raghava13.4 Open Dources of Bioinformati Cs
Raghava13.4 Open Dources of Bioinformati Cs
net
1172 Current Topics in Medicinal Chemistry, 2013, 13, 1172-1191
Open Source Software and Web Services for Designing Therapeutic Mole-
cules
Deepak Singla1,2, Sandeep Kumar Dhanda1, Jagat Singh Chauhan1, Anshu Bhardwaj3,
Samir K. Brahmachari3,4, Open Source Drug Discovery Consortium3 and Gajendra P.S. Raghava1,*
1
Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India; 2Centre for Microbial Biotechnol-
ogy, Panjab University, Chandigarh, India; 3CSIR-Open Source Drug Discovery Unit, New Delhi, India; 4CSIR-Institute
of Genomics and Integrative Biology, New Delhi, India
Abstract: Despite the tremendous progress in the field of drug designing, discovering a new drug molecule is still a chal-
lenging task. Drug discovery and development is a costly, time consuming and complex process that requires millions of
dollar and 10-15 years to bring new drug molecules in the market. This huge investment and long-term process are attrib-
uted to high failure rate, complexity of the problem and strict regulatory rules, in addition to other factors. Given the
availability of ‘big’ data with ever improving computing power, it is now possible to model systems which is expected to
provide time and cost effectiveness to drug discovery process. Computer Aided Drug Designing (CADD) has emerged as
a fast alternative method to bring down the cost involved in discovering a new drug. In past, numerous computer pro-
grams have been developed across the globe to assist the researchers working in the field of drug discovery. Broadly,
these programs can be classified in three categories, freeware, shareware and commercial software. In this review, we
have described freeware or open-source software that are commonly used for designing therapeutic molecules. Major em-
phasis will be on software and web services in the field of chemo- or pharmaco-informatics that includes in silico tools
used for computing molecular descriptors, inhibitors designing against drug targets, building QSAR models, and ADMET
properties.
Keywords: Open source drug discovery, QSAR models, software, machine learning techniques, chemoinformatics, pharma-
coinformatics.
Fig. (1). Flowchart of drug discovery – The amount of fund required depends on the success rate at the clinical trial stage.
ics, major emphasis is on designing of drug molecules or molecules. These resources are used for developing vari-
ligands and their interaction with drug targets. In contrast to ous models for predicting inhibitors.
bioinformatics, cheminformatics is dominated by proprietary
or commercial software/webservers where software are
• Molecular Editors: This topic will cover software and
web services for drawing new molecules and for editing
costly with stringent license conditions. Due to heavy cost of
of existing molecules. This topic will also include tools
chemoinformatics resources, computer-aided drug discovery used for visualization of molecules.
is still a costly affair.
Despite the existence of a large number of computational
• Analog Generators: In this section, we will describe
software used to generate analogs of molecules. It will
methods which are available to evaluate and priortize inhibi-
also include software used to generate the virtual library
tors, very few are ready for public use. Some of those which
of chemicals.
are published are not easy to use due to lack of a webserver
or standalone package. These issues can only be resolved • Structure Optimization: Software used for generating
efficiently by enhancing collaborations in an open and free 2D/3D structure, and for optimization of energy/geometry
knowledge sharing environment. Recent research also sug- of molecules will be covered under this topic.
gests that open collaborative drug discovery will be the fu- • Molecular Descriptors: Calculation of molecular de-
ture paradigm of biomedical research [18-20]. In order to scriptors is fundamental requirement for developing
overcome the limitations of existing approaches, open QSAR models. We will describe the software available
source/ freely available software have been developed by for the same.
different organizations like OpenTox [21], OSDD (Open
Source Drug Discovery), CDD [22], Blue Obelisk [23] etc. • Similarity Search: This topic will describe software or
In past, number of reviews had been published in this area. web services which are frequently used to perform
In this review, our major focus is on software that are freely chemical similarity search.
accessible and could be used at different stages of the drug • QSAR/QSPR Models: Software used for developing
discovery process (Fig. 2). Our focus will be on chemoin- models like QSAR, QSPR, QSTR will be described.
formatics or pharmacoinformatics related software that are
used for designing drug molecules/ligands/inhibitors. In • Chemical Clustering: Classification and clustering of
principle the success of designing a drug which has high small molecules is important to understand property of a
selectivity and specificity to a given functional target is the scaffold, major free software will be described in this sec-
key challenge in order to have a potential drug with least tion.
possible side effects. Also, the present tools have limited • Molecular Docking: This topic will describe commonly
predictive capacity for estimating pharmacokinetic parame- used software packages for docking small molecules in
ters which are imperative to have an ideal drug. We have macromolecules.
covered the following topics in this review.
• Pharmacophore Tools: In this section, we will cover
• Source of Molecules: Resources in the field of drug de- resources important for pharmacophore search.
sign like databases on chemicals assays/properties, drug
1174 Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 Singla et al.
• ADMET Techniques: Software/webservers used for ries that are managing comprehensive information about
computing drugability of molecules, and ADMET proper- millions of chemicals. This section describes major source of
ties will be described. molecules that are freely available for public use. Chemical
databases/resources are the backbone of computer-aided
• Drug Target Prediction: This section describes the soft-
drug discovery, whether it is chemoinformatics or pharma-
wares and webservice important in prediction of drug tar-
coinformatics or bioinformatics. These databases provide
gets.
information that is used to build knowledge-based models
• Designing of Inhibitors: This topic describes different for discovering and designing drug molecules. Here, we have
tools that allow users to predict inhibitors against a target. covered major databases that are available free for public
These tools generally used diverese techniques (like use, (Table 1) provides brief description of each database. A
QSAR models, docking, screening) for designing inhibi- number of commercial/in-house databases such as WOM-
tors. BAT [24], World Drug Index (http://www.daylight.com/
• Other resources: Miscellaneous major resources over the products/wdi.html) etc. are into existence for a long time.
internet that are serving community will be covered under However, more recently the availability of molecule data-
this topic. bases such as PubChem [25, 26], Zinc [27], ChEMBL [28]
has dramatically changed the landscape of publicly available
• Major Initiatives: Numerous organizations and groups cheminformatics resources. Some of the databases like Pub-
working towards affordable drugs will be covered in this Chem BioAssay, ChEMBL also encapsulates the informa-
section. tion regarding the target protein, organism on which the
• Future Prospects: This section will describe forthcoming chemical is effective and sometimes along with their activity
prospects of open source in drug discovery including score. In the foreground, we have described major databases
limitations of existing resources. in brief, these database are the backbone for developing
models for drug discovery (Table 1).
SOURCE OF MOLECULES PubChem project is an open public repository and main-
tain three types of information, namely, substance, com-
In ancient times, natural products obtained from various
sources like plants were used to test whether they have bio- pound and BioAssays [25, 26]. PubChem Substance contains
original chemical structure submitted by different vendors,
logical activity. These natural products were major sources
publishers or other government agencies. PubChem Com-
for discovering therapeutics. Subsequently, synthetic chem-
pound maintains the index of unique chemical structures
ists have synthesised large number of chemical compounds
present in PubChem Substance. PubChem BioAssay cur-
and generated library of synthetic molecules in the last cen-
rently contains information about 500,000 assays, covering
tury. Presently, there are number of databases and reposito-
Open Source Software and Web Services for Designing Therapeutic Molecules Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 1175
5000 protein targets, 30,000 gene targets and providing over cially available compounds. This database contains 21 mil-
130 million bioactivity outcomes [25]. ChEMBL is a manu- lion compounds available for virtual screening. The most
ally curated database that provides comprehensive informa- important feature of this database is that users can filter data
tion about 1 million bioactive (small drug-like molecules) using various features like molecular weight, logP etc.,
compounds with 8200 drug targets [28]. This database also which leads to a smaller dataset with most relevant proper-
contains the different dataset for neglected diseases like ma- ties.
laria from both commercials as well as academics sources.
ChemDB is a database of commercially available small
ZINC database maintains information about all the commer-
molecules. it contains around five million chemicals [29].
This database provides different types of information of
Table 1. Databases and Resources Managing and Hosting a chemicals that includes predicted or experimentally deter-
Database of Chemical Compounds mined physicochemical properties, such as 3D structure,
melting temperature and solubility. ChemSpider contains
more than 28 million unique chemical entities aggregated
Database Brief Description with URL from more than 400 diverse data sources [30]. Each structure
entry in ChemSpider is associated with a list of predicted
PubChem
A comprehensive database of bioassays, compounds and molecular properties as well as possibly available experi-
substances (http://pubchem.ncbi.nlm.nih.gov/) mental data, spectra etc. It has also been integrated with the
SureChem (http://surechem.com/) patent database collection
Database of drug like molecules
ChEMBL of structures to facilitate structure-based linking to patents
(https://www.ebi.ac.uk/chembldb)
between the two data collections. NCI database has more
Maintain commercially-available compounds for virtual than 275,000 small molecule structures, a very useful re-
Zinc
screening (http://zinc.docking.org/) source for researchers working in the field of cancer/AIDS
[31]. In addition to big databases, there are some databases
ChemDB Collection of small-molecules (http://cdb.ics.uci.edu/)
that maintain specialized information. These databases main-
ChemSpider A chemical database (http://www.chemspider.com/) tain chemical compounds information about their role in the
biological system like KEGG contains association of chemi-
Commercial compounds cals in pathway and diseases [32]. Similarly, number of da-
MMsINC
(http://mms.dsfarm.unipd.it/MMsINC/) tabases maintain interaction of target-ligand interaction that
Maintain comprehensive information is essential for target based drug discovery [33, 34].
KEGG
(http://www.genome.jp/kegg/)
MOLECULAR EDITORS
SMPDB Small molecule Pathway database (http://www.smpdb.ca)
Molecular editors are commonly used tools in the field of
HMDB Human Metabolites (http://www.hmdb.ca/) cheminformatics, to draw and manipulate chemical struc-
Dictionary of chemical components refered in PDB entries
tures. These tools provide a number of facilities like geome-
PDBeChem try optimization, structure visualization, energy minimiza-
(http://www.ebi.ac.uk/pdbe-srv/pdbechem/)
tion. There are several software packages available, which
Binding affinity information for PDB Ligands allow users to sketch a molecular diagram on a computer.
PDB-Bind (http://sw16.im.med.umich.edu/databases/pdbbind/index.jsp Comprehensive list of molecular editors is given in (Table
) 2). BKchem (http://bkchem.zirael.org/) is a free software
written in Python. It works on major operating systems like
Binding affinity of PDB Ligands
BindingDB Linux, WinXP and MacOS X. It allows users to draw, edit,
(http://www.bindingdb.org/)
visualize the molecules. ChemSketch
Small molecules related to cancer (http://www.acdlabs.com/resources/free ware/chemsketch/)
NCI
(http://cactus.nci.nih.gov/ncidb2.1/) is a free comprehensive chemical drawing package that al-
lows users to draw chemical structures including organics,
Collaborative drug discovery organometallics, polymers, and Markush structures. Free
CDD
(https://www.collaborativedrug.com/) version of ChemSketch has limited facilities, and it is only
DrugBank All kind of drugs (http://www.drugbank.ca/) available for Windows platform. JChemPaint
(http://jchempaint.github.com/) is a java-based open-source
Hormones and their Receptors software developed for drawing, editing and viewing 2D
HMRbase
(http://crdd.osdd.net/raghava/hmrbase/) chemical structure. This software developed using the Chem-
Benzyl-isoquinaloid alkaloids
istry Development Kit (CDK). XDrawChem (http://xdraw
BIAdb chem.sourceforge.net/) is a two-dimensional molecule draw-
(http://crdd.osdd.net/raghava/biadb/)
ing program, it is an open-source software. This software can
NPACT
Plant derived natural compounds read molecules in various formats and can create images in
(http://crdd.osdd.net/raghava/npact/) popular formats like PNG, EPS. JME Molecular Editor
(http://www.molinspiration.com/jme/) is a software devel-
SuperNatural A searchable database of available natural compounds
oped to draw, edit, and view molecules and reactions. It is a
HIT Herb ingredients targets (http://lifecenter.sgst.cn/hit/) java based software available free for non-commercial user,
available in a stand-alone mode or as an applet for integrat-
Drugs@FDA FDA approved drug products (http://www.fda.gov/Drugs/)
ing in the web page.
1176 Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 Singla et al.
Table 2. List of Major Molecular Editors, Frequently Used rial enumeration of virtual chemical compound libraries.
for Drawing and Editing Molecules Some of these packages are available in open source while
others are commericialy available software packages. All
these software packages are based on similar methodology to
Editors Brief Description generate virtual chemical libraries. Basically, all these pack-
ages required three basic substructures: core scaffolds, link-
Python based free 2D molecule editor ers and building blocks (R-groups).
BKchem
(http://bkchem.zirael.org/) The most commonly used combinatorial chemistry and
analogs designing tool is SmiLib. It is a freely available
ACD/ChemSketch Freeware is a free software for drawing Linux based chemoinformatics tool which can be used as
Chem-
chemicals command line and graphical user inferface for generating
Sketch
(http://www.acdlabs.com/resources/freeware/chemsketch/) combinatorial library. It requires three substructures: Scaf-
folds, Building blocks and Linkers to generate a combinato-
Editor for 2D chemical structures
JChemPaint rial library. GLARE (Global Library Assessment of REa-
(http://jchempaint.github.com/)
gents) is an Open Source package to generate the combinato-
Draw and edit complex molecules, no fee for academic rial library. CLEVER (chemical library editing, visualizing
Accelrys community and enumerating resource) is a free chemoinformatics tool
Draw (http://accelrys.com/products/informatics/cheminformatics/d that enumerates chemical libraries using customized frag-
raw/index.html) ments; it also computes the physicochemical properties of
the generated compounds. Another tool is Library synthe-
XDraw- Molecule drawing program sizer, an open-source java based tool for chemical library
Chem (http://xdrawchem.sourceforge.net/) enumeration and profiling. NEWLEAD is also used for the
automatic generation of combinatorial library from bioactive
MedChem Drawing molecules and integration with ADMET property. conformations of reference molecules. The input for this
Designer (http://simplus-downloads.com/) software is a set of fragments in the 3D orientation corre-
sponding to a given pharmacophore model. ORganic VIrtual
JME Molecular Editor Library (ORVIL) is a perl program to generate the combina-
JME
(http://www.molinspiration.com/jme/) torial library organic substituents without using scaffold
hopping. It is designed to explore the organic chemical space
PubChem in the given query structure without affecting the entire
A web-based tool for sketching, integrated in PubChem
Sketcher backbone of the molecule enabling minimum molecular
(http://pubchem.ncbi.nlm.nih.gov/edit2/index.html)
[117] complexities.
dinates then it optimizes and refines the final coordinates by cal information is represented in the form of some values or
mengine program. Cyndi is fast and powerful structure con- numbers for a property in considerations [43]. A key steps in
formation generation package based on the multi-objective classical quantitative structure-activity/property relationship
evolution algorithm. It is capable of generating geometrically (QSAR/QSPR) modeling is the encoding of a chemical com-
diverse conformers at the large scale. It has an option to re- pound into a vector of numerical descriptors. These molecu-
move the redundant geometrical conformers with the RMSD lar descriptors may be result of some experiment, for exam-
filter and finally optimize remaining conformers with energy ple logP and are highly correlated with that property of
minimization. chemicals. Based on these descriptors QSAR/QSPR model
Table 4. List of Software and Web Servers Used for Structu-
are developed, which are helpful in designing new chemical
entity (NCE) having the property similar to used dataset
re Optimization of Molecules
[44]. Today, huge numbers of software are available in pub-
lic domain to calculate molecular descriptor, some of which
are listed in (Table 5).
Software Brief Description
Table 5. Important Software and Webserver for Computing
Conformer Ensembles Molecular Descriptors
Balloon
(http://web.abo.fi/~mivainio/balloon/index.php)
Generate geometrically extended or compact conformations Calculating descriptors from a two-dimensional chemical
Cyndi [123] (http://www.biomedcentral.com/1471- MOLD2 structure
2105/10/101/additional/) [45] (http://www.fda.gov/ScienceResearch/BioinformaticsTools/
Mold2/default.htm)
TINKER Software Tools for Molecular Design
[124] (http://dasher.wustl.edu/tinker/) PowerMV Window based calculation of descriptors
[125] (http://nisla05.niss.org/PowerMV/index.html)
volume, molecular linear free energy relation descriptors etc. Table 6. Similarity Search Algorithms and Their Web Links
were calculated as identified by Laggner. PowerMV is a
Window based tool for descriptor generation, and similarity
search. This software is capable of calculating binary de- Software Description
scriptors as well as the descriptors used to derive drug-
likeness based upon the Lipinski’s rule of five (Ro5). JC search used for searching similar structure, substruc-
JOELib/JOELib2 is another java based cheminformatics tureas well as super structure from a given database.
JCsearch
library, which is being widely used for descriptor calcula- (http://www.chemaxon.com/jchem/doc/user/Jcsearch.htm
tion, SMARTS substructure search, conversion of file for- l)
mats. PubChem provide the facility to search similar chemical
Mold2 calculates 779 1D and 2D molecular descriptors PubChem in PubChem database using PubChem based binary fin-
from diverse information like pyhico-chemial properties, gerprints. (http://pubchem.ncbi.nlm.nih.gov/search/)
topology, atom counts, Eigen values. It has been shown that
Chemical structure similarity search against KEGG
Mold2 descriptors perform better than the number of com-
SIMCOMP COMPOUND, KEGG DRUG, and other databases.
mercial software [45]. AFGen is a program that can calculate
[129] SIMCOMP is based on 2D graph representation.
the graph based properties of chemicals. These graph proper-
(http://www.genome.jp/tools/simcomp/)
ties include paths (PF), acyclic subgraphs (AF), and arbitrary
topology subgraphs (GF). The ISIDA Fragmentor2011 cal- SUBCOMP is based on bit-string representation of
SUBCOMP
culates substructural molecular fragment and ISIDA property chemical structures.
[129]
labeled fragments from a Structure-Data File (SDF). (http://www.genome.jp/tools/subcomp/)
ODDescriptors/BlueDesc is a freely available Java-based SMSD is a Java based software library for calculating
user friendly tool that calculates cheminformatics descriptor Maximum Common Subgraph (MCS) between small
to be used to develop the model for QSAR/QSPR. This SMSD
molecules. This will help us to find similarity/distance
software is based upon the CDK and JOElib2 for descriptor [130]
between two molecules. (http://www.ebi.ac.uk/thornton-
calculations and can generate libsvm or arff file as output. srv/software/SMSD/)
CDK is a java based descriptor calculation tool developed in
2003. The tool is capable of calculating topological, geomet-
rical, charge based and constitutional descriptors. A number QSAR/QSPR MODELS
of software/libraries have been developed for computing Quantitative structure-activity relationship (QSAR) is a
molecular descriptors using CDK. Our group made web- mathematical relationship linking chemical structure and
interface for CDK see http://crdd.osdd.net:8081/webcdk/. biological/pharmacological activity in a quantitative manner
MODEL is molecular Descriptor Lab, for computing a com- for a series of chemical compounds. Related term quantita-
prehensive set of 3,778 molecular descriptors from following tive structure–property relationship (QSPR) is used to repre-
six categories: constitutional descriptors, electronic descrip- sent the relationship between structure and physico-chemical
tors, physical chemistry properties, topological indices, properties [46, 47]. There are two types of QSAR models:
geometrical molecular descriptors, and quantum chemistry. 2D-QSAR, the models constructed using 2D descriptors, and
it is established in predicting physicochemical properties as
well as in providing quantitative estimates of various bio-
CHEMICAL SIMILARITY SEARCH
logical effects [48]. Another type is 3D-QSAR, when QSAR
Searching similar molecules in chemoinformatics is an model is generated by descriptors of 3D structure of mole-
important tool for chemicals classification, database search- cules [49]. The application of any QSAR models is to predict
ing or the relationship between molecules and their activity. the biological activities of new compounds based on struc-
In the past, a number of tools have been developed to calcu- tural properties of chemicals against a particular target or
late the similarity matrix (Table 6). These algorithms vary whole cell [50, 51].
from simple molecular properties based, graph based, shape
There are number of techniques that are frequently used
based and volume based. Each module has its own pros and
to build QSAR models, this section describes major tech-
cons in searching of database. The similarity is measured in
niques used for building prediction of activity of molecules
terms of Tanimoto cofficient (varies from 0.0 to 1.0) or
(Table 7). Support Vector Machine (SVM) is a machine
euclidian distance. There are some algorithms which con-
learning algorithm program which is based on statistical and
sider both shape matching and feature matching. The sim-
optimization theory and having capability to handle struc-
plest way to calculate the similarity is provided by Open
tural feature data [52]. The SVMlight software is an imple-
babel on MCCS166 key based similarity search [38]. The
mentation of SVM is widely used in chemoinformatics. The
PubChem database also provides PubChem881 key based
WEKA package contains a wide range of tools and algo-
2D similarity search. In addition, PubChem also provides a
rithms for data analysis and predictive modeling [53]. The
facility to search based on shape and chemical feature map-
system is written in JAVA, a platform independent object-
ping. The JC search tool from ChemAxon has the capability
oriented programming language. WEKA is a complete data
to search similar molecules at a given cut-off value. Some-
mining software that could be used for pre-processing of
times users are interested in finding the substrucure/ su-
data, clustering, model building, visualization, and feature
pestructure of similar to active molecules. This could also be
selection. The most common file format recognize by
done using the JC search tool.
WEKA is ARFF (attribute-relation file format) and csv for-
Open Source Software and Web Services for Designing Therapeutic Molecules Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 1179
mat. Artificial Neural Network (ANN) is powerful machine rithm for numeric data. K-Nearest Neighbor may be imple-
learning technique, commonly used for solving the classifi- mented using the free software TiMBL (Tilburg Memory-
cation problems. SNNS (Stuttgart Neural Network Simula- Based Learner) (http://ilk.uvt.nl/timbl).
tor) is a software simulator for neural networks on Unix
(http://www.ra.cs.uni-tuebingen.de/SNNS/). The multi-layer Feature selection is a key step to eliminate correlation,
feed-forward network, back propagation multi-layer percep- multi-collinearity and remove useless attributes from all de-
tron (MLP) are the most popular application of ANN used in scriptors. In (Table 7), we describe commonly used feature
generating QSAR model. Memory-Based Learning is a di- selection softwares like Weka [53], Rapidminer, Orange and
rect descendant of the classical k-Nearest Neighbor (k-NN) RRF [54]. Weka (Waikato Environment for Knowledge
approach, which is a powerful pattern classification algo- Analysis) is a popular java based tool used in feature selec-
tion. There are various feature selection approaches like Ge-
netic algorithms (GA), Greedy stepwise forward selection,
Table 7. Machine Learning and Feature Selection Techniques wrapper selection method and F-stepping remove-one are
in Cheminformatics implemented in Weka. Rapidminer is a open-source software
widely used for machine learning, data mining and feature
selection. Brute force, Forward selection, Backward elimina-
Software Brief Description
tion etc. are important feature selection algorithms in
Software used for developing QSAR model Rapidminer (http://rapid-i.com/content/view/181/190/). Or-
ange is a data mining and machine learning tool used in fea-
SVM is a supervised learning technique, used for ture selection and data analysis. Orange.feature.selection
classification and regression analysis. The QSAR models module provides feature selection facilities. Regularized
SVM can be optimized using different SVM parameters and Random Forest (RRF) is an R package based feature selec-
kernels. tion techniques. In RRF, a set of non-redundant features can
(http://www.cs.cornell.edu/People/tj/svm_light/) be selected without loss of predictive information [54].
ANN is based on supervised learning, unsupervised
CHEMICAL CLUSTERING
learning and reinforcement learning. SNNS (Stuttgart
ANN Neural Network Simulator) is a free software simulator Clustering of chemical is playing a very crucial role in
for neural networks. computational chemistry [55]. The chemical clustering is
(http://www.ra.cs.uni-tuebingen.de/SNNS/) used to identify the outliers in a given dataset, to understand
the behaviour of a particular functional group, and also in
The k-nearest neighbor algorithm (k-NN) is a method for
identification of a common scaffold etc. Numbers of ap-
classifying objects based on closest training examples.
proach have been used for clustering the chemical com-
kNN TiMBL is an open source software package implement-
pounds like the binary fingerprints based, graph properties
ing k-nearest neighbor classification.
based, maximum common substructure based [56]. Based on
(http://ilk.uvt.nl/timbl/)
these, lots of softwares (commercial as well as open-source)
Weka is a collection of visualization tools and algorithms has been developed in the past (Table 8).
for data analysis and predictive modeling, It contains Table 8. List of Chemical Clustering Tools and Their Web
Weka [53] libSVM, SMO, NaiveBayes, LMT, Random Forest etc Addresses
learning algorithms.
(http://www.cs.waikato.ac.nz/~ml/weka/)
Software Brief Description
Feature Selection techniques
Weka is a popular java based tool used in feature selec- Chemical clustering and analysis
ChemMine
tion. There are various feature selection methods and (http://chemmine.ucr.edu/)
Weka
evaluators are available in Weka package.
It is R based open source chemical clustering tool.
(http://www.cs.waikato.ac.nz/~ml/weka/) ChemMineR
(http://manuals.bioinformatics.ucr.edu/home/chemminer)
RapidMiner is a freely available software. It contains
ChemAxon Cluster
Brute force, Forward selection and Backward elimination Jcluster
Rapidminer (http://www.chemaxon.com/products/jklustor/)
algorithms for feature selection.
(http://rapid-i.com/content/view/181/190/) Chemical clustering webserever
ChemBioserv
(http://bioserver-
orngFSS (Orange.feature.selection) module provides er [57]
3.bioacademy.gr/Bioserver/ChemBioServer/)
feature selection facilities. It contains attMeasure, best-
Orange NAtts, selectBestNAtts, filterRelieff etc function for
feature selection.
ChemBioServer [57] is a free web-based application that
(http://orange.biolab.si/)
performs the clustering by two methods, the hierarchical as
Regularized Random Forest (RRF) is an R package for well as the modern Affinity Propagation (AP) clustering al-
feature selection. In RRF variables are selected based on gorithm. While clustering, the web-server also displays the
RRF [54]
a subsample of data at each node. cluster in an attractive graphical manner along with the rep-
(http://cran.r-project.org/web/packages/RRF/index.html) resentative scaffold for a particular cluster. The compound
1180 Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 Singla et al.
screening and analysis can be performed using the server forms translational and rotational search in possible direction
based upon vdW energy, geometry, physicochemical proper- between two molecules [66].
ties, and undesired/toxic moieties. Table 9. Listing for Molecular Docking Tools with Brief De-
ChemMine Tools is aimed at searching, comparing and scription
clustering of chemicals [58]. The tool can cluster chemical
by three clustering algorithms: hierarchical, binning and
multidimensional scaling and also the clustering of numeri- Name Brief Description
cal data is also provided. Additionally, the property calcula-
tion module is also inbuilt in the webserver. The webserver Anchor-and-Grow based docking program, for flexible
is limiting in comparison for more than two chemicals at a Dock [62, 63] ligand and flexible protein.
time and to fish out the representative chemical of a cluster. (http://dock.compbio.ucsf.edu/).
ChemMineR inbuilt in R environment, an open-source For Flexible ligand, Flexible protein side chains. Com-
Autodock
tool that also provides various functions for clustering entire patible for Linux, Window and Mac OS.
[64]
compound libraries and visualizing clustering results and (http://autodock.scripps.edu/).
compound structures [59]. The tool supports SDF file for
Mainly for protein-protein and protein-DNA docking.
import molecules. ChemAxon’s JKlustor Suite can be used Hex [65]
(http://hex.loria.fr/)
to search similarity, calculate diversity and structural com-
parison and chemical clustering based on the molecular de- For rigid-body docking, based on based on Fourier cor-
scriptors. The suite is capable of showing the representative FTDock [66] relation algorithm.
structures for a cluster as well as the number of chemical in (http://www.sbg.bio.ic.ac.uk/docking/ftdock.html)
that particular cluster.
Improved version of AutoDock4, fast, betters binding
AutoDock
energy.
MOLECULAR DOCKING Vina [131]
(http://vina.scripps.edu/)
Molecular docking technique is most preferably used to
predicts the preferred orientation of molecule with in the HADDock It is use for protein-protein/protein-ligand docking.
active site of target molecule where it binds to form a stable [132] (http://www.nmr.chem.uu.nl/haddock/)
complex. So, it is widely used in hit identification and lead
optimization [60, 61]. Mostly docking algorithm generate the PHARMACOPHORE TOOLS
large number of possible structures and finally selects the
most favorable structure geometry by scoring function. De- Pharmacophore search is a key component of drug dis-
pending on the interacting partner of the proteins, docking covery programs that could be used as alternative method to
can be divided into two classes: Protein-protein docking, molecular docking for fast and efficient screening of com-
where two different protein molecules interact with each pound library. It represents the spatial arrangement of
other and this is mainly rigid body docking and protein- chemical features that is essential for a molecule to interact
ligand docking, where protein binds with small molecules. with a specific target receptor. Pharmacophore search is an
established and effective mechanism of virtual screening [67,
AutoDock developed and maintained by Scripps Re- 68]. A brief list of freely available Pharmacophore genera-
search Institute, is an open source molecular modeling soft- tion software is given in (Table 10).
ware mainly for protein ligand docking (Table 9). This soft-
ware include two important programs: AutoGrid pre- The Pharmapper is a freely available webserver for iden-
calculates grid maps of interaction energies for different tification of potential target candidates of a small-molecule
atom types and AutoDock is used for docking of the ligand [69]. This server maintains a database repository of ~7000
with in the predefind grid based on genetic algorithm. Dock targets based pharmacophore models. Based on triangle
is another anchor-and-grow based docking program. It is hashing based method, it finds the best matching poses of
applied both for rigid body and flexible ligand docking. Lat- input ligand against all known pharmacophore based models.
est, Dock version can predict binding poses by adding new This is highly useful for fast searching as it took around 1hr
features like force field scoring enhanced by solvation and to screen the whole database. PharmaGist is a freely avail-
receptor flexibility (Table 9). It is developed by UCSF [62, able webserver for searching pharmacophore from a set of
63]. Autodock Vina is a new open-source program for pro- ligand molecules [70]. This server only requires the set of
tein ligand docking and virtual screening. It is improved ver- ligands known to interact with a particular target without any
sion of AutoDock 4, which is fast and improving the accu- prior knowledge of target structure. This software initially
racy of the binding mode predictions [64]. Hex is an aca- align input molecules, detect the subsets of molecules having
demically free program for protein and DNA docking. It can similar features, with the possibility that a particular subset
also use protein-ligand docking [65]. High Ambiguity may bind to different binding sites or with different binding
Driven biomolecular DOCKing (HADDOCK) is a docking modes. This software also address cases where the input
software that use the biophysical interaction data, mutagene- ligands have different affinities by defining weighted phar-
sis data or bioinformatic predictions. It is developed for pro- macophore based on the number of ligands that share them,
tein-protein docking; it can also be applied to protein-ligand and automatically select the most appropriate pharma-
docking. FTDOCK is a software package based on Fourier cophore for virtual screening. Therefore, it is an important
correlation algorithm used for rigid-body docking. It per- tool for virtual screening of large database. Pharmer is an
open-source, fast, and an efficient pharmacophore tool for
Open Source Software and Web Services for Designing Therapeutic Molecules Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 1181
virtual screening [71]. The search time depends upon the skin irritation, eye irritation etc. Prediction of Activity Spec-
complexity of a query molecule rather than size of database. tra for Substances (PASS) could predict simultaneously 3678
This software takes only one pdb file at a time and use KDB- kinds of activity with mean accuracy of prediction about
tree and Bloom fingerprint for pharmacophore searching. 95% (leave-one-out cross validation) based on the com-
This software also supported the different kind of pharma- pound's structural formula. The online webserver is able to
cophore format like pml (ligand scout), ph4 (MOE) etc. predict ~1244 types of biological activity, including pharma-
ZincPharma is an extension of this software that could be cological effects, mechanisms of action, toxic and adverse
used for screening of zinc database. effects, interaction with metabolic enzymes and transporters,
Table 10. Different Types of Pharmocophore Searching Soft-
influence on gene expression, etc. Therefore, it’s very easy
to screen large compound database with in a short period of
wares
time. So, this software is very useful for the prediction of the
biological activity spectrum for existing compounds and
Softwares Brief Description compounds, which are virtually predicted.
Table 11. Various ADMET Properties Prediction Tools
Pharmacokinetic drug monitering
Boomer
(http://www.boomer.org/)
Softwares Brief Description
Cyber A software for pharmacokinetic simulations
Patient (http://www.labsoft.com/www/software.html)
Predicting cancer causing potential of a chemical
OncoLogic™
A tool for pharmacokinetic modeling (http://www.epa.gov/oppt/cahp/pubs/can.htm)
PKfit (http://cran.csie.ntu.edu.tw/web/packages/PKfit/index.ht OSIRIS ADMET (http://www.organic-chemistry.org/prog/peo/)
ml)
Drug metabolism
Therapeutic drug monitoring Metabolizer (http://www.chemaxon.com/products/online-
JPKD
(http://pkpd.kmu.edu.tw/jpkd/) tryouts/metabolizer/)
Therapeutic drug monitoring A webserver for predicting druglikeness of chemical
Tdm
(http://pkpd.kmu.edu.tw/tdm/) DrugMint compounds.
mobilePK (http://pkpd.kmu.edu.tw/mobilepk/) (http://crdd.osdd.net/oscadd/drugmint)
Pharmapper Ligand based Pharmacophore search A webserver for quantitative estimating the drug-likeness
QED
[69] (http://59.78.96.61/pharmmapper/) of a molecule. (http://crdd.osdd.net/oscadd/qed)
webserver is helpful for in-silico screening ADMET profiles pound will be metabolized or not. It also predicts the family
of drug candidates and environmental chemicals. FAF- of cytochrome responsible for its metabolism.
Drugs2 is command line free software developed in python.
This software has the capability to identifies key functional DRUG TARGET IDENTIFICATION
groups, and also some toxic and unstable mole-
cules/functional groups. As it stands, FAF-Drugs2 imple- During the last two decades or so, tremendous progress
ments different filtering rules such as 23 physicochemical have been made in medicinal chemistry for discovering new
rules and 204 substructure searching rules that can be easily potential drug targets. Identifying drug targets by experi-
customized. This software is also implemented with Gnuplot ments alone would be a very time-consuming and costly
software to plot several distribution diagrams of major phys- affair. In addition, it is also important to use systems based
icochemical properties of the screened compound libraries. approaches to identify drug targets for a polypharmacology
OncoLogicTM is a desktop application that can analyze a approach to ensure targeting more than one protein at time
chemical structure to determine the likelihood that it may for better efficacy. These approaches need quality annotation
cause cancer. This is based on some rules derived by apply- of proteins both at structural and functional level which are
ing structure activity relationship (SAR) analysis and incor- then used to construct interaction graphs to identify central
porating knowledge of how chemicals cause cancer in ani- proteins. These central hub proteins may then be selected in
mals and humans, and mimicking the decision logic of hu- combination depending on their structural features, expres-
man experts. sion profiles and localization to identify best pairs for poly-
pharmacology. Therefore, a lot of computational algorithms
OSIRIS Property Explorer is a freely available webserver such as those for identifying drug-target interaction networks
that can predict physico-chemical and toxicological molecu- [72, 73], inhibitor design [74-76], multiple drug-target pre-
lar properties, need to be optimized at the time of designing diction [77-79], classifying body fluids [80], identifying re-
pharmaceutically active compounds. This tool was originally combination spots with pseudo dinucleotide compositioniden
developed by T. Sander and later released in the public do- [81], classifying hepalar cirrhosis and carcinoma [82], classi-
main in 1999 on Actelion’s web site to demonstrate the ap- fying anatomical therapeutic chemical (ATC) classification
plicability of Java applets for real-time cheminformatics ap- of drugs [83] etc. have been developed for target discovery.
plications. This tool calculates drug-likeness, drug score etc. These software provides very useful insights for both me-
along with some important physic-chemical properties. dicinal chemistry research and drug development [84].
DrugLogit is available in the form of freely available web-
server to predict the probability of a compound to act as drug Understanding the location of a protein molecule is of
or non-drug. This tool is based on simple, readily available prime importance in prioritizing the drug targets. This would
molecular properties of a compound. A selection of the equa- be very helpful in understanding the biological phenomenon
tions also allows classifying the disease category of a com- like protein-protein interactions, protein-ligand interaction,
pound. Approximately, 23 equations have been used in this pathways analysis. Over the years, a number of software
software for prediction. They are rationalized based on the have been developed in predicting the protein localization in
different mechanism of action, administration mode, and the different compartment of various cell types as summa-
target organs of different disease categories. rized in (Table 12), such as eukaryotic, human, plant, bacte-
rial subcellular localization [16, 85-90]. Similarly, algo-
rithms for predicting antimicrobial peptides [91], identifying
TOOLS FOR DRUG METABOLISM
HIV cleavage sites in proteins [92, 93], predicting proteases
Drug metabolism is an important aspect in the drug dis- and their types [94], identifying virulence factors [95] have
covery process. A number of cytochrome proteins classified been developed for predicting potential drug targets.
into different protein families are known to be responsible In addition to that, there are some well known family of
for drug metabolism. The high or low metabolism of a drug
proteins like GPCR, nuclear receptors, kinases etc. that have
is related with its dose requirement. Therefore, it is very im-
great contribution in drug discovery [12, 96-102]. GPCR
portant to predict the fate of a compound to be metabolized
proteins is being targeted by nearly 50% of marketed drugs
or not, site of action etc. Towards this, a number of tools
and nuclear receptors targeted by nearly 13% FDA approved
have been developed such as MetaSite, MetaPred etc. (Table
drugs [103]. Therefore, characterizing these class of protein
11). family will provide in-depth knowledge in undertanding the
MetaSite is freely available software that could predict ligand-protein interactions and drugs side-effect. In (Table
metabolic transformations related to cytochrome-mediated 13), we describes a series of web-server/tools important in
reactions. This software also provides the structure of the drug target discovery.
metabolites formed, highlights the molecular moieties that
help to direct the molecule in the cytochrome cavity. This DESIGNING OF INHIBITORS
software also claims that primary site of metabolism has
been accurately predicted in more than 85% of the cases. Inhibitors are required to block a target or stop a signal-
SMARTCyp 2.0 is another freely available JAVA based ing cascade. These inhibitors could be exploited in three ap-
downloadable software for metabolic site prediction for all proaches: structure based inhibitor design (SBID), ligand
five major drug-metabolizers. Metabolizer computes all the based inhibitor design (LBID) and de novo inhibitor design
possible metabolites of a given molecule, predicts the major (DNID) [104, 105]. The structure of target is the perquisite
metabolites, and estimates metabolic stability. MetaPred is a for the SBID, which could be determined by X-ray crystal-
freely available webserver for predicting whether the com- lography or NMR [106-108]. Alternatively, a huge list of
Open Source Software and Web Services for Designing Therapeutic Molecules Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 1183
Table 12. Webserver for Proteome Annotation with Their Brief Description
Webserver Description
A web based tool for prediction of ATP binding residue in protein sequence
ATPint [14]
http://www.imtech.res.in/raghava/atpint
This can predict four major protein localizations (cytoplasmic, mitochondrial, nuclear and extracellular) for eukaryotes.
ESLpred2 [16]
http://www.imtech.res.in/raghava/eslpred2
iRSpot-PseDNC [81] Identify recombination spots with pseudo dinucleotide composition. http://lin.uestc.edu.cn/server/iRSpot-PseDNC
A classifier for identifying the subcellular localization of virus proteins with both single and multiple sites
iLoc-Virus [85]
http://icpr.jci.edu.cn/bioinfo/iLoc-Virus
A software for predicting subcellular locations of human proteins with both single and multiple sites
iLoc-Hum [86]
http://icpr.jci.edu.cn/bioinfo/iLoc-Hum or http://www.jci-bioinfo.cn/iLoc-Hum
A webserver for predicting the subcellular localization of single and multiple site plant proteins
iLoc-Plant [87]
http://icpr.jci.edu.cn/bioinfo/iLoc-Plant or http://www.jci-bioinfo.cn/iLoc-Plant
Predicting the subcellular localization of singleplex and multiplex in Gram-positive bacterial proteins
iLoc-Gpos [88]
http://icpr.jci.edu.cn/bioinfo/iLoc- Gpos or http://www.jci-bioinfo.cn/iLoc-Gpos
A webserver for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins
iLoc-Euk [89]
http://icpr.jci.edu.cn/bioinfo/iLoc-Euk
A web-server specifically trained to predict the proteins, which are destined to localize in mitochondria in yeast and ani-
MitPred [143] mals particularly using domain profiles
http://www.imtech.res.in/raghava/mitpred
Predicting subcellular localization of both single and multiple sites in gram-negative bacterial proteins
iLoc-Gneg [144]
http://icpr.jci.edu.cn/bioinfo/iLoc-Gneg
Identification of DNA binding proteins using random forest with grey model
iDNA-Prot [146]
http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot
Web based tool for prediction of GTP binding residue in protein sequence
GTPbinder [151]
http://www.imtech.res.in/raghava/gtpbinder
GlycoPP is a webserver for predicting potential N-and O-glycosites in prokaryotic protein sequence
GlycoPP [154]
http://www.imtech.res.in/raghava/glycopp
1184 Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 Singla et al.
Webserver Description
The server incorporates neural network to identify mucin type GalNAc O-glycosylation sites in mammalian proteins
NetOGlyc [155]
http://www.cbs.dtu.dk/services/NetOGlyc
Table 13. Software for Drug Target Prediction with Their Brief Description
Software Description
This is a server for predicting G-protein-coupled receptors and for classifying them in families and sub-families
GPCRpred [12]
http://www.imtech.res.in/raghava/gpcrpred
A cellular automation image approach for predicting G-protein-coupled receptor functional classes
GPCR-CA [97]
http://icpr.jci.edu.cn/
A web-server for identifying G-protein coupled receptors and their families with grey incidence analysis
GPCR-GIA [98]
http://218.65.61.89:8080/bioinfo/GPCR-GIA
This is a dipeptide composition based method for predicting Amine Type of G-protein-coupled receptors
GPCRSclass [99]
http://www.imtech.res.in/raghava/gpcrsclass
This tool could be used for identifying phosphorylation sites using protein sequence profile and protein coupling pattern
KinasePhos2.0 [159] features
http://KinasePhos2.mbc.nctu.edu.tw
The server utilizes neural network to predict kinase specific eukaryotic protein phosphoylation sites
NetPhosK (160)
http://www.cbs.dtu.dk/services/NetPhosK
VGIchan server is to predict voltage gated ion channels and classify them into sodium, potassium, calcium and chloride ion
VGIchan [161] channels.
http://www.imtech.res.in/raghava/vgichan
To classify the proteins of bacteria into virulence factors, information molecule, cellular process and metabolism molecule
VICMpred [162]
http://www.imtech.res.in/raghava/vicmpred
A SVM based tool for the classification of nuclear receptors on the basis of amino acid composition or dipeptide composition
NRpred [163]
http://www.imtech.res.in/raghava/nrpred
A webserver for two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features
NR-2L [164]
http://icpr.jci.edu.cn/bioinfo/NR2L or http://www.jci-bioinfo.cn/NR2L
A physical-chemical property matrix based method for identifying nuclear receptors and their subfamilies
iNR-PhyChem [165]
http://www.jci-bioinfo.cn/iNR-PhysChem or http://icpr.jci.edu.cn/bioinfo/iNR-PhysChem
Software for sequence-based predictor for identifying nucleosomes via physicochemical properties
iNuc-PhysChem [166]
http://lin.uestc.edu.cn/server/iNuc-PhysChem
A tool to predict secretory proteins irrespective of presence or absence of N-terminal signal peptides using machine-learning
SRTpred [167] techniques.
http://www.imtech.res.in/raghava/srtpred
Software Description
Computer program that could be explored to identify the binding residues in a transcription factor
ProteDNA [169]
http://serv.csbb.ntu.edu.tw/ProteDNA
This tool could be explored to predict transcription factor binding sites in DNA sequence
TESS [171]
http://www.cbil.upenn.edu/cgi-bin/tess/tess
In-siico prediction algorithm for regulatory elements of DNA-binding proteins in bacterial genomes
PREDetector [172]
http://www.montefiore.ulg.ac.be/~hiard/PreDetector
Web-based approach for identification of transcription factor binding sites in DNA sequences
PROMO [173]
http://alggen.lsi.upc.es/cgibin/promo_v3/promo/promoinit.cgi?dirDB=TF_8.3
software is also available for target structure prediction; modular tools developed by Oliver Kohlbacher for data stor-
whose description is beyond the scope of this review. Tech- age, docking, QSAR and analyzing result in the field of
nically, homology modeling, threading and docking strategy computer-aided drug design (http://www.ballview.org/
are the drivers for SBID. Currently, there are several meth- caddsuite). The integration of this tool in galaxy platform
ods that have been developed using the above-mentioned helps in creating the workflow for complexes process.
approach. Here we are describing some of these methods; ChemBench (http://chembench.mml.unc.edu/) is developed
GDoQ predicts the inhibitors against GlmU enzyme from by Carolina Exploratory Center for Cheminformatics Re-
Mycobacterium tuberculosis [109], KiDoQ, is another web- search (CECCR) to provide a platform for virtual libraries of
server for designing inhibitors against Dihyrodipcolinate available chemicals with predicted biological and drug-like
synthase (DHDPS), a potential drug target enzyme of a properties, model building, property and activity predictors,
unique bacterial DAP/Lysine pathway [110] (Table 14). In and special tools for chemical library design.
the absence of target structure, LBID could be applied where
new or improved inhibitors could be designed computation- MAJOR INITIATIVES
ally from a dataset of already known inhibitors [111, 112].
Toxipred is a user-friendly web server for the prediction of Along with the availability of numerous open source
aqueous toxicity of small chemical molecules in T. pyri- tools and softwares, there are also some organizations work-
formis. DNID approach is applied for designing where no ing in a collaborative manner with public or industry partner
inhibitors were reported previously [113, 114]. In DNID, the in developing affordable medicines particularly for neglected
ligand is built based upon the complementarity of the active diseases like TB, malaria etc. (Table 15). The Drugs for Ne-
site in a target with the ligand [115, 116]. A webserver glected Diseases Initiative (DNDi) is an open source, non-
named “Drugster” has implemented LigBuilder for building profit, collaborative project for developing new treatments
ligands. e-LEA3D is webserver dedicated to drug designing against Neglected Diseases with major emphasis on malaria,
with focus on generating de novo libraries and virtual screen- slepping sickness. Till date, it has contributed two com-
ing (Table 14). pounds for malaria, one compound for sleeping sickness.
The Infectious Disease Research Institute (IDRI) is another
non-profit rganization with major focus on infectious dis-
OTHER RESOURCES
eases like tuberculsois, malaria etc. This is involved in pre-
Although this review covers a lot about open source ap- vention, diagnosis and treatment of infectious diseases. In
plications in drug discovery, a large number of freely avail- collaboration with GSK, their TB vaccine are in phase-1
able resources, for computational drug discovery are not clinical trial. OpenTox was initiated as a collaborative pro-
covered. We are trying to summarize them in this section. ject to develop in-silico toxicology models that could be
used for the creation of predictive toxicology applications.
CRDD (Computational resources for drug discovery) is
This involving the collaboration between different univer-
an open source in silico repository for tools being developed
sity, enterprise, and government research groups to design
under the aegis of OSDD. This repository provides free ac-
and build the initial OpenTox framework. Blue Obelisk is
cess to webserver, databases and software related to drug
the name used by diverse internet group promoting reusable
discovery (http://crdd.osdd.net/). DrugPedia is another pro-
ject from OSDD, where information of drugs is maintained chemistry via open-source software development. The three
major areas of this movement are 1) open source 2) open
in the form of pages (http://crdd.osdd.net/drugpedia). May-
standard 3) open data. Open source for drug discovery
aChemTools is a collection of Perl script for handling and
(OSDD) is a translational platform for drug discovery, which
manipulating the structure files for general purpose in drug
connects informaticians, experimental biologist, clinician,
discovery (www.mayachemtools.org/). CADDSuite offers
1186 Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 Singla et al.
Table 14. Tools for Designing Inhibitors with Their Brief Description
Software Description
ToxiPred A web server for the prediction of aqueous toxicity against T. pyriformis (http://crdd.osdd.net/raghava/toxipred)
A web server for binding affinity prediction of ketoxazole derivatives against Fatty Acid Amide Hydrolase (FAAH).
KetoDrug
(http://crdd.osdd.net/oscadd/ketodrug/)
e-LEA3D [178] Scaffold designing, virtual screening and generate combinatorial library (http://bioinfo.ipmc.cnrs.fr/lea.html.)
MDRIpred [179] A webserver for prediction of Inhibitor against Drug tolerant M.Tuberculosis. (http://crdd.osdd.net/oscadd/mdri/)
Table 15. Some Open Source Initiatives for Drug Discovery with Their Research Area
Drugs for Neglected Diseases Initiative Sleeping sickness, visceral leishmaniasis, Chagas disease (http://www.dndi.org/)
Infectious Disease Research Institute Tuberculosis, leishmaniasis, Chagas disease, malaria, leprosy and Buruli ulcer (http://www.idri.org/)
research organizations to provide affordable medicine on those that exists in public domain. It is evident that a lot
against tuberculosis, malaria. This project was launched in of efforts are being made to enhance the preditive potential
2008 with the motto “affordable health care for all”. of these tools. However, there are limitations and areas
where such efforts are lacking. Some of the limitations in-
FUTURE PROSPECTS clude the fact that computational methods does not create
real life situation during docking experiments where a ligand
The current review has focussed on the existing algo- is to find its target in presence of many proteins with high
rithms and tools in cheminformatics with specific emphasis
Open Source Software and Web Services for Designing Therapeutic Molecules Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 1187
precision. Also, there is lack of applications that allow inhi- readership which is a good primer for better citations of the
bition of multiple proteins in a critical pathway or consider publications and works as strong incentive and motivation
host genome polymorphism for drug metabolism and trans- for young researchers.
port. Another challenge particularly in drug discovery pro-
jects is also to model combination therapy and prediciting CONFLICT OF INTEREST
the right dosage based on pharmacokinetic parameters.
These are some of the challenges that need experimental and The authors confirm that this article content has no con-
theoretical researchers to work in collaboration for develop- flicts of interest.
ing better platforms for drug discovery programs.
ACKNOWLEDGEMENTS
As most of the drug discovery research has been a part of
pharmaceutical industries, the field of computer-aided drug The authors are thankful to the Council of Scientific and
designing is dominated by commercial tools. Academia and Industrial Research, India for funding (Grant No. HCP0001).
institutes are making efforts to design better predictive mod-
els and provide them as open source tools which can be ABBREVIATIONS
worked on by others thus ensuring continuous improve-
CDD = Collaborative Drug Discovery
ments. Given that most efforts in academia and institutes are
not directly linked to drug discovery and development, the SMILES = Simplified Molecular-Input Line Entry
prediction accuracy and fine tuning of these models is lim- System
ited and needs to be benchmarked more comprehensively. OSDD = Open Source Drug Discovery
Unlike pharmaceutical set up where this is done as a routine
exercise, validating predictions in a research environment is WOMBAT = WOrld of Molecular BioAcTivity
mostly restricted to smaller datasets and workable experi- QSAR = Quantitative Structure Activity Relation-
ments. On the contrary, it is also worth mentioning that most ship
of the advancements in understanding of chemical space and
their drug like properties is studied and published by aca- QSPR = Quantitative Structure Property Relation-
demic researchers which ultimately feeds into predictive ship
tools, both open source and proprietary. Thus, it is evident QSTR = Quantitative Structure Toxicity Relation-
that novel methods and intelligent designs published by re- ship
searchers in public funded organizations is utilized by indus-
try and translated into pipelines for drug research. These CADD = Computer Aided Drug Design
platforms provide an integrated environment for researchers ADMET = Absorption, Distribution, Metabolism,
to study, evaluate and prioritize compounds for their drug- Toxicity
like properties in a user and time friendly manner. Creating
such platforms in open source environment is need of the REFERENCES
hour which demands concerted effort in ensuring consistent
[1] Drews, J. Drug discovery: a historical perspective. Science, 2000,
data formats and ontologies for curating data and sharing the 287(5460), 1960–1964.
results of data analyses. Open source communities working [2] Ban, T. A. The role of serendipity in drug discovery. Dialogues
towards developing these platforms need to ensure that these Clin. Neurosci., 2006, 8(3), 335–344.
standards are followed and the community at large should be [3] Jones, A. W. Early drug discovery and the rise of pharmaceutical
made aware of using such standards for better data organiza- chemistry. Drug Test. Anal., 2011, 3(6), 337–344.
[4] Jagusztyn-Krynicka, E. K.; Wyszyska, A. The decline of antibi-
tion and analyses. It is also imperative to systematically otic era--new approaches for antibacterial drug discovery. Pol. J.
benchmark these models and their predictive capability Microbiol., 2008, 57(2), 91–98.
which feeds back and provide more clues for improvement [5] O’Connor, K. A.; Roth, B. L. Finding new tricks for old drugs: an
towards faster drug development with reduced failure rates. efficient route for public-sector drug discovery. Nat. Rev. Drug
Discov., 2005, 4(12), 1005–1014.
These collaborative platforms allow researchers to work to- [6] Fda Challenge and Opportunity on the Critical Path to New Medi-
gether and solve challenging problems by sharing ideas and cal Products. Review Literature And Arts Of The Americas, 2004,
discover alternate, more efficient mechanisms of resolving 1–31.
them which are beyond the expertise of any individual group [7] Wlodawer, A. Rational approach to AIDS drug design through
or laboratory. structural biology. Annu. Rev. Med., 2002, 53(6), 595–614.
[8] DesJarlais, R. L.; Seibel, G. L.; Kuntz, I. D.; Furth, P. S.; Alvarez,
Researchers, mostly, from public funded organizations J. C.; Ortiz De Montellano, P. R.; DeCamp, D. L.; Babé, L. M.;
share their scientific discoveries and resources via peer- Craik, C. S. Structure-based design of nonpeptide inhibitors spe-
cific for the human immunodeficiency virus 1 protease. Proc. Natl.
reviewed publications. With the advent of open access, that Acad. Sci. U. S. A., 1990, 87(17), 6644–6648.
provides a open platform to researchers for sharing their re- [9] Update on activities at the Universal Protein Resource (UniProt) in
search outcomes, there has been a constant pressure for the 2013. Nucleic Acids Res., 2012, 41(Database issue), D43-D47.
researcher to arrange for publishing charges. It is mandatory [10] Bernstein, F. C.; Koetzle, T. F.; Williams, G. J.; Meyer, E. F.;
Brice, M. D.; Rodgers, J. R.; Kennard, O.; Shimanouchi, T.;
for researchers to provide access to the details of the tools or Tasumi, M. The Protein Data Bank: a computer-based archival file
algorithms they have developed through peer-reviewed pub- for macromolecular structures. Arch. Biochem. Biophys., 1978,
lications. Hence, in order to overcome this barrier, we think 185(2), 584–591.
that the article should be published without any publication [11] Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J.
charges or the country scientific department should provide Basic local alignment search tool. J. Mol. Biol., 1990, 215(3), 403–
410.
the grants for the publication fee. Open access ensures larger
1188 Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 Singla et al.
[12] Bhasin, M.; Raghava, G. P. S. GPCRpred: an SVM-based method [30] Little, J. L.; Williams, A. J.; Pshenichnov, A.; Tkachenko, V. Iden-
for prediction of families and subfamilies of G-protein coupled re- tification of “known unknowns” utilizing accurate mass data and
ceptors. Nucleic Acids Res., 2004, 32(Web Server issue), W383– ChemSpider. J. Am. Soc. Mass Spectrom., 2012, 23(1), 179–185.
389. [31] Milne, G. W.; Nicklaus, M. C.; Driscoll, J. S.; Wang, S.; Zahare-
[13] Pearson, W. R.; Lipman, D. J. Improved tools for biological se- vitz, D. National Cancer Institute Drug Information System 3D da-
quence comparison. Proc. Natl. Acad. Sci. U. S. A., 1988, 85, (8), tabase. J. Chem. Inf. Comput. Sci., 1994, 34(5), 1219–1224.
2444–2448. [32] Goto, S.; Okuno, Y.; Hattori, M.; Nishioka, T.; Kanehisa, M.
[14] Chauhan, J. S.; Mishra, N. K.; Raghava, G. P. S. Identification of LIGAND: database of chemical compounds and reactions in bio-
ATP binding residues of a protein from its primary sequence. BMC logical pathways. Nucleic Acids Res., 2002, 30(1), 402–404.
Bioinformatics, 2009, 10, 434. [33] Schreyer, A.; Blundell, T. CREDO: a protein-ligand interaction
[15] Larkin, M. A.; Blackshields, G.; Brown, N. P.; Chenna, R.; McGet- database for drug discovery. Chem. Biol. Drug Des., 2009, 73(2),
tigan, P. A.; McWilliam, H.; Valentin, F.; Wallace, I. M.; Wilm, 157–167.
A.; Lopez, R.; Thompson, J. D.; Gibson, T. J.; Higgins, D. G. [34] Hendlich, M.; Bergner, A.; Günther, J.; Klebe, G. Relibase: design
Clustal W and Clustal X version 2.0. Bioinformatics, 2007, 23(21), and development of a database for comprehensive analysis of pro-
2947–2948. tein-ligand interactions. J. Mol. Biol., 2003, 326(2), 607–620.
[16] Garg, A.; Raghava, G. P. S. ESLpred2: improved method for pre- [35] Chen, I. J.; Foloppe, N. Conformational sampling of druglike
dicting subcellular localization of eukaryotic proteins. BMC Bioin- molecules with MOE and catalyst: implications for pharmacophore
formatics, 2008, 9, 503. modeling and virtual screening. J. Chem. Inf. Model., 2008, 48(9),
[17] McGuffin, L. J.; Bryson, K.; Jones, D. T. The PSIPRED protein 1773–1791.
structure prediction server. Bioinformatics, 2000, 16(4), 404–405. [36] Foloppe, N.; Chen, I. J. Conformational sampling and energetics of
[18] Bhardwaj, A.; Scaria, V.; Raghava, G. P. S.; Lynn, A. M.; Chandra, drug-like molecules. Curr. Med. Chem., 2009, 16(26), 3381–3413.
N.; Banerjee, S.; Raghunandanan, M. V; Pandey, V.; Taneja, B.; [37] Bonnet, P.; Agrafiotis, D. K.; Zhu, F.; Martin, E. Conformational
Yadav, J.; Dash, D.; Bhattacharya, J.; Misra, A.; Kumar, A.; analysis of macrocycles: finding what common search methods
Ramachandran, S.; Thomas, Z.; Brahmachari, S. K. Open source miss. J. Chem. Inf. Model., 2009, 49(10), 2242–2259.
drug discovery--a new paradigm of collaborative research in tuber- [38] O’Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vander-
culosis drug development. Tuberculosis (Edinb), 2011, 91(5), 479– meersch, T.; Hutchison, G. R. Open Babel: An open chemical tool-
486. box. J. Cheminform., 2011, 3, 33.
[19] Ardal, C.; Røttingen, J. A. Open source drug discovery in practice: [39] Leite, T. B.; Gomes, D.; Miteva, M. A.; Chomilier, J.; Villoutreix,
a case study. PLoS Negl. Trop. Dis., 2012, 6(9), E1827. B. O.; Tufféry, P. Frog: a FRee Online druG 3D conformation gen-
[20] Ardal, C.; Alstadsæter, A.; Røttingen, J. A. Common characteris- erator. Nucleic Acids Res., 2007, 35(Web Server issue), W568–572.
tics of open source software development and applicability for drug [40] Puranen, J. S.; Vainio, M. J.; Johnson, M. S. Accurate conforma-
discovery: a systematic review. Health Res. Policy Syst., 2011, 9, tion-dependent molecular electrostatic potentials for high-
36. throughput in silico drug discovery. J. Comput. Chem., 2010, 31(8),
[21] Tcheremenskaia, O.; Benigni, R.; Nikolova, I.; Jeliazkova, N.; 1722–1732.
Escher, S. E.; Batke, M.; Baier, T.; Poroikov, V.; Lagunin, A.; Rau- [41] Lagorce, D.; Pencheva, T.; Villoutreix, B. O.; Miteva, M. A. DG-
tenberg, M.; Hardy, B. OpenTox predictive toxicology framework: AMMOS: a new tool to generate 3d conformation of small mole-
toxicological ontology and semantic media wiki-based Open- cules using distance geometry and automated molecular mechanics
Toxipedia. J. Biomed. Semantics, 2012, 3 SUPPL 1, S7. optimization for in silico screening. BMC Chem. Biol., 2009, 9, 6.
[22] Ekins, S.; Bradford, J.; Dole, K.; Spektor, A.; Gregory, K.; Blon- [42] Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors,
deau, D.; Hohman, M.; Bunin, B. A. A collaborative database and Wiley & Sons: New York, 2000.
computational models for tuberculosis drug discovery. Mol. Bio- [43] Katritzky, A. R.; Gordeeva, E. V Traditional topological indices vs
syst., 2010, 6(5), 840–851. electronic, geometrical, and combined molecular descriptors in
[23] O’Boyle, N. M.; Guha, R.; Willighagen, E. L.; Adams, S. E.; Al- QSAR/QSPR research. J. Chem. Inf. Comput. Sci., 33(6), 835–857.
varsson, J.; Bradley, J.C.; Filippov, I. V; Hanson, R. M.; Hanwell, [44] Xue, Y.; Li, Z. R.; Yap, C. W.; Sun, L. Z.; Chen, X.; Chen, Y. Z.
M. D.; Hutchison, G. R.; James, C. A.; Jeliazkova, N.; Lang, A. S.; Effect of molecular descriptor feature selection in support vector
Langner, K. M.; Lonie, D. C.; Lowe, D. M.; Pansanel, J.; Pavlov, machine classification of pharmacokinetic and toxicological prop-
D.; Spjuth, O.; Steinbeck, C.; Tenderholt, A. L.; Theisen, K. J.; erties of chemical agents. J. Chem. Inf. Comput. Sci., 44(5), 1630–
Murray-Rust, P. Open Data, Open Source and Open Standards in 1638.
chemistry: The Blue Obelisk five years on. J. Cheminform., 2011, [45] Hong, H.; Xie, Q.; Ge, W.; Qian, F.; Fang, H.; Shi, L.; Su, Z.;
3(1), 37. Perkins, R.; Tong, W. Mold(2), molecular descriptors from 2D
[24] Olah, M.; Rad, R.; Ostopovici, L.; Bora, A.; Hadaruga, N.; Hada- structures for chemoinformatics and toxicoinformatics. J. Chem.
ruga, D.; Moldovan, R.; Fulias, A.; Mractc, M.; Oprea, T. I. Inf. Model., 2008, 48(7), 1337–1344.
WOMBAT and WOMBAT-PK: Bioactivity Databases for Lead [46] Hansch, C.; Hoekman, D.; Leo, A.; Weininger, D.; Selassie, C. D.
and Drug Discovery. In Chemical Biology; Wiley-VCH Verlag Chem-bioinformatics: comparative QSAR at the interface between
GmbH, 2008; Vol. 1-3, pp. 760–786. chemistry and biology. Chem. Rev., 2002, 102(3), 783–812.
[25] Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Zhou, Z.; [47] Ekins, S.; Mestres, J.; Testa, B. In silico pharmacology for drug
Han, L.; Karapetyan, K.; Dracheva, S.; Shoemaker, B. A.; Bolton, discovery: methods for virtual ligand screening and profiling. Br. J.
E.; Gindulyte, A.; Bryant, S. H. PubChem’s BioAssay Database. Pharmacol., 2007, 152(1), 9–20.
Nucleic Acids Res., 2012, 40(Database issue), D400–412. [48] Dudek, A. Z.; Arodz, T.; Gálvez, J. Computational methods in
[26] Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Bryant, S. H. developing quantitative structure-activity relationships (QSAR): a
PubChem: a public information system for analyzing bioactivities review. Comb. Chem. High Throughput Screen., 2006, 9(3), 213–
of small molecules. Nucleic Acids Res., 2009, 37(Web Server is- 228.
sue), W623–633. [49] Tamura, H.; Ishimoto, Y.; Fujikawa, T.; Aoyama, H.; Yoshikawa,
[27] Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, H.; Akamatsu, M. Structural basis for androgen receptor agonists
R. G. ZINC: A Free Tool to Discover Chemistry for Biology. J. and antagonists: interaction of SPEED 98-listed chemicals and re-
Chem. Inf. Model., 2012, 52(7), 1757-1768. lated compounds with the androgen receptor based on an in vitro
[28] Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; reporter gene assay and 3D-QSAR. Bioorg. Med. Chem., 2006,
Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al- 14(21), 7160–7174.
Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity [50] Kubinyi, H. QSAR and 3D QSAR in drug design Part 1: methodol-
database for drug discovery. Nucleic Acids Res., 2012, 40(Database ogy. Drug Discov. Today, 1997, 2(11), 457–467.
issue), D1100–1107. [51] Ebalunode, J. O.; Zheng, W.; Tropsha, A. Application of QSAR
[29] Chen, J. H.; Linstead, E.; Swamidass, S. J.; Wang, D.; Baldi, P. and shape pharmacophore modeling approaches for targeted
ChemDB update--full-text search and virtual chemical space. Bio- chemical library design. Methods Mol. Biol., 2011, 685, 111–133.
informatics, 2007, 23(17), 2348–2351. [52] Joachims, T. Making large-Scale SVM Learning Practical. Ad-
vances in Kernel Methods Support Vector Learning, 1999, 169–
184.
Open Source Software and Web Services for Designing Therapeutic Molecules Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 1189
[53] Frank, E.; Hall, M.; Trigg, L.; Holmes, G.; Witten, I. H. Data min- [75] Li, X. B.; Wang, S. Q.; Xu, W. R.; Wang, R. L.; Chou, K. C. Novel
ing in bioinformatics using Weka. Bioinformatics, 2004, 20(15), inhibitor design for hemagglutinin against H1N1 influenza virus by
2479–2481. core hopping method. PLoS One, 2011, 6(11), E28111.
[54] Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature [76] Lian, P.; Wei, D. Q.; Wang, J. F.; Chou, K. C. An allosteric
Selection. J. Mach. Learn. Res., 2003, 3, 1157–1182. mechanism inferred from molecular dynamics simulations on
[55] Schuffenhauer, A.; Brown, N.; Ertl, P.; Jenkins, J. L.; Selzer, P.; phospholamban pentamer in lipid membranes. PLoS One, 2011,
Hamon, J. Clustering and rule-based classifications of chemical 6(4), E18587.
structures evaluated in the biological activity space. J. Chem. Inf. [77] Yu, H.; Chen, J.; Xu, X.; Li, Y.; Zhao, H.; Fang, Y.; Li, X.; Zhou,
Model., 2007, 47(2), 325–336. W.; Wang, W.; Wang, Y. A systematic prediction of multiple drug-
[56] Chu, C. W.; Holliday, J. D.; Willett, P. Effect of data standardiza- target interactions from chemical, genomic, and pharmacological
tion on chemical clustering and similarity searching. J. Chem. Inf. data. PLoS One, 2012, 7(5), E37608.
Model., 2009, 49(2), 155–161. [78] Ma, Y.; Wang, S. Q.; Xu, W. R.; Wang, R. L.; Chou, K. C. Design
[57] Athanasiadis, E.; Cournia, Z.; Spyrou, G. ChemBioServer: A web- novel dual agonists for treating type-2 diabetes by targeting perox-
based pipeline for filtering, clustering and visualization of chemical isome proliferator-activated receptors with core hopping approach.
compounds used in drug discovery. Bioinformatics, 2012, 28(22), PLoS One, 2012, 7(6), E38546.
3002–3003. [79] Anand, P.; Sankaran, S.; Mukherjee, S.; Yeturu, K.; Laskowski, R.;
[58] Backman, T. W. H.; Cao, Y.; Girke, T. ChemMine tools: an online Bhardwaj, A.; Bhagavat, R.; Brahmachari, S. K.; Chandra, N.
service for analyzing and clustering small molecules. Nucleic Acids Structural Annotation of Mycobacterium tuberculosis Proteome.
Res., 2011, 39(Web Server issue), W486–491. PLoS One, 2011, 6(10), E27044.
[59] Cao, Y.; Charisi, A.; Cheng, L. C.; Jiang, T.; Girke, T. Chem- [80] Hu, L. L.; Huang, T.; Cai, Y. D.; Chou, K. C. Prediction of body
mineR: a compound mining framework for R. Bioinformatics, fluids where proteins are secreted into based on protein interaction
2008, 24(15), 1733–1734. network. PLoS One, 2011, 6(7), E22989.
[60] Gschwend, D. A.; Good, A. C.; Kuntz, I. D. Molecular docking [81] Li, B. Q.; Huang, T.; Liu, L.; Cai, Y. D.; Chou, K. C. Identification
towards drug discovery. J. Mol. Recognit., 1996, 9(2), 175–186. of colorectal cancer related genes with mRMR and shortest path in
[61] Kitchen, D. B.; Decornez, H.; Furr, J. R.; Bajorath, J. Docking and protein-protein interaction network. PLoS One, 2012, 7(4), E33393.
scoring in virtual screening for drug discovery: methods and appli- [82] Huang, T.; Wang, J.; Cai, Y. D.; Yu, H.; Chou, K. C. Hepatitis C
cations. Nat. Rev. Drug Discov., 2004, 3(11), 935–949. virus network based classification of hepatocellular cirrhosis and
[62] Lang, P. T.; Brozell, S. R.; Mukherjee, S.; Pettersen, E. F.; Meng, carcinoma. PLoS One, 2012, 7(4), E34460.
E. C.; Thomas, V.; Rizzo, R. C.; Case, D. A.; James, T. L.; Kuntz, [83] Chen, L.; Zeng, W. M.; Cai, Y. D.; Feng, K. Y.; Chou, K. C. Pre-
I. D. DOCK 6: combining techniques to model RNA-small mole- dicting Anatomical Therapeutic Chemical (ATC) classification of
cule complexes. RNA, 2009, 15(6), 1219–1230. drugs by integrating chemical-chemical interactions and similari-
[63] Moustakas, D. T.; Lang, P. T.; Pegg, S.; Pettersen, E.; Kuntz, I. D.; ties. PLoS One, 2012, 7(4), E35254.
Brooijmans, N.; Rizzo, R. C. Development and validation of a [84] Chou, K. C. Structural bioinformatics and its impact to biomedical
modular, extensible docking program: DOCK 5. J. Comput. Aided science. Curr. Med. Chem., 2004, 11(16), 2105–2134.
Mol. Des., 2006, 20, (10–11), 601–619. [85] Xiao, X.; Wu, Z. C.; Chou, K. C. iLoc-Virus: a multi-label learning
[64] Morris, G. M.; Huey, R.; Olson, A. J. Using AutoDock for ligand- classifier for identifying the subcellular localization of virus pro-
receptor docking. Curr. Protoc. Bioinformatics, 2008, CHAPTER teins with both single and multiple sites. J. Theor. Biol., 2011,
8, UNIT 8.14. 284(1), 42–51.
[65] Macindoe, G.; Mavridis, L.; Venkatraman, V.; Devignes, M. D.; [86] Chou, K. C.; Wu, Z. C.; Xiao, X. iLoc-Hum: using the accumula-
Ritchie, D. W. HexServer: an FFT-based protein docking server tion-label scale to predict subcellular locations of human proteins
powered by graphics processors. Nucleic Acids Res., 2010, 38(Web with both single and multiple sites. Mol. Biosyst., 2012, 8(2), 629–
Server issue), W445–449. 641.
[66] Jackson, R. M.; Gabb, H. A.; Sternberg, M. J. Rapid refinement of [87] Wu, Z. C.; Xiao, X.; Chou, K. C. iLoc-Plant: a multi-label classi-
protein interfaces incorporating solvation: application to the dock- fier for predicting the subcellular localization of plant proteins with
ing problem. J. Mol. Biol., 1998, 276(1), 265–285. both single and multiple sites. Mol. Biosyst., 2011, 7(12), 3287–
[67] Horvath, D. Pharmacophore-based virtual screening. Methods Mol. 3297.
Biol., 2011, 672, 261–298. [88] Wu, Z. C.; Xiao, X.; Chou, K. C. iLoc-Gpos: a multi-layer classi-
[68] Spitzer, G. M.; Heiss, M.; Mangold, M.; Markt, P.; Kirchmair, J.; fier for predicting the subcellular localization of singleplex and
Wolber, G.; Liedl, K. R. One concept, three implementations of 3D multiplex Gram-positive bacterial proteins. Protein Pept. Lett.,
pharmacophore-based virtual screening: distinct coverage of 2012, 19(1), 4–14.
chemical search space. J. Chem. Inf. Model., 2010, 50(7), 1241– [89] Chou, K. C.; Wu, Z. C.; Xiao, X. iLoc-Euk: a multi-label classifier
1247. for predicting the subcellular localization of singleplex and multi-
[69] Liu, X.; Ouyang, S.; Yu, B.; Liu, Y.; Huang, K.; Gong, J.; Zheng, plex eukaryotic proteins. PLoS One, 2011, 6(3), E18258.
S.; Li, Z.; Li, H.; Jiang, H. PharmMapper server: a web server for [90] Chou, K. C.; Shen, H. B. Cell-PLoc: a package of Web servers for
potential drug target identification using pharmacophore mapping predicting subcellular localization of proteins in various organisms.
approach. Nucleic Acids Res., 2010, 38(Web Server issue), W609– Nature Protoc., 2008, 3(2), 153–162.
614. [91] Wang, P.; Hu, L.; Liu, G.; Jiang, N.; Chen, X.; Xu, J.; Zheng, W.;
[70] Schneidman-Duhovny, D.; Dror, O.; Inbar, Y.; Nussinov, R.; Wolf- Li, L.; Tan, M.; Chen, Z.; Song, H.; Cai, Y. D.; Chou, K. C. Predic-
son, H. J. PharmaGist: a webserver for ligand-based pharma- tion of antimicrobial peptides based on sequence alignment and
cophore detection. Nucleic Acids Res., 2008, 36(Web Server issue), feature selection methods. PLoS One, 2011, 6(4), E18476.
W223–228. [92] Shen, H. B.; Chou, K. C. HIVcleave: a web-server for predicting
[71] Koes, D. R.; Camacho, C. J. Pharmer: efficient and exact pharma- human immunodeficiency virus protease cleavage sites in proteins.
cophore search. J. Chem. Inf. Model., 2011, 51(6), 1307–1314. Anal. Biochem., 2008, 375(2), 388–390.
[72] He, Z.; Zhang, J.; Shi, X. H.; Hu, L. L.; Kong, X.; Cai, Y. D.; [93] Chou, K. C. A vectorized sequence-coupling model for predicting
Chou, K. C. Predicting drug-target interaction networks based on HIV protease cleavage sites in proteins. J. Biol. Chem., 1993,
functional groups and biological features. PLoS One, 2010, 5(3), 268(23), 16938–16948.
E9603. [94] Chou, K. C.; Shen, H. B. ProtIdent: a web server for identifying
[73] Huang, T.; Zhang, J.; Xu, Z. P.; Hu, L. L.; Chen, L.; Shao, J. L.; proteases and their types by fusing functional domain and sequen-
Zhang, L.; Kong, X. Y.; Cai, Y. D.; Chou, K. C. Deciphering the tial evolution information. Biochem. Biophys. Res. Commun., 2008,
effects of gene deletion on yeast longevity using network and ma- 376(2), 321–325.
chine learning approaches. Biochimie, 2012, 94(4), 1017–1025. [95] Zheng, L. L.; Li, Y. X.; Ding, J.; Guo, X. K.; Feng, K. Y.; Wang,
[74] Wang, J. F.; Chou, K. C. Insights into the mutation-induced HHH Y. J.; Hu, L. L.; Cai, Y. D.; Hao, P.; Chou, K. C. A comparison of
syndrome from modeling human mitochondrial ornithine trans- computational methods for identifying virulence factors. PLoS
porter-1. PLoS One, 2012, 7(1), E31048. One, 2012, 7(8), E42517.
[96] Xiao, X.; Wang, P.; Chou, K. C. GPCR-2L: predicting G protein-
coupled receptors and their types by hybridizing two different
1190 Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 Singla et al.
modes of pseudo amino acid compositions. Mol. Biosyst., 2011, [125] Liu, K.; Feng, J.; Young, S. S. PowerMV: a software environment
7(3), 911–919. for molecular viewing, descriptor generation, data analysis and hit
[97] Xiao, X.; Wang, P.; Chou, K. C. GPCR-CA: A cellular automaton evaluation. J. Chem. Inf. Model., 45(2), 515–522.
image approach for predicting G-protein-coupled receptor func- [126] He, Y.; Liew, C. Y.; Sharma, N.; Woo, S. K.; Chau, Y. T.; Yap, C.
tional classes. J. Comput. Chem., 2009, 30(9), 1414–1423. W. PaDEL-DDPredictor: Open-source software for PD-PK-T pre-
[98] Lin, W. Z.; Xiao, X.; Chou, K. C. GPCR-GIA: a web-server for diction. J. Comput. Chem., 2012.
identifying G-protein coupled receptors and their families with grey [127] Truszkowski, A.; Jayaseelan, K. V.; Neumann, S.; Willighagen, E.
incidence analysis. Protein Eng. Des. Sel., 2009, 22(11), 699–705. L.; Zielesny, A.; Steinbeck, C. New developments on the chemin-
[99] Bhasin, M.; Raghava, G. P. S. GPCRsclass: a web tool for the formatics open workflow environment CDK-Taverna. J. Chemin-
classification of amine type of G-protein-coupled receptors. Nu- form., 2011, 3, 54.
cleic Acids Res., 2005, 33(Web Server issue), W143–147. [128] Li, Z. R.; Han, L. Y.; Xue, Y.; Yap, C. W.; Li, H.; Jiang, L.; Chen,
[100] Chou, K. C. Prediction of G-protein-coupled receptor classes. J. Y. Z. MODEL-molecular descriptor lab: a web-based server for
Proteome Res., 2005, 4(4), 1413–1418. computing structural and physicochemical features of compounds.
[101] Elrod, D. W.; Chou, K. C. A study on the correlation of G-protein- Biotechnol. Bioeng., 2007, 97(2), 389–396.
coupled receptor types with amino acid composition. Protein Eng., [129] Hattori, M.; Tanaka, N.; Kanehisa, M.; Goto, S. SIM-
2002, 15(9), 713–715. COMP/SUBCOMP: chemical structure search servers for network
[102] Chou, K. C.; Elrod, D. W. Bioinformatical analysis of G-protein- analyses. Nucleic Acids Res., 2010, 38(Web Server issue), W652–
coupled receptors. J. Proteome Res., 1(5), 429–433. 656.
[103] Overington, J. P.; Al-Lazikani, B.; Hopkins, A. L. How many drug [130] Rahman, S. A.; Bashton, M.; Holliday, G. L.; Schrader, R.; Thorn-
targets are there? Nat. Rev. Drug Discov., 2006, 5(12), 993–996. ton, J. M. Small Molecule Subgraph Detector (SMSD) toolkit. J.
[104] Veselovsky, A. V; Ivanov, A. S. Strategy of computer-aided drug Cheminform., 2009, 1(1), 12.
design. Curr. Drug Targets Infect. Disord., 2003, 3(1), 33–40. [131] Trott, O.; Olson, A. J. AutoDock Vina: improving the speed and
[105] Liguory, C.; Coffin, J. C. Oral choledocoscopy after endoscopic accuracy of docking with a new scoring function, efficient optimi-
sphincterotomy. Nouv. Presse Med., 1979, 8(2), 136. zation, and multithreading. J. Comput. Chem., 2010, 31(2), 455–
[106] Kalyaanamoorthy, S.; Chen, Y. P. P. Structure-based drug design 461.
to augment hit discovery. Drug Discov. Today, 2011, 16, (17–18), [132] Dominguez, C.; Boelens, R.; Bonvin, A. M. J. J. HADDOCK: a
831–839. protein-protein docking approach based on biochemical or bio-
[107] Klebe, G. Recent developments in structure-based drug design. J. physical information. J. Am. Chem. Soc., 2003, 125(7), 1731–1737.
Mol. Med. (Berl), 2000, 78(5), 269–281. [133] Koes, D. R.; Camacho, C. J. ZINCPharmer: pharmacophore search
[108] Ferenczy, G. Structure-based drug design. Acta Pharm. Hung., of the ZINC database. Nucleic Acids Res., 2012, 40(Web Server is-
1998, 68(1), 21–31. sue), W409–414.
[109] Singla, D.; Anurag, M.; Dash, D.; Raghava, G. P. S. A web server [134] Patlewicz, G.; Jeliazkova, N.; Safford, R. J.; Worth, A. P.;
for predicting inhibitors against bacterial target GlmU protein. Aleksiev, B. An evaluation of the implementation of the Cramer
BMC Pharmacol., 2011, 11, 5. classification scheme in the Toxtree software. SAR QSAR Environ.
[110] Garg, A.; Tewari, R.; Raghava, G. P. S. KiDoQ: using docking Res., 2008, 19, (5–6), 495–524.
based energy scores to develop ligand based model for predicting [135] Lagunin, A.; Stepanchikova, A.; Filimonov, D.; Poroikov, V.
antibacterials. BMC Bioinformatics, 2010, 11, 125. PASS: prediction of activity spectra for biologically active sub-
[111] Acharya, C.; Coop, A.; Polli, J. E.; Mackerell, A. D. Recent ad- stances. Bioinformatics, 2000, 16(8), 747–748.
vances in ligand-based drug design: relevance and utility of the [136] Cheng, F.; Li, W.; Zhou, Y.; Shen, J.; Wu, Z.; Liu, G.; Lee, P. W.;
conformationally sampled pharmacophore approach. Curr. Com- Tang, Y. admetSAR: A Comprehensive Source and Free Tool for
put. Aided Drug Des., 2011, 7(1), 10–22. Assessment of Chemical ADMET Properties. J. Chem. Inf. Model.,
[112] Bacilieri, M.; Moro, S. Ligand-based drug design methodologies in 2012.
drug discovery process: an overview. Curr. Drug Discov. Technol., [137] Lagorce, D.; Sperandio, O.; Galons, H.; Miteva, M. A.; Villoutreix,
2006, 3(3), 155–165. B. O. FAF-Drugs2: free ADME/tox filtering tool to assist drug dis-
[113] Dean, P. M.; Lloyd, D. G.; Todorov, N. P. De novo drug design: covery and chemical biology projects. BMC Bioinformatics, 2008,
integration of structure-based and ligand-based methods. Curr. 9, 396.
Opin. Drug Discov. Devel., 2004, 7(3), 347–353. [138] García-Sosa, A. T.; Oja, M.; Hetényi, C.; Maran, U. DrugLogit:
[114] Hartenfeller, M.; Schneider, G. De novo drug design. Methods Mol. logistic discrimination between drugs and nondrugs including dis-
Biol., 2011, 672, 299–323. ease-specificity by assigning probabilities based on molecular
[115] Dean, P. M. Chemical genomics: a challenge for de novo drug properties. J. Chem. Inf. Model., 2012, 52(8), 2165–2180.
design. Mol. Biotechnol., 2007, 37(3), 237–245. [139] Cruciani, G.; Carosati, E.; De Boeck, B.; Ethirajulu, K.; Mackie,
[116] Schneider, G.; Fechner, U. Computer-based de novo design of C.; Howe, T.; Vianello, R. MetaSite: understanding metabolism in
drug-like molecules. Nat. Rev. Drug Discov., 2005, 4(8), 649–663. human cytochromes from the perspective of the chemist. J. Med.
[117] Ihlenfeldt, W. D.; Bolton, E. E.; Bryant, S. H. The PubChem Chem., 2005, 48(22), 6970–6979.
chemical structure sketcher. J. Cheminform., 2009, 1(1), 20. [140] Liu, R.; Liu, J.; Tawa, G.; Wallqvist, A. 2D SMARTCyp reactivity-
[118] Schüller, A.; Schneider, G.; Byvatov, E. SMILIB: Rapid Assembly based site of metabolism prediction for major drug-metabolizing
of Combinatorial Libraries in SMILES Notation. QSAR Comb. Sci., cytochrome P450 enzymes. J. Chem. Inf. Model., 2012, 52(6),
2003, 22(7), 719–721. 1698–1712.
[119] Truchon, J. F. GLARE: A tool for product-oriented design of com- [141] Mishra, N. K.; Agarwal, S.; Raghava, G. P. Prediction of cyto-
binatorial libraries. Methods Mol. Biol., 2011, 685, 337–346. chrome P450 isoform responsible for metabolizing a drug mole-
[120] Lam, T. H.; Bernardo, P. H.; Chai, C. L. L.; Tong, J. C. CLEVER: cule. BMC Pharmacol., 2010, 10, 8.
A general design tool for combinatorial libraries. Methods Mol. [142] Garg, A.; Bhasin, M.; Raghava, G. P. S. Support vector machine-
Biol., 2011, 685, 347–356. based method for subcellular localization of human proteins using
[121] Tschinke, V.; Cohen, N. C. The NEWLEAD program: a new amino acid compositions, their order, and similarity search. J. Biol.
method for the design of candidate structures from pharmacophoric Chem., 2005, 280(15), 14427–14432.
hypotheses. J. Med. Chem., 1993, 36(24), 3863–3870. [143] Kumar, M.; Verma, R.; Raghava, G. P. S. Prediction of mitochon-
[122] Junker, J. Statistical filtering for NMR based structure generation. drial proteins using support vector machine and hidden Markov
J. Cheminform., 2011, 3(1), 31. model. J. Biol. Chem., 2006, 281(9), 5357–5363.
[123] Liu, X.; Bai, F.; Ouyang, S.; Wang, X.; Li, H.; Jiang, H. Cyndi: a [144] Xiao, X.; Wu, Z. C.; Chou, K. C. A multi-label classifier for pre-
multi-objective evolution algorithm based method for bioactive dicting the subcellular localization of gram-negative bacterial pro-
molecular conformational generation. BMC Bioinformatics, 2009, teins with both single and multiple sites. PloS one, 2011, 6(6),
10, 101. E20592.
[124] Tosco, P.; Balle, T.; Shiri, F. SDF2XYZ2SDF: how to exploit [145] Bhasin, M.; Garg, A.; Raghava, G. P. S. PSLpred: prediction of
TINKER power in cheminformatics projects. J. Mol. Model., 2011, subcellular localization of bacterial proteins. Bioinformatics, 2005,
17(11), 3021–3023. 21(10), 2522–2524.
Open Source Software and Web Services for Designing Therapeutic Molecules Current Topics in Medicinal Chemistry, 2013, Vol. 13, No. 10 1191
[146] Lin, W. Z.; Fang, J. A.; Xiao, X.; Chou, K. C. iDNA-Prot: identifi- [163] Bhasin, M.; Raghava, G. P. S. Classification of nuclear receptors
cation of DNA binding proteins using random forest with grey based on amino acid composition and dipeptide composition. J.
model. PLoS One, 2011, 6(9), E24756. Biol. Chem., 2004, 279(22), 23262–23266.
[147] Lin, W. Z.; Fang, J. A.; Xiao, X.; Chou, K. C. Predicting Secretory [164] Wang, P.; Xiao, X.; Chou, K. C. NR-2L: a two-level predictor for
Proteins of Malaria Parasite by Incorporating Sequence Evolution identifying nuclear receptor subfamilies based on sequence-derived
Information into Pseudo Amino Acid Composition via Grey Sys- features. PLoS One, 2011, 6(8), E23505.
tem Model. PLoS One, 2012, 7(11), E49040. [165] Xiao, X.; Wang, P.; Chou, K. C. iNR-PhysChem: a sequence-based
[148] Chou, K. C.; Shen, H. B. Signal-CF: a subsite-coupled and win- predictor for identifying nuclear receptors and their subfamilies via
dow-fusing approach for predicting signal peptides. Biochem. Bio- physical-chemical property matrix. PLoS One, 2012, 7(2), E30869.
phys. Res. Commun., 2007, 357(3), 633–640. [166] Chen, W.; Lin, H.; Feng, P. M.; Ding, C.; Zuo, Y. C.; Chou, K. C.
[149] Mishra, N. K.; Kumar, M.; Raghava, G. P. S. Support vector ma- iNuc-PhysChem: a sequence-based predictor for identifying nu-
chine based prediction of glutathione S-transferase proteins. Pro- cleosomes via physicochemical properties. PLoS One, 2012, 7(10),
tein Pept. Lett., 2007, 14(6), 575–580. E47843.
[150] Chen, K.; Mizianty, M. J.; Kurgan, L. ATPsite: sequence-based [167] Garg, A.; Raghava, G. P. S. A machine learning based method for
prediction of ATP-binding residues. Proteome Sci., 2011, 9 SUPPL the prediction of secretory proteins using amino acid composition,
1, S4. their order and similarity-search. In silico Biol., 2008, 8(2), 129–
[151] Chauhan, J. S.; Mishra, N. K.; Raghava, G. P. S. Prediction of GTP 140.
interacting residues, dipeptides and tripeptides in a protein from its [168] Liu, Z.; Ren, J.; Cao, J.; He, J.; Yao, X.; Jin, C.; Xue, Y. System-
evolutionary information. BMC Bioinformatics, 2010, 11, 301. atic analysis of the Plk-mediated phosphoregulation in eukaryotes.
[152] Mishra, N. K.; Raghava, G. P. S. Prediction of FAD interacting Brief. Bioinform, 2012, BBS041–.
residues in a protein from its primary sequence using evolutionary [169] Chu, W. Y.; Huang, Y. F.; Huang, C. C.; Cheng, Y. S.; Huang, C.
information. BMC Bioinformatics, 2010, 11 SUPPL 1, S48. K.; Oyang, Y. J. ProteDNA: a sequence-based predictor of se-
[153] Ansari, H. R.; Raghava, G. P. S. Identification of NAD interacting quence-specific DNA-binding residues in transcription factors. Nu-
residues in proteins. BMC Bioinformatics, 2010, 11, 160. cleic Acids Res., 2009, 37(Web Server issue), W396–401.
[154] Chauhan, J. S.; Bhat, A. H.; Raghava, G. P. S.; Rao, A. GlycoPP: a [170] Gabdoulline, R.; Eckweiler, D.; Kel, A.; Stegmaier, P. 3DTF: a
webserver for prediction of N- and O-glycosites in prokaryotic pro- web server for predicting transcription factor PWMs using 3D
tein sequences. PLoS One, 2012, 7(7), E40155. structure-based energy calculations. Nucleic Acids Res., 2012,
[155] Julenius, K.; Mølgaard, A.; Gupta, R.; Brunak, S. Prediction, con- 40(Web Server issue), W180–185.
servation analysis, and structural characterization of mammalian [171] Schug, J. Using TESS to predict transcription factor binding sites in
mucin-type O-glycosylation sites. Glycobiology, 2005, 15(2), 153– DNA sequence. Curr. Protoc. Bioinformatics, 2008, CHAPTER 2,
164. UNIT 2.6.
[156] Monigatti, F.; Gasteiger, E.; Bairoch, A.; Jung, E. The Sulfinator: [172] Hiard, S.; Marée, R.; Colson, S.; Hoskisson, P. A.; Titgemeyer, F.;
predicting tyrosine sulfation sites in protein sequences. Bioinfor- Van Wezel, G. P.; Joris, B.; Wehenkel, L.; Rigali, S. PREDetector:
matics, 2002, 18(5), 769–770. a new tool to identify regulatory elements in bacterial genomes.
[157] Chang, W. C.; Lee, T. Y.; Shien, D. M.; Hsu, J. B. K.; Horng, J. T.; Biochem. Biophys. Res. Commun., 2007, 357(4), 861–864.
Hsu, P. C.; Wang, T. Y.; Huang, H. D.; Pan, R. L. Incorporating [173] Messeguer, X.; Escudero, R.; Farré, D.; Núñez, O.; Martínez, J.;
support vector machine for identifying protein tyrosine sulfation Albà, M. M. PROMO: detection of known transcription regulatory
sites. J. Comput. Chem., 2009, 30(15), 2526–2537. elements using species-tailored searches. Bioinformatics, 2002,
[158] Xue, Y.; Zhou, F.; Fu, C.; Xu, Y.; Yao, X. SUMOsp: a web server 18(2), 333–334.
for sumoylation site prediction. Nucleic Acids Res., 2006, 34(Web [174] Kumar, R.; Panwar, B.; Chauhan, J. S.; Raghava, G. P. Analysis
Server issue), W254–257. and prediction of cancerlectins using evolutionary and domain in-
[159] Wong, Y. H.; Lee, T. Y.; Liang, H. K.; Huang, C. M.; Wang, T. Y.; formation. BMC Res. Notes, 2011, 4, 237.
Yang, Y. H.; Chu, C. H.; Huang, H. D.; Ko, M. T.; Hwang, J. K. [175] Jenwitheesuk, E.; Wang, K.; Mittler, J. E.; Samudrala, R. PIR-
KinasePhos 2.0: a web server for identifying protein kinase- Spred: a web server for reliable HIV-1 protein-inhibitor resis-
specific phosphorylation sites based on sequences and coupling tance/susceptibility prediction. Trends Microbiol., 2005, 13(4),
patterns. Nucleic Acids Res., 2007, 35(Web Server issue), W588– 150–151.
594. [176] Yuan, Y.; Pei, J.; Lai, L. LigBuilder 2: A Practical de novo Drug
[160] Blom, N.; Sicheritz-Pontén, T.; Gupta, R.; Gammeltoft, S.; Brunak, Design Approach. J. Chem. Inf. Model., 2011.
S. Prediction of post-translational glycosylation and phosphoryla- [177] Vlachakis, D.; Tsagrasoulis, D.; Megalooikonomou, V.; Kossida,
tion of proteins from the amino acid sequence. Proteomics, 2004, S. Introducing Drugster: a comprehensive and fully integrated drug
4(6), 1633–1649. design, lead and structure optimization toolkit. Bioinformatics,
[161] Saha, S.; Zack, J.; Singh, B.; Raghava, G. P. S. VGIchan: predic- 2012.
tion and classification of voltage-gated ion channels. Genomics [178] Douguet, D. e-LEA3D: a computational-aided drug design web
Proteomics Bioinformatics, 2006, 4(4), 253–258. server. Nucleic Acids Res., 2010, 38(Web Server issue), W615–
[162] Saha, S.; Raghava, G. P. S. VICMpred: an SVM-based method for 621.
the prediction of functional proteins of Gram-negative bacteria us- [179] Singla, D.; Tewari, R.; Kumar, A.; Raghava, G. P. Designing of
ing amino acid patterns and composition. Genomics Proteomics inhibitors against drug tolerant Mycobacterium tuberculosis
Bioinformatics, 2006, 4(1), 42–47. (H37Rv). Chem. Cent. J., 2013, 7(1), 49.
Received: December 12, 2012 Revised: January 20, 2013 Accepted: February 08, 2013