In Silico
In Silico
In Silico
JC
The Journal of Engineering and Exact Sciences – JCEC, Vol. 05 N. 01 (2019)
journal homepage: https://periodicos.ufv.br/ojs/jcec
doi: https: 10.18540/jcecvl5iss1pp0049-0062
OPEN ACCESS – ISSN: 2527-1075
RESUMO
O estudo da relação quantitativa estrutura-atividade (QSAR) foi utilizado em um conjunto
de dados de 43 compostos de inibidores heterocíclicos e fenílicos, a fim de estabelecer uma
correlação entre as concentrações inibitórias dos compostos em questão e seus Estruturas.
O método de otimização da teoria funcional da densidade (DFT) foi utilizado para
minimizar a energia das estruturas 3D utilizando o parâmetro híbrido funcional de Becke
(B3) com a correlação funcional de Lee, Yang e Parr (LYP), comumente denominada
B3LYP funcional híbrido e 6-31G * base Set (B3LYP/6-31G *) método, para descobrir seus
descritores Quantum molecular. Cinco modelos de QSAR foram gerados com a técnica de
algoritmo de função genética (GFA). Entre os cinco modelos gerados, o modelo 1 foi
selecionado como o melhor modelo por causa de sua significância estatística (LOF de
Friedman = 0,3008, R2 = 0,9784, R2adj = 0,9739, Qcv2 = 0,9675 e R2pred = 0,7348). O
modelo meticuloso foi avaliado por meio da abordagem Leave One out Cross-Validation
(LOO-CV), validação externa dos compostos do conjunto de teste, teste de randomização Y
e domínio de aplicabilidade (Williams Plot). O modelo QSAR proposto foi altamente
preditivo e vigoroso com bons parâmetros de validação. Os descritores moleculares
EC
JC
JCEC - ISSN 2527-1075.
∑
The molecular descriptor is a mathematical value or figure
= ( )
that describe properties of molecules acquired from an accurate
∑ ∑
algorithm or experimental procedure (Olasupo et al., 2017).
Quantum chemical descriptors were calculated using the
optimization software of Quantum chemistry known as Spartan
' 14 version 1.1.2 (Abdulfatai et al., 2017). The descriptors of Where, ME is the mean effect of descriptor J in the developed
the various dimensions (1D, 2D, and 3D) were calculated with QSAR model, while βj, is the coefficient of descriptor J, Dj is
the help of Padel software version 2.18. The total molecular the value of each descriptor in the data set for each molecule in
descriptors of 1875 (1444 1D, 2D, and 431 3D) were generated the training compounds, m is the number of descriptors that
from the Padel descriptor software and the Spartan "14 " appear in the model and n is the number of training set
Software (Arthur et al., 2016). compounds (Minovski et al., 2013)
The descriptors obtained from the Padel software were The significant impact of each designated descriptor is to
unvarying by a procedure that makes use of range; Maximum take into account the standardized regression coefficients ( )
and minimum with the dispersion of the molecular descriptors as the factor that must be considered in measuring the degree
using standard deviation or variance. Thus, the sprinkling of of impact of each descriptor to the model. The standardized
the descriptors was achieved before the data were transformed regression coefficients can be calculated using Equation 2,
over into an N distribution (0, 1, 2... N). This procedure given by the expression below:
enables the correlation between descriptors much less
= ( )
redundant (Panchal et al., 2013).
corresponding coefficient value of each descriptor in the However, the predictive power of the QSAR model was
regression equation and its numerical sign (Roy et al., 2015). also determined by the use of Friedman's lack of Fit (LOF),
However, each regression coefficient should be significant at which is one among other criteria of internal validation
the 95% probability level (P < 0.05), otherwise called a 95% assessment. The lack of Friedman's Fit (LOF) was calculated
confidence limit. This can be confirmed from a Student 't' test. using Equation 6, given below as:
The number of compounds and the number of descriptors must
+,- = (2)
be in a ratio of at least 5:1. MLR model that fits well the
/ + &#
. − 1
normalized data will give a scatter plot of experimental
inhibition concentration versus predicted inhibition 0
concentration which eventually shows a lower deviation of the
contour points from the line of fit (figure 1). Where SEE is the standard estimation error, also known as
standard deviation (SD), p is the number of independent
2.10 Evaluation of the QSAR Models variable in the model, d is a user-defined smoothing parameter,
c is the number of terms in the model, and N is the number of
The developed QSAR models were evaluated using the the compound in the training set. However, a model is alleged
following statistical parameters, N (number of compounds in to be vigorous if it has a smaller SEE value. The SEE can be
regression), P (number of descriptors), R2 (squared correlation calculated using Equation 7, expressed below as:
coefficient), F- test (Fischer's value), Q2CV (Cross-validation
− #$%&.
correlation coefficient) R2pred. (square correlation coefficient
=3 (6)
%4#.
0−5−
for external tests). The R2 regression coefficient and the Q2CV
cross-validation correlation coefficient are the two most
important determining factors that should be taken into
consideration in accepting any validated QSAR model (Arthur Where, %4#. and #$%&. are the experimental and predicted
et al., 2016). A model is alleged to be significant only if it has inhibition concentration of the compounds in the training set. N
fulfilled the following conditions: R2 > 0.6, Q2CV > 0.6 and is the number of training compounds, P is the number of
R2pred > 0.5. Taking into account the statistical parameters, we descriptors in the model (Jalali et al., 2004). In addition,
opted for a high-value model Q2CV (cross-validation correlation another factor that should be considered highly important when
coefficient) and R2 (correlation coefficient for the training set) accessing the internal validation of a QSAR model is the
as the best model. Leave-One-out cross-validation coefficient. The cross-
. validation regression coefficient (Q2CV) can be calculated using
2.11 Internal Validation of the QSAR Model Equation 8, given by:
∑9 −
7/8 = − (;)
%4#. #$%&.
Internal validation is the first step in validating the
∑9 −:
QSAR model. The expected results of internal validation
indicate that the model permeates a higher level of stability and %4#.
Where <=>?. , <?@=A. , BCD <E are experimental, predicted and the
reliability (Abdulfatai et al., 2016, 1). The Square correlation
coefficient (R2) designates the division of the total variant
accredited to the model. The closer the value of R2 to 1.0; the mean inhibition concentration values of the training set
better the model generated. R2 is one of the most commonly compounds (Jalali et al., 2004).
used internal validation parameters and can be calculated using
Equation 4, expressed below as: 2.12 External validation of the QSAR Model
∑ −
= − ())
! . #$%&.
The developed QSAR model was externally validated to
∑ −
confirm its robustness. Therefore, the external validation of the
! . '$ ( model was evaluated centered on its R2 value for the
compounds in the test set. Therefore, the external predictive
Where, ! . , #$%&. , and '$ ( are observed, predicted a strength and the extrapolation of the models were calculated
mean inhibition concentration of the compounds in the training using the regression coefficient expression given by equation
set (Alho et al., 2010). The R2 value is directly proportional to 9, below:
the number of descriptors in the model. Therefore, we cannot
∑ −
= − (G)
#$%&.'% ' %4#.'% '
only rely on the R2 value in the development of the model.
F% '
∑ − : F$
However, R2 needs to be adjusted to the number of descriptors
used in the model. The adjusted R2 can be calculated using #$%&.'% ' (
equation 5 given below as:
Where, #$%&.'% ' , %4#.'% ' , are the predicted, experimental
− ( − ) −#
= −( − ) = (*) : F$
−#− −# +
inhibition concentration of the compounds in the test set and
&.
( is the mean experimental inhibition concentration of
the training set compounds (Tropsha et al., 2003). In addition,
Where n is the number of training set compounds and p is the predictive capacity of the QSAR model was also calculated
∑ : ∑ :
2.14 Evaluation of the Applicability Domain of the Model
ķ= &ķ =P
( )
∑: The developed QSAR model was further evaluated by
T = ( )U V , ( = V, … #, ) ( ))
and new QSAR models were developed using the same
arrangement of variables as they existed in the unrandomized F F
(V + )
2.13 Y-randomization test
T∗ = ( *)
The Y- randomization test is an important criterion for the
external validation of the developed QSAR model. To
safeguard this, the QSAR model produced was predictive and Where n is the number of compounds in the training set, and k
not acquired by fortuitous, the Y-randomization test was is the number of descriptors used in the model.
employed using the training set compounds, as suggested by
2.15 Quality assurance of the model
the Y randomization test, / # should be greater than 0.5 (/ #
Tropsha (Tropsha et al., 2003). However, for a model to pass
> 0.5). The / # can be calculated using equation 13, expressed The methods of internal and external validation of a QSAR
model were the two most significant techniques used in
below as:
assessing the stability, robustness, reliability and predictive
/ # = ⌈ − ( $) ⌉ ( )
.
capacity of the QSAR model. The validation parameters were
compared with the recommendation standard (Veerasamy et
Where, / # is the coefficient of determination for Y
al., 2011). Table 1.0; outline the Standard General
recommendation values for the internal and external validation
randomization test, is the correlation coefficient for Y- parameters that guarantee whether to accept or to reject a
randomization test and $ is average of the random model model.
(Tropsha et al., 2003).
EC
JC
JCEC - ISSN 2527-1075.
≥ .2
5G*% < . *
Coefficient of determination
\/8 ≥ .*
Confidence interval at 95% confidence level
Table 2- Showing Herbicide IUPAC Name, Experimental pIC50, Predicted pIC50 values and Residual Values of the
generated MLR model.
S/N IUPAC Name pLC50Exp pLC50Pred. Residual
.
1 2.96 2.81 -0.15
2
17
37
43 5 5.15 0.15
Table 3- Validation parameters for each model using Genetic Function Approximation (GFA)
S/N Model 1 Model 2 Model 3 Model 4 Model 5 Threshold value
Table 4- Calculated descriptor values for the training sets with their experimental and predicted concentrations.
Molecules AMR SpMax8_Bhp FPSA-2 MOMI-YZ RDF50m Experimental Predicted
pLC50 pLC50
4 77.1585 2.192375 0.971161 1.368516 4.938247 3.04 3.01
5 81.3337 2.246656 0.597721 6.505179 10.84811 5.63 5.50
6 73.6698 1.855351 0.62989 3.441717 9.041087 4.41 4.14
8 77.7131 2.368499 0.990879 1.027995 4.619998 3.26 3.07
9 47.3379 1.147726 0.663697 2.028204 1.060182 0.95 1.29
10 85.9427 2.45754 1.622312 1.648164 11.35794 3.03 2.77
11 61.6751 1.918317 1.000945 2.015574 9.995276 3.04 2.76
12 63.455 2.303188 1.210624 2.76899 5.756426 2.18 2.30
13 75.9645 2.413115 1.603034 3.894527 15.41641 3.32 3.17
14 49.6439 1.400916 0.854616 1.769071 5.600613 1.8 1.64
15 46.9027 0.907671 0.557558 1.631555 10.00544 2.02 2.28
16 18.5507 0.817568 0.413431 2.539778 0.001403 0.52 0.46
18 56.9644 1.303324 0.87168 2.500567 5.616158 1.66 1.88
20 84.3649 2.276436 0.832989 2.848138 9.616859 4.14 4.43
21 98.324 2.699828 1.846134 2.136735 8.547054 2.61 2.66
23 61.3448 1.467804 0.612111 1.200149 2.45737 2.61 2.300
25 71.256 2.257071 0.735835 1.7323 23.1274 5.53 5.56
26 68.3294 2.31343 0.87362 4.708859 4.723874 3.21 3.45
30 133.5519 2.811904 1.9543 3.782769 27.18359 6.4 6.19
32 122.1024 2.768123 1.194735 3.399947 15.81433 5.95 6.22
33 97.112 2.650921 1.659118 1.221423 16.1747 3.6 3.78
34 85.6045 2.349084 0.610816 5.913709 9.17447 5.85 5.45
36 108.0014 2.75524 2.319869 3.399239 19.56198 3.11 3.35
39 91.0298 2.247382 0.490221 4.857742 8.623428 5.34 5.65
42 111.577 2.671396 1.600715 8.795199 20.34854 6.04 5.98
43 92.7797 2.502606 1.116026 2.007745 8.517236 4.33 4.03
EC
JC
JCEC - ISSN 2527-1075.
Table 5- Calculated descriptors for the test set with their experimental and predicted Activity.
Molecules AMR SpMax8_Bhp FPSA-2 MOMI-YZ RDF50m Activity Predict
2 53.1163 1.246209 0.547225 2.763915 1.450453 2.51 2.03
3 60.7273 1.479485 0.592166 8.461216 3.651049 3.67 3.45
19 54.3212 1.66715 0.850731 2.593675 7.153605 3.55 2.37
22 91.2674 2.665925 1.074292 2.856846 7.112731 4.98 4.19
24 70.4077 1.840996 0.723742 1.335393 7.309965 3.18 3.30
27 98.6457 2.684169 1.884564 5.179453 10.28315 4.16 3.17
28 115.3564 2.665925 1.619995 2.890615 19.21301 4.2 5.15
29 74.3812 1.977655 0.678168 9.250508 3.035912 4.16 4.26
31 125.5065 2.745364 1.025938 5.845965 24.63727 5.71 8.08
35 63.5926 2.315266 1.754143 4.465842 7.913456 2.04 1.46
37 55.904 1.718534 0.723825 3.15644 3.283551 1.91 2.42
38 81.9939 2.45754 0.880807 2.441991 13.95963 4.98 4.84
41 88.4505 2.476088 1.183455 2.169109 10.96011 4.19 3.98
and \/8 for the number of trials allow us to say that, the
Random 2 0.500953 0.250954 -0.13068 7. It was observed from the result that, the low values of
Random 3 0.46971 0.220627 -0.16702
/ # value was much higher than 0.5 which signifies that,
constructed model was reliable, stable and robust. While, the
Random 4 0.579175 0.335444 -0.09836
Random 5 0.354832 0.125906 -0.29534 the built model was powerful and not inferred by Trial and
Random 6 0.314135 0.098681 -0.47183 error.
Random 7 0.559705 0.31327 -0.13038
The description and other related statistical parameters
Random 8 0.306126 0.093713 -0.41786 that may have a greater impact on the selected descriptors
Random 9 0.262277 0.068789 -0.70639 were reported in table 8. The appearance of 2D and 3D
Random 10 0.360066 0.129647 -0.54853 descriptors in the model shows that these types of
descriptors will have a greater affinity (increase the
inhibition concentration of the compounds). The values of
Random Models Parameters the variance inflation factor (VIF) for all the five descriptors
Average r : 0.422977 of the model were lower than 7, which indicates that the
descriptors were orthogonal and the model validity was
Average r^2 : 0.191035 highly significant. The null hypothesis suggesting that there
Average Q^2 : -0.31291 is no significant relationship between the inhibition
cRp^2 : 0.884432 concentration and the descriptor used in the construction of
the model at P < 0.05. The P- values of the descriptors in the
EC
JC
JCEC - ISSN 2527-1075.
Table 8 - List of the descriptors, their description, classes, and their statistical significance.
Statistics
S/N Descriptor Description Class VIF ME P-Value
Symbol
6 R² = 0.9784
5
Predicted (pLC50)
4
0
0 1 2 3 4 5 6 7
Experimental (pLC50)
Figure 1- A plot of Predicted (pLC50) versus Experimental (pLC50) of the training set
9
8 R² = 0.7348
7
Predicted (pLC50)
6
5
4
3
2
1
0
0 1 2 3 4 5 6
Experimental (pLC50)
Figure 2- A plot of Predicted (pLC50) versus Experimental (pLC50) of the test set.
EC
JC
JCEC - ISSN 2527-1075.
ℎ∗ = 0.6
training trainig
4 4
set set
3
Standardied Residual
3 test set test set
ID-31
2 2
Residual (pS)
1 1
0 0
ID-35
-1 -1
-2 -2
-3 -3
-4 -4
0 5 0 0,5 1
Experimental (pLC50) leverage
Figure 3- A plot of Residual versus Experimental (pLC50) Figure 4- Williams Plot, A plot of standardized residual
of the training and test set versus Leverage
mechanisms of action. Pesticide Biochemistry and RASULEV, B. F.; ABDULLAEV, N. D.; SYROV, V. N.;
Physiology, v.102, n. 3, p. 189-197, 2012. LESZCZYNSKI, J. A Quantitative Structure‐Activity
FUNAR-TIMOFEI, S.; BOROTA, A.; CRISAN, L. Combined Relationship (QSAR) Study of the Antioxidant
molecular docking and QSAR study of fused Activity of Flavonoids. QSAR & Combinatorial
heterocyclic herbicide inhibitors of D1 protein in Science, v.24, n. 9, p. 1056-1065, 2005.
photosystem II of plants. Molecular diversity, v.21, ROY, K.; KAR, S.; Das, R. N. Understanding the basics of
n. 2, p. 437-454, 2017. QSAR for applications in pharmaceutical sciences and
GANDY, M. N.; CORRAL, M. G.; MYLNE, J. S.; STUBBS, risk assessment. Academic Press, 2015.
K. A. An interactive database to explore herbicide SAIDI, A.; MIRZAEI, M. Prediction of AHAS inhibition by
physicochemical properties. Organic & sulfonylurea herbicides using a genetic algorithm and
biomolecular chemistry, v,13, n. 20, p. 5586-5590, artificial neural network. 2016.
2015. SHEN, M.; LET IRAN, A.; XIAO, Y.; GALBRAITH, A.;
HANSCH, C.; MUIR, R. M.; FUJITA, T.; MALONEY, P. P.; KOHN, H.; TROPSHA, A. Quantitative structure-
GEIGER, F.; STREICH, M. The correlation of activity relationship analysis of functionalized amino
biological activity of plant growth regulators and acid anticonvulsant agents using k nearest neighbor
chloromycetin derivatives with Hammett constants and simulated annealing PLS methods. Journal of
and partition coefficients. Journal of the American medicinal chemistry, v.45, n. 13, p. 2811-2823,
Chemical Society, v.85, n. 18, p. 2817-2824, 1963. 2002.
IBRAHIM, M. T.; UZAIRU, A.; SHALLANGWA, G. A.; TAKAČ, M. J.; MEDIĆ-ŠARIČ, M. QSPR, and QSAR in
IBRAHIM, A. In-silico studies of some oxadiazoles Pharmacy. I. Classic QSAR models. Hansch and Fred
derivatives as anti-diabetic compounds. Journal of Wilson's model. Farmaceutski glasnik: glacial
King Saud University-Science, p. 2018. Hrvatskog farmaceutskog društva, v.47, n. 6, p.
JALALI-HERAVI, M.; KYANI, A. "Use of computer-assisted 161-178, 1991.
methods for the modeling of the retention time of a TROPSHA, A.; GRAMATICA, P.; GOMBAR, V. K. The
variety of volatile organic compounds: a PCA-MLR- importance of being earnest: validation is the absolute
ANN approach. Journal of chemical information essential for successful application and interpretation
and computer sciences, v.44 n. 4, p. 1328-1335, of QSPR models. QSAR & Combinatorial Science,
2004. v.22, n. 1, p. 69-77, 2003.
KENNARD, R. W.; STONE, LARRY, A. Computer-aided TROYER, J. R. In the beginning: the multiple discoveries of
design of experiments. Technometrics, v.11, n. 1, p. the first hormone herbicides. Weed Science, v.49, n.
137-148, 1969. 2, p. 290-297, 2001.
LEE, C.; Yang, W.; PARR, R.G. Becke's three-parameter VEERASAMY, R.; RAJAK, H.; JAIN, A.; SIVADASAN, S.;
hybrid method using the LYP. Phys. Rev. B, v.37, p. VARGHESE, C. P.; AGRAWAL, R. K. Validation of
785, 1988. QSAR models-strategies and importance.
LIU, Y.; Zhao, H.; WANG, Z.; Li, Y.; Song, H.; RICHES, H.; International Journal of Drug Design & Discovery,
BEATTIE, D.; Gu, Y.; WANG, Q. The discovery of v.3, p. 511-519, 2011.
3-(1-aminoethylidene) quinoline-2, 4 (1H, 3H)-dione VERMA, J.; KHEDKAR, V. M.; COUTINHO, E. C. "3D-
derivatives as novel PSII electron transport inhibitors. QSAR in drug design-a review. Current topics in
Molecular diversity, v.17, n. 4, p. 701-710, 2013. medicinal chemistry, v.10, n. 1, p. 95-115, 2010.
MINOVSKI, N.; ŽUPERL, Š.; DRGAN, V.; NOVIČ, M. ZHANG, C.; CHANG, S.; TIAN, X.; Tian, Y. 3D-QSAR and
Assessment of applicability domain for multivariate docking modeling study of 1, 3, 5-triazine derivatives
counter-propagation artificial neural network as PSII electron transport inhibitor. Asian Journal of
predictive models by minimum Euclidean distance Chemistry, v.26, n. 1, p. 264, 2014.
space analysis: A case study. Analytica chimica acta, ZIMMERMAN, P. W.; HITCHCOCK, A. E. Plant hormones.
v.759, p. 28-42, 2013. Annual review of biochemistry, v.17, n. 1, p. 601-
OLASUPO, S. B.; UZAIRU, A.; SAGA GIS, B. S. Density 626, 1948.
Functional Theory (B3LYP/6-31G*) Study of
Toxicity of Polychlorinated Dibenzofurans., 2017.
PANCHAL, J. H.; KALIDINDI, S. R.; MCDOWELL, D. L.
Key computational modeling issues in integrated
computational materials engineering. Computer-
Aided Design, v.45, n. 1, 4-25, 2013.
PFISTER, K.; ARNTZEN, C. J. The mode of action of
photosystem II-specific inhibitors in herbicide-
resistant weed biotypes. Zeitschrift für
Naturforschung C, v.34, n. 11, p. 996-1009, 1979.
PRASAD, R. K.; SHARMA, R. 2D QSAR Analysis of
pyrazine carboxamide derivatives as an herbicidal
agent. Journal of Computational Method &
Molecular Design, v.1, p. 7-13, 2011.