1471 2156 11 26 PDF
1471 2156 11 26 PDF
1471 2156 11 26 PDF
2010
Structured summary
Snapshot
Key findings
The study identifies a combination of 14 SNPs in 12 genes that can predict T2D with a
rate of 65.3% using an SVM algorithm. The study also finds that sub-population datasets
of men and women yield slightly better prediction rates of 70.9% and 70.6%,
respectively.
The study found novel associations between combinations of SNPs and T2D in a Korean
population. The PPI network analysis identified IL4, INSR, and IRS1 genes as common
genes among the target population datasets.
The study found that the SVM classifiers performed well in predicting T2D cases, with
optimized kernel functions and parameters.
Case-Control Association Study For each SNP, the p-value was calculated based on a
chisquared test
We proposed gene-gene interaction considering candidate genes association study
using SVM based feature selection method in this research
Objectives
The objective of this study was to develop a method to detect multiple disease SNPs,
possibly on different chromosomes, using SVM and to identify potential disease markers
associated with a high susceptibility to T2D.
Methods
The study uses a Support Vector Machine (SVM) algorithm to identify combinations of SNPs
associated with T2D. The SVM is trained and tested using a 10-fold cross-validation test.
The study used a SVM approach with feature selection to identify the best combination of
SNPs associated with T2D. The dataset consisted of 408 SNP data distributed over 87 T2D-
related genes. The PPI network analysis was conducted using the GenomeNetwork
platform.
The study used forward selection with SNP genotype features to find a smaller feature set,
and a 10-fold cross-validation test to evaluate the performance of SVM classifiers. The
kernel functions and parameters for the classification algorithms were optimized during the
10-fold cross-validation tests.
Results
The study achieves a prediction rate of 65.3% with a combination of 14 SNPs in 12 genes
using an SVM algorithm. The study also finds that sub-population datasets of men and
women yield slightly better prediction rates of 70.9% and 70.6%, respectively.
The study found novel associations between combinations of SNPs and T2D in a Korean
population. The PPI network analysis identified IL4, INSR, and IRS1 genes as common genes
among the target population datasets.
Conclusions
The study demonstrates the feasibility of incorporating an SVM algorithm into a case-
control study to identify combinations of SNPs associated with T2D. The study suggests that
including other important genes and clinical factors may improve the prediction rate in the
future.
The study demonstrated the feasibility of incorporating SVM into a case-control study to
identify gene-gene interactions associated with T2D. The results suggest that the method
can be useful for identifying potential disease markers associated with a high susceptibility
to T2D.
The study does not provide explicit conclusions, but the results suggest that the SVM
classifiers performed well in predicting T2D cases.
Analysis
Research comparison
Limitations
This candidate-gene based analysis may have some limitations to detect association from
the small population size and the limited number of candidate genes . The result of this
classical case-control association study may need a further replication study with a large
independent target population of cases and controls for establishing the credibility of a
genotype-phenotype association. We used this classical association study result in the
process of sub-dataset filtering based on the genotype-based pvalue range
Another reason is the limitation of the forward selection method to find the best
combination of SNPs
Study subjects
462 cases
Conclusions Support Vector Machine based feature selection method in this research found
novel association between combinations of SNPs and T2D in a Korean population. Data and
Data Preprocessing Our dataset consists of 408 SNP data distributed over putative 87
T2D-related genes in 462 cases (patients) and 456 normal controls. The T2D cases,
confirmed and diagnosed in the Ansan and Ansung cohort study area, were identified from
the Korean Health and Genome Study (KHGS)
Another reason is the limitation of the forward selection method to find the best
combination of SNPs. The entire set of 408 SNPs may contain noise SNPs for forward
selection, and some useful SNPs in the ideal combination may be removed from the very
restricted p-valuebased filtered SNP dataset (e.g., 24 SNPs with p < 0.05). The best
prediction rate of the SVM classifier with a RBF kernel function was 65.3% with 14 SNPs
including a combination from the 240 SNPs with p < 0.6 (Table 2 and Table 3)
The entire set of 408 SNPs may contain noise SNPs for forward selection, and some useful
SNPs in the ideal combination may be removed from the very restricted p-valuebased
filtered SNP dataset (e.g., 24 SNPs with p < 0.05). The best prediction rate of the SVM
classifier with a RBF kernel function was 65.3% with 14 SNPs including a combination from
the 240 SNPs with p < 0.6 (Table 2 and Table 3). In table 3, rs343 was reported the
association with T2D [35], and two of SNPs (rs2070011 and rs2243250) were reported
with not T2D but myocardial infarction [36,37]
Study compliance
Ethics
This work was supported by an intramural grant from the Korea National Institute of
Health, Korea Center for Disease Control and Prevention, Republic of Korea (4845-301-210)
This work was also supported by Korea Research Environment Open NETwork. Author
Details Division of Bio-Medical Informatics, Center for Genome Science, National Institute of
Health, Korea Center for Disease Control and Prevention, 194, Tongil-Lo, Eunpyung-Gu,
Seoul 122-701, Republic of Korea Received: 23 June 2009 Accepted: 23 April 2010
Published: 23 April 2010
T©BhM2isC0ia1Gsr0etainBncelaeOtnicipseseta2nav0laA;1icl0iac,be1slne1ss:f2aerr6eotmi
Bcil:oehMdttiespdt:r/iC/bweunwttewrda.lbuLinotdme. erdthcentetrraml.csoomf
t/h1e47C1re-2a1ti5v6e/C11o/m26mons Attribution License
(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly cited
Funding
This work was supported by an intramural grant from the Korea National Institute of
Health, Korea Center for Disease Control and Prevention, Republic of Korea (4845-301-210)
This work was also supported by Korea Research Environment Open NETwork. Author
Details Division of Bio-Medical Informatics, Center for Genome Science, National Institute of
Health, Korea Center for Disease Control and Prevention, 194, Tongil-Lo, Eunpyung-Gu,
Seoul 122-701, Republic of Korea Received: 23 June 2009 Accepted: 23 April 2010
Published: 23 April 2010
T©BhM2isC0ia1Gsr0etainBncelaeOtnicipseseta2nav0laA;1icl0iac,be1slne1ss:f2aerr6eotmi
Bcil:oehMdttiespdt:r/iC/bweunwttewrda.lbuLinotdme. erdthcentetrraml.csoomf
t/h1e47C1re-2a1ti5v6e/C11o/m26mons Attribution License
(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly cited
Abstract
Abstract Background Type 2 diabetes mellitus (T2D), a metabolic disorder characterized by
insulin resistance and relative insulin deficiency, is a complex disease of major public health
importance. Its incidence is rapidly increasing in the developed countries. Complex diseases
are caused by interactions between multiple genes and environmental factors. Most
association studies aim to identify individual susceptibility single markers using a simple
disease model. Recent studies are trying to estimate the effects of multiple genes and multi-
locus in genome-wide association. However, estimating the effects of association is very
difficult. We aim to assess the rules for classifying diseased and normal subjects by
evaluating potential gene-gene interactions in the same or distinct biological pathways.
Results We analyzed the importance of gene-gene interactions in T2D susceptibility by
investigating 408 single nucleotide polymorphisms (SNPs) in 87 genes involved in major
T2D-related pathways in 462 T2D patients and 456 healthy controls from the Korean
cohort studies. We evaluated the support vector machine (SVM) method to differentiate
between cases and controls using SNP information in a 10-fold cross-validation test. We
achieved a 65.3% prediction rate with a combination of 14 SNPs in 12 genes by using the
radial basis function (RBF)-kernel SVM. Similarly, we investigated subpopulation data sets
of men and women and identified different SNP combinations with the prediction rates of
70.9% and 70.6%, respectively. As the high-throughput technology for genome-wide SNPs
improves, it is likely that a much higher prediction rate with biologically more interesting
combination of SNPs can be acquired by using this method. Conclusions Support Vector
Machine based feature selection method in this research found novel association between
combinations of SNPs and T2D in a Korean population.
Bibliography
1. Ban, H.-J., Heo, J. Y., Oh, K., & Park, K.-J.. (2010). Identification of Type 2 Diabetes-
associated combination of SNPs using Support Vector Machine. BMC Genomic Data, 11.
https://doi.org/10.1186/1471-2156-11-26