Ieee Conference Paper Template
Diabetes refers to chronic conditions characterized by
increased level of blood glucose commonly referred to as
blood sugar. This can sometimes cause life threating health
problems and can cause damage to the kidneys, heart, eyes
and nerves. Diabetes is one of the biggest public health
concerns in the world, and it has a big impact on both public
health and the economy. The main two types are Type 1
which is caused when pancreas either produce little insulin or
no insulin at all and Type 2 is caused when the cells does not
respond to the insulin .As per the reports of world health developing intelligence through experience
organization around 422 million people throughout the word
are diabetic and 1.5 million deaths are reported every year
due to diabetes. Figure 1: Diabetes cases around the world and future predictions
As per the agenda of sustainable development the member
states have to reduce the mortality from NCD,s including II. REVIEW OF LITERATURE
diabetes - by one-third achieve universal health coverage,
Artificial Neural Networks (ANN’s) and Bayesian
and provide access to affordable essential medicines
Networks were utilized in the categorization of diabetes as
As per the report of International Diabetics Federations in
well as cardiovascular diseases and their respective levels of
its10th Atlas edition 537 adults in the age group of 20-79 are
accuracy were evaluated.. This paper mentions the review of
living with diabetics and it is estimated that this number may
some papers from 2010 to 2021.Mainly the multilayer feed
increase to 643 million by 2030 and can go up to 783 million
forward neural network and Naive Bayesian network have
by 2045. In the year 2021 diabetes is responsible for 6.7
reported the good accuracy .
million that is equivalent to 1 death every 5 seconds.
[2] proposed the system for diabetes prediction using
Diabetes cases around the world are shown in Figure 1 [1].
AdaBoost algorithm. The proposed system uses the series of
The Figure mentions the cases in year 2021 and predicted
base classifiers comprised of (SVM) Support Vector
cases for 2030 and 2045.
Machine, Naive Bayes and decision tree. The global data for
Computer-Aided Diagnosis (CAD) is a rapidly expanding
the Prima Indian dataset were retrieved from the repository at
and dynamic field of research in the medical industry.
the “University of California, Irvine”. The retrieved data set
Machine learning researchers have used number of machine
is used for training and testing purpose. The data set consists
learning techniques for disease perception and diagnosis. By
of the 768 record and 9 attributes. For validation purpose the
local data set is used. Different performance metrics are used 10 fold cross validation was used to assess the performance
to evaluate the performance of the proposed model which of classification
includes accuracy, sensitivity, specificity and error rate.
[3] two data sets have been taken one is breast cancer and Figure: Workflow of proposed approach
other is diabetics . The classification of attributes have been algorithms, the results shows that Naïve Bayes achieved
done using classification algorithm on Weka tool. Several highest performance.
classification algorithms were applied on breast cancer and [5] Presents the idea of detecting diabetes using machine
diabetes data sets like J48, SMO ,Naive Bayes, SMO, MLP. learning techniques on PIMA Indian data set. SVM and DT
In case of diabetes data set SMO classification gives the best classification algorithms have been used. The framework
accuracy of 76.80 and for breast cancer data set J48 gives the uses the R programming. The SVM reported the
accuracy best of 74.28 %. The performance evaluation is classification accuracy of 82%. However the paper does not
done by using several performance metrics like Precision, the mention the validation approach used which is critical
Recall and other metrics. parameter in machine learning tasks.
In [4] machine learning framework for diabetes prediction In [6] a system is proposed for diabetes analysis and
has been proposed in which several classification algorithms prediction. The systems use two data sets of Prime Indian
has been used on Prima Indian Diabetes data set. The Diabetes Dataset and data set from 130 US hospitals. The
algorithms include Decision tree artificial neural network techniques used for analysis are “K nearest neighbor, Naive
(ANN), logistic regression, Random forest and naive Bayes. Bayes, Random forest and decision tree”. This paper also
uses ensemble method which shows the good results.
In [7] a technique for diagnosis of type 2 diabetes is
proposed. The data set of Asian diabetic patients Pima was
used in the research. The data set contains 768 records in
which 500 is of healthy woman and 268 are the woman who
suffered from type 2 diabetes. The studies use eight features
for the diagnosis of diabetes. The accuracy of model was
reported to be 84%
In [8] authors mentioned that ensemble voting classifiers for
the prediction of diabetes with an accuracy of 80% and 81%
for a data set of pima Indian diabetics. The method was
developed by using 10-fold cross validation and by splitting
data into training which is 70% and testing set which is 30%.
The paper [9] showed that the dataset analysed for the
purpose of diagnosis used to make a diagnosis needed to be
pre-processed and that missing values needed to be filled in..
The Modified training set improved accuracy while requiring
less time to train the set.
A new approach, proposed in [10], makes use of predictive
analysis to zero in on the factors that contribute to the early
diagnosis of diabetes mellitus. For diabetes data analysis, the
Random Forest method and Decision Tree algorithm have the
highest sensitivity and specificity, respectively, of 98.20 %
and 98.00%. The naive Bayesian result claims a best
accuracy of 82.30%. To increase classification accuracy, the
research additionally generalizes the selection of appropriate
characteristics from the dataset.
Machine learning algorithms such as decision trees, neural
networks, and random forests have all been utilized in the
process of diabetes prediction. The data used in the study is
obtained from hospital in china containing 14 features, the
principal component analysis and minimum redundancy
techniques are used for dimensionality reduction. Random
forest algorithm achieved highest accuracy of 80.84 % [11].
Performance Metrics
Accuracy Precision Recall F1-Score
RF 0.79 0.8 0.89 0.84
DT 0.74 0.78 0.82 0.8
XGBoost 0.74 0.77 0.84 0.62
SVM 0.69 0.63 0.63 0.62
KNN 0.70 0.63 0.63 0.63
Accuracy (A): It is the total number of instances that are This paper presents the scenario of diabetes and its future
correctly classified by the algorithm. Mathematically it is predictions. In this paper various machine learning
written as: algorithms are applied in order to develop a model for
diabetes detection. The Random forest classifier achieved the
(1) highest accuracy but the most important thing is that several
papers have also used the different classifier and did not
Sensitivity: The sensitivity of a machine learning model mention about the problems about the data set. We want to
refers to its ability to detect positive instances correctly. It is mention here that it seems that there are no missing values in
also known as true positive rate: the data but the data set have several records where the value
is 0 in which we replaced using mean value and also the data
has the class imbalance problem for which we have used
(2) Synthetic Minority Oversampling Technique.
