Analysis of Classification Techniques For Medical Data: April 2018
Analysis of Classification Techniques For Medical Data: April 2018
Analysis of Classification Techniques For Medical Data: April 2018
net/publication/326380394
CITATIONS READS
0 318
2 authors, including:
Dr. P. Thangaraju
Bishop Heber College
21 PUBLICATIONS 93 CITATIONS
SEE PROFILE
All content following this page was uploaded by Dr. P. Thangaraju on 13 July 2018.
Abstract: Data mining is widely used in many fields for II. LITERATURE SURVEY
analysing large data from different perspective and it helps us
to extract and summarize useful information. Modern R. Subhashini [3] et al., proposed a novel classification
medicine generates large amount of information stored in the strategy to predict the chronic kidney disease using Optimal
medical database. Analysing these data manually is a complex Fuzzy-K nearest neighbour technique. The performance of
and tedious process. It is necessary to develop a model which fuzzy is made optimum by tuning the membership functions
helps to extract useful knowledge and provide scientific utilizing the Bat optimization algorithm. Then the OF is
decision-making for the diagnosis. Early diagnosis of diseases utilised to measure the similarity in the KNN for the
is important need in healthcare industry for giving treatment. classification of disease. she compared the OF-KNN algorithm
Data mining plays an important role in analysing medical with ANN, SVM and KNN. she observed OF-KNN
data. Tools and various algorithms available in data mining outperformed and given best result.
help us to develop models that can assist us to make accurate
and timely decisions. Classification is the task of generalizing S. Vijayarani [4] et al., made the comparative analysis of
known structure which can be applied to new data. In this classification algorithms such as Naïve Bayes and Support
paper, an analysis of various classification algorithms is done Vector Machine. She hasused the synthetic kidney
which are used for analysing medical data. The accuracy of function test dataset with six attributes and 584 instance
the classification is mainly focused in this survey paper.
for analysing kidney disease. The implementation is
done using MATLAB. Based on the performance
measures of classification accuracy, error rate and
Keywords— Datamining, Classification,ANN,Decision tree. execution time it was observed that SVM is better when
compared with Naïve Bayes.
I. INTRODUCTION
With the increase in growth of population, there is a Manish Kumar[5] et al., made a comparative analysis on
significant expansion in the health issues. Many new types of various classification algorithm like Random Forest
diseases and its symptoms have been identified. Numerous classifiers, Sequential Minimal Optimization, Naïve Bayes,
diseases are strongly associated with a common symptom Radial Basis Function and Multilayer Perceptron Classifier
which makes it complicated for the doctors to diagnose the and Simple Logistic for the predicting Chronic Kidney
exact diseases precisely on one go. This is where data mining Disease . He used UCI chronic kidney disease dataset with 25
comes to help. it helps in diagnosing the disease by analysing attribute and 400 instance. The obtained result showed that
the patients data. Even-though the prediction is not extremely the Random Forest classifier outperformed all other
accurate, it gives the doctor a concise idea what the disease classifiers in terms of Area under the ROC curve (AUC),
might be. Thus, Data mining is not a substitution to doctors as accuracy and MCC with values 1.0, 1.0 and 1.0 respectively.
an alternative, it is a tool which support them to identify the
diseases in advance stages [1]. Prerana [6] et al., proposed a systematic approach for earlier
diagnosis of Thyroid disease using back propagation algorithm
Data mining plays an important role in analysing medical in neural network. He has used UCI dataset with 29 attributes.
data. It helps us to create the models which assist us to analyse he has implemented the predictive neural model which works
the raw data containing the patients symptoms and predict the in two phase , one is back propagation and second phase is
disease.It can also improve the management quality of updation of weights to classify the thyroid disease in
hospital. The medical information may be redundant, multi- MATLAB Neural Network Toolbox software. He has taken
attributed, inconsistent, incomplete and closely related with FTI values as input to classify in three different classes with
time. The key techniques of medical data mining involves in values 1,2,3. The Training performance plots for gradient
pre-processing of medical data, analysing different pattern and descent training algorithm and Levenberg algorithm,
resource, applying mining algorithms and predicting the presenting variation of MSE verses numbers of epochs and
reliability of mining results[2]. plots for the variation of error gradient values during training
process . It has been observed that Levenberg Marquardt
method has shown a better training performance for achieving Hui-Ling Chen[11] et al., proposed a three-stage expert
the set target in 59 epochs and gradient decent is showing a System (FS-PSO-SVM) based on a hybrid support vector
poor performance as it is unable to achieve the set target value machines approach for diagnosing thyroid disease. The first
of 0.0001 in 1000epochs.
stage (FS) aimed at constructing diverse feature subsets with
different discriminative capability. In second stage, the feature
K. Saravana Kumar[7] et al., made a comparative analysis of
subsets obtained are used for training designed SVM classifier
K- Nearest Neighbor and Support Vector Machine in accuracy
for training an optimal predictor model whose parameters are
of predictions of Hypothyroid. He has applied SVM and KNN
optimized using particle swarm optimization (PSO). Finally,
methods to the collected data to predict hypothyroid and
the obtained optimal SVM model used for diagnosing the
observed the prediction accuracy is 94.4336 in SVM and
96.3430 accuracy in KNN. As the difference / variance is thyroid disease using the most discriminative feature subset
1.9094. Therefore, he has concluded that KNN performs better and the optimal parameters. The proposed system has
than the SVM while predicting thyroid disease. achieved the highest classification accuracy reported so far by
10-fold crossvalidation method, with the mean accuracy of
97.49% and with the maximum accuracy of 98.59%.
Anurag Upadhayay[8] et al., made an empirical comparison
between two algorithms C4.5 and C5.0 of Decision Tree Ali Keles[12] et al.,proposed An Expert system for diagnosing
technique in predicting thyroid disease. He has used UCI thyroid disease(ESTDD).they found fuzzy rules by using
neuro fuzzy method, which will be in ESTDD system. The
dataset with 29 attributes and worked with 400 patients
accuracy of ESTDD is 95.33% while diagnosing thyroid
records. He observed that Running Time of C5.0 was Small as
diseases.
compared to C4.5,Tree size of C4.5 was very large when
compared to C5.0, After Pruning C5.0 Tree generated more
P. Thangaraju[13] et al., proposed a model to analyse the data
accurate rule set, Train error in case of c5.0 was small when of liver diseases using particle swarm optimization algorithm
compared to the C4.5,Rule set Generated by the C5.0 (PSO) with KStar Classification for classifying the existence
algorithm is 6 and the confidence level of the rules was more of disease. He has used liver disorder UCI dataset with 3245
than 95%.so he concluded that C5.0 is better when compared instance and seven attributes. The model used to find the
to C4.5. chances of occurrence of liver diseases on the basis of input
variables by building an intelligent system based on feature
selection.
V Prasad[9] et al., proposed a Health Diagnosis Expert
Advisory System on Trained Data Sets for predicting the level
Amit Kumar Dewangan [14] et al., proposed CART-Info Gain
of Hyperthyroid in human body. The EAS system is and CART- Gain Ratio feature selection technique for
developed by using Data Matching System which is applied classification of thyroid disease. He has chosen CART
on Training Data Set to identify the relevant disease according algorithm as best model as it provided highest accuracy of
to the data of the symptoms specified in the knowledge base. 99.47%. he has applied Info Gain and Gain Ratio feature
Once the user enters the details of the symptoms and submits selection technique to CART to increase its performance. Info
then it predicts the Human disease. One limitation in his work Gain and Gain Ratio feature selection technique is used to
reduce the irrelevant features from original data set. After
is that if the user enters wrong detail it may misguide them by
observation, CART-Info Gain and CARTGain Ratio gave
predicting wrong disease. 99.47% and 99.20% accuracy with 25 and 3 feature
respectively.
Wei-Wen Chang[10] et al., combined the main concepts of
estimation of distribution algorithms and immune algorithms T.Karthikeyan[15] et al., proposed PCA-NB algorithm to
to form a hybrid algorithm called immune-estimation of improve the prediction accuracy of the classification. He has
distribution algorithms (IEDA) and applied it To Classify UCI applied Principal Component analysis (PCA) as a feature
thyroid gland data set. They have compare the results between evaluator and ranker for searching method. Naive Bayes
IEDA and traditional genetic algorithms. Based on the results, algorithm is used as a classification algorithm. He has used
hepatitis patients UCI dataset with 155 instances and 19
they concluded their research is better than traditional genetic attributes. PCA-NB improved the accuracy of classification to
algorithm including accuracy, type I error and type II error. 89%.
Hyperthyroid, International Journal of Computer Applications,2014 [14] Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem
ISSN:0975 – 8887 Kumar,Classification of Thyroid Disease with Feature Selection
Technique, International Journal of Engineering and Techniques,June
[10] Wei-Wen Chang , Wei-Chang Yeh, Pei-Chiao Huang ,A hybrid 2016 , ISSN: 2395-1303
immune-estimation distribution of algorithm for mining thyroidgland
data,2010,Expert Systems with Applications 37 (2010) 2066–2071. [15] T.Karthikeyan, P.Thangaraju, PCA-NB Algorithm to Enhance
the Predictive Accuracy, International Journal of Engineering and
[11] Hui-Ling Chen , Bo Yang , Gang Wang , Jie Liu ,Yi-Dong Chen Technology (IJET), MARCH 2014,ISSN : 0975-4024.
, Da-You Liu,A Three-Stage Expert System Based on Support Vector
Machines for Thyroid Disease Diagnosis, Springer Science+Business [16] Parvez Ahmad, Saqib Qamar, Syed Qasim Afser Rizv,
Techniques of Data Mining In Healthcare: A Review, International
Media J Med Syst 2012,ISSN 1953–1963.
Journal of Computer Applications, June 2015,ISSN :0975 – 8887.
[12] Ali Keles, Aytu¨rk Keles , ESTDD: Expert system for thyroid
diseases diagnosis,2008, Expert Systems with Applications 34 242– [17] https://en.wikipedia.org/wiki/Decision_tree
246