NB 1
NB 1
NB 1
net/publication/336436404
CITATIONS READS
202 3,719
3 authors, including:
Ramya. G. Franklin
Sathyabama Institute of Science and Technology
17 PUBLICATIONS 270 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ramya. G. Franklin on 03 August 2021.
Abstract—Data mining, a great developing technique that several probabilities [5]. Concerning the heart disease
revolves around exploring and digging out significant information prediction numerous systems are being recommended which
from massive collection of data which can be further beneficial in are being deployed by the means of various techniques and
examining and drawing out patterns for making business related algorithms. Gaining quality service at affordable price remains
decisions. Talking about the Medical domain, implementation of the prime and challenging concern for the healthcare
data mining in this field can yield in discovering and withdrawing establishments. For offering quality services at par, there must
valuable patterns and information which can prove beneficial in be accurate diagnosis of the patients along with effective dosage
performing clinical diagnosis. The research focuses on heart
of medicines. Low quality clinical diagnosis and treatment can
disease diagnosis by considering previous data and information.
yield in undesired and inadequate results. One solution for cost
To achieve this SHDP (Smart Heart Disease Prediction) is built via
Navies Bayesian in order to predict risk factors concerning heart
cutting by Healthcare establishments can be utilization of
disease. The speedy advancement of technology has led to computer generated data or use of DSS (decision support
remarkable rise in mobile health technology that being one of the systems). Usually the Healthcare sector involves abundant of
web application. The required data is assembled in a standardized data related to patients, various diagnosis of the diseases,
form. For predicting the chances of heart disease in a patient, the resource management etc. This information or data must be
following attributes are being fetched from the medical profiles, further broken down by the Human services. Using
these include: age, BP, cholesterol, sex, blood sugar etc... The computerized system, patients treatments records can be stored
collected attributes acts as input for the Navies Bayesian and using mining methods one can acquire significant
classification for predicting heart disease. The dataset utilized is information and queries concerning the hospital. Supervised
split into two sections, 80% dataset is utilized for training and rest and unsupervised learning are the 2 data mining methods.
20% is utilized for testing. The proposed approach includes Supervised learning involves usage of training for learning
following stages: dataset collection, user registration and login model parameters where else no training set is required in
(Application based), classification via Navies Bayesian, prediction unsupervised learning. Classification and prediction are the
and secure data transfer by employing AES (Advanced basic approach of data mining. The Classification models helps
Encryption Standard). Thereafter result is produced. The in classifying distinct, disorganized data values on the other
research elaborates and presents multiple knowledge abstraction hand prediction model anticipated values that are continuous.
techniques by making use of data mining methods which are Thereafter making use of the analysis result for offering
adopted for heart disease prediction. The output reveals that the
web/mobile application to the users. Following are the stages in
established diagnostic system effectively assists in predicting risk
factors concerning heart diseases.
the proposed approach: user registration and login based on
Application, dataset collection, classification via Navies
Keywords—Data Mining; Smart Heart Disease Prediction Bayesian, prediction and secure data transfer by the means of
(SHDP); Web And Mobile Application; Navies Bayesian; Advanced AES (Advanced Encryption Standard) and lastly output in PDF
Encryption Standard (AES); Data Collection; Classification; format. AES helps in transmitting user data to the database in a
Prediction. secured manner. From the security point of view, patient’s
personalized data is replaced with some mock values. The study
I. INTRODUCTION considers and employs medicine datasets performances for
Data mining process involves mining/extracting of very predicting Heart disease in contrast to other Machine Learning
significant, hidden and valuable information from large techniques. The proposed technique assures to be extremely
databases [1]. Usually the Healthcare sector involves abundant significant and effective in handling classification, resembling
of data related to patients, various diagnosis of the diseases ML (Machine Learning) with respect to Naive Bayesian model.
etc… [2]. Nowadays the hospitals are adopting the culture of Following represent journal classification: Section 2
hospital IMS (information management systems) in order to illustrate work of previous author. Section 3 put forth the
handle their or patients data systematically and effectively. [3]. proposed system of Heart disease classification and prediction
Large quantity of data is produced by such systems that is and overview of various levels. Section 4, presents the
represented using charts, numbers, text and images. Though experimental outcome. Lastly, Section 5 presents the
such sort of data is hardly employed for making any clinical conclusion and proposes research work for future.
decisions[4]. The current research emphasizes on heart disease
diagnosis. Various techniques of data mining have been
incorporated for diagnosing the disease thereby obtaining
II. RELATED WORK for exploring issues in terms of complexity and analyzing the
Kaan Uyar et.al, proposes some of the computational patients effectively. CVD (Cardiovascular disease) stands for
techniques for analyzing heart diseases be employing RFNN the scientific term symbolizing the heart diseases. The dataset
(recurrent fuzzy neural networks) and Genetic algorithm which fetched referring the medical test results of around 303
must be assisted by medical experts for catering several angiography patients (from Cleveland Clinic, Ohio), were
parameters that may impact the decision making process. A imbibed on around 425 patients (from Hungarian Institute of
total of 297 instances of patient data are taken into account, Cardiology-Budapest, Hungary) having a frequency of 38%
amidst which 45 are assigned for testing and 252 are employed [12].
for training. The testing yields an accuracy of 97.78%. With the Ashok Kumar Dwivedi presents a significant model that
help of heart disease testing dataset, investigation are carried identifies occurrence of heart disease in thousands of samples
out successfully. Following factors are calculated: accuracy, immediately. Herein capability of 6 machine learning
RMSE (root means square error), probability of the techniques is being assessed for heart disease prediction. Using
misclassification error, specificity, sensitivity, precision and F- 8different classification performance indices, performance of
score [6]. methods is evaluated. Also by using the receiver operative
Anuradha Lamgunde et.al., recommends the genetic characteristic curve, the methods are evaluated. By deploying
algorithm using back propagation technique approach to predict logistic regression, high classification accuracy of 85 % is
the heart disease effectively. The research utilizes numerous achieved, yielding in 89% sensitivity and 81% specificity [13].
input features for examining the system for heart disease AnimeshHazra et.al, discusses that, day to day abundant
prediction. Overall 13 medical attributes are utilized by the amount of information is produced by the health care sector
system like Gender, BP, cholesterol etc… for predicting the maximum of which remains unexplored and unused. But there
chances of a patient contracting heart disease [7]. does not exist adequate effective tools for extracting
Theresa Princy. R et.al., conducts a survey related to various meaningful information from such data repository for carrying
classification techniques that can predict risk factors related to out detection of clinical diseases or any other task. The work
every individual considering the factors such as gender, age, targets towards summarizing few prevailing researches on heart
BP, cholesterol, pulse rate. By means of various data mining disease prediction via data mining techniques, examining
classification techniques like NB-Naïve Bayes, Decision Tree hybrid of different mining algorithms and deriving a conclusion
Algorithm, KNN and NN-Neural Network etc., patients risk against the best effective technique(s) [14].
level can be classified. Since a lot of attributes are taken into Ashish Chhabbi et al. have performed study on several data
account, high accuracy is achieved for the risk level [8]. mining techniques for withdrawing and exploring unknown
S. Indhumathi et.al., recommends the Naïve Bayes patterns from the databases which can assist in attending
algorithm for predicting high risk of heart disease in patients. complicated inquiries concerning the heart disease prediction.
For the training set, pre-processed data is considered. Collection of dataset is done using the UCI repository. Naive
Classification and prediction forms the prime data mining Bayes and improvised k-means algorithm have been deployed.
phases. The classification phase involves pre-processing According to the results generated, advanced k-means
wherein the following tasks are performed: data cleaning, algorithm, yields in high accuracy in contrast to simple k-means
normalization, data reduction etc. The prediction phase (in which no: of clusters are already defined) [15].
involves classification and prediction of disease types. Hence Kamal Kant et al. recommends a heart disease prediction
training set includes disease type and the testing set is built model by employing Naïve Bayes data mining technique. The
using the questions. The output generated is forwarded to the technique is a statistical classifier that doesn’t make the
doctor/specialist [9]. attributes dependent on each other. To decide upon the class,
S. Dangare.et.al., presents the main three layers: input, the posterior probability must be highly raised. Here, Naïve
hidden and the output layer. The input is fed to the input layer Bayes classifier yields great performance and is quiet efficient
and the output layer projects the result acquired. Thereafter, for predicting disease in statistical probability and real time
comparison of both actual and the expected output is expert system, then comes the Neural Network and Decision
performed. Using the back propagation, error can be trees [16].
determined and weight amidst the output and prior hidden Sharan Monica L et.al carries out a survey on prevailing
layers can be adjusted. After completion of back propagation, techniques of KDD (knowledge discovery in databases by
forward process commences and is carried forward till the error adopting the following mining techniques namely - J48, NB
is reduced [10]. Tree and simple CART for accurately predicting heart disease
K. Pramanik.et.al., recommends a Hybrid Algorithm that using least no: of attributes in the WEKA tool. J48 being an
being a blend of ID3 and KNN algorithm and are adopted for open source Java application of C4.5 that acquires information
predicting heart disease. The data is pre-processed using the for making decisions. Naive Bayes (NB) classifier builds
KNN algorithm hence it’s also referred to as pre-processed models mostly for continuous dataset using predictive
algorithm. The pre-processed data forms the training set which proficiencies. Data relationships that are significant can be
is then classified in a form of tree structure. For predicting the promptly projected using CART (Classification and Regression
heart disease, ID3 algorithm is implemented for the classifier. Trees). The above mentioned 3DT algorithms are deployed
By using the KNN Algorithm, classification of incorrect values using WEKA. CART exhibited an accuracy of 92.2% and J48
is performed [11]. was the fastest one, framed in just 0.08 sec [17].
RishabhSaxena et.al, discusses that Heart diseases have SumitraSangwan et.al have built a hybrid algorithm that
become quiet rampant and common in the today’s society. The employs k-means and A-priori algorithm that are capable of
HDD (Heart disease dataset) of Cleveland is taken into account extracting abundant data along with significant information.
Initially, clustering is performed via k-means clustering
algorithm. Thereafter, frequent item-sets are determined using interventions becomes quiet complicated and astonishing.
A-priori algorithm along with extraction of frequent term-sets Secure execution and analysis exhibits Cloud computing
for Boolean association rule. The approach of "bottom up" is feasibility via Naive Bayes and AES. By making use of social
imbibed wherein frequent subsets are expanded with one single networking site, restrictive patient’s information can be shared
item at a time and testing of entire groups of candidates is with patient’s family members. The AES encryption algorithm
performed against the data. With the output generated its safeguards sensitive but accessible patient’s information [21].
elucidated that clustering followed by A-priori results in high
performance for heart disease prediction [18]. III. PROPOSED WORK
Rishi Dubey et.al carries out the heart disease prediction by A. Overview
surveying various data mining techniques. It was exhibited As per today’s advanced and hi-tech living style, majority
from various researches that considering the ‘accuracy’ of the people are contracting heart disease which gives a sudden
parameter, hybrid techniques surpasses a single classification jolt to an individual that at times one lacks time to get treated
technique. It was ascertained that for achieving prediction, immediately. Hence its very much essential that timely and
neural network proves to be quiet effective. The system yields early diagnosis is performed which being quiet challenging
in assuring outcome when trained well using genetic concern for the medical association. Poor and incorrect analysis
algorithms. Moreover, appropriate treatment could be selected carried out by the hospital can being down its reputation and
for patients in the near future rather than merely anticipating the working. The research focuses on to build cost cutting and
likeness of contracting heart disease in individuals [19]. effective approach by the means of data mining techniques so
that DSS (decision support system) can be enhanced. Predicting
Monika Gandhi et.al, explores and carries a study for heart disease with the help of numerous attributes/symptoms is
determining how data mining techniques along with other quiet complicated. The present research utilizes Naives
techniques helps in drawing out unknown patterns from large Bayesian - data mining classification technique for effectively
databases enabling the healthcare establishments in decision enabling heart disease diagnosis and thereby offering
making. The data mining classification techniques such as appropriate treatment. Supervising different medical factors
decision trees, Neural network and Naive Bayes are employed and post operational period stands very crucial. AES encrypts
for data discovery, extraction and classification. [20]. the patients’ records/data and save it in database. The results
Arun R et.al discusses about CVD or cardiovascular illness. generated reveals that the diagnostic system built successfully
Presence of numerous sort of symptoms and strong predict the risk level associated with heart diseases.
Mobile
Application
Login into Web application
Collection of Data
Prediction
Accuracy
S. No No. of Techniques Time(s)
(%) Figure 3 mentioned above depicts a security comparison
Sequential Minimal between AES and PHEA (Parallel Homomorphic Encryption
1 84.07 0.02 Algorithm). The recommended AES algorithm yields in greater
Optimization (SMO)
in contrast to remaining techniques.
2 Bayes Net (BN) 81.11 0.02
SIMULATION RESULTS
Multi Layer
3 77.4 0.75 Below the figures 4 to 8 analysis of design and
Perception (MLP)
implementation of smart heart disease prediction is proposed.
4 Navies Bayesian (NB) 89.77 0.01 This system continuously monitors the coronary heart patient
and updates the data to the object converse data base and if any
abnormalities are observed.
90
80
70
60
Performance
50
40 Accuracy (%)
30 Time (s)
20
10
0
SMO BN MLP NB
No of Techniques
Advanced Encryption
1 98.2
Standard (AES)
2 PHEA 92.21
100
98 Fig 6: Patient has to enter the user id and password to login
Security (%)
96
94 AES
92 PHEA
90
88
AES PHEA
No. of Techniques