Design And Implementing Heart Disease Prediction Using Naives Bayesian

Conference Paper · April 2019

DOI: 10.1109/ICOEI.2019.8862604


202 3,719

3 authors, including:

Ramya. G. Franklin
Sathyabama Institute of Science and Technology


Design And Implementing Heart Disease Prediction

Using Naives Bayesian
Anjan Nikhil Repaka Sai Deepak Ravikanti Ramya G Franklin
Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
Sathyabama Institute of Science and Sathyabama Institute of Science and Sathyabama Institute of Science and
Technology Technology Technology
Chennai, India Chennai, India Chennai, India
repakaanjannikhil@gmail.com ravikantisaideepak@gmail.com Mikella.prabu@gmail.com

Abstract—Data mining, a great developing technique that several probabilities [5]. Concerning the heart disease
revolves around exploring and digging out significant information prediction numerous systems are being recommended which
from massive collection of data which can be further beneficial in are being deployed by the means of various techniques and
examining and drawing out patterns for making business related algorithms. Gaining quality service at affordable price remains
decisions. Talking about the Medical domain, implementation of the prime and challenging concern for the healthcare
data mining in this field can yield in discovering and withdrawing establishments. For offering quality services at par, there must
valuable patterns and information which can prove beneficial in be accurate diagnosis of the patients along with effective dosage
performing clinical diagnosis. The research focuses on heart
of medicines. Low quality clinical diagnosis and treatment can
disease diagnosis by considering previous data and information.
yield in undesired and inadequate results. One solution for cost
To achieve this SHDP (Smart Heart Disease Prediction) is built via
Navies Bayesian in order to predict risk factors concerning heart
cutting by Healthcare establishments can be utilization of
disease. The speedy advancement of technology has led to computer generated data or use of DSS (decision support
remarkable rise in mobile health technology that being one of the systems). Usually the Healthcare sector involves abundant of
web application. The required data is assembled in a standardized data related to patients, various diagnosis of the diseases,
form. For predicting the chances of heart disease in a patient, the resource management etc. This information or data must be
following attributes are being fetched from the medical profiles, further broken down by the Human services. Using
these include: age, BP, cholesterol, sex, blood sugar etc... The computerized system, patients treatments records can be stored
collected attributes acts as input for the Navies Bayesian and using mining methods one can acquire significant
classification for predicting heart disease. The dataset utilized is information and queries concerning the hospital. Supervised
split into two sections, 80% dataset is utilized for training and rest and unsupervised learning are the 2 data mining methods.
20% is utilized for testing. The proposed approach includes Supervised learning involves usage of training for learning
following stages: dataset collection, user registration and login model parameters where else no training set is required in
(Application based), classification via Navies Bayesian, prediction unsupervised learning. Classification and prediction are the
and secure data transfer by employing AES (Advanced basic approach of data mining. The Classification models helps
Encryption Standard). Thereafter result is produced. The in classifying distinct, disorganized data values on the other
research elaborates and presents multiple knowledge abstraction hand prediction model anticipated values that are continuous.
techniques by making use of data mining methods which are Thereafter making use of the analysis result for offering
adopted for heart disease prediction. The output reveals that the
web/mobile application to the users. Following are the stages in
established diagnostic system effectively assists in predicting risk
factors concerning heart diseases.
the proposed approach: user registration and login based on
Application, dataset collection, classification via Navies
Keywords—Data Mining; Smart Heart Disease Prediction Bayesian, prediction and secure data transfer by the means of
(SHDP); Web And Mobile Application; Navies Bayesian; Advanced AES (Advanced Encryption Standard) and lastly output in PDF
Encryption Standard (AES); Data Collection; Classification; format. AES helps in transmitting user data to the database in a
Prediction. secured manner. From the security point of view, patient’s
personalized data is replaced with some mock values. The study
I. INTRODUCTION considers and employs medicine datasets performances for
Data mining process involves mining/extracting of very predicting Heart disease in contrast to other Machine Learning
significant, hidden and valuable information from large techniques. The proposed technique assures to be extremely
databases [1]. Usually the Healthcare sector involves abundant significant and effective in handling classification, resembling
of data related to patients, various diagnosis of the diseases ML (Machine Learning) with respect to Naive Bayesian model.
etc… [2]. Nowadays the hospitals are adopting the culture of Following represent journal classification: Section 2
hospital IMS (information management systems) in order to illustrate work of previous author. Section 3 put forth the
handle their or patients data systematically and effectively. [3]. proposed system of Heart disease classification and prediction
Large quantity of data is produced by such systems that is and overview of various levels. Section 4, presents the
represented using charts, numbers, text and images. Though experimental outcome. Lastly, Section 5 presents the
such sort of data is hardly employed for making any clinical conclusion and proposes research work for future.
decisions[4]. The current research emphasizes on heart disease
diagnosis. Various techniques of data mining have been
incorporated for diagnosing the disease thereby obtaining

II. RELATED WORK for exploring issues in terms of complexity and analyzing the
Kaan Uyar et.al, proposes some of the computational patients effectively. CVD (Cardiovascular disease) stands for
techniques for analyzing heart diseases be employing RFNN the scientific term symbolizing the heart diseases. The dataset
(recurrent fuzzy neural networks) and Genetic algorithm which fetched referring the medical test results of around 303
must be assisted by medical experts for catering several angiography patients (from Cleveland Clinic, Ohio), were
parameters that may impact the decision making process. A imbibed on around 425 patients (from Hungarian Institute of
total of 297 instances of patient data are taken into account, Cardiology-Budapest, Hungary) having a frequency of 38%
amidst which 45 are assigned for testing and 252 are employed [12].
for training. The testing yields an accuracy of 97.78%. With the Ashok Kumar Dwivedi presents a significant model that
help of heart disease testing dataset, investigation are carried identifies occurrence of heart disease in thousands of samples
out successfully. Following factors are calculated: accuracy, immediately. Herein capability of 6 machine learning
RMSE (root means square error), probability of the techniques is being assessed for heart disease prediction. Using
misclassification error, specificity, sensitivity, precision and F- 8different classification performance indices, performance of
score [6]. methods is evaluated. Also by using the receiver operative
Anuradha Lamgunde et.al., recommends the genetic characteristic curve, the methods are evaluated. By deploying
algorithm using back propagation technique approach to predict logistic regression, high classification accuracy of 85 % is
the heart disease effectively. The research utilizes numerous achieved, yielding in 89% sensitivity and 81% specificity [13].
input features for examining the system for heart disease AnimeshHazra et.al, discusses that, day to day abundant
prediction. Overall 13 medical attributes are utilized by the amount of information is produced by the health care sector
system like Gender, BP, cholesterol etc… for predicting the maximum of which remains unexplored and unused. But there
chances of a patient contracting heart disease [7]. does not exist adequate effective tools for extracting
Theresa Princy. R et.al., conducts a survey related to various meaningful information from such data repository for carrying
classification techniques that can predict risk factors related to out detection of clinical diseases or any other task. The work
every individual considering the factors such as gender, age, targets towards summarizing few prevailing researches on heart
BP, cholesterol, pulse rate. By means of various data mining disease prediction via data mining techniques, examining
classification techniques like NB-Naïve Bayes, Decision Tree hybrid of different mining algorithms and deriving a conclusion
Algorithm, KNN and NN-Neural Network etc., patients risk against the best effective technique(s) [14].
level can be classified. Since a lot of attributes are taken into Ashish Chhabbi et al. have performed study on several data
account, high accuracy is achieved for the risk level [8]. mining techniques for withdrawing and exploring unknown
S. Indhumathi et.al., recommends the Naïve Bayes patterns from the databases which can assist in attending
algorithm for predicting high risk of heart disease in patients. complicated inquiries concerning the heart disease prediction.
For the training set, pre-processed data is considered. Collection of dataset is done using the UCI repository. Naive
Classification and prediction forms the prime data mining Bayes and improvised k-means algorithm have been deployed.
phases. The classification phase involves pre-processing According to the results generated, advanced k-means
wherein the following tasks are performed: data cleaning, algorithm, yields in high accuracy in contrast to simple k-means
normalization, data reduction etc. The prediction phase (in which no: of clusters are already defined) [15].
involves classification and prediction of disease types. Hence Kamal Kant et al. recommends a heart disease prediction
training set includes disease type and the testing set is built model by employing Naïve Bayes data mining technique. The
using the questions. The output generated is forwarded to the technique is a statistical classifier that doesn’t make the
doctor/specialist [9]. attributes dependent on each other. To decide upon the class,
S. Dangare.et.al., presents the main three layers: input, the posterior probability must be highly raised. Here, Naïve
hidden and the output layer. The input is fed to the input layer Bayes classifier yields great performance and is quiet efficient
and the output layer projects the result acquired. Thereafter, for predicting disease in statistical probability and real time
comparison of both actual and the expected output is expert system, then comes the Neural Network and Decision
performed. Using the back propagation, error can be trees [16].
determined and weight amidst the output and prior hidden Sharan Monica L et.al carries out a survey on prevailing
layers can be adjusted. After completion of back propagation, techniques of KDD (knowledge discovery in databases by
forward process commences and is carried forward till the error adopting the following mining techniques namely - J48, NB
is reduced [10]. Tree and simple CART for accurately predicting heart disease
K. Pramanik.et.al., recommends a Hybrid Algorithm that using least no: of attributes in the WEKA tool. J48 being an
being a blend of ID3 and KNN algorithm and are adopted for open source Java application of C4.5 that acquires information
predicting heart disease. The data is pre-processed using the for making decisions. Naive Bayes (NB) classifier builds
KNN algorithm hence it’s also referred to as pre-processed models mostly for continuous dataset using predictive
algorithm. The pre-processed data forms the training set which proficiencies. Data relationships that are significant can be
is then classified in a form of tree structure. For predicting the promptly projected using CART (Classification and Regression
heart disease, ID3 algorithm is implemented for the classifier. Trees). The above mentioned 3DT algorithms are deployed
By using the KNN Algorithm, classification of incorrect values using WEKA. CART exhibited an accuracy of 92.2% and J48
is performed [11]. was the fastest one, framed in just 0.08 sec [17].

RishabhSaxena et.al, discusses that Heart diseases have SumitraSangwan et.al have built a hybrid algorithm that
become quiet rampant and common in the today’s society. The employs k-means and A-priori algorithm that are capable of
HDD (Heart disease dataset) of Cleveland is taken into account extracting abundant data along with significant information.
Initially, clustering is performed via k-means clustering

algorithm. Thereafter, frequent item-sets are determined using interventions becomes quiet complicated and astonishing.
A-priori algorithm along with extraction of frequent term-sets Secure execution and analysis exhibits Cloud computing
for Boolean association rule. The approach of "bottom up" is feasibility via Naive Bayes and AES. By making use of social
imbibed wherein frequent subsets are expanded with one single networking site, restrictive patient’s information can be shared
item at a time and testing of entire groups of candidates is with patient’s family members. The AES encryption algorithm
performed against the data. With the output generated its safeguards sensitive but accessible patient’s information [21].
elucidated that clustering followed by A-priori results in high
performance for heart disease prediction [18]. III. PROPOSED WORK
Rishi Dubey et.al carries out the heart disease prediction by A. Overview
surveying various data mining techniques. It was exhibited As per today’s advanced and hi-tech living style, majority
from various researches that considering the ‘accuracy’ of the people are contracting heart disease which gives a sudden
parameter, hybrid techniques surpasses a single classification jolt to an individual that at times one lacks time to get treated
technique. It was ascertained that for achieving prediction, immediately. Hence its very much essential that timely and
neural network proves to be quiet effective. The system yields early diagnosis is performed which being quiet challenging
in assuring outcome when trained well using genetic concern for the medical association. Poor and incorrect analysis
algorithms. Moreover, appropriate treatment could be selected carried out by the hospital can being down its reputation and
for patients in the near future rather than merely anticipating the working. The research focuses on to build cost cutting and
likeness of contracting heart disease in individuals [19]. effective approach by the means of data mining techniques so
that DSS (decision support system) can be enhanced. Predicting
Monika Gandhi et.al, explores and carries a study for heart disease with the help of numerous attributes/symptoms is
determining how data mining techniques along with other quiet complicated. The present research utilizes Naives
techniques helps in drawing out unknown patterns from large Bayesian - data mining classification technique for effectively
databases enabling the healthcare establishments in decision enabling heart disease diagnosis and thereby offering
making. The data mining classification techniques such as appropriate treatment. Supervising different medical factors
decision trees, Neural network and Naive Bayes are employed and post operational period stands very crucial. AES encrypts
for data discovery, extraction and classification. [20]. the patients’ records/data and save it in database. The results
Arun R et.al discusses about CVD or cardiovascular illness. generated reveals that the diagnostic system built successfully
Presence of numerous sort of symptoms and strong predict the risk level associated with heart diseases.

User Registration through Mobile


User Enter System IP address

Login into Web application

Collection of Data

Classification using Navies


Training Dataset Testing Dataset

Data Encryption using



Result in PDF format

Fig 1: Proposed Architecture

B. Data collection Step 4: Thereafter class conditional independence is presumed.

Using the UCI dataset, Collection of medical data of Thus,
patients with heart diseases is carried out. Throughout P(X| Ci) = P(x1| Ci)* P(x2| Ci )......P(xm| Ci)
issues/matters are assumed for CAD and registered for Step 5: For predicting class of X, P(X|Ci )P(Ci ) is computed for every
angiography. Every patient’s attributes are being assembled class Ci.
such as demographic, historic and laboratory features such as
Naive Bayes classifier predict that class label of X = Ci class if
sex, age, hypertension, smoking history, diabetes mellitus,
chest pain type, dyslipidemia, random blood sugar, low and P(X|Ci)P(Ci) > P(X|Cj)P(Cj)
high density lipoprotein, cholesterol, triglycerides, systolic and for 1 ≤ j ≤ m, j ≠ i
diastolic blood pressure, weight, height, BMI (body mass F. Prediction
index), central obesity, waist circumference, ankle–brachial
index, duration of exercise, METS obtained, rate pressure With the launch of automated medical diagnosis system,
product, recovery duration with persistent ST changes, duke there is high development in the medical domain and at the
treadmill test and angiography result. same time cost consumption has reduced. There are numerous
factors prevailing for heart attack diagnosis and mostly
C. User Registration and Login patient’s test records are being referred and analyzed for
Data mining techniques have proven to be extremely carrying out the diagnosis. For enhancing the diagnosis process,
advantageous in healthcare sector by assisting in diagnosis and experience and knowledge of various medical experts/doctors
identification of diseases effectively, protecting and enhancing as well as patient’s medical screening data is being collected in
patient’s life span, helping the medical care takers in treatment databases, resulting in an extremely significant system. With
plans and cutting down medical cost. First is the process of user the blend of clinical decision support and computerized patient
registration wherein the user must fill up the registration form records, the medical faults can be reduced, patient’s safety can
through a mobile application. After successful completion of be enhanced, variation in unwanted practices can be minimized
user registration, using the system IP (internet protocol) address thereby improvising throughout patient’s results. In addition, by
the user can login anytime by using his/her own username and making use of heart disease levels predictions, a prediction
password. Every registered user’s credentials are saved in the algorithm is being established.
database. After this the complete symptoms list is given G. Security for AES
including the affected clinical features like age, sex, cholesterol,
sugar, ECG, chest pain, Rest blood pressure, etc. AES is designed transmitting the data to the database in a
secure manner. AES is a very popular and demanding
D. Classification encryption algorithm that is utilized very often. It uses bytes
This classification algorithm basically employs conditional instead of bits for carrying out all the operations. For instance a
independence, this implies that value of an attribute for an plain text of 128 is assumed to be of 16 bytes framed or
available class is not dependent on other attribute values since designed in a matrix of four columns and four rows to carry out
the algorithm relies upon the Bayesian theorem. any processing. For performing cryptography, this encryption
algorithm is utilized by software and hardware both. Also no
E. Navies Bayesian based Classification practical cryptanalytic attack has been reported yet against AES
A Naive Bayesian (NB) classifier, also termed as algorithm. The result being generated in a PDF format. All the
“independent feature model" relies upon the Bayesian theorem patient’s details are encrypted using the above encryption
and acts as a simple probabilistic classifier having powerful algorithm.
independence hypothesis. Generally, the NB classifier presumes
that the existence/absence of a specific class feature is IV. RESULT AND DISCUSSION
independent of the existence of the other class feature. NB The section put forth a standard model for performing
classifiers usually perform in supervised learning. The classifier classification and prediction of composite web services. Data
is based on conditional independence, this implies that value of mining process involves mining/extracting of very significant,
a variable for an available class is independent of other existing hidden and valuable information from large databases of health
variable value. In case of high dimensionality input, the care sector. It’s illustrated in the work that higher results are
classifier is highly appropriate. Using Naïve Bayesian, models achieved when hybrid of data mining techniques are deployed
having predictive potentials can be designed. instead of implementing one single mining technique on a data
set. The partitioning of data samples is done using tenfold,
Algorithm every fold has been imbibed in testing, and any folds left were
Step 1: Say D represents the training set and each record denoted by utilized for training at time of cross validation.
n-dimensional attribute vector, this means X=(x1, x2…, xn), predicting
n measurements from n attributes (say A1 to An.)
Table 1 represents proposed classification techniques
performance which is compared with prevailing techniques of
Step 2: Consider m no: of classes for prediction (say C1, C2….... Cm) SMO (Sequential Minimal Optimization), Bayes Net and MLP
By Bayes’ theorem: (Multi-Layer Perception). Effective outcome is exhibited by the
P (Ci | X) =
P(X | Ci)∗ P(Ci) proposed Navies Bayesian with greater performance in contrast
𝑃(𝑋) to rest of the techniques.
Step 3: Since P(X) being a constant for every class, hence P (X|C i)*

S. No No. of Techniques Time(s)
(%) Figure 3 mentioned above depicts a security comparison
Sequential Minimal between AES and PHEA (Parallel Homomorphic Encryption
1 84.07 0.02 Algorithm). The recommended AES algorithm yields in greater
Optimization (SMO)
in contrast to remaining techniques.
2 Bayes Net (BN) 81.11 0.02
Multi Layer
3 77.4 0.75 Below the figures 4 to 8 analysis of design and
Perception (MLP)
implementation of smart heart disease prediction is proposed.
4 Navies Bayesian (NB) 89.77 0.01 This system continuously monitors the coronary heart patient
and updates the data to the object converse data base and if any
abnormalities are observed.

40 Accuracy (%)
30 Time (s)
No of Techniques

Fig 4: Patient has to enter the details to create an account

Fig: 2 Performance of Heart disease classification techniques

Fig. 2 above compares Navies Bayesian classification

techniques to different techniques like SMO (Sequential
Minimal Optimization), Bayes Net and MLP (Multi-Layer
Perception). Effective performance is reported with the
proposed heart diseases classification technique than the
remaining ones.
Table 2 depicts a comparison of AES (Advanced
Encryption Standard) is contrast to PHEA (Parallel
Homomorphic Encryption Algorithm). The AES algorithm
offers effective security in comparison to rest others.
Fig 5: Patient has to enter the ip address to connect to web application

S. No No. of Techniques Security (%)

Advanced Encryption
1 98.2
Standard (AES)

2 PHEA 92.21

98 Fig 6: Patient has to enter the user id and password to login
Security (%)

94 AES
No. of Techniques

Fig: 3 Comparison of Security techniques

