Design and Implementing Heart Disease Prediction Using Naives Bayesian Dept. of Cse
Design and Implementing Heart Disease Prediction Using Naives Bayesian Dept. of Cse
Design and Implementing Heart Disease Prediction Using Naives Bayesian Dept. of Cse
OF CSE
ABSTRACT
Data mining, a great developing technique that revolves around exploring and digging out
significant information from massive collection of data which can be further beneficial in examining
and drawing out patterns for making business related decisions. Talking about the Medical domain,
implementation of data mining in this field can yield in discovering and withdrawing valuable
patterns and information which can prove beneficial in performing clinical diagnosis.
The research focuses on heart disease diagnosis by considering previous data and
information. To achieve this SHDP (Smart Heart Disease Prediction) is built via Navies Bayesian in
order to predict risk factors concerning heart disease. The speedy advancement of technology has
led to remarkable rise in mobile health technology that being one of the web application. The
required data is assembled in a standardized form. For predicting the chances of heart disease in a
patient, the following attributes are being fetched from the medical profiles, these include: age, BP,
cholesterol, sex, blood sugar etc...
The collected attributes acts as input for the Navies Bayesian classification for predicting
heart disease. The dataset utilized is split into two sections, 80% dataset is utilized for training and
rest 20% is utilized for testing. The proposed approach includes following stages: dataset collection,
user registration and login (Application based), classification via Navies Bayesian, prediction and
secure data transfer by employing AES (Advanced Encryption Standard). Thereafter result is
produced. The research elaborates and presents multiple knowledge abstraction techniques by
making use of data mining methods which are adopted for heart disease prediction. The output
reveals that the established diagnostic system effectively assists in predicting risk factors concerning
heart diseases.
Keywords—Data Mining; Smart Heart Disease Prediction(SHDP); Web And Mobile Application;
Navies Bayesian; Advanced Encryption Standard (AES); Data Collection; Classification; Prediction.
CHAPTER 1
INTRODUCTION
1.1 Overview: Angina, chest pain and heart attack are the symptoms of Coronary Heart Disease
(CHD), Cardiovascular Diseases (CVDs) elucidated around one fourth of all deaths in India in
2008. In India, there could be 30 million CHD patients out of which 14 million are in urban and
16 million in rural areas. [1]. The heart attack increases due to smoking, lack of exercises, high
blood pressure, high cholesterol, improper diet, high sugar levels etc. [2]. Early detection and
treatment can keep heart disease from getting worse. In the past few decades, medical data
mining have played a important role to explore the hidden patterns which can be used for
clinical diagnosis of any disease dataset
1.2 Objective: Classification is one of the data mining technique to classify the patient class as
normal or heart disease but classification use all attributes either relevant or irrelevant features
which may reduce the classification performance. Feature subset selection is one of the
dimensionality reduction techniques use to improve the accuracy. Particle swarm optimization
is a feature selection technique which removes the redundant features to improve the classifier
performance. Our proposed model identifies the relevant features and removes the irrelevant
features, to predict the heart disease effectively.
1.3 Related Work: Data mining plays an important role in the field of heart disease prediction.
Medical Data mining has great potential like exploring the hidden patterns which can be utilized
for clinical diagnosis of any disease dataset. Several data mining techniques are used in the
diagnosis of heart disease such as Naive Bayes, Decision Tree, neural network, kernel density,
bagging algorithm, and support vector machine showing different levels of accuracies. Naive
Bayes is one of the successful classification techniques used in the diagnosis of heart disease
patients. Peter et al. [4] talked about a new feature selection method algorithm which is the
hybrid method which combined CFS and Bayes theorem (CFS+Filter Subset Eval) and
evaluated accuracy 85.5%. Shouman presented work by integrating k-means clustering with
Naive Bayes using different initial centroid selection to improve the Naive Bayes accuracy for
diagnosing heart disease patients and accuracy was 84.5%. Rupali et al. decision support in
Heart Disease Prediction System (HDPS) is developed by using both Naive Bayesian
Classification and Jelinek-Mercer smoothing technique. This Laplace smoothing is use to make
an approximating function which attempts to capture important patterns in the data to avoid
noise & accuracy is 86%. Elma et al. proposed a classifier with the distance-based algorithm K-
nearest neighbour and statistical based Naïve Bayes classifier (cNK) and achieved the accuracy
85.92% for heart disease dataset.
CHAPTER 2
LITERATURE SURVEY
2.1 Existing system: Classification is one of the data mining technique to classify the
patient class as normal or heart disease but classification use all attributes either relevant or
irrelevant features which may reduce the classification performance. In order to increase the
performance we have to go to the proposed system.
2.2 Proposed Work: The present research utilizes Naives Bayesian - data mining
classification technique for effectively enabling heart disease diagnosis and thereby offering
appropriate treatment. Supervising different medical factors and post operational period stands
very crucial. AES encrypts the patients’ records/data and save it in database. The results
generated reveals that the diagnostic system built successfully predict the risk level associated
with heart diseases.
CHAPTER 3
PROPOSED MODEL
3.1 Overview: As per today’s advanced and hi-tech living style, majority of the people are
contracting heart disease which gives a sudden jolt to an individual that at times one lacks time
to get treated immediately. Hence its very much essential that timely and early diagnosis is
performed which being quiet challenging concern for the medical association. Poor and
incorrect analysis carried out by the hospital can being down its reputation and working. The
research focuses on to build cost cutting and effective approach by the means of data mining
techniques so that DSS (decision support system) can be enhanced. Predicting heart disease
with the help of numerous attributes/symptoms is quiet complicated.
3.2 Data Collection: Using the UCI dataset, Collection of medical data of patients with
heart diseases is carried out. Throughout issues/matters are assumed for CAD and registered for
angiography. Every patient’s attributes are being assembled such as demographic, historic and
laboratory features such as sex, age, hypertension, smoking history, diabetes mellitus, chest pain
type, dyslipidemia, random blood sugar, low and high density lipoprotein, cholesterol,
triglycerides, systolic and diastolic blood pressure, weight, height, BMI (body mass index),
central obesity, waist circumference, ankle–brachial index, duration of exercise, METS
obtained, rate pressure product, recovery duration with persistent ST changes, duke treadmill
test and angiography result.
3.4 User Registration and Classification: Data mining techniques have proven to be
extremely advantageous in healthcare sector by assisting in diagnosis and identification of
diseases effectively, protecting and enhancing patient’s life span, helping the medical care
takers in treatment plans and cutting down medical cost. First is the process of user registration
wherein the user must fill up the registration form through a mobile application. After
successful completion of user registration, using the system IP (internet protocol) address the
user can login anytime by using his/her own username and password. Every registered user’s
credentials are saved in the database. After this the complete symptoms list is given including
the affected clinical features like age, sex, cholesterol, sugar, ECG, chest pain, Rest blood
pressure, etc.
CHAPTER 4
4.1 Naives Bayesian based Classification: A Naive Bayesian (NB) classifier, also
termed as “independent feature model" relies upon the Bayesian theorem and acts as a simple
probabilistic classifier having powerful independence hypothesis. Generally, the NB classifier
presumes that the existence/absence of a specific class feature is independent of the existence of
the other class feature. NB classifiers usually perform in supervised learning. The classifier is
based on conditional independence, this implies that value of a variable for an available class is
independent of other existing variable value. In case of high dimensionality input, the classifier
is highly appropriate. Using Naïve Bayesian, models having predictive potentials can be
designed.
4.2 Algorithm:
Step 1: Say D represents the training set and each record denoted by n-dimensional attribute
vector, this means X=(x1, x2…, xn), predicting n measurements from n attributes (say A1 to
An.)
Step 2: Consider m no: of classes for prediction (say C1, C2….... Cm) By Bayes’ theorem: P
(Ci | X) = P(X | Ci)∗ P(Ci) 𝑃(𝑋)
Step 3: Since P(X) being a constant for every class, hence P (X|Ci)* P(Ci) must be maximized.
Step 4: Thereafter class conditional independence is presumed. Thus, P(X| Ci) = P(x1| Ci)*
P(x2| Ci )......P(xm| Ci)
Step 5: For predicting class of X, P(X|Ci )P(Ci ) is computed for every class Ci. Naive Bayes
classifier predict that class label of X = Ci class if P(X|Ci)P(Ci) > P(X|Cj)P(Cj) for 1 ≤ j ≤ m, j
≠ i.
4.3 Prediction: With the launch of automated medical diagnosis system, there is high
development in the medical domain and at the same time cost consumption has reduced. There
are numerous factors prevailing for heart attack diagnosis and mostly patient’s test records are
being referred and analyzed for carrying out the diagnosis. For enhancing the diagnosis process,
experience and knowledge of various medical experts/doctors as well as patient’s medical
screening data is being collected in databases, resulting in an extremely significant system. With
the blend of clinical decision support and computerized patient records, the medical faults can
be reduced, patient’s safety can be enhanced, variation in unwanted practices can be minimized
thereby improvising throughout patient’s results. In addition, by making use of heart disease
levels predictions, a prediction algorithm is being established.
4.4 Security for AES: AES is designed transmitting the data to the database in a secure
manner. AES is a very popular and demanding encryption algorithm that is utilized very often.
It uses bytes instead of bits for carrying out all the operations. For instance a plain text of 128 is
assumed to be of 16 bytes framed or designed in a matrix of four columns and four rows to
carry out any processing. For performing cryptography, this encryption algorithm is utilized by
software and hardware both. Also no practical cryptanalytic attack has been reported yet against
AES algorithm. The result being generated in a PDF format. All the patient’s details are
encrypted using the above encryption algorithm.
CHAPTER 5
RESULT AND DISCUSSION
5.1 Result and Discussion: The section put forth a standard model for performing
classification and prediction of composite web services. Data mining process involves
mining/extracting of very significant, hidden and valuable information from large databases of
health care sector. It’s illustrated in the work that higher results are achieved when hybrid of
data mining techniques are deployed instead of implementing one single mining technique on a
data set. The partitioning of data samples is done using tenfold, every fold has been imbibed in
testing, and any folds left were utilized for training at time of cross validation.
Fig. 2 above compares Navies Bayesian classification techniques to different techniques like
SMO (Sequential Minimal Optimization), Bayes Net and MLP (Multi-Layer Perception).
Effective performance is reported with the proposed heart diseases classification technique than
the remaining ones.
Figure 3 mentioned above depicts a security comparison between AES and PHEA (Parallel
Homomorphic Encryption Algorithm). The recommended AES algorithm yields in greater in
contrast to remaining techniques.
CHAPTER 6
CONCLUSION
Data collection is carried out using numerous sources that are primary factors responsible for
any sort of heart disease and thereby using a structure the database is constructed. The research
focuses on establishing SHDP (Smart Heart Disease Prediction that takes into consideration the
approach of NB (Naive Bayesian) classification and AES (Advanced Encryption Standard)
algorithm for resolving the issue of heart disease prediction. its revealed that in regard to
accuracy, the prevailing technique surpasses the Naive Bayes by yielding an accuracy of
89.77%in spite of reducing the attributes. AES yields in high security performance evaluation in
comparison to PHEA (Parallel Homomorphic Encryption Algorithm).
CHAPTER 7
FUTURE WORK
Application developers should work together with health care professionals and researchers to
deliver disease apps which improve healthcare outcomes. An overall of research process to
reduce delays would help to ensure application based heart disease prevention research is not
entirely left behind by advances in technology.
CHAPTER 8
REFERENCES
[1] Purushottama. C, Kanak Saxenab, Richa Sharma (2016), “Efficient Heart Disease Prediction
System”, Elsevier, Procedia Computer Science, No. 85, pp. 962 – 969.
[2] Kipp W. Johnson, BS, Jessica Torres Soto, MS, Benjamin S. Glicksberg (2018), “Artificial
Intelligence in Cardiology”, Elsevier, Journal Of The American College Of Cardiology, Vol.
71, No. 23, pp. 2668 - 2679.
[3] Chala Beyene, Pooja Kamat (2018), “Survey on Prediction and Analysis the Occurrence of
Heart Disease Using Data Mining Techniques”, International Journal of Pure and Applied
Mathematics, Vol. 118, No. 8, pp. 165-174.
[4] Manpreet Singh, Levi Monteiro Martins, Patrick Joanis, and Vijay K. Mago (2016),
“Building a Cardiovascular Disease Predictive Model using Structural Equation Model & Fuzzy
Cognitive Map”, IEEE, ICFS (FUZZ), pp. 1377-1382.
[5] Shalet K.S, V. Sabarinathan, V. Sugumaran, V. J. Sarath Kumar (2015), “Diagnosis of Heart
Disease Using Decision Tree and SVM Classifier”, International Journal of Applied
Engineering Research, Vol. 10, No.68, pp. 598-602.