Heart Disease Prediction With Machine Learning Approaches

Heart Disease Prediction with Machine
Learning Approaches
Abstract— In recent times, Heart Disease prediction is one of the most complicated tasks in medical field. In
the modern era, approximately one person dies per minute due to heart disease. Data science plays a crucial
role in processing huge amount of data in the field of healthcare. As heart disease prediction is a complex
task, there is a need to automate the prediction process to avoid risks associated with it and alert the patient
well in advance. This paper makes use of heart disease dataset available in UCI machine learning repository.
Here, we will use various machine learning algorithms such as support vector classifier, random forest, knn,
naïve bayes, decision tree and logistic regression. The algorithms are used on the basis of features and for
predicting the heart disease. This paper uses different machine learning algorithms for comparing the
accuracy among them.
1. Introduction
The work proposed in this paper focus mainly on various data mining practices that are employed in heart disease
prediction. Human heart is the principal part of the human body. Basically, it regulates blood flow throughout our
body. Any irregularity to heart can cause distress in other parts of body. Any sort of disturbance to normal
functioning of the heart can be classified as a Heart disease. In today’s contemporary world, heart disease is one of
the primary reasons for occurrence of most deaths. Heart disease may occur due to unhealthy lifestyle, smoking,
alcohol and high intake of fat which may cause hypertension [2]. According to the World Health Organization
more than 10 million die due to Heart diseases every single year around the world. A healthy lifestyle and earliest
detection are only ways to prevent the heart related diseases.
The main challenge in today's healthcare is provision of best quality services and effective accurate diagnosis [1].
Even if heart diseases are found as the prime source of death in the world in recent years, they are also the ones
that can be controlled and managed effectively. The whole accuracy in management of a disease lies on the proper
time of detection of that disease. The proposed work makes an attempt to detect these heart diseases at early stage
to avoid disastrous consequences.
Records of large set of medical data created by medical experts are available for analysing and extracting valuable
knowledge from it. Data mining techniques are the means of extracting valuable and hidden information from the
large amount of data available. Mostly the medical database consists of discrete information. Hence, decision
making using discrete data becomes complex and tough task. Machine Learning (ML) which is subfield of data
mining handles large scale well-formatted dataset efficiently. In the medical field, machine learning can be used for
diagnosis, detection and prediction of various diseases. The main goal of this paper is to provide a tool for doctors to
detect heart disease as early stage [5]. This in turn will help to provide effective treatment to patients and avoid
severe consequences. ML plays a very important role to detect the hidden discrete patterns and thereby analyse the
given data. After analysis of data ML techniques help in heart disease prediction and early diagnosis. This paper
presents performance analysis of various ML techniques such as Naive Bayes, Decision Tree, Logistic Regression
and
Random Forest for predicting heart disease at an early stage.
2. Methodology
In this paper, we have used our dataset for applying different machine learning algorithms for identifying if a person
has heart disease or not. Then, we will handle the missing values in the dataset, visualize the dataset and observe the
accuracy obtained by different machine learning algorithms. The machine learning algorithms used are defined
below.
Data Collection
In this paper, the dataset is obtained from the Cleveland Heart Disease database at UCI Repository. There are 14
attributes in the dataset.
The description of dataset is given as follows:
1) Age: describes the age of a person.
2) Sex: describes the sex of a person; 1 for male, 0 for female.
3) Cp: describes the chest pain type in a person ( 1 for angina, 2 for a typical angina, 3 for non-angina, 4 for
asymptomatic).
4) Trestbps: describes the resting blood pressure.
5) Chol: describes the serum cholesterol.
6) FBS: describes the Fasting Blood Sugar ( 1 for true & 0 for false).
7) Restecg: describes the resting electro-graphic results( 0 for normal, 1 for ST-T wave abnormality, 2 for left
ventricular hypertrophy).
8) Thalach: describes the maximum heart rate.
9) Exang: describes the exercise induced angina
10) Oldpeak: describes the depression raised by exercise relative to rest.
11) Slope: describes the slope of the peak exercise ST segment (1 for up sloping, 2 for flat, 3 for down sloping).
12) Ca: describes the number of blood vessels.
13) Thal: describes thal feature (3 for normal, 6 for fixed defect, 7 for reversible effect).
14) Target: describes the target class (0 for no heart disease, 1 2 3 4 for having heart disease).
Flow Diagram
3. Results and Discussion
Correlation Matrix
Let’s see the correlation matrix of features. From this graph, we can observe that some features are highly correlated
and some are not.
Figure 1: This figure shows the correlation matrix
Bar plot for target class with different features:
It is very important that the dataset we are using should be pre-processed and cleaned. This graph shows the count of
each target class.
Figure 2: Target versus Count Feature.
The above graph shows the distribution of target versus count class that is used to predict the total number of heart
disease whether someone has heart disease or not (0 = no heart disease, 1 = having heart disease).
Machine Learning Algorithms
Logistic Regression:
Logistic Regression is a classification algorithm mostly used for binary classification problems. In logistic
regression instead of fitting a straight line or hyper plane, the logistic regression algorithm uses the logistic function
to squeeze the output of a linear equation between 0 and 1. There are 13 independent variables which makes logistic
regression good for classification.Logistic regression is a supervised learning algorithm used to predict the binary
form of a target variable. It is the easiest and simplest algorithm used in machine learning that can be used for
various problems such as disease prediction, cancer detection and so on. In this paper, we achieved the accuracy of
85.25% by using this model.
Naïve Bayes Classifier:
This classifiers calculates the probabilities for every factor. It is based on the Bayes Theorem for calculating
probabilities and conditional probabilities. Naive Bayes is a statistical classifier. It is based on Bayes’ performance
with decision tree and other selected classifiers.It is easy to implement. In this paper, we achieved the accuracy of
85.25% by using this classifier.
K Nearest Neighbors Classifier:
The k-nearest neighbors (KNN) algorithm is a simple and easy-to-implement supervised machine learning algorithm
that can be used to solve both classification and regression problems.This algorithm does not uses the parameters
instead they use the datapoints to derive the output. It is the concept of last learning model which is full of
prediction. The basic idea of this algorithm is they use various datapoints as inputs and with these datapoints they
derive the output that is full of assumption.
Figure 10: This figure shows the K Neighbors Classifier scores
This graph shows that the maximum accuracy achieved by K neighbors classifier is 90.16%.
Support Vector Classifier:
Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used for either regression
or classification challenges. Yet, it is mostly used in classification problems. Support Vectors are simply the
coordinates of individual observation. Support Vector Machine is a bound which best segregates the two classes like
line and hyperplane.There are several kernels on which the hyper plane can be decided. This paper mainly focuses
on four kernels namely linear, polynomial (poly), radial basis function (rbf) and sigmoid. This type of classifier uses
less memory because they use a subset of training points in the decision phase. In this paper, we achieved the
accuracy of 81.97% by using this model.
Decision Tree Classifier
Decision Tree algorithm is in the form of a flowchart where the inner node represents the dataset attributes and the
outer branches are the outcome. Decision Tree is chosen because they are fast, reliable, easy to interpret and very
little data preparation is required. In Decision Tree, the prediction of class label originates from root of the tree. The
value of the root attribute is compared to record’s attribute. On the result of comparison, the corresponding branch is
followed to that value and jump is made to the next node.In this paper, we achieved the accuracy of 77.05% by
using this model.
Random Forest Classifier:
This algorithm contains set of trees in which each node is like a tree structure and from that the output is predicted. It
handles the large amount of data. It gives the accurate output and gives the better efficiency. The computation
process is tough.The random forest composed of multiple decision trees. It creates a forest of trees.In this paper, we
achieved the accuracy of 85.25% by using this model.
Table 1: Accuracy Values
Algorithms Accuracy
Logistic Regression 85.25%
Naïve Bayes Classifier 85.25%
K Nearest Neighbors Classifier 90.16%
Decision Tree Classifier 77.05%
Support Vector Classifier 81.97%
Random Forest Classifier 85.25%
Table 1 shows that K Nearest Neighbors Classifier gives the best accuracy with 87% in comparison with the other
machine learning algorithms used in this paper. Because KNN algorithm is based on feature similarity and is one of
the most famous classification algorithms as of now in the industry simply due to its simplicity and accuracy. K
nearest neighbors is a simple algorithm that stores all the accessible cases and classifies new cases based on a
similarity measure.
4. Conclusion and Future Work

This paper involves prediction of the heart disease dataset with proper data processing and implementation of
machine learning algorithms. In this paper, we uses six machine learning algorithms for prediction.
Among all the machine learning algorithms used in this paper, the highest accuracy is achieved by K Nearest
Neighbors Classifier with 87%. This paper shows that the machine learning algorithms can be used to predict the
heart disease easily with different parameters and models. Machine learning is very useful in prediction, solving
problems and other areas. Machine learning is an effective way to solve the problems in different areas too.
REFERENCES
[1] Avinash Golande, Pavan Kumar T, ”Heart Disease Prediction Using Effective Machine Learning Techniques”,
International Journal of Recent Technology and Engineering, Vol 8, pp.944-950,2019.
[2] T.Nagamani, S.Logeswari, B.Gomathy,” Heart Disease Prediction using Data Mining with Mapreduce
Algorithm”, International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-
3075, Volume-8 Issue-3, January 2019.
[3] Fahd Saleh Alotaibi,” Implementation of Machine Learning Model to Predict Heart Failure Disease”, (IJACSA)
International Journal of Advanced Computer Science and Applications, Vol. 10, No. 6, 2019.
[4] Anjan Nikhil Repaka, Sai Deepak Ravikanti, Ramya G Franklin, ”Design And Implementation Heart Disease
Prediction Using Naives Bayesian”, International Conference on Trends in Electronics and
Information(ICOEI 2019).
[5] Theresa Princy R,J. Thomas,’Human heart Disease Prediction System using Data Mining Techniques’,
International Conference on Circuit Power and Computing Technologies,Bangalore,2016.
[6] Himanshu Sharma, M A Rizvi, (August 2017): Prediction of Heart Disease Using Machine Learning
Algorithms: A Survey.
[7] I Ketut Agung Enriko, Muhammad Suryanegara,
Dadang Gunawan al, (June 2018): “Heart Disease Diagnosis System with k – Nearest Neighbors Method
Using Real Clinical Medical Records”, 4th
International Conference.
[8] Lakshmanarao, Y. Swathi, P.Sri Sai Sundareswar, (November 2019): Machine Learning Techniques For Heart
Disease Prediction, International Journal Of Scientific & Technology Research Volume 8, Issue 11.
[9] Monika Gandhi, Shailendra Narayanan Singh, (2015): Predictions in heart diseases using techniques of data
mining.
[10] M. S. Amin, Y. K. Chiam, K. D. Varathan,
(Mar.2019): Identication of significant features and data mining techniques in predicting heart disease,
Telematics Inform., vol. 36, pp. 8293.
[11] Senthilkumar Mohan, Chandrasegar Thirumalai, Gautam Srivastava, (2019): Effective Heart Disease
Prediction Using Hybrid Machine Learning Techniques, Digital Object Identifier
10.1109/ACCESS.2019.2923707, IEEE Access, VOLUME 7.
Sree Rony prosad shaha

ID : 16204001
Student of BAUET

Heart Disease Prediction With Machine Learning Approaches

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Heart Disease Prediction With Machine Learning Approaches

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Heart Disease Prediction With Machine Learning Approaches

Uploaded by

Copyright:

Available Formats

Heart Disease Prediction with Machine

3. Results and Discussion

Bar plot for target class with different features:

Figure 2: Target versus Count Feature.

Machine Learning Algorithms

K Nearest Neighbors Classifier:

Figure 10: This figure shows the K Neighbors Classifier scores

Support Vector Classifier:

Decision Tree Classifier

Table 1: Accuracy Values

4. Conclusion and Future Work

Sree Rony prosad shaha

You might also like