Novel Deep Learning Architecture For Heart Disease Prediction Using Convolutional Neural Network
Novel Deep Learning Architecture For Heart Disease Prediction Using Convolutional Neural Network
Novel Deep Learning Architecture For Heart Disease Prediction Using Convolutional Neural Network
Abstract — Healthcare is one of the most a need to develop an early diagnosis system that prevents the
important aspects of human life. Heart disease is known to deaths which are occurring due to heart diseases.
be one of the deadliest diseases which is hampering the Heart diseases or also known as cardiac diseases are
lives of many people around the world. Heart disease must generally caused by the narrowing of coronary arteries which
be detected early so the loss of lives can be prevented. The supply blood to the heart. There are methods like Angiography
availability of large-scale data for medical diagnosis has which is used for detecting heart diseases but it is very costly
helped developed complex machine learning and deep and is prone to certain reactions in a patient’s body. This
learning-based models for automated early diagnosis of prevents the widespread use of these techniques in countries
heart diseases. The classical approaches have been limited with large poor populations.
in terms of not generalizing well to new data which have There is a need of developing healthcare products that
not been seen in the training set. This is indicated by a provide quality results at an affordable rate. Healthcare
large gap in training and test accuracies. This paper organizations are also looking for clinical tests which can be
proposes a novel deep learning architecture using a 1D performed without invasion at a cheap rate. The development
convolutional neural network for classification between of a computer-based decision support system for the diagnosis
healthy and non-healthy persons to overcome the of various diseases can help organizations cater to the need of
limitations of classical approaches. Various clinical millions of people around the world.
parameters are used for assessing the risk profile in the The rapid growth of machine learning and deep
patients which helps in early diagnosis. Various techniques learning algorithms has helped research in various industries
are used to avoid overfitting in the proposed network. The including medical. The availability of large-scale medical
proposed network achieves over 97% training accuracy diagnosis data has helped in training these algorithms. The
and 96% test accuracy on the dataset. The accuracy of the clinical support system can be developed using these
model is compared in detail with other classification algorithms which helps in reducing cost and increasing
algorithms using various performance parameters which accuracy [3].
proves the effectiveness of the proposed architecture. Various clinical features can be utilized by machine
learning algorithms for categorizing the risk profile of the
Keywords— Heart Disease Prediction, Healthcare, Deep patients. There are certain features like age, sex, heredity
Learning, 1D Convolutional Neural Network, Embedding which are not in control while features like blood pressure,
Layer, Overfitting smoking, drinking habits are in control of the patient [2]. The
proposed algorithm uses a combination of these features for
I. INTRODUCTION categorizing healthy and non-healthy patients.
The remainder of the paper is organized as follows: The
There has been considerable research in the field of existing methods of heart disease classification using machine
healthcare in the last few years particularly after the Covid learning solutions are discussed in Section II. The explanation
pandemic. It has been observed that heart diseases are one of of the proposed architecture is explained in Section-III. The
the deadliest diseases which cause maximum deaths of human implementation details and results are discussed in Section-IV.
lives in the world according to the world health organization
[1]. It is also observed that more than 24% of the deaths in II. LITERATURE SURVEY
India are due to various forms of heart disease [2]. So there is
There has been a lot of research in developing a heart network with dropout to avoid overfitting. It uses the
disease diagnosis system for early detection using various Cleveland database [12] which has 13 features for classifying
clinical parameters. Various Classification algorithms like between healthy and non-healthy patients. The other
Logistic Regression, Support Vector Machine, Decision Tree, classification algorithms are also implemented for verifying
Random Forest, Artificial Neural Network, etc are being used the performance of the proposed architecture using
for classifying patients. This section summarizes those well-known performance measuring parameters.
implementations. A detailed explanation of the proposed architecture with
S. Radhimeenakshi [4] proposed a Decision Tree and the algorithms and techniques used in the next section.
Support Vector Machine for heart disease classification. He
concluded that the decision tree classifier performs better than III. PROPOSED ARCHITECTURE
SVM in terms of accuracy measured using a confusion matrix.
R.W.Jones et al [5] proposed a heart disease prediction This section describes the proposed architecture and all its
technique using an artificial neural network. They used a constituent layers in detail along with the techniques used to
self-applied questionnaire for training the neural network. The optimize the architecture. It also gives some theoretical
neural network contained three hidden layers and was trained background about the 1-D convolutional neural network
using a backpropagation algorithm. The architecture was (CNN) which is central to the proposed architecture.
validated using the Dundee rank factor score and achieved a Conventional 2D CNN has become very popular in
98% relative operating characteristic value on the dataset. pattern recognition problems like Image classification and
Ankita Dewan et al. [6] compared the performance of object detection [13]. CNNs are similar to ANN in which they
genetic algorithms and backpropagation for training the neural consist of self-optimizing neurons which are trained to
network architecture. They concluded backpropagation perform a certain task. This has led to the development of 1-D
algorithms perform better with a very minimum error on the CNN which can operate on one-dimensional dataset or Time
dataset. SY Huang et al. [7] proposed a learning vector series data [13]. The proposed architecture using this concept
quantization algorithm for training the artificial neural of 1D CNN is shown in figure 1 below.
network. They used 13 clinical features for training the
network and achieved almost 80% accuracy on the dataset.
Jayshril S. Sonawane et al. [8] proposed a new artificial
neural architecture that can be trained using a vector
quantization algorithm with random order incremental
training. They also used 13 clinical features for training and
achieved 85.55% accuracy on the dataset. Majid Ghonji
Feshki et al. [9] used four different classification algorithms
which include C4.5, Multilayer Perceptron, Sequential
Minimal Optimization, and feed-forward backpropagation. Figure 1: Proposed 1-D CNN Architecture
They concluded that the PSO algorithm with neural networks
achieved the best accuracy of around 91.94% on the dataset. The input to the architecture will be the 13 features
R. R. Manza et al. [10] proposed an Artificial Neural that are important in the classification of heart disease. These
Network with many numbers of Radial Basis Function features are converted to a new representation called word
neurons in the hidden layer. They obtained around 97% embedding by the layer called as Embedding Layer. It is
accuracy on this architecture. Saba Bashir et al. [2, 10] similar to the Bag of Words concept used for Text data. It
proposed a hybrid model for heart disease prediction which helps in a better representation of the dataset according to
uses a combination of decision tree, SVM, and Naïve Bayes unique values present in each of the features. The output of the
algorithms. They achieved 74% sensitivity, 82% accuracy, and Embedding layer is given to the 1D CNN layer for feature
93% specificity. extraction.
P. Ramprakash et al. [1] proposed a deep neural 1D CNN is very similar to conventional 2D CNN but
network and χ2 statistical model for feature selection. They the convolution operation is only applied to the one dimension
used various techniques to avoid overfitting and underfitting. which results in shallow architecture which can be easily
They achieved 94% accuracy, 93% sensitivity, and 93% trained on normal CPU or even embedded development
specificity. Turay Karayilan et al. [2] studied the performance boards [13]. The convolution operation helps in finding useful
of artificial neural networks with the various number of hidden hierarchical features from the dataset which are useful in
layers. They achieved around 95.55% accuracy using five classification. The dimensions of the output features after 1D
hidden layers. CNN can be calculated using the equation given below:
It can be observed that most of the proposed systems use
𝑤+2𝑝−𝑓
Artificial Neural Networks with some modifications. It is 𝑥= 𝑠
+1 (1)
observed that these architectures are prone to overfitting so
perform poorly on new data. So this paper proposes a new Where x is the dimension of output features and w is
architecture using a one-dimensional convolutional neural the size of input features. f indicates the size of the filter used
for convolutions. ‘p’ indicates padding which are values added that training accuracy was very high and validation accuracy
on the boundary before applying convolution. ‘s’ indicates was low. The dropout technique was introduced to remove
stride which is the value travelled after applying convolution overfitting. It removes random neurons with a certain
operation. probability during training which allows the different
The 1D convolution operation is a linear operation networks to be trained at every iteration. This will help in the
that is not useful in classifying nonlinear data. Most of the network not being too dependent on any single neuron of the
real-world dataset is nonlinear which requires some nonlinear network. The dropout layer has been introduced after each
operation after convolution. This nonlinear function is called trainable layer in the proposed architecture. The addition of
an activation function. Sigmoid, hyperbolic tangent and the dropout layer helped the training and test accuracy to be
rectified linear unit (RelU) are some of the widely used very similar which points to the network adapting well to data
activation functions. The proposed architecture uses the RelU that it has not seen.
activation function which is easy to compute and allows faster The next section describes the implementation details and
computation. It also does not suffer from vanishing or results obtained after training the proposed architecture.
exploding gradient problems.
There can be multiple convolution layers in the IV. IMPLEMENTATION AND RESULTS
architecture followed by an activation function. The proposed
architecture uses two 1-D convolution layers with 128 filters The proposed architecture for heart disease prediction
and filter sizes of 3. The output of the final convolution layer has been implemented using the scikit-learn and Keras library
is passed through the global max-pooling layer which pools which allows the implementation of various machine learning
the maximum value from all the channels and reduces the and deep learning algorithm. The system used for
dimension of output. The output of pooling is passed through development contains an intel i5 CPU and 8GB RAM. It also
the fully connected layer with 256 neurons which extracts the has GeForce 940 GPU which helps in training the architecture
useful features for classification. This layer is similar to the faster.
hidden layer is ANN. The final layer contains a single neuron The paper uses the Cleveland database [12] which
which gives the classification probability. The final layer uses has 303 samples of patients with 14 different features. The
the sigmoid activation function as it directly gives the dataset is divided into two parts. 80% is used for training and
probability for binary classification. the remaining 20% is used for validation. The features used
The layer-wise details along with output feature for classification in the dataset are explained in Table 2.
dimensions and the number of trainable parameters are shown
in Table 1. Table 2: Dataset details
Sr Feature Value Range
Table 1: Layerwise CNN Architecture No.
Layer (type) Output Shape No. of 1 Age of Patient 29-77
Parameters 2 Gender 1 = Male
Embedding_1 (None, 13, 300) 45600 0 = Female
(Embedding) 3 Category of Chest Pain 0 = Atypical
dropout_1 (Dropout) (None, 13, 300) 0 Angina
1 = typical
Angina
Conv1d_1 (Conv1D) (None, 13, 64) 57664
2 = Asymptotic
dropout_2 (Dropout) (None, 13, 64) 0 3 = Non Angina
4 Blood Pressure 94-200
Conv1d_2 (Conv1D) (None, 13, 64) 12352 5 Serum Cholesterol Level 126-564
6 Fasting Blood Suger 0 if < 120
Global_max_pooling1d_1 (None, 64) 0 1 if >= 120
(GlobalMaxPooling1) 7 Resting ECG result 0 = Normal
dense_1 (Dense) (None, 256) 16640 1 = ST-T wave
dense_2 (Dense) (None, 1) 257 abnormalities
Total parameters: 132,513 2 = left
ventricular
Trainable parameters: 132,513 Hypertrophy
Non-trainable parameters: 0 8 Heart rate 71-202
9 Exercise-induced Angina 0 = No
The proposed 1D CNN architecture contains around 1 = Yes
0.13 million trainable parameters which will get adapted 10 ST depression due to 0 – 6.2
during the training of the network. It was observed that exercise-related to rest
general CNN architecture overfitted the training data meaning 11 The slope of the peak 0= un sloping
exercise ST segment 1=flat Where mt is the momentum term at timestamp ‘t’, β1 is
2=down sloping constant which is taken as 0.9 and gt is the gradient at
12 Count of major 0-3 timestamp ‘t’. Exponentially decaying averages of past
vessels colored by squared gradients is calculated by:
Fluoroscopy 2
13 Thallium Scan 3=normal
(
𝑣𝑡 = β2𝑣𝑡−1 + 1 − β2 𝑔𝑡 ) (3)
6=fixed Where vt is the velocity term at timestamp ‘t’, β2 is constant
7=reversible which is taken as 0.99 and gt is the gradient at timestamp ‘t’.
effect
^ 𝑚𝑡 ^ 𝑣𝑡
14 Heart Disease 0 = No Bias correction 𝑚𝑡 = 𝑡 and 𝑣𝑡 = 𝑡 and then, update
1 = Yes 1−β1 1−β2
parameters using Adam’s update rule:
Some of the attributes have missing values for some η ^
of the examples. Those values have been replaced with the θ𝑡+1 = θ𝑡 − ^
𝑚𝑡 (4)
𝑣𝑡+ϵ
mean value of that attribute for training our architecture. Most
of the traditional classification architectures require all the Where ε is constant with a very small value which avoids
attributes in the same range. This dataset has attributes in division by zero and θt is a parameter value at timestamp ‘t’.
different ranges so a standardization technique is applied
which converts all the attributes into the same range. It The training and test accuracy after each epoch is
subtracts all the attribute values with the mean value of the shown in figure 3 below:
attribute and divides by the standard deviation of the attribute.
The final attribute is the true label for the patient whether
he/she has heart disease or not. The dataset is a little
unbalanced in the sense that there are more negative examples
compared to positive as shown in figure 2 below.
The accuracy is measured in terms of the ratio of the actual positive examples are correctly identified. There is
total number of correctly classified examples to total always a tradeoff between precision and recall so a new
examples. It can be from the table that the proposed performance measuring parameter F1 score is introduced. F1
architecture is the best performing architecture in terms of test score is a harmonic mean of precision and recall which gives a
accuracy. Some other architectures perform well on the balance value between precision and recall.
Training set but perform very poorly on the test set. The The last parameter AUC is a measure of the area
accuracy can alternatively be represented as a confusion under the receiver operating characteristic curve. The ROC
matrix. It has a number of correctly classified examples in the curve is shown in Figure 5.
diagonal and wrongly classified examples elsewhere. The
confusion matrix for the given architecture is shown below in
Table 4.
Table 4: Confusion Matrix
Predicted Class
0 1
True
0 24 2
Class
1 0 36