Novel Deep Learning Architecture For Heart Disease Prediction Using Convolutional Neural Network

Novel Deep Learning Architecture for Heart
Disease Prediction using Convolutional

Neural Network
Shadab Hussain1, Dr Santosh Kumar Nanda2, Susmith Barigidad3, Shadab Akhtar 4, Md Suaib5
1
Computer Science and Mathematics, Liverpool John Moores University, UK
2
Vice President of Artificial Intelligence and Data Science Services, Techversant
3
Computer Science and Engineering, Santa Clara University, USA
4
Computer Science and Engineering, GL Bajaj Institute of Technology and Management, India
5
Computer Science and Engineering, Saroj Institute of Technology and Management, India
Abstract — Healthcare is one of the most a need to develop an early diagnosis system that prevents the
important aspects of human life. Heart disease is known to deaths which are occurring due to heart diseases.
be one of the deadliest diseases which is hampering the Heart diseases or also known as cardiac diseases are
lives of many people around the world. Heart disease must generally caused by the narrowing of coronary arteries which
be detected early so the loss of lives can be prevented. The supply blood to the heart. There are methods like Angiography
availability of large-scale data for medical diagnosis has which is used for detecting heart diseases but it is very costly
helped developed complex machine learning and deep and is prone to certain reactions in a patient’s body. This
learning-based models for automated early diagnosis of prevents the widespread use of these techniques in countries
heart diseases. The classical approaches have been limited with large poor populations.
in terms of not generalizing well to new data which have There is a need of developing healthcare products that
not been seen in the training set. This is indicated by a provide quality results at an affordable rate. Healthcare
large gap in training and test accuracies. This paper organizations are also looking for clinical tests which can be
proposes a novel deep learning architecture using a 1D performed without invasion at a cheap rate. The development
convolutional neural network for classification between of a computer-based decision support system for the diagnosis
healthy and non-healthy persons to overcome the of various diseases can help organizations cater to the need of
limitations of classical approaches. Various clinical millions of people around the world.
parameters are used for assessing the risk profile in the The rapid growth of machine learning and deep
patients which helps in early diagnosis. Various techniques learning algorithms has helped research in various industries
are used to avoid overfitting in the proposed network. The including medical. The availability of large-scale medical
proposed network achieves over 97% training accuracy diagnosis data has helped in training these algorithms. The
and 96% test accuracy on the dataset. The accuracy of the clinical support system can be developed using these
model is compared in detail with other classification algorithms which helps in reducing cost and increasing
algorithms using various performance parameters which accuracy [3].
proves the effectiveness of the proposed architecture. Various clinical features can be utilized by machine
learning algorithms for categorizing the risk profile of the
Keywords— Heart Disease Prediction, Healthcare, Deep patients. There are certain features like age, sex, heredity
Learning, 1D Convolutional Neural Network, Embedding which are not in control while features like blood pressure,
Layer, Overfitting smoking, drinking habits are in control of the patient [2]. The
proposed algorithm uses a combination of these features for
I. INTRODUCTION categorizing healthy and non-healthy patients.
The remainder of the paper is organized as follows: The
There has been considerable research in the field of existing methods of heart disease classification using machine
healthcare in the last few years particularly after the Covid learning solutions are discussed in Section II. The explanation
pandemic. It has been observed that heart diseases are one of of the proposed architecture is explained in Section-III. The
the deadliest diseases which cause maximum deaths of human implementation details and results are discussed in Section-IV.
lives in the world according to the world health organization
[1]. It is also observed that more than 24% of the deaths in II. LITERATURE SURVEY
India are due to various forms of heart disease [2]. So there is
There has been a lot of research in developing a heart network with dropout to avoid overfitting. It uses the
disease diagnosis system for early detection using various Cleveland database [12] which has 13 features for classifying
clinical parameters. Various Classification algorithms like between healthy and non-healthy patients. The other
Logistic Regression, Support Vector Machine, Decision Tree, classification algorithms are also implemented for verifying
Random Forest, Artificial Neural Network, etc are being used the performance of the proposed architecture using
for classifying patients. This section summarizes those well-known performance measuring parameters.
implementations. A detailed explanation of the proposed architecture with
S. Radhimeenakshi [4] proposed a Decision Tree and the algorithms and techniques used in the next section.
Support Vector Machine for heart disease classification. He
concluded that the decision tree classifier performs better than III. PROPOSED ARCHITECTURE
SVM in terms of accuracy measured using a confusion matrix.
R.W.Jones et al [5] proposed a heart disease prediction This section describes the proposed architecture and all its
technique using an artificial neural network. They used a constituent layers in detail along with the techniques used to
self-applied questionnaire for training the neural network. The optimize the architecture. It also gives some theoretical
neural network contained three hidden layers and was trained background about the 1-D convolutional neural network
using a backpropagation algorithm. The architecture was (CNN) which is central to the proposed architecture.
validated using the Dundee rank factor score and achieved a Conventional 2D CNN has become very popular in
98% relative operating characteristic value on the dataset. pattern recognition problems like Image classification and
Ankita Dewan et al. [6] compared the performance of object detection [13]. CNNs are similar to ANN in which they
genetic algorithms and backpropagation for training the neural consist of self-optimizing neurons which are trained to
network architecture. They concluded backpropagation perform a certain task. This has led to the development of 1-D
algorithms perform better with a very minimum error on the CNN which can operate on one-dimensional dataset or Time
dataset. SY Huang et al. [7] proposed a learning vector series data [13]. The proposed architecture using this concept
quantization algorithm for training the artificial neural of 1D CNN is shown in figure 1 below.
network. They used 13 clinical features for training the
network and achieved almost 80% accuracy on the dataset.
Jayshril S. Sonawane et al. [8] proposed a new artificial
neural architecture that can be trained using a vector
quantization algorithm with random order incremental
training. They also used 13 clinical features for training and
achieved 85.55% accuracy on the dataset. Majid Ghonji
Feshki et al. [9] used four different classification algorithms
which include C4.5, Multilayer Perceptron, Sequential
Minimal Optimization, and feed-forward backpropagation. Figure 1: Proposed 1-D CNN Architecture
They concluded that the PSO algorithm with neural networks
achieved the best accuracy of around 91.94% on the dataset. The input to the architecture will be the 13 features
R. R. Manza et al. [10] proposed an Artificial Neural that are important in the classification of heart disease. These
Network with many numbers of Radial Basis Function features are converted to a new representation called word
neurons in the hidden layer. They obtained around 97% embedding by the layer called as Embedding Layer. It is
accuracy on this architecture. Saba Bashir et al. [2, 10] similar to the Bag of Words concept used for Text data. It
proposed a hybrid model for heart disease prediction which helps in a better representation of the dataset according to
uses a combination of decision tree, SVM, and Naïve Bayes unique values present in each of the features. The output of the
algorithms. They achieved 74% sensitivity, 82% accuracy, and Embedding layer is given to the 1D CNN layer for feature
93% specificity. extraction.
P. Ramprakash et al. [1] proposed a deep neural 1D CNN is very similar to conventional 2D CNN but
network and χ2 statistical model for feature selection. They the convolution operation is only applied to the one dimension
used various techniques to avoid overfitting and underfitting. which results in shallow architecture which can be easily
They achieved 94% accuracy, 93% sensitivity, and 93% trained on normal CPU or even embedded development
specificity. Turay Karayilan et al. [2] studied the performance boards [13]. The convolution operation helps in finding useful
of artificial neural networks with the various number of hidden hierarchical features from the dataset which are useful in
layers. They achieved around 95.55% accuracy using five classification. The dimensions of the output features after 1D
hidden layers. CNN can be calculated using the equation given below:
It can be observed that most of the proposed systems use
𝑤+2𝑝−𝑓
Artificial Neural Networks with some modifications. It is 𝑥= 𝑠
+1 (1)
observed that these architectures are prone to overfitting so
perform poorly on new data. So this paper proposes a new Where x is the dimension of output features and w is
architecture using a one-dimensional convolutional neural the size of input features. f indicates the size of the filter used
for convolutions. ‘p’ indicates padding which are values added that training accuracy was very high and validation accuracy
on the boundary before applying convolution. ‘s’ indicates was low. The dropout technique was introduced to remove
stride which is the value travelled after applying convolution overfitting. It removes random neurons with a certain
operation. probability during training which allows the different
The 1D convolution operation is a linear operation networks to be trained at every iteration. This will help in the
that is not useful in classifying nonlinear data. Most of the network not being too dependent on any single neuron of the
real-world dataset is nonlinear which requires some nonlinear network. The dropout layer has been introduced after each
operation after convolution. This nonlinear function is called trainable layer in the proposed architecture. The addition of
an activation function. Sigmoid, hyperbolic tangent and the dropout layer helped the training and test accuracy to be
rectified linear unit (RelU) are some of the widely used very similar which points to the network adapting well to data
activation functions. The proposed architecture uses the RelU that it has not seen.
activation function which is easy to compute and allows faster The next section describes the implementation details and
computation. It also does not suffer from vanishing or results obtained after training the proposed architecture.
exploding gradient problems.
There can be multiple convolution layers in the IV. IMPLEMENTATION AND RESULTS
architecture followed by an activation function. The proposed
architecture uses two 1-D convolution layers with 128 filters The proposed architecture for heart disease prediction
and filter sizes of 3. The output of the final convolution layer has been implemented using the scikit-learn and Keras library
is passed through the global max-pooling layer which pools which allows the implementation of various machine learning
the maximum value from all the channels and reduces the and deep learning algorithm. The system used for
dimension of output. The output of pooling is passed through development contains an intel i5 CPU and 8GB RAM. It also
the fully connected layer with 256 neurons which extracts the has GeForce 940 GPU which helps in training the architecture
useful features for classification. This layer is similar to the faster.
hidden layer is ANN. The final layer contains a single neuron The paper uses the Cleveland database [12] which
which gives the classification probability. The final layer uses has 303 samples of patients with 14 different features. The
the sigmoid activation function as it directly gives the dataset is divided into two parts. 80% is used for training and
probability for binary classification. the remaining 20% is used for validation. The features used
The layer-wise details along with output feature for classification in the dataset are explained in Table 2.
dimensions and the number of trainable parameters are shown
in Table 1. Table 2: Dataset details
Sr Feature Value Range
Table 1: Layerwise CNN Architecture No.
Layer (type) Output Shape No. of 1 Age of Patient 29-77
Parameters 2 Gender 1 = Male
Embedding_1 (None, 13, 300) 45600 0 = Female
(Embedding) 3 Category of Chest Pain 0 = Atypical
dropout_1 (Dropout) (None, 13, 300) 0 Angina
1 = typical
Angina
Conv1d_1 (Conv1D) (None, 13, 64) 57664
2 = Asymptotic
dropout_2 (Dropout) (None, 13, 64) 0 3 = Non Angina
4 Blood Pressure 94-200
Conv1d_2 (Conv1D) (None, 13, 64) 12352 5 Serum Cholesterol Level 126-564
6 Fasting Blood Suger 0 if < 120
Global_max_pooling1d_1 (None, 64) 0 1 if >= 120
(GlobalMaxPooling1) 7 Resting ECG result 0 = Normal
dense_1 (Dense) (None, 256) 16640 1 = ST-T wave
dense_2 (Dense) (None, 1) 257 abnormalities
Total parameters: 132,513 2 = left
ventricular
Trainable parameters: 132,513 Hypertrophy
Non-trainable parameters: 0 8 Heart rate 71-202
9 Exercise-induced Angina 0 = No
The proposed 1D CNN architecture contains around 1 = Yes
0.13 million trainable parameters which will get adapted 10 ST depression due to 0 – 6.2
during the training of the network. It was observed that exercise-related to rest
general CNN architecture overfitted the training data meaning 11 The slope of the peak 0= un sloping
exercise ST segment 1=flat Where mt is the momentum term at timestamp ‘t’, β1 is
2=down sloping constant which is taken as 0.9 and gt is the gradient at
12 Count of major 0-3 timestamp ‘t’. Exponentially decaying averages of past
vessels colored by squared gradients is calculated by:
Fluoroscopy 2
13 Thallium Scan 3=normal
(
𝑣𝑡 = β2𝑣𝑡−1 + 1 − β2 𝑔𝑡 ) (3)
6=fixed Where vt is the velocity term at timestamp ‘t’, β2 is constant
7=reversible which is taken as 0.99 and gt is the gradient at timestamp ‘t’.
effect
^ 𝑚𝑡 ^ 𝑣𝑡
14 Heart Disease 0 = No Bias correction 𝑚𝑡 = 𝑡 and 𝑣𝑡 = 𝑡 and then, update
1 = Yes 1−β1 1−β2
parameters using Adam’s update rule:
Some of the attributes have missing values for some η ^
of the examples. Those values have been replaced with the θ𝑡+1 = θ𝑡 − ^
𝑚𝑡 (4)
𝑣𝑡+ϵ
mean value of that attribute for training our architecture. Most
of the traditional classification architectures require all the Where ε is constant with a very small value which avoids
attributes in the same range. This dataset has attributes in division by zero and θt is a parameter value at timestamp ‘t’.
different ranges so a standardization technique is applied
which converts all the attributes into the same range. It The training and test accuracy after each epoch is
subtracts all the attribute values with the mean value of the shown in figure 3 below:
attribute and divides by the standard deviation of the attribute.
The final attribute is the true label for the patient whether
he/she has heart disease or not. The dataset is a little
unbalanced in the sense that there are more negative examples
compared to positive as shown in figure 2 below.
Figure 3: Accuracy after each epoch using Adam

Optimizer without Dropout
The proposed architecture achieves 98.9% accuracy

on the training set and 90.32% on the test set. There is a large
Figure 2: Distribution of labels in the dataset gap between training and test accuracies which indicates
overfitting in the architecture. Dropout layers are added after
As can be seen in figure 2, there are 163 negative every trainable layer in the architecture with a probability of
examples and 140 positive examples in the dataset. The first neurons being removed is 0.3. The train and test accuracy at
13 attributes are used as input features for classification. The every epoch for the modified architecture is shown in figure 4.
proposed architecture is trained for 150 epochs with a batch
size of 32. The binary cross-entropy function is used as a loss
function to calculate the loss between the true value and the
predicted value. This function has to be minimized using some
optimization algorithm to achieve convergence. The Adam
optimization algorithm is used for training as it provides faster
convergence and does not zigzag around the local minima
[14].
ADAM uses exponentially weighted gradients as well as
exponentially weighted square gradients for updating the
weights at each iteration.
Exponentially decaying averages of past gradients is
calculated by:
(
𝑚𝑡 = β1𝑚𝑡−1 + 1 − β1 𝑔𝑡 ) (2)
The proposed architecture achieves 97.79% accuracy
on the training set and 96.77% on the test set. Some other
well-known classification algorithms are also implemented for
comparing the performance of the proposed architecture. The
detailed comparison table is shown in Table 3 below:
Figure 4: Accuracy after each epoch using Adam

Optimizer
Table 3: Performance Comparison of Proposed Architecture with other Classifiers

Algorithm Training Accuracy Test Accuracy Precision Recall F1 Score AUC
Logistic Regression 86.36 80.32 85 65 73.5 78.0
Naïve Bayes 86.77 78.68 78.26 69.23 73.46 77.47
SVM 92.56 80.32 85 65.3 73.9 78.4
Decision Tree 100 77.04 73.07 73.07 73.07 76.53
Random Forest 99.17 77.04 77.23 65.38 70.83 75.54
LightGBM 99.58 77.04 83.33 57.69 68.18 74.56
XGBoost 100 78.68 84.21 61.53 71.11 76.48
Artificial Neural 88 78.68 78.26 69.23 73.46 77.47
Network
Proposed 97.79 96.77 94.73 100 97.29 96.15
Architecture (1D
CNN)
The accuracy is measured in terms of the ratio of the actual positive examples are correctly identified. There is
total number of correctly classified examples to total always a tradeoff between precision and recall so a new
examples. It can be from the table that the proposed performance measuring parameter F1 score is introduced. F1
architecture is the best performing architecture in terms of test score is a harmonic mean of precision and recall which gives a
accuracy. Some other architectures perform well on the balance value between precision and recall.
Training set but perform very poorly on the test set. The The last parameter AUC is a measure of the area
accuracy can alternatively be represented as a confusion under the receiver operating characteristic curve. The ROC
matrix. It has a number of correctly classified examples in the curve is shown in Figure 5.
diagonal and wrongly classified examples elsewhere. The
confusion matrix for the given architecture is shown below in
Table 4.
Table 4: Confusion Matrix
Predicted Class
0 1
True
0 24 2
Class
1 0 36
When the dataset is unbalanced then sometimes

accuracy does not give a correct idea about the performance
measure of the architecture. So, the performance is also
measured in terms of other performance measuring parameters
like precision, recall, F1 Score, and area under the ROC curve Figure 5: ROC curve
(AUC).
Precision is an indication of how many positive The ROC curve is the plot between false positive rate
predictions are correct whereas recall identifies how many and true positive rate. The proposed architecture has a ROC
curve that is very near to the ideal curve which indicates the classification. In 2015 2nd International Conference on
good performance of the architecture on the test set. Computing for Sustainable Global Development
The proposed architecture performs well in terms of (INDIACom) (pp. 704-706). IEEE.
all of these performance parameters. The proposed [7] Chen, A.H., Huang, S.Y., Hong, P.S., Cheng, C.H. and Lin,
architecture is also verified on new data which is not available E.J., 2011, September. HDPS: Heart disease prediction system.
on either train or validation set. It achieves good performance In 2011 computing in cardiology (pp. 557-560). IEEE.
on new data as well. The statistical importance of each feature [8] Sonawane, J.S. and Patil, D.R., 2014, March. Prediction of
in classification is also observed. heart disease using learning vector quantization algorithm.
In 2014 Conference on IT in Business, Industry, and
V. CONCLUSION Government (CSIBIG) (pp. 1-5). IEEE.
[9] Feshki, M.G. and Shijani, O.S., 2016, April. Improving the
The goal of this paper is early diagnosis of heart heart disease diagnosis by evolutionary algorithm of PSO and
disease using a computer-assisted system. This paper proposes Feed Forward Neural Network. In 2016 Artificial Intelligence
a 1D convolutional neural network architecture for predicting and Robotics (IRANOPEN) (pp. 48-53). IEEE.
heart disease. It also contains an Embedding layer which [10] Hannan, S.A., Mane, A.V., Manza, R.R. and Ramteke,
converts the feature vector into new vector embedding which R.J., 2010, December. Prediction of heart disease medical
helps in classification. The proposed architecture is prescription using radial basis function. In 2010 IEEE
implemented as a software system on a computer that can help International Conference on Computational Intelligence and
in the early diagnosis of heart disease at a cheap cost and with Computing Research (pp. 1-6). IEEE.
high accuracy. The architecture uses overfitting avoidance [11] Bashir, S., Qamar, U. and Javed, M.Y., 2014, November.
techniques which help the performance of unseen data. The An ensemble-based decision support framework for intelligent
performance of 1D CNN architecture is best among all other heart disease diagnosis. In International conference on
classification algorithms like Logistic Regression, Naïve Bays, information society (i-Society 2014) (pp. 259-264). IEEE.
SVM, Decision Tree, Random Forest, LightGBM, XGBoost, [12] Cleaveland heart disease Dataset. Available:
and ANN. More and more parameters can be included in the http://archive.ics.uci.edu/ml/datasets/heart+Disease
system which can help in classifying heart disease more Principle Investigator responsible for data collection: V.A.
accurately. It can also be integrated with wearable sensor Medical Center, Long Beach, and Cleveland Clinic
readings for real-time prediction of heart diseases. Foundation: Robert Detrano, M.D., Ph.D.
[13] Mamun, M.M.R.K. and Alouani, A., 2020. FA-1D-CNN
REFERENCES Implementation to Improve Diagnosis of Heart Disease Risk
Level. In 6th World Congress on Engineering and Computer
[1] “New initiative launched to tackle cardiovascular disease,
Systems and Sciences (pp. 122-1).
the world's number one killer,” World Health Organization.
[14] Kingma, D.P. and Ba, J., 2014. Adam: A method for
[Online].
stochastic optimization. arXiv preprint arXiv:1412.6980.
Available:http://www.who.int/cardiovascular_diseases/global-
hearts/Global_hearts_initiative/en/.
[2] Ramprakash, P., Sarumathi, R., Mowriya, R. and
Nithyavishnupriya, S., 2020, February. Heart Disease
Prediction Using Deep Neural Network. In 2020 International
Conference on Inventive Computation Technologies (ICICT)
(pp. 666-670). IEEE.
[3] Karayılan, T. and Kılıç, Ö., 2017, October. Prediction of
heart disease using neural network. In 2017 International
Conference on Computer Science and Engineering (UBMK)
(pp. 719-723). IEEE.
[4] Radhimeenakshi, S., 2016, March. Classification and
prediction of heart disease risk using data mining techniques
of Support Vector Machine and Artificial Neural Network.
In 2016 3rd International Conference on Computing for
Sustainable Global Development (INDIACom) (pp.
3107-3111). IEEE.
[5] Shen, Z., Clarke, M., Jones, R.W. and Alberti, T., 1993,
October. Detecting the risk factors of coronary heart disease
by use of neural networks. In Proceedings of the 15th Annual
International Conference of the IEEE Engineering in Medicine
and Biology Societ (pp. 277-278). IEEE.
[6] Dewan, A. and Sharma, M., 2015, March. Prediction of
heart disease using a hybrid technique in data mining

Novel Deep Learning Architecture For Heart Disease Prediction Using Convolutional Neural Network

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Novel Deep Learning Architecture For Heart Disease Prediction Using Convolutional Neural Network

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Novel Deep Learning Architecture For Heart Disease Prediction Using Convolutional Neural Network

Uploaded by

Copyright:

Available Formats

Novel Deep Learning Architecture for Heart

Disease Prediction using Convolutional

Figure 3: Accuracy after each epoch using Adam

The proposed architecture achieves 98.9% accuracy

Figure 4: Accuracy after each epoch using Adam

Table 3: Performance Comparison of Proposed Architecture with other Classifiers

When the dataset is unbalanced then sometimes

You might also like