IC3E 2018 Paper 181

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/329484705
Machine Learning Algorithms for Anemia Disease Prediction: Select

Proceedings of IC3E 2018
Chapter · January 2019

DOI: 10.1007/978-981-13-2685-1_44
CITATIONS READS
17 4,520
3 authors, including:
Manish Jaiswal T.J. Siddiqui

University of Allahabad University of Allahabad
6 PUBLICATIONS 18 CITATIONS 53 PUBLICATIONS 571 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Speech, Image, and Language Processing for Human Computer Interaction: Multi-Modal Advancements View project
recommender system View project
All content following this page was uploaded by Manish Jaiswal on 01 April 2019.
The user has requested enhancement of the downloaded file.

Machine Learning Algorithms for Anemia disease
Prediction
Manish Jaiswal1, Anima Srivastava2, and Tanveer J. Siddiqui3

1
Department of Electronics & Communication, University of Allahabad,
manish.jk50@gmail.com
2,3
Department of Electronics & Communication, University of Allahabad
{animasparklestar,siddiqui.tanveer}@gmail.com
Abstract. The remarkable advances in health industry have led to a significant

production of data in everyday life. This data requires processing to extract use-
ful information, which can be useful for analysis, prediction, recommendations,
and decision-making. Data mining and machine learning techniques are used to
transform the available data into valuable information. In medical science, dis-
ease prediction at the right time is the central problem for professionals for pre-
vention and effective treatment plan. Sometimes, in absence of accuracy this
may lead to death. In this study, we investigate supervised machine learning
algorithms - Naive Bayes, random forest, and decision tree algorithm - for pre-
diction of anemia using CBC (complete blood count) data collected from pa-
thology centers. The results show that Naive-Bayes technique outperforms in
terms of accuracy as compared to C4.5 and random forest.
Keywords: Anemia, classification algorithms, Decision Making, Complete

Blood Count (CBC)
1 Introduction
The modern health care system generates huge volumes of data every day. There is
a need to mine and analyze these data to extract useful information and to reveal hid-
den pattern. Data mining is the process of discovering new patterns from data collect-
ed from varying sources. A number of machine learning algorithms have been used
successfully in making prediction in various domains such as healthcare, weather
forecasting, stock price prediction, product recommendation. An important aspect of
medical science research is the prediction of various diseases and factors that cause
them. In medical domain, healthcare data are being used to predict epidemics, to de-
tect disease, to improve quality of life and avoid early deaths [1]. In this work, we
investigate three different classification algorithms for its prediction.
Anemia is defined as decrease in amount of red blood cells (RBCs)
or hemoglobin in the blood [2] that has significant adverse health consequences, as
well as adverse impacts on economic and social development. Although the most
reliable indicator of anemia is blood hemoglobin concentration, but there are a num-
ber of factors that can cause anemia such as iron deficiency, chronic infections such
as HIV, malaria, and tuberculosis, vitamin deficiencies, e.g. vitamins B12 and A,
cancer, and acquired disorders that affect red blood cell production and hemoglobin
synthesis.
Anemia causes fatigue and low productivity [3, 4, 5] and, when it occurs in preg-
nancy, may be associated with increased risk of maternal and perinatal mortality [6,
7]. According to World health organization (WHO), maternal and neonatal mortality
were responsible for 3.0 million deaths in 2013 in developing countries.
Anemia disease prediction plays a most important role in order to detect other as-
sociated diseases. Anemia disease is classified on the basis of morphology or on the
basis of its underlying cause (Figure 1).
Based on the morphology, anemia is divided into three types, which are normo-
cytic, microcytic and macrocytic. Based on cause, anemia is classifies into three
types namely blood loss, inadequate production of normal blood and excessive de-
struction of blood cells.
Fig. 1. Classification of Anemia
In this paper, we attempt to investigate the performance of Naive-Bayes, random

forest, and decision tree algorithm for anemia disease prediction on dataset collected
from local pathology centers. The need of this investigation arises from the fact that
the underlying cause of the disease varies from one region to another. Although, ran-
dom forest classifier has been earlier investigated for predicting heart and chronic
kidney disease but to the best of our knowledge it has not been investigated for ane-
mia disease prediction. This adds novelty to the work.
The rest of the paper is organized as follows:

Section 2 introduces briefly reviews existing related work. In section 3, we discuss
various types of anemia diagnosis tests. Section 4 presents proposed methodology.
Section 5 presents experimental details and discussion. Finally, we conclude in sec-
tion 6.
2 Related Works
In the last decade, numerous data mining and machine learning techniques have been
used for anemia disease. Most noted once are the following:
In [8], SMO support vector machine and C4.5 decision tree algorithm has been
used for the prediction of anemia and a performance comparison of the two algo-
rithms is done.
In [11], WEKA is used to get a suitable classifier for developing a mobile App,
which can predict and diagnose Hematological data comments. The authors com-
pared neural network classification algorithms with J48 and Naïve Bayes classifier.
The results show that J48 classifier exhibits maximum accuracy.
Dogan & Turkoglu [12] developed a decision support system for detecting Iron-
Deficiency Anemia using decision tree algorithm. The algorithm uses three hematol-
ogy parameters, Serum iron, Serum iron-binding capacity and Ferritin. The evaluation
is done on Data of 96 patients and the results were successfully matched with Physi-
cian’s decision.
Abdullah and Al-asmari[13] experimented with WEKA algorithms: Naive-Bayes,
Multilayer Perception, J48 and SMO in an attempt to predict anemia types using CBC
reports. The evaluation was done on real data constructed from CBC reports of 41
anemic persons. Similar to [11], J48 decision tree algorithm along with SMO was the
best performer with an accuracy of 93.75%.
Unlike the work in [11] and [13], we have chosen a different set of classifier and local
data in our work.
3 Diagnostic tests Classification
There are four main tests that are ordered to diagnose anemia disorder which are
complete blood count (CBC), ferritin, PCR (Polymerase chain reaction ) and hemo-
globin electrophoresis.
 CBC test is the most frequently blood test to measure overall health and determine
a wide range of diseases [8] including anemia, infection and leukemia. A com-
plete blood count test measures almost 15 tests including: hemoglobin (Hb), Red
blood cells (RBC), hematocrit (HCT), mean corpuscular hemoglobin (MCH), mean
corpuscular volume (MCV), and so on [8].
 A ferritin test measures the amount of iron store in the body. High levels
of ferritin indicate an iron storage disorder, such as hemochromatosis. Low levels
of ferritin indicate iron deficiency, which causes anemia.
 PCR test is a molecular test, which is used to diagnose genetic disorder.
 A hemoglobin electrophoresis test is a blood test used to measure and identify the
different types of hemoglobin in the bloodstream.
4 Methodology
We have used three classifiers namely Random forest, Naive – Bayes and Decision
tree C4.5 algorithm. Figure 2 depicts the flowchart of the proposed method.
4.1 Random Forest Algorithm
Random forest (RF) algorithm derives from decision tree classifier. It is a combina-
tion of tree predictors which aggregates the results of all the trees in the collection
and uses majority voting in prediction.
4.2 Decision Tree Algorithm

A decision tree is a tree in which each branch node represents a choice between a
number of alternatives, and each leaf node represents a decision. It has been exten-
sively used in various fields [9] [10]. C4.5 (J48 in WEKA) is a Decision Tree devel-
oped by Ross Quinlan.
4.3 Naive – Bayes Algorithm

Naive – Bayes Algorithm is based on Bayes rule of conditional probability. It uses all
the attributes contained in the data and analyzes them individually as though they are
equally important and independent of each other. It requires very less amount of train-
ing data.
Data Collection
Pre-processing
Classifier Learning
Anemia Disease Prediction
Performance Evaluation
Fig. 2. Flowchart of Proposed Model

5 Experimental Results and Discussion
5.1 Dataset
We collect data from different pathology centre and laboratory test centers in nearby
area. The collected dataset consists of 200 test samples. These are CBC test data. The
dataset contains 18 attributes out of which we have selected only those, which are
required for anemia disease detection. These are Age, Gender, MCV, HCT, HGB,
MCHC and RDW
5.2 Experimental setup
The proposed method uses CBC test values. First, the data is pre-processed to extract
the seven attributes as mentioned in 5.1. Then, we apply the random forest, decision-
tree and NB classifier on it. The performance evaluation is done in terms of accuracy
and mean absolute error (MAE). The mean absolute error (MAE) is measures how
close the predictions are to the eventual outcomes. Table 1 shows the results of the
three classifiers. Ten Fold cross validation has been used to obtain accuracy.
Table 1. Comparision of algorithms
Random Forest Naive- Bayes C4.5

Mean- Absolute Error 0.0332 0.0333 0.0347
Accuracy 95.3241 96.0909 95.4602
The comparative performance of each classification algorithm based on accuracy and

MAE is shown in Figure 3 and Figure 4. The Naive-Bayes classifier exhibits best
performance on our dataset, which is unlike [11] and [13]. It is not surprising because
the dataset being used in these works are different and the cause of disease in different
countries might be different. We achieve a maximum accuracy of 96.09% with NB
classifier which is better than the best performing classifiers- SMO and J48 with an
accuracy of 93.75% - reported in [13].
Mean Abosute Error
0.035
0.0345
0.034
0.0335
0.033 (MAE)
0.0325
0.032
Random C4.5 Naïve- Bayes
Forest
Fig. 3. MAE using each Algorithm
Accuracy
96.5
96
95.5
(Accuracy)
95
94.5
Random Forest Naïve- Bayes C4.5
Fig. 4. Comparison of Accuracy using each Algorithm
6. Conclusion & Future Work
In this paper, we have compared the performance of three different classifiers in the
prediction of anemia disease. The experimental result on a sample dataset suggests
that Naive- Bayes classification algorithm provides best performance in terms of ac-
curacy as compared to C4.5 and Random forest. Automatic prediction can reduce
manual effort involved in diagnosis. In future, automated tools can be developed
which can helps the prediction results to suggest further diagnosis. Such automated
tools can prove valuable in timely detection of more serious disease. Furthermore,
such disease prediction system can extended to recommend a treatment plan.
Reference
1. Arun, V, et al.: Privacy of Health Information in Telemedicine on Private Cloud, Interna-
tional Journal of Family Medicine & Medical Science Research. (2015)
2. Provenzano, R., Lerma, E.V., & Szczech, L.: Management of Anemia. Springer.(2018)
3. Ezzati, M., Lopez, Ad., Rodgers, A., Murray, C.J.L.: Comparative quantification of health
risks: global and regional burden of disease attributable to selected major risk factors. Ge-
neva: World Health Organization. (2004)
4. Balarajan, Y., et al.: Anaemia in low-income and middle-income countries. (2011)
5. Haas, J.D., Brownlie, T.: Iron deficiency and reduced work capacity: A critical review of
the research to determine a causal relationship. J Nutr. (2001)
6. Kozuki, N., Lee, A.C., Katz, J.: Child Health Epidemiology Reference Group. Moderate to
severe, but not mild, maternal anemia is associated with increased risk of small-for-
gestational-age outcomes. J Nutr. (2012)
7. Steer, P.J.: Maternal hemoglobin concentration and birth weight. Clin Nutr. (2000)
8. Shilpa A. Sanap, Meghana Nagori, Vivek Kshirsaga.: Classification of Anemia Using Data
Mining Techniques.: Swarm, Evolutionary, and Memetic Computing pp 113-121. Springer
(2011).
9. Jerez-Aragonés J.M. et al.: A combined neural network and decision trees model for prog-
nosis of breast cancer relapse. Artif Intell Med. (2003) pp 45–63.
10. Podgorelec, V. et al.: Decision trees: an overview and their use in medicine. J Med Syst.
(2002) pp: 445–463
11. N. Amin and A. Habib Comparison of different classification techniques using WEKA for
hematological data, American Journal of Engineering Research, Volume-4, Issue-3, pp-55-
61 (2015)
12. Dogan, S., Turkoglu, I.: Iron deficiency anemia detection from hematology parameters by
using decision tree. International journal of Science and technology. (2008) pp: 85-92.
13. Manal Abdullah and Salma Al-Asmari, Anemia types prediction based on data mining
classification algorithms, Communication, Management and Information Technology,
(2016) Taylor & Francis Group, London,
View publication stats

IC3E 2018 Paper 181

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

IC3E 2018 Paper 181

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IC3E 2018 Paper 181

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Machine Learning Algorithms for Anemia Disease Prediction: Select

Chapter · January 2019

Manish Jaiswal T.J. Siddiqui

SEE PROFILE SEE PROFILE

recommender system View project

The user has requested enhancement of the downloaded file.

Manish Jaiswal1, Anima Srivastava2, and Tanveer J. Siddiqui3

Abstract. The remarkable advances in health industry have led to a significant

Keywords: Anemia, classification algorithms, Decision Making, Complete

Fig. 1. Classification of Anemia

In this paper, we attempt to investigate the performance of Naive-Bayes, random

The rest of the paper is organized as follows:

3 Diagnostic tests Classification

4.1 Random Forest Algorithm

4.2 Decision Tree Algorithm

4.3 Naive – Bayes Algorithm

Anemia Disease Prediction

Fig. 2. Flowchart of Proposed Model

5.2 Experimental setup

Table 1. Comparision of algorithms

Random Forest Naive- Bayes C4.5

The comparative performance of each classification algorithm based on accuracy and

Fig. 3. MAE using each Algorithm

Fig. 4. Comparison of Accuracy using each Algorithm

6. Conclusion & Future Work

View publication stats

You might also like