IC3E 2018 Paper 181
IC3E 2018 Paper 181
IC3E 2018 Paper 181
net/publication/329484705
CITATIONS READS
17 4,520
3 authors, including:
Some of the authors of this publication are also working on these related projects:
Speech, Image, and Language Processing for Human Computer Interaction: Multi-Modal Advancements View project
All content following this page was uploaded by Manish Jaiswal on 01 April 2019.
1 Introduction
The modern health care system generates huge volumes of data every day. There is
a need to mine and analyze these data to extract useful information and to reveal hid-
den pattern. Data mining is the process of discovering new patterns from data collect-
ed from varying sources. A number of machine learning algorithms have been used
successfully in making prediction in various domains such as healthcare, weather
forecasting, stock price prediction, product recommendation. An important aspect of
medical science research is the prediction of various diseases and factors that cause
them. In medical domain, healthcare data are being used to predict epidemics, to de-
tect disease, to improve quality of life and avoid early deaths [1]. In this work, we
investigate three different classification algorithms for its prediction.
Anemia is defined as decrease in amount of red blood cells (RBCs)
or hemoglobin in the blood [2] that has significant adverse health consequences, as
well as adverse impacts on economic and social development. Although the most
reliable indicator of anemia is blood hemoglobin concentration, but there are a num-
ber of factors that can cause anemia such as iron deficiency, chronic infections such
as HIV, malaria, and tuberculosis, vitamin deficiencies, e.g. vitamins B12 and A,
cancer, and acquired disorders that affect red blood cell production and hemoglobin
synthesis.
Anemia causes fatigue and low productivity [3, 4, 5] and, when it occurs in preg-
nancy, may be associated with increased risk of maternal and perinatal mortality [6,
7]. According to World health organization (WHO), maternal and neonatal mortality
were responsible for 3.0 million deaths in 2013 in developing countries.
Anemia disease prediction plays a most important role in order to detect other as-
sociated diseases. Anemia disease is classified on the basis of morphology or on the
basis of its underlying cause (Figure 1).
Based on the morphology, anemia is divided into three types, which are normo-
cytic, microcytic and macrocytic. Based on cause, anemia is classifies into three
types namely blood loss, inadequate production of normal blood and excessive de-
struction of blood cells.
In the last decade, numerous data mining and machine learning techniques have been
used for anemia disease. Most noted once are the following:
In [8], SMO support vector machine and C4.5 decision tree algorithm has been
used for the prediction of anemia and a performance comparison of the two algo-
rithms is done.
In [11], WEKA is used to get a suitable classifier for developing a mobile App,
which can predict and diagnose Hematological data comments. The authors com-
pared neural network classification algorithms with J48 and Naïve Bayes classifier.
The results show that J48 classifier exhibits maximum accuracy.
Dogan & Turkoglu [12] developed a decision support system for detecting Iron-
Deficiency Anemia using decision tree algorithm. The algorithm uses three hematol-
ogy parameters, Serum iron, Serum iron-binding capacity and Ferritin. The evaluation
is done on Data of 96 patients and the results were successfully matched with Physi-
cian’s decision.
Abdullah and Al-asmari[13] experimented with WEKA algorithms: Naive-Bayes,
Multilayer Perception, J48 and SMO in an attempt to predict anemia types using CBC
reports. The evaluation was done on real data constructed from CBC reports of 41
anemic persons. Similar to [11], J48 decision tree algorithm along with SMO was the
best performer with an accuracy of 93.75%.
Unlike the work in [11] and [13], we have chosen a different set of classifier and local
data in our work.
There are four main tests that are ordered to diagnose anemia disorder which are
complete blood count (CBC), ferritin, PCR (Polymerase chain reaction ) and hemo-
globin electrophoresis.
CBC test is the most frequently blood test to measure overall health and determine
a wide range of diseases [8] including anemia, infection and leukemia. A com-
plete blood count test measures almost 15 tests including: hemoglobin (Hb), Red
blood cells (RBC), hematocrit (HCT), mean corpuscular hemoglobin (MCH), mean
corpuscular volume (MCV), and so on [8].
A ferritin test measures the amount of iron store in the body. High levels
of ferritin indicate an iron storage disorder, such as hemochromatosis. Low levels
of ferritin indicate iron deficiency, which causes anemia.
PCR test is a molecular test, which is used to diagnose genetic disorder.
A hemoglobin electrophoresis test is a blood test used to measure and identify the
different types of hemoglobin in the bloodstream.
4 Methodology
We have used three classifiers namely Random forest, Naive – Bayes and Decision
tree C4.5 algorithm. Figure 2 depicts the flowchart of the proposed method.
Random forest (RF) algorithm derives from decision tree classifier. It is a combina-
tion of tree predictors which aggregates the results of all the trees in the collection
and uses majority voting in prediction.
Data Collection
Pre-processing
Classifier Learning
Performance Evaluation
5.1 Dataset
We collect data from different pathology centre and laboratory test centers in nearby
area. The collected dataset consists of 200 test samples. These are CBC test data. The
dataset contains 18 attributes out of which we have selected only those, which are
required for anemia disease detection. These are Age, Gender, MCV, HCT, HGB,
MCHC and RDW
The proposed method uses CBC test values. First, the data is pre-processed to extract
the seven attributes as mentioned in 5.1. Then, we apply the random forest, decision-
tree and NB classifier on it. The performance evaluation is done in terms of accuracy
and mean absolute error (MAE). The mean absolute error (MAE) is measures how
close the predictions are to the eventual outcomes. Table 1 shows the results of the
three classifiers. Ten Fold cross validation has been used to obtain accuracy.
Accuracy
96.5
96
95.5
(Accuracy)
95
94.5
Random Forest Naïve- Bayes C4.5
In this paper, we have compared the performance of three different classifiers in the
prediction of anemia disease. The experimental result on a sample dataset suggests
that Naive- Bayes classification algorithm provides best performance in terms of ac-
curacy as compared to C4.5 and Random forest. Automatic prediction can reduce
manual effort involved in diagnosis. In future, automated tools can be developed
which can helps the prediction results to suggest further diagnosis. Such automated
tools can prove valuable in timely detection of more serious disease. Furthermore,
such disease prediction system can extended to recommend a treatment plan.
Reference
1. Arun, V, et al.: Privacy of Health Information in Telemedicine on Private Cloud, Interna-
tional Journal of Family Medicine & Medical Science Research. (2015)
2. Provenzano, R., Lerma, E.V., & Szczech, L.: Management of Anemia. Springer.(2018)
3. Ezzati, M., Lopez, Ad., Rodgers, A., Murray, C.J.L.: Comparative quantification of health
risks: global and regional burden of disease attributable to selected major risk factors. Ge-
neva: World Health Organization. (2004)
4. Balarajan, Y., et al.: Anaemia in low-income and middle-income countries. (2011)
5. Haas, J.D., Brownlie, T.: Iron deficiency and reduced work capacity: A critical review of
the research to determine a causal relationship. J Nutr. (2001)
6. Kozuki, N., Lee, A.C., Katz, J.: Child Health Epidemiology Reference Group. Moderate to
severe, but not mild, maternal anemia is associated with increased risk of small-for-
gestational-age outcomes. J Nutr. (2012)
7. Steer, P.J.: Maternal hemoglobin concentration and birth weight. Clin Nutr. (2000)
8. Shilpa A. Sanap, Meghana Nagori, Vivek Kshirsaga.: Classification of Anemia Using Data
Mining Techniques.: Swarm, Evolutionary, and Memetic Computing pp 113-121. Springer
(2011).
9. Jerez-Aragonés J.M. et al.: A combined neural network and decision trees model for prog-
nosis of breast cancer relapse. Artif Intell Med. (2003) pp 45–63.
10. Podgorelec, V. et al.: Decision trees: an overview and their use in medicine. J Med Syst.
(2002) pp: 445–463
11. N. Amin and A. Habib Comparison of different classification techniques using WEKA for
hematological data, American Journal of Engineering Research, Volume-4, Issue-3, pp-55-
61 (2015)
12. Dogan, S., Turkoglu, I.: Iron deficiency anemia detection from hematology parameters by
using decision tree. International journal of Science and technology. (2008) pp: 85-92.
13. Manal Abdullah and Salma Al-Asmari, Anemia types prediction based on data mining
classification algorithms, Communication, Management and Information Technology,
(2016) Taylor & Francis Group, London,