Early Lung Disease Pred

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2019 5th International Conference for Convergence in Technology (I2CT)

Pune, India. Mar 29-31, 2019

Early Prediction of Lung Diseases

Anuradha D. Gunasinghe Achala C. Aponso Harsha Thirimanna


Informatics Institute of Technology, Informatics Institute of Technology, Wso2
57, Ramakrishna Rd, Colombo 06, 57, Ramakrishna Rd, Colombo 06, 20,Palm Grove,
Sri Lanka. Sri Lanka. Colombo 3.
Anuradhadenuwan@gmail.com Achala.a@iit.ac.lk harsha.thirimanna@gmail.com

Abstract- Machine learning is a branch of artificial intelligence 23,000 times [18]. People with lung disorders have had
that employs a variety of statistical, probabilistic and optimization difficulty breathing. Millions of people in the United States
techniques that allows computers to “learn” from past examples have lung disease. All kinds of lung diseases together are
and to detect hard-to-discern patterns from large, noisy or the same number 3 killer in the United States. Lung diseases
complex data sets. Machine learning offers a principle approach
for developing sophisticated, automatic, and objective algorithms
are many of the disorders of the lungs, such as asthma,
for analysis of high-dimensional and multimodal biomedical data. COPD, influenza, pneumonia and tuberculosis, lung cancer
Machine Learning plays an important role in medical systems. and other respiratory problems. Some lungs can cause
Earlier identification of diseases, we can be helped to detect earlier respiratory distress [1].
and more accurately, which can save many people as well as
reduce the pressure on the system. Lung diseases are the one of A. LUNG DISEASES
the leading cause of death. The early identification and prediction Tuberculosis is an infectious disease, caused in most
of a lung diseases have become a necessity in the research, as it can cases by microorganisms called Mycobacterium
facilitate the subsequent clinical management of patients. Machine tuberculosis. The microorganisms usually enter the body by
Learning based decision support system provide the contribution
to the doctors in their diagnosis decisions. Project considered
inhalation through the lungs [4]. One of the most serious
about the breathing problems of patients as well as Asthma, disease in the world is lung cancer [8].Moreover it can be
Chronic Obstructive Pulmonary Disease (COPD), Tuberculosis, totally cure using early detection. Er et al. [4]stated that
Pneumothorax and Lung cancer. Machine Learning and Deep Pneumonia is an inflammation or infection of the lungs
Learning used to process data as well as create models for most commonly caused by a bacteria or virus. Moreover
diagnosing patients. Combining the processing of patient Pneumonia can also be caused by inhaling vomit or other
information with data from chest X-rays, using CNN with the foreign substances. According to Er etal. [4], Asthma is a
well-known pre-trained model, Caps Net network for data this chronic disease characterized by recurrent attacks of
form are the methods used for this project to identify the lung breathlessness and wheezing. During an asthma attack, the
diseases. Initially studied and analyzed the data set, then apply
Machine Learning and Deep Learning to predict that the patient
lining of the bronchial tubes swell, causing the airways to
has a lung disease or not. Project is a binary classification with narrow and reducing the flow of air into and out of the
input is patient's data (age, gender, chest X-ray images & view lungs. COPD is a preventable and treatable disease state
position) and output is found what the diseases is or not. The aim characterised by airflow limitation that is not fully
of the paper is to detect and diagnose the lung diseases as early as reversible [4]. Furthermore airflow limitation is usually
possible which will help the doctor to save the patient’s life. This progressive and is associated with an abnormal
paper describes how lung diseases was predicted and controlled, inflammatory response of the lungs to noxious particles or
using Machine Learning. gases, primarily caused by cigarette smoking.
Key words: Deep Learning, Lung Diseases, Machine Learning
B. Structuring Technologies
This section states the popular data formatting
I. INTRODUCTION technologies available for Lung Diseases, discussing their
features.
The pressure on health is on the increase, and the
environment, climate, man's lifestyle, the public's increasing
(a) Machine Learning
risk, the world is changing every day. In 2015, 56 million
Machine learning is a branch of artificial intelligence
people are dead, of which 68 percent have slowly developed.
that employs a variety of statistical, probabilistic and
On the top 10 list there are two diseases related to the lungs.
optimization techniques that allows computers to “learn”
Lung diseases leading cause of death there for we will focus on
from past examples and to detect hard-to-discern patterns
this article in lung-disease.
from large, noisy or complex data sets [6]. Machine learning
“When you breathe, in your lungs fill oxygen with air and
offers a principled approach for developing sophisticated,
distribute it to blood. Your body cells work and require oxygen
automatic, and objective algorithms for analysis of high-
to grow [2]. During a normal day, human breathe nearly
dimensional and multimodal biomedical data [19].

978-1-5386-8075-9/19/$31.00 ©2019 IEEE 1


(b) Deep Learning A literature survey was conducted to gain the
Deep learning is making major advances in solving background knowledge an learn all the related techniques
problems that have resisted the best attempts of the artificial and technologies like Machine Learning, Deep Learning ,
intelligence community for many years. It has turned out to be Convolutional Neural Networks(CNN),etc. It was mainly
very good at discovering intricate structures in high- focus on the lung diseases prediction system. Optimized
dimensional data and is therefore applicable to many domains CNN is the main method of this project. Here is the
of science, business and government [12]. Architecture for this approach.

C. Algorithmic Analysis (IT aspect)


This is a problem with the new datasets that have never
been fully modeled, therefore I do not want to get stuck and I
come up with the following approach:

偪 Convolutional Neural Network


This is a powerful algorithm for processing image data like
this. Given the huge data in the full dataset, this is indeed the
appropriate method to apply, some parameters are considered
and used as follows:

● Neural network architecture: Choose the appropriate


architecture Fig. 1: VGG16 architect extract feature

A neural network architecture is introduced for incremental


supervised learning of recognition categories and
multidimensional maps in response to arbitrary sequences of
analog or binary input vectors, which may represent fuzzy or
crisp sets of features [4].
● Preprocessing parameters
● Fine tuning
● Spatial transformer

This is a differentiable module which applies a spatial


transformation to a feature map during a single forward pass,
where the transformation is conditioned on the particular input,
producing a single output feature map [11].

● Training parameters Fig. 2: Full architecture


The training parameters are important because they
emphasis the required performance and the accuracy required The architecture consists of three main layers in the
from the neural network [3]. following order:
● Add more data in network not only images 偪 Spatial transformer layers (The first three layers)
● The first is lamda to transfer the routing features [-
偪 Capsules Network 0.5: 0.5], which means that the features of the
With the power to distinguish many objects from different image have an average value of 0.
perspectives, I find it useful because our image data has two ● The second is Batch Normalization.
types of View Position. Just like CNN I have some things to do ● The third layer is Spatial Transformer, which is
as follows: used to extract the most valuable features for
● Capsules Networks architect: Choose the appropriate classification.
architecture
● Preprocessing parameter 偪 Extract features layers (VGG16 pretrained model)
● A set of 13 layers as shown in the first image of
● Training parameters the VGG is the extract features, there are many
II. METHODOLOGY pretrained model but now I am trying before with
VGG16 because this is a simple model for
Recently a large dataset of X-ray lung data was public on learning time and training faster.
Kaggle and UCI Machine Learning Repository followed by
labeled lung disease data.

2
偪 Classification layers (Last 3 layers) diseases from the patient's chest X-ray data plus some
● The first layer is the Flattened layer from the output additional information.
of the layers VGG16 and 5 features plus 'Age',
'Gender Male', 'Gender Female', 'View position AP', IV. OVERVIEW OF A POSSIBLE SOLUTION
'View position PA'. These additional features will also
This project proposed the diagnosis of lung diseases
affect the sorting, as we have seen above, so they are
from the patient's chest X-ray data plus some additional
added to this layer. Following this layer is the dropout
information. The best solution is to have a complex CNN
layer.
with the following data processed:
● The next two layers are Dense after each Dropout
● Research for resolved issues, domain information,
layer, with a gradual decrease in depth.
support data, methods, and solution data for similar
projects. Some potential techniques are listed and
Capsule Network
investigated.
With the Capsule Network I had a slight change from the
original Hinton architecture so that it could work well with this ● Sample data is downloaded and analyzed,
data set. Here is the architecture taken from the article by preprocessing, metric selection
Hinton, I will clarify the changes right below: ● Testing multiple architectures, optimizing and
testing on a sample dataset.
● Use good architects to test the full dataset, continue
optimizing and statistics.

This project is based on a very new set of data and not


many people find out, this is a very good problem and if
done well it will make a big contribution to the community.
This project has tested many new and interesting methods
such as Spatial Transformer and has shown that they have
recorded remarkable results.
This project is hardly new, and chest X-rays are difficult
to see clearly, the data is not standardized, and NLP labeling
can be used to obtain the disease. It is also difficult to apply
a very new method of Convolutional Neural Network so
Fig. 3: Capsule Network there is not much documentation to optimize it. Big data on
Optimized CNN the full dataset is also a big challenge for me being limited
Changing and experimenting with a lot of image sizes, I to computer power.
found that the 64x64 image size was small enough and good The results of this project has achieved my initial
enough for the model to capture the pattern of the image. Use expectation, but should be able to apply in hospitals, more
the Spatial transformer with some layers supporting the front as improvements are needed to increase the precision of the
lamda layer. The spatial transformer layer uses a fairly simple model.
locnet (localization network) model to separate key features
from the image. Non-complementary data has been tested in V. CONCLUSION AND FUTURE WORK
many places on the architecture, and the first layer of the In this paper, the effect of the lungs of a modern patient
classification is most appropriate. Tweaks the thresholds of on the various researchers and the damage to the lung is
precision, recall, and Fbeta score clearly explained by various researchers. Since these
Refine the index of the dropout layer in the classification diseases (asthma, COPD, pneumonia, tuberculosis and lung
Parameter of optimizer Gradient descent with momentum cancer) has been cured the necessity of identifying this
decay and learning rate. disease has become essential according to many researches
as. One of the main concerns of this research is to identify
III. PROBLEM IDENTIFICATION and select a proper data sets and technique to analyze lung
People with lung disease have difficulty of breathing. diseases. Chest x-ray was selected based on the comparisons
Millions of people have lung disease in the U.S. If all types of and discussions that were stated in this paper.
lung disease are lumped together, it is the number three killer in Next a proper and suitable feature extraction algorithm
the United States. The term lung disease refers to many was chosen since the chest x-ray signal may contain lots of
disorders affecting the lungs. Now a days there is no computer unnecessary data. This selection was based on advantages
system for the identification of the all lung diseases using chest and disadvantages of using many common algorithms, such
X ray although there is a system for the identification of as CNN, Capsule Network. Finally, a classification
Pneumonia. This project is provided the diagnosis of lung algorithm was also discussed based on their characteristic
qualities. In short-term research, it was seen that CNN
algorithm added additional benefits to predict the lung

3
diseases in advance with better results. Ultimately, lung disease [9] Er, O., Yumusak, N. & Temurtas, F., 2010. Chest diseases diagnosis
can be diagnosed. using artificial neural networks. Expert Systems With Applications,
In the future, we hope to conduct a training with more data Volume 37, pp. 7648-7655.
[10] https://www.womenshealth.gov/a-z-topics/lung-disease
sets and change some parameters to faster the model. Some
[11] Jaderberg, M., Simonyan, K., Zisserman, A. & Kavukcuoglu, K.,
metric parameters of the metrics will also be tested. We can 2015. Spatial Transformer Networks. s.l., Neural Information
experiment on pre-trained model to improve the accuracy. Processing Systems Conference.
[12] LeCun, Y., Bengio, . Y. & Hinton, G., 2015. Deep learning.
REFERENCES International journal of science, pp. 436-444.
[1] Anon., 2018. Two lung diseases killed 3.6 million in 2015: study. [13] Matrix capsules with EM routing.
[Online] https://openreview.net/forum?id=HJWLfGWRb&noteId=HJWLfGW
[Accessed 4 November 2018] Rb
[2] Anon., 2007. Understanding how your lungs work and how COPD affects [14] Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray
your body. s.l.:s.n. Kavukcuoglu. Spatial Transformer Networks.
[3] Barghash, M. A. & Santarisi, N. S., 2004. Pattern recognition of control https://arxiv.org/abs/1506.02025
charts using artificial neural networks - analyzing the effect of the training [15] Misra, A., Rudrapatna, M. & Sowmya, A., 2004. Automatic Lung
parameters. Journal of Intelligent Manufacturing, Volume 15, pp. 635- Segmentation: A Comparison of Anatomical and Machine Learning
644. Approaches.
[4] Carpenter, G. A. et al., 1992. Fuzzv ARTMAP: A Neural Network [16] NIH full Chest X-rays dataset, https://www.kaggle.com/nih-chest-
Architecture for incremental supervised learning of analog xrays/data
multidimensional maps. IEEE Transactions on Neural Network, 3(5). [17] NIH sample Chest X-rays dataset, https://www.kaggle.com/nih-chest-
[5] Celli, B. R. & MacNee, W., 2004. Standards for the diagnosis and xrays/sample
treatment of patients with COPD: a summary of the ATS/ERS position [18] Niwa, H. (2007) ‘[ No Title ]’, Development, 134(4), pp. 635–646.
paper. European Respiratory Journal, Volume 23, pp. 932-946. [19] Sajda, P., 2006. Machine Learning for Detection, New York: s.n
[6] Cruz, J. A. & Wishart, D. S., 2006. Applications of machine learning in [20] Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. Dynamic Routing
cancer prediction. Volume 2, pp. 59-78. Between Capsules. https://arxiv.org/abs/1710.09829
[7] Dept. of Health and Human Services Office on Women's Health [21] Van, D. D., 2018. Diseases detection from Chest X-ray data, s.l.: s.n.
:https://medlineplus.gov/lungdiseases.html [22] Vinitha, S., Sweetlin, S., Vinusha, H. & Sajini, S., 2018. Disease
[8] Durga, S. & Kasturi, K., 2017. Lung disease prediction system using data prediction using machine learning over big data. Computer Science &
mining techniques. Jour of Adv Research in Dynamical & Control Engineering: An International Journal (CSEIJ), 8(1).
Systems, 9(5). [23] Wang, X. et al., n.d. ChestX-ray8: Hospital-scale Chest X-ray
Database and Benchmarks on Weakly-Supervised Classification and
Localization of Common Thorax Diseases.

You might also like