Asthma Management Report
Asthma Management Report
Asthma Management Report
ABSTRACT
CHAPTER 1
INTRODUCTION
PROPOSED SYSTEM
The purpose of this study was to determine the efficacy of DNN for increased accuracy in the
diagnosis of asthma and to make a comparison of the prediction performances of several deep
learning methods, in particular Random Forest.
MOTIVATIONS:
Retrospective analysis of patients ages 2 to 18 years seen at two urban pediatric EDs with asthma
exacerbation over 4 years. Asthma exacerbation was defined as receiving both albuterol and
systemic corticosteroids. We included patient features, measures of illness severity available in
triage, weather features, and Centers for Disease Control and Prevention influenza patterns. We
OBJECTIVES:
• The objective of this paper is to provide models for identifying features that are the most
indicative of the development of asthma in children.
• The project will provide predictive models to be used on many different data sets, with
• the data that is used in this project as an example, and will show how the data can be used
to formulate intervention strategies and early medical attention for children.
• Six classifiers were evaluated including traditional linear regression, logistic regression,
knn, decision tree, and Random Forest.
• All showed high accuracy for the current dataset. Choosing the best prediction model for
this problem can be complex because of the vast amount of health information that can be
collected for each child, and the amount of options there are for creating these models.
Chapter 2
LITERATURE SURVEY
CHAPTER 3
A good SRS defines the how Software System will interact with all internal modules,
hardware, communication with other programs and human user interactions with wide range of
real life scenarios. Using the Software requirements specification (SRS) document on QA lead,
managers creates test plan. It is very important that testers must be cleared with every detail
specified in this document in order to avoid faults in test cases and its expected results.
It is highly recommended to review or test SRS documents before start writing test cases and
making any plan for testing. Let’s see how to test SRS and the important point to keep in mind
while testing it.
1. Correctness of SRS should be checked. Since the whole testing phase is dependent on
SRS, it is very important to check its correctness. There are some standards with which
we can compare and verify.
2. Ambiguity should be avoided. Sometimes in SRS, some words have more than one
meaning and this might confused tester’s making it difficult to get the exact reference.
It is advisable to check for such ambiguous words and make the meaning clear for
better understanding.
3. Requirements should be complete. When tester writes test cases, what exactly is
Most of the defects which we find during testing are because of either incomplete
requirements or ambiguity in SRS. To avoid such defect it is very important to test
software requirements specification before writing the test cases. Keep the latest version
of SRS with you for reference and keep yourself updated with the latest change made to
the SRS. Best practice is to go through the document very carefully and note down all the
confusions, assumptions and incomplete requirements and then have a meeting with the
client to get them clear before development phase starts as it becomes costly to fix the
bugs after the software is developed. After all the requirements are cleared to a tester, it
becomes easy for him to write effective test cases and accurate expected results.
Functional Requirements
GUI User-facing apps
We are planning to develop user interface apps for both the smartphones as well as
desktop so our clients can use these apps to gain access to our network.
Pre-Processing unit
This unit will pre-process the data obtained from UCI repository. Various processing
includes, data cleaning (removing the data which is not labelled) , stemming ,
lemmatization and various other functions.
Non-Functional Requirements
Security
Information is stored and shared on our platform is highly secure since the information is
divided into chunks and encrypted and stored on various system. Hence attacks on the
system are difficult.
Scalability
As the number of nodes increases in our network the scalability of our platform in terms
of space and accessibility increase’s exponentially.
Performance
As our network is based on peer to peer and not on a single data storage so the single
point of failure is removed and so the performance is increased.
User friendly
The user-facing apps that are used by the clients to access our network are designed in such
a way that they are user friendly and very easy to use.
Cost
The cost of constructing data centres which usually cause billions and the maintenance cost
of these data centres is nullified.
Availability
Since our system is not centralized and there is no single point of failure in our system,
therefore the availability of our system increased.
HARDWARE REQUIREMENTS
Processor : > i3
Ram : 4GB.
Hard Disk : 500 GB.
Input device : Standard Keyboard and Mouse.
Compact Disk : 650 Mb.
Output device : High Resolution Monitor.
SOFTWARE REQUIREMENTS
PYTHON
Open CV
OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed
at real-time computer vision. Originally developed by Intel, it was later supported by Willow
Advance vision research by providing not only open but also optimized code for basic
vision infrastructure. No more reinventing the wheel.
Disseminate vision knowledge by providing a common infrastructure that developers
could build on, so that code would be more readily readable and transferable.
Advance vision-based commercial applications by making portable, performance
optimized code available for free – with a license that did not require code to be open or
free itself.
Structure of Open CV
Once OpenCV is installed, the OPENCV_BUILD\install directory will be populated with three
types of files:
Image and video I/O (file and camera based input, image/video file output).
Matrix and vector manipulation and linear algebra routines (products, solvers,, SVD).
Basic image processing (filtering, edge detection, corner detection, sampling and
interpolation, color conversion, morphological operations, histograms, image pyramids).
Structural analysis (connected components, contour processing, distance transform,
various moments, template matching, Hough transform, polygonal approximation, line
fitting, ellipse fitting, Delaunay triangulation).
Camera calibration (finding and tracking calibration patterns, calibration, fundamental
matrix estimation, homography estimation, stereo correspondence).
Motion analysis (optical flow, motion segmentation, tracking).
DATA ANALYSIS
Data analysis is the process of analyzing the raw data so that the processed/analyzed data can
be used in a system or a method/process. It majorly involves three steps data acquisition, data
preprocessing and exploratory data analysis. Data acquisition is collecting the data from
various sources like agencies, etc. for further analysis. While acquiring the data it is important
to collect data which is relevant to the system or the process.
Data preprocessing is a methodology in data mining that is used to convert the raw data into
meaningful and efficient format. Many unrelated and may be present in the results. Software
cleaning is done to tackle the portion. This includes managing details which are incomplete,
noisy information etc. and hence the process of data preprocessing is performed. Exploratory
data analysis is a significant process to carry out data investigations in order to detect
patterns, irregularities, test the hypothesis and check conclusions using summary statistics
and graphical representations.
The main objective of data analysis’ exploratory phase is to know the important
characteristics of the data by using descriptive statistics, correlation analysis, visual
inspection and other simple modeling and understand it.
CHAPTER 3
SYSTEM DESIGN
A system architecture or systems architecture is the conceptual model that defines the
structure, behavior, and more views of a system. An architecture description is a formal
description and representation of a system, organized in a way that supports reasoning about
the structures and behaviors of the system. A system architecture can consist of system
components and the sub- systems developed, that will work together to implement the overall
system. There have been efforts to formalize languages to describe system architecture,
collectively these are called architecture description languages (ADLs).
Various organizations can define systems architecture in different ways, including:
The fundamental organization of a system, embodied in its components, their relationships to
each other and to the environment, and the principles governing its design and evolution. A
representation of a system, including a mapping of functionality onto hardware and software
components, a mapping of the software architecture onto the hardware architecture, and human
interaction with these components.
An allocated arrangement of physical elements which provides the design solution for a
consumer product or life-cycle process intended to satisfy the requirements of the functional
architecture and the requirements baseline.
Architecture consists of the most important, pervasive, top-level, strategic inventions, decisions,
and their associated rationales about the overall structure (i.e., essential elements and their
relationships) and associated characteristics and behavior.
A description of the design and contents of a computer system. If documented, it may include
information such as a detailed inventory of current hardware, software and networking
capabilities; a description of long-range plans and priorities for future purchases, and a plan for
upgrading and/or replacing dated equipment and software.
A formal description of a system, or a detailed plan of the system at component level to guide
its implementation.
The composite of the design architectures for products and their life-cycle processes.
The structure of components, their interrelationships, and the principles and guidelines
System Architecture:
Data collection
The dataset used in this study consisted of 7001 records collected from asthma patients using
previously prescribed home telemanagement.52 The severity of asthma varied from mild
persistent to severe persistent.31 Patients used a laptop computer at home to fill in their asthma
diary on a daily basis. The diary included information about respiratory symptoms, sleep
disturbances due to asthma, limitation of physical activity, presence of cold, and medication
usage (Table 1). The patients also measured peak expiratory flow (PEF) using a peak expiratory
flow meter that communicated PEF values to the laptop automatically. The laptop sent the results
One big advantage of random forest is that it can be used for both classification and regression
problems, which form the majority of current machine learning systems. Let's look at random
forest in classification, since classification is sometimes considered the building block of
machine learning. Below you can see how a random forest would look like with two trees:
:
Random forest has nearly the same hyper parameters as a decision tree or a bagging classifier.
Fortunately, there's no need to combine a decision tree with a bagging classifier because you
can easily use the classifier-class of random forest. With random forest, you can also deal with
regression tasks by using the algorithm's regressor.
Random forest adds additional randomness to the model, while growing the trees. Instead of
searching for the most important feature while splitting a node, it searches for the best feature
among a random subset of features. This results in a wide diversity that generally results in a
better model.
REAL-LIFE ANALOGY
Andrew wants to decide where to go during one-year vacation, so he asks the people who know
him best for suggestions. The first friend he seeks out asks him about the likes and dislikes of his
past travels. Based on the answers, he will give Andrew some advice.
This is a typical decision tree algorithm approach. Andrew's friend created rules to guide his
decision about what he should recommend, by using Andrew's answers.
Afterwards, Andrew starts asking more and more of his friends to advise him and they again ask
him different questions they can use to derive some recommendations from. Finally, Andrew
chooses the places that where recommend the most to him, which is the typical random forest
algorithm approach.
FEATURE IMPORTANCE
Another great quality of the random forest algorithm is that it is very easy to measure the relative
importance of each feature on the prediction. Sklearn provides a great tool for this that measures
a feature's importance by looking at how much the tree nodes that use that feature reduce
impurity across all trees in the forest. It computes this score automatically for each feature after
training and scales the results so the sum of all importance is equal to one.
If you don’t know how a decision tree works or what a leaf or node is, here is a good description
from Wikipedia: '"In a decision tree each internal node represents a 'test' on an attribute (e.g.
whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and
each leaf node represents a class label (decision taken after computing all attributes). A node that
has no children is a leaf.'"
D I F F E R E N C E B E T W E E N D EC I S I O N T R E E S A N D R A N D O M F O R E S T S
While random forest is a collection of decision trees, there are some differences.
If you input a training dataset with features and labels into a decision tree, it will formulate some
set of rules, which will be used to make the predictions.
For example, to predict whether a person will click on an online advertisement, you might collect
the ads the person clicked on in the past and some features that describe his/her decision. If you
put the features and labels into a decision tree, it will generate some rules that help
predict whether the advertisement will be clicked or not. In comparison, the random forest
algorithm randomly selects observations and features to build several decision trees and then
averages the results.
Another difference is "deep" decision trees might suffer from overfitting. Most of the time,
random forest prevents this by creating random subsets of the features and building smaller trees
using those subsets. Afterwards, it combines the subtrees. It's important to note this doesn’t work
every time and it also makes the computation slower, depending on how many trees the random
forest builds.
IMPORTANT HYPERPARAMETERS
The hyperparameters in random forest are either used to increase the predictive power of the
model or to make the model faster. Let's look at the hyperparameters of sklearns built-in random
forest function.
A D V A N TA G E S A N D D I S A D V A N TA G E S O F T H E R A N D O M F O R E S T
ALGORITHM
One of the biggest advantages of random forest is its versatility. It can be used for both
regression and classification tasks, and it’s also easy to view the relative importance it assigns to
the input features.
Random forest is also a very handy algorithm because the default hyperparameters it uses often
produce a good prediction result. Understanding the hyperparameters is pretty straightforward,
and there's also not that many of them.
One of the biggest problems in machine learning is overfitting, but most of the time this won’t
The main limitation of random forest is that a large number of trees can make the algorithm too
slow and ineffective for real-time predictions. In general, these algorithms are fast to train, but
quite slow to create predictions once they are trained. A more accurate prediction requires more
trees, which results in a slower model. In most real-world applications, the random forest
algorithm is fast enough but there can certainly be situations where run-time performance is
important and other approaches would be preferred.
And, of course, random forest is a predictive modeling tool and not a descriptive tool, meaning if
you're looking for a description of the relationships in your data, other approaches would be
better.
The random forest algorithm is used in a lot of different fields, like banking, the stock market,
medicine and e-commerce.
In finance, for example, it is used to detect customers more likely to repay their debt on time, or
use a bank's services more frequently. In this domain it is also used to detect fraudsters out to
scam the bank. In trading, the algorithm can be used to determine a stock's future behavior.
In the healthcare domain it is used to identify the correct combination of components in medicine
and to analyze a patient’s medical history to identify diseases.
Random forest is used in e-commerce to determine whether a customer will actually like the
product or not.
SUMMARY
Random forest is a great algorithm to train early in the model development process, to see how it
performs. Its simplicity makes building a “bad” random forest a tough proposition.
Random forests are also very hard to beat performance wise. Of course, you can probably always
find a model that can perform better, like a neural network for example, but these usually take
more time to develop, though they can handle a lot of different feature types, like binary,
categorical and numerical.
Overall, random forest is a (mostly) fast, simple and flexible tool, but not without some
limitations.
UML diagrams:
Data Acquisition:
The datasets used in our analysis was obtained from the database of the Climate Agency .
Such databases are obtained from various locations across the country. Every sample
comprises of 12 characteristics deemed to be important attributes for forecasting intensity as
seen in Table I. The key reason for selecting the intensity attribute instead of the other
attributes is that the severity attribute is perceived to be of great significance to the flood
control centre.
Data Pre-processing:
It is a methodology in data mining that is used to convert the raw information into a
meaningful and efficient format. Many unrelated and missing parts may be present in the
results. Data cleaning is performed to tackle the portion. This includes managing details
which are incomplete, noisy information etc. The steps involved in this process are:
Data cleansing: This condition happens when some data is missing in the datasets. Data
transformation: It is a data mining process used to transform the data into suitable forms. It is
usually done in order to convert the data into required format and then carry on the further
processes of analysing.
Dataflow Diagram:
A data flow diagram (DFD) is a graphical representation of the "flow" of data through an
information system, modelling its process aspects. A DFD is often used as a preliminary step to
create an overview of the system without going into great detail which can later be elaborated.
Level-0
Level-1
CHAPTER 5
SYSTEM IMPLEMENTATION
MODULES:
• Data collection
• Classification algorithms
• Evaluation
Data collection
The dataset used in this study consisted of 7001 records collected from asthma patients using
previously prescribed home telemanagement.52 The severity of asthma varied from mild
persistent to severe persistent.31 Patients used a laptop computer at home to fill in their asthma
diary on a daily basis. The diary included information about respiratory symptoms, sleep
disturbances due to asthma, limitation of physical activity, presence of cold, and medication
usage. The patients also measured peak expiratory flow (PEF) using a peak expiratory flow
meter that communicated PEF values to the laptop automatically. The laptop sent the results of
patient self-testing to a central server on a daily basis. Once the self-testing results were received,
the patient’s current asthma status was automatically assigned to one of four levels of asthma
severity on the basis of a widely accepted clinical algorithm promulgated by current clinical
guidelines53—green zone (zone 1): “doing well;” high yellow zone (zone 2): “asthma is getting
worse;” low yellow zone (zone 3): “dangerous deterioration;” and red zone (zone 4): “medical
alert.” For the purposes of this study, we merged zones 1 and 2 in one class named “no-alert” and
merged zones 3 and 4 into another class named “high-alert.” For each day of patient self-testing,
our research database included a set of variables from patient asthma diaries and corresponding
asthma severity for this day expressed as “no-alert” zone or “high alert” zone.
Predictive modeling of asthma exacerbation
Classification algorithms
Three classification algorithms were used for building classification models: adaptive Bayesian
network, naive Bayesian classifier, and support vector machines. We briefly describe each
algorithm and corresponding analytical workflow below.
The naive Bayesian classifier looks at historical data and calculates conditional probabilities for
Evaluation
The results of the experiments were captured in terms of the number of values for true positive
(TP), true negative (TN), false negative (FN), and false positive (FP), which subsequently were
converted into overall accuracy, sensitivity, and specificity values as follows:
Results
The resultant dataset had 146 (49%) high-alert records and 152 (51%) no-alert records. The
distribution could not be made exactly equal across the two classes because of software
limitations.
Three sets of experiments were conducted to evaluate the effect of using a stratified sample to
train a classifier as opposed to using only the original skewed data set. In each of these
experiments, we divided the data into two disjoint sets for training and testing so that there are no
common records in the two sets. Furthermore, about 70% of the data in each set was used for
training and the remaining 30% was used for testing. Each set of experiments was conducted on
three classification algorithms. The dataset for the first case consisted of the whole dataset with
2435 records, of which 1452 records were used for training and the remaining 983 records were
used for testing. The second dataset consisted of a stratified sample of 298 records, of which 205
were used for training and the remaining 93 were used for testing. In the third set of experiments
the classifiers were trained using the training data for the stratified sample of 205 records and
were tested against the remaining 2230 of the original records. The breakdown of the dataset
sizes is provided in Table 2. In each of the abovementioned experimental datasets, three
classifiers were trained and tested: adaptive Bayesian network, naive Bayesian classifier, and
support vector machines.
Based on a 7-day window with 21 self-report variables generated daily by an asthma patient,
there were 147 attributes initially, of which 63 had a positive value in our attribute importance
model, which were kept in our subsequent analyses. However, these 63 attributes were spread
over 7 days, and, hence, there were nine attributes with values on each of the 7 days prior to the
target attribute, namely, the value of Zone on the eighth day. The selection of the 7-day
preceding window was based on clinical factors. The most important attribute was the value of
Zone with reduced importance from the seventh day to the first day. This implies that the Zone
value on the seventh day was the best predictor of the condition of the patient on the eighth day.
Machine learning approaches have significant potential in prediction of asthma patients. Future
steps in developing predictive models for asthma will utilize a comprehensive predictive
framework combining multiple data streams, and further optimization of feature selection, data
set preparation, and machine learning approaches may significantly enhance resulting
algorithms. Predictive modeling of asthma may be relevant to developing effective predictive
frameworks in other chronic health conditions.
REFERENCES:
Stratified, personalised or P4 medicine: a new direction for placing the patient at the
centre of healthcare and health education. Academy of Medical Sciences. 2015
Collins FS, Varmus H. A new initiative on precision medicine. N Engl J Med. 2015;
372:793–5. [PubMed: 25635347]
Tenenbaum JD, Avillach P, Benham-Hutchins M, Breitenstein MK, Crowgey EL,
Hoffman MA, Jiang X, Madhavan S, Mattison JE, Nagarajan R, Ray B, Shin D,
Visweswaran S, Zhao Z, Freimuth RR. An informatics research agenda to support
precision medicine: seven key areas. J Am Med Inform Assoc. 2016
Ashley EA. The precision medicine initiative: a new national effort. JAMA. 2015;
313(21):2119–20. [PubMed: 25928209]
Tan SS, Gao G, Koch S. Big Data and Analytics in Healthcare. Methods Inf Med. 2015;
54(6):546– 7. [PubMed: 26577624]
Amarasingham R, Audet AJ, Bates DW, Glenn Cohen, Entwistle M, Escobar GJ, Liu V,
Etheredge L, Lo B, Ohno-Machado L, Ram S, Saria S, Schilling LM, Shahi A, Stewart
WF, Steyerberg EW, Xie B. Consensus Statement on Electronic Health Predictive
Analytics: A Guiding Framework to Address Challenges. EGEMS (Wash DC). 2016;
4(1):1163. [PubMed: 27141516]
Piccirillo JF, Vlahiotis A, Barrett LB, Flood KL, Spitznagel EL, Steyerberg EW. The
changing prevalence of comorbidity across the age spectrum. Crit Rev OncolHematol.
2008; 67(2):124–32. [PubMed: 18375141]