Asthma Management Report

ASTHMA MANAGEMENT USING MACHINE LEARNING 2021
ABSTRACT
Patient telemonitoring results in an aggregation of significant amounts of information about

patient disease trajectory. However, the potential use of this information for early prediction of
exacerbations in adult asthma patients has not been systematically evaluated. The aim of this
study was to explore the utility of telemonitoring data for building machine learning algorithms
that predict asthma exacerbations before they occur. The study dataset comprised daily self-
monitoring reports consisting of 7001 records submitted by adult asthma patients during home
telemonitoring. Predictive modeling included preparation of stratified training data sets,
predictive feature selection, and evaluation of resulting classifiers. Using a 7-day window, a
naive Bayesian classifier, adaptive Bayesian network, and support vector machines were able to
predict asthma exacerbation occurring on day 8, with sensitivity of 0.80, 1.00, and 0.84;
specificity of 0.77, 1.00, and 0.80; and accuracy of 0.77, 1.00, and 0.80, respectively. Our study
demonstrated that machine learning techniques have significant potential in developing
personalized decision support for chronic disease telemonitoring systems. Future studies may
benefit from a comprehensive predictive framework that combines telemonitoring data with
other factors affecting the likelihood of developing acute exacerbation. Approaches implemented
for advanced asthma exacerbation prediction may be extended to prediction of exacerbations in
patients with other chronic health conditions.
CHAPTER 1
[Type text] Page 1

INTRODUCTION
Asthma is a chronic inflammatory disorder of the airways characterized by an obstruction of

airflow, which may be completely or partially reversed with or without specific therapy [1].
Airway inflammation is the result of interactions between various cells, cellular elements, and
cytokines. In susceptible individuals, airway inflammation may cause recurrent or persistent
bronchospasm, with symptoms like wheezing, breathlessness, chest tightness, and cough,
particularly at night or after exercise. Asthma is a disease with polymorphic phenotype affected
by several environmental and genetic factors which both play a key role in the development and
persistence of the disease [2, 3]. Among these factors family history of asthma, presence of
atopic dermatitis or allergic rhinitis, wheezing episodes during childhood, maternal smoking
during pregnancy, and several prenatal and environmental factors are included [4–7]. Most
children who suffer from asthma develop their first symptoms before the 5th year of age [8].
However, it is difficult to discriminate asthma from other wheezing disorders of the childhood
because the symptoms are similar. Thus, children with asthma may often be misdiagnosed as
having a common cold, bronchiolitis, or pneumonia. For the diagnosis of asthma a detailed
medical history and physical examination along with a lung function test is usually required. On
the other hand, lung function test is hard to be performed in children younger than five years old.
In preventive medicine, the value of a test lies in its ability to identify those individuals who are
at high risk of an illness and who therefore require intervention while excluding those who do
not require such intervention. The accuracy of the risk classification is of particular relevance in
the case of asthma disease. Early identification of patients at high risk for asthma disease
progression may lead to better treatment opportunities and hopefully better disease outcomes in
adulthood [9–13]. Several efforts have been made by different groups to discover a safe way of
prediction of asthma outcome such as asthma index API or modified asthma index mAPI in
children younger than five years old [14, 15]. To the knowledge of the authors, this is the first
study where machine learning techniques are used in the prediction of persistent asthma.
However, Principal Component Analysis (PCA) has been used in several medical studies as for
instance to evaluate the multivariate association between functional microvascular variables and
clinical-laboratorial anthropometrical measurements [16]. Moreover, in the study of [17],
[Type text] Page 2

multivariate projection techniques have been utilized to reveal how inflammatory mediators
demonstrate a distinct pattern of response to traumatic brain injury in humans. Finally, in [18],
PCA was used for Gait Kinematics Data in Acute and Chronic Stroke Patients. Least Square
Support Vector Machine (LSSVM) classifiers have been used with success for diagnosis of lung
cancer [19] and in a hepatitis diagnosis system [20].
Asthma, one of the most common diseases throughout the world, is defined clinically as the
combination of variable respiratory symptoms and large changes in lung function. Diagnosis of
this disease is based on international guidelines, including symptoms such as coughing, shortness
of breath, and wheezing.1, 2, 3 As these symptoms are not specific to asthma, 30% of the asthma
patients are misdiagnosed by a physician when the diagnosis was based on the symptoms
alone.4, 5, 6, 7 This misdiagnosis may lead to inappropriate treatment and then potentially
physical and financial complications.8
There is no single reliable test (‘gold standard’) and there are no standardized diagnostic criteria
for asthma. Thus, it is not possible to make an unequivocal evidence-based recommendation on
how to make an asthma diagnosis.9 We previously demonstrated that symptom-sign score, which
involved combining symptoms, including recurrent episodes and nocturnal worsening as well as
physical findings, could determine the probability of asthma. However, if the diagnosis is
considered as being of intermediate or low probability, further lung function tests, including
obstructive spirometry and a positive bronchodilator test, could increase the probability of
accurately diagnosing asthma.10 In that study, logistic analysis was used as a linear-regression
model, and the accuracy of diagnosing asthma was limited to 70% on the determined symptom-
sign score.
The experience of the physician in the diagnostic process is undeniable, and to avoid wasting
time in the diagnosis, techniques based on artificial intelligence (AI) are being employed, so that
the diagnosis is rapid, reliable, accurate, and knowledge-based.11 One of the AI techniques used
to solve real-world problems involves neural networks. Since the successful use of a deep
learning neural network was demonstrated for an image classification task,12 the deep learning
technique has been gaining ground in medical image analysis. Deep neural network (DNN) has
been developed for image analysis and has been used for deep learning as a non-linear regression
analysis for classifying images.
[Type text] Page 3

EXISTING SYSTEM
Asthma in children needs to be identified as early as possible to provide children with medical
intervention. Creating a model that accurately predicts asthma in children has proven difficult. In
the current state, research has provided models for asthma prediction that have low accuracy and
use a small, specific sample size. There are limited research that analyzes a large population of
children, using specific factors, to develop a model that can be used in a clinical setting. In this
paper, we developed predictive models to analyze a child asthma health dataset. Machine
learning classifiers are used to develop these predictive models; including Linear Regression,
Decision Tree, Random Forest, KNN, and Naive Bayes technique. Of all the classifiers
implemented, random forest classifier resulted in highest prediction accuracy (90.9%). Following
are the variables: Sex, Difficulty Breathing, Allergies, and Medication have the highest
correlation with asthma. The review of current research and the results of model that are
presented in this paper can be used in a clinical setting by medical professionals to make
predictions of asthma development in children and implement early intervention for the
treatment of asthma development.
Limitations of Existing System
 Existing system is unable to adapt to the external conditions.
 Accuracy of identification is less.
 Time consuming.
PROPOSED SYSTEM
The purpose of this study was to determine the efficacy of DNN for increased accuracy in the
diagnosis of asthma and to make a comparison of the prediction performances of several deep
learning methods, in particular Random Forest.
MOTIVATIONS:
Retrospective analysis of patients ages 2 to 18 years seen at two urban pediatric EDs with asthma
exacerbation over 4 years. Asthma exacerbation was defined as receiving both albuterol and
systemic corticosteroids. We included patient features, measures of illness severity available in
triage, weather features, and Centers for Disease Control and Prevention influenza patterns. We
[Type text] Page 4

tested four models: decision trees, LASSO logistic regression, random forests, and gradient
boosting machines. For each model, 80% of the data set was used for training and 20% was used
to validate the models. The area under the receiver operating characteristic (AUC) curve was
calculated for each model.
OBJECTIVES:
• The objective of this paper is to provide models for identifying features that are the most
indicative of the development of asthma in children.
• The project will provide predictive models to be used on many different data sets, with
• the data that is used in this project as an example, and will show how the data can be used
to formulate intervention strategies and early medical attention for children.
• Six classifiers were evaluated including traditional linear regression, logistic regression,
knn, decision tree, and Random Forest.
• All showed high accuracy for the current dataset. Choosing the best prediction model for
this problem can be complex because of the vast amount of health information that can be
collected for each child, and the amount of options there are for creating these models.
[Type text] Page 5

Chapter 2
LITERATURE SURVEY
[1] Asthma Academy: Developing Educational Technology to Improve Asthma Medication

Adherence and Intervention Efficiency
Authors: Aiswaria S. Nair, Karen DeMuth
Published in: IEEE 2017
Asthma is a leading chronic disorder among children and adolescents. Although some children
outgrow asthma while transitioning into adulthood, there are others who continue to suffer from
life-threatening asthmatic exacerbations. Teenagers tend to have certain misconceptions about
their asthmatic condition and treatment which are rarely recognized or addressed in regular
clinical consultations. After reviewing the literature in this field, we have identified that
improving patient knowledge can be effective in augmenting engagement, and considerably
improving their clinical outcomes. It is necessary to develop an effective educational
intervention that can help Asthma patients change their perception about self-efficacy and
ultimately reduce the total health care costs incurred. Hence, a sound transfer of knowledge
during the transition from childcare to adult care is highly recommended. On these very lines,
Georgia Institute of Technology designed an interactive educational application called Asthma
Academy in conjunction with Children’s Healthcare of Atlanta. This website resides in the
public cloud and uses a novel animation video– based curriculum to deliver essential healthcare
education to asthmatic adolescents in an interactive manner. What distinguishes it from similar
initiatives is the use of a cost-effective technique to simulate caregiver-patient interactions and
the ability to cater to a wide range of socio-economic statuses and educational levels. A group-
based study with twenty asthma adolescents was conducted to evaluate the user acceptance and
performance of Asthma Academy supplemented by regular check-ups over a period of eight to
ten weeks. Observations recorded post the study clearly indicate higher levels of engagement and
the systematic dissemination of information offered by Asthma Academy.
[2] Asthma Irritant Monitoring
Authors: E-J. Maalouf,A. Aoun,N. Marina
Published in: 2018 30th International Conference on Microelectronics (ICM)
[Type text] Page 6

This paper describes a prototype for asthma irritant monitoring system (AIM) that can be used by
asthma patients. The AIM is a compact device that senses the environment around the patient for
different irritants in order to detect any signs of asthma attacks or potentially unhealthy
environments. Hence, asthma patients are able to know whether the environment around them is
Healthy or not, allowing them to take appropriate action. In addition, the device offers the
capability of sending the data to the physician to follow the patient case and a display indicating
the environment condition around the patients. Furthermore, the AIM displays data reordered in
the daily tests, allowing the patient and the physician to check the progress from previous days.
Finally, the AIM device is aligned with the medical requirements as per physicians’ and
telemedicine specialists’ recommendations; the experiments carried out on asthma patients
demonstrated the effectiveness and sustainable use of the AIM device.
[3] Modernising Asthma Management: Personalised Asthma Action Plans Using a
Smartphone Application
Authors: Nikita Isaac, NaveenaaSampath and Valerie Gay
Published in: IEEE 2015
Asthma is a chronic disease affecting one in nine Australians. With symptoms such as coughing,
wheezing and shortness of breath.asthma can significantly impact a patient’s quality of life.
Asthma action plans are said to be one of the most effective asthma interventions available.
However, in Australia only one in five people aged 15 and over, with asthma, have a written
asthma action plan. Even less of which, refer to their plan. A review of related literature and
work showed a gap regarding accessibility of information on asthma action plans in a written
form. In an attempt to mitigate this problem, this paper focuses on the design and development of
a smartphone application. The application is currently a high-fidelity prototype designed and
built using proto.io software. In addition to this conversion, the application incorporates aspects
of the Internet of Things (IoT) whereby real-time data regarding environmental triggers such as
temperature, humidity and pollen in surroundings, can be accessed from the application.
The application ultimately aims to help asthmatics improve their health and quality of life by
providing them, or their career with the knowledge needed to better understand and manage their
asthma, when and where they need it.
[4] A Method and Algorithm for Remote Monitoring of Patients in Asthma
Authors: Anna Glazova, ZafarYuldashev, Anna Bashkova
[Type text] Page 7

Published in:2018 Ural Symposium on Biomedical Engineering, Radioelectronics and
Information Technology (USBEREIT)
A method and algorithm for remote monitoring of patients in asthma is discussed. The method
includes a comprehensive evaluation of the results of standardized questionnaire about disease
symptoms, changes in the bronchodilator intake regimen and data of functional test of respiratory
systems state. The proposed functional test is basedon the assessment of the duration of tracheal
sounds in the frequency band of 200-2000 Hz registered during forced expiratory maneuver. To
record tracheal sounds a lapel microphone placed at the mouth outside the exhaled airflow is
used. The algorithm of processing and analysis of diagnostic data is described. The developed
diagnostic algorithm proved its effectiveness in the long-term monitoring of a group of healthy
individuals and patients in asthma.
[5] Prospects for Designing a Portable System for Monitoring of the Patient’s Condition
with Bronchial Asthma
Authors: Ivan V. Semernik, Alexander V. Dem’yanenko
Published in: 2019 IEEE
In this article the prospects and possibilities for creating an individual wearable system for
monitoring the condition of a patient suffering from bronchial asthma and preventing attacks of
the disease are discussed. As the basic method of determining the condition of the patient is
considered the technique for determining the transmission coefficient of a certain frequency
microwave signal through the chest. The proposed method is non-invasive and harmless and can
be used for patients of all age groups.
[6] Detection and Monitoring of Asthma Trigger Factor using Zigbee
Authors: Miss. AnumehaLal , Mr. Girish A. Kulkarni
Published in: International Advanced Research Journal in Science, Engineering and
Technology, Vol. 3, Issue 7, July 2016
Asthma is one of the widespread chronic diseases. Firstly, the medical background of asthma is
given. Pathology and symptoms are presented. Asthma is a chronic condition that mostly affects
adolescents. It is a condition that requires continuous monitoring of the symptoms in order to
provide an effective course of treatment. It also requires a strict adherence to medication
prescribed by the physician. However, the aim of this study is to develop a system, which is
based on a periodical data collected by the different sensors. There is no cure for asthma.
[Type text] Page 8

Symptoms can be prevented by monitoring factors which can trigger asthma attack. So it is very
much needed that there should be a system which can monitor air parameter on regular basis and
warn the patient when these factor can trigger their asthma attack.
[7] Wireless sensor networks in monitoring of asthma
Authors: DinkoOletic
Published in: IJRSE 2013
Asthma is one of the widespread chronic diseases. Rising prevalence increases the burden of
personal disease management, financial expenditures and workload, both on sides of patients and
healthcare systems. Firstly, the medical background of asthma is given. Pathology and symptoms
are presented. Afterwards, the problem of persistent asthma management is introduced with a
short overview of traditional disease management techniques. A review on approaches to asthma
telemonitoring is made. Effectiveness of home peakflowmetry is analysed. Employment of low
power wireless sensor networks (WSN) paired with smartphone technologies is reviewed as a
novel asthma management tool. Using the technology, the aim is to retain the disease in a
controlled state with minimal effort, invasiveness and cost, and assess patient’s condition
objectively. WSN-s for sensing of both asthma triggers in the environment, and continuous
monitoring of physiological functions, in particular respiratory function are reviewed. Sensing
modalities for acquiring respiratory function are presented. Signal acquisition prerequisites and
signal processing of respiratory sounds are reviewed. Focus is put on low-power continuous
wheeze detection techniques. At the end, research challenges for further studies are identified.
[8] Monitoring the patient with asthma: An evidence-based approach
Authors: Harold S. Nelson, MD
Published in: Apr 17, 2000
The monitoring of symptoms, airflow obstruction, and exacerbations is essential to asthma
management. Patients who practice self-monitoring in conjunction with use of a written action
plan and regular medical review have significantly fewer hospitalizations, emergency department
visits, and lost time from work. Either symptom monitoring or peak expiratory flow monitoring
is satisfactory, provided the results are interpreted with reference to the patient’s own baseline
asthma status. Regular monitoring by physicians also improves health outcomes for patients,
provided the physician is systematic and monitors control, medications, and skills at regular
intervals. Additional monitoring tools are under evaluation, and these include measures of airway
[Type text] Page 9

responsiveness, airway inflammation, and Internet- based monitoring systems. Administrators
need to monitor the quality and cost of care, as well as compliance with national management
guidelines. Assessment of the hospitalization rate and regular audit may achieve these aims in
the hospital setting. The best way to assess and monitor asthma in primary care remains an
unresolved yet crucial issue because primary care physicians manage the vast burden of illness
caused by asthma. Monitoring asthma outcomes is an essential step toward the successful
implementation of national guidelines for the management of asthma.
[Type text] Page 10

CHAPTER 3
SYSTEM REQUIREMENT SPECIFICATION
A software requirements specification (SRS) is a detailed description of a software system to

be developed with its functional and non-functional requirements. The SRS is developed based
the agreement between customer and contractors. It may include the use cases of how user is
going to interact with software system. The software requirement specification document
consistent of all necessary requirements required for project development. To develop the
software system we should have clear understanding of Software system. To achieve this we
need to continuous communication with customers to gather all requirements.
A good SRS defines the how Software System will interact with all internal modules,
hardware, communication with other programs and human user interactions with wide range of
real life scenarios. Using the Software requirements specification (SRS) document on QA lead,
managers creates test plan. It is very important that testers must be cleared with every detail
specified in this document in order to avoid faults in test cases and its expected results.
It is highly recommended to review or test SRS documents before start writing test cases and
making any plan for testing. Let’s see how to test SRS and the important point to keep in mind
while testing it.
1. Correctness of SRS should be checked. Since the whole testing phase is dependent on
SRS, it is very important to check its correctness. There are some standards with which
we can compare and verify.
2. Ambiguity should be avoided. Sometimes in SRS, some words have more than one
meaning and this might confused tester’s making it difficult to get the exact reference.
It is advisable to check for such ambiguous words and make the meaning clear for
better understanding.
3. Requirements should be complete. When tester writes test cases, what exactly is
[Type text] Page 11

required from the application, is the first thing which needs to be clear. For e.g. if
application needs to send the specific data of some specific size then it should be
clearly mentioned in SRS that how much data and what is the size limit to send.
4. Consistent requirements. The SRS should be consistent within itself and consistent to
its reference documents. If you call an input “Start and Stop” in one place, don’t call it
“Start/Stop” in another. This sets the standard and should be followed throughout the
testing phase.
5. Verification of expected result: SRS should not have statements like “Work as
expected”, it should be clearly stated that what is expected since different testers would
have different thinking aspects and may draw different results from this statement.
6. Testing environment: some applications need specific conditions to test and also a
particular environment for accurate result. SRS should have clear documentation on
what type of environment is needed to set up.
7. Pre-conditions defined clearly: one of the most important part of test cases is pre-
conditions. If they are not met properly then actual result will always be different
expected result. Verify that in SRS, all the pre-conditions are mentioned clearly.
8. Requirements ID: these are the base of test case template. Based on requirement Ids,
test case ids are written. Also, requirements ids make it easy to categorize modules so
just by looking at them, tester will know which module to refer. SRS must have them
such as id defines a particular module.
9. Security and Performance criteria: security is priority when a software is tested
especially when it is built in such a way that it contains some crucial information when
leaked can cause harm to business. Tester should check that all the security related
requirements are properly defined and are clear to him. Also, when we talk about
performance of a software, it plays a very important role in business so all the
requirements related to performance must be clear to the tester and he must also know
when and how much stress or load testing should be done to test the performance.
10. Assumption should be avoided: sometimes when requirement is not cleared to tester,
he tends to make some assumptions related to it, which is not a right way to do testing
as assumptions could go wrong and hence, test results may vary. It is better to avoid
assumptions and ask clients about all the “missing requirements” to have a better
[Type text] Page 12

understanding of expected results.
11. Deletion of irrelevant requirements: there are more than one team who work on SRS
so it might be possible that some irrelevant requirements are included in SRS. Based on
the understanding of the software, tester can find out which are these requirements and
remove them to avoid confusions and reduce work load.
12. Freeze requirements: when an ambiguous or incomplete requirement is sent to client
to analyse and tester gets a reply, that requirement result will be updated in the next
SRS version and client will freeze that requirement. Freezing here means that result
will not change again until and unless some major addition or modification is
introduced in the software.
Most of the defects which we find during testing are because of either incomplete
requirements or ambiguity in SRS. To avoid such defect it is very important to test
software requirements specification before writing the test cases. Keep the latest version
of SRS with you for reference and keep yourself updated with the latest change made to
the SRS. Best practice is to go through the document very carefully and note down all the
confusions, assumptions and incomplete requirements and then have a meeting with the
client to get them clear before development phase starts as it becomes costly to fix the
bugs after the software is developed. After all the requirements are cleared to a tester, it
becomes easy for him to write effective test cases and accurate expected results.
Functional Requirements
 GUI User-facing apps
We are planning to develop user interface apps for both the smartphones as well as
desktop so our clients can use these apps to gain access to our network.
 Pre-Processing unit
This unit will pre-process the data obtained from UCI repository. Various processing
includes, data cleaning (removing the data which is not labelled) , stemming ,
lemmatization and various other functions.
[Type text] Page 13

 Machine learning Models

We will be using one model, which will be trained using the data sets obtained. Various
Machine learning techniques are used to train this model.
Non-Functional Requirements
 Security
Information is stored and shared on our platform is highly secure since the information is
divided into chunks and encrypted and stored on various system. Hence attacks on the
system are difficult.
 Scalability
As the number of nodes increases in our network the scalability of our platform in terms
of space and accessibility increase’s exponentially.
 Performance
As our network is based on peer to peer and not on a single data storage so the single
point of failure is removed and so the performance is increased.
 User friendly
The user-facing apps that are used by the clients to access our network are designed in such
a way that they are user friendly and very easy to use.
 Cost
The cost of constructing data centres which usually cause billions and the maintenance cost
of these data centres is nullified.
 Availability
Since our system is not centralized and there is no single point of failure in our system,
therefore the availability of our system increased.
[Type text] Page 14

HARDWARE REQUIREMENTS
 Processor : > i3
 Ram : 4GB.
 Hard Disk : 500 GB.
 Input device : Standard Keyboard and Mouse.
 Compact Disk : 650 Mb.
 Output device : High Resolution Monitor.
SOFTWARE REQUIREMENTS
 Operating system : macOS, Windows XP/7 or higher version.

 Coding language : python (>=python 3.3 or python 2.7).
 Tool-Open cv, Jupiter Notebook, anaconda navigator
TOOLS AND TECHNOLOGY DETAILS
PYTHON
It is an object-oriented programming language. The processing happens during the runtime,

and this is performed by the interpreter. Python's simple to learn and easy to use is an
advantage and thus makes it developer friendly. It is easier to read and understand as the
syntax is conventional. The code can be executed line by line using the interpreter. Python
can support multiple platforms like Linux, UNIX, windows, Macintosh, and so on. The
paradigms of Object-oriented programming are supported by python. The functions such as
polymorphism, operator overloading and multiple inheritance is supported python.
Open CV
OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed
at real-time computer vision. Originally developed by Intel, it was later supported by Willow
[Type text] Page 15

Garage then Itseez (which was later acquired by Intel). The library is crossplatform and free for
use under the open-source BSD license. OpenCV supports deep learning frameworks
TensorFlow, Torch/PyTorch and Cafe.
It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and
Mac OS. OpenCV leans mostly towards real-time vision applications and takes advantage of
MMX and SSE instructions when available. A full-featured CUDA and OpenCL interfaces are
being actively developed right now. There are over 500 algorithms and about 10 times as many
functions that compose or support those algorithms. OpenCV is written natively in C++ and has
a templated interface that works seamlessly with STL containers.
In 1999, the OpenCV project was initially an Intel Research initiative to advance CPUintensive
applications, part of a series of projects including real-time ray tracing and 3D display walls. The
main contributors to the project included a number of optimization experts in Intel Russia, as
well as Intel‘s Performance Library Team. In the early days of OpenCV, the goals of the project
were described as:
 Advance vision research by providing not only open but also optimized code for basic
vision infrastructure. No more reinventing the wheel.
 Disseminate vision knowledge by providing a common infrastructure that developers
could build on, so that code would be more readily readable and transferable.
 Advance vision-based commercial applications by making portable, performance
optimized code available for free – with a license that did not require code to be open or
free itself.
[Type text] Page 16

Structure of Open CV
Once OpenCV is installed, the OPENCV_BUILD\install directory will be populated with three
types of files:
 Header files: These are located in the OPENCV_BUILD\install\includesubdirectory and

are used to develop new projects with OpenCV.
 Library binaries: These are static or dynamic libraries (depending on the option selected
with CMake) with the functionality of each of the OpenCV modules. They are located in
the bin subdirectory (for example, x64\mingw\bin when the GNU compiler is used).
 Sample binaries: These are executables with examples that use the libraries. The sources
for these samples can be found in the source package.
[Type text] Page 17

General description
 Open source computer vision library in C/C++.

 Optimized and intended for real-time applications.
 OS/hardware/window-manager independent.
 Generic image/video loading, saving, and acquisition.
 Both low and high level API.
Features
 Image data manipulation (allocation, release, copying, setting, conversion).
 Image and video I/O (file and camera based input, image/video file output).
 Matrix and vector manipulation and linear algebra routines (products, solvers,, SVD).
 Various dynamic data structures (lists, queues, sets, trees, graphs).
 Basic image processing (filtering, edge detection, corner detection, sampling and
interpolation, color conversion, morphological operations, histograms, image pyramids).
 Structural analysis (connected components, contour processing, distance transform,
various moments, template matching, Hough transform, polygonal approximation, line
fitting, ellipse fitting, Delaunay triangulation).
 Camera calibration (finding and tracking calibration patterns, calibration, fundamental
matrix estimation, homography estimation, stereo correspondence).
 Motion analysis (optical flow, motion segmentation, tracking).
 Object recognition (eigen-methods, HMM).
 Basic GUI (display image/video, keyboard and mouse handling, scroll-bars).
 Image labeling (line, conic, polygon, text drawing)
MACHINE LEARNING TECHNIQUES
A branch of artificial intelligence is called machine learning, it is used to study algorithms

which bring out the regularities and patterns, these factors can reduce the computational cost
and effortless implementation.
[Type text] Page 18

When compared to physical models this technology provides fast training, validation, testing
with increased performance and less complexity.
DATA ANALYSIS
Data analysis is the process of analyzing the raw data so that the processed/analyzed data can
be used in a system or a method/process. It majorly involves three steps data acquisition, data
preprocessing and exploratory data analysis. Data acquisition is collecting the data from
various sources like agencies, etc. for further analysis. While acquiring the data it is important
to collect data which is relevant to the system or the process.
Data preprocessing is a methodology in data mining that is used to convert the raw data into
meaningful and efficient format. Many unrelated and may be present in the results. Software
cleaning is done to tackle the portion. This includes managing details which are incomplete,
noisy information etc. and hence the process of data preprocessing is performed. Exploratory
data analysis is a significant process to carry out data investigations in order to detect
patterns, irregularities, test the hypothesis and check conclusions using summary statistics
and graphical representations.
The main objective of data analysis’ exploratory phase is to know the important
characteristics of the data by using descriptive statistics, correlation analysis, visual
inspection and other simple modeling and understand it.
[Type text] Page 19

CHAPTER 3
SYSTEM DESIGN
A system architecture or systems architecture is the conceptual model that defines the
structure, behavior, and more views of a system. An architecture description is a formal
description and representation of a system, organized in a way that supports reasoning about
the structures and behaviors of the system. A system architecture can consist of system
components and the sub- systems developed, that will work together to implement the overall
system. There have been efforts to formalize languages to describe system architecture,
collectively these are called architecture description languages (ADLs).
Various organizations can define systems architecture in different ways, including:
The fundamental organization of a system, embodied in its components, their relationships to
each other and to the environment, and the principles governing its design and evolution. A
representation of a system, including a mapping of functionality onto hardware and software
components, a mapping of the software architecture onto the hardware architecture, and human
interaction with these components.
An allocated arrangement of physical elements which provides the design solution for a
consumer product or life-cycle process intended to satisfy the requirements of the functional
architecture and the requirements baseline.
Architecture consists of the most important, pervasive, top-level, strategic inventions, decisions,
and their associated rationales about the overall structure (i.e., essential elements and their
relationships) and associated characteristics and behavior.
A description of the design and contents of a computer system. If documented, it may include
information such as a detailed inventory of current hardware, software and networking
capabilities; a description of long-range plans and priorities for future purchases, and a plan for
upgrading and/or replacing dated equipment and software.
A formal description of a system, or a detailed plan of the system at component level to guide
its implementation.
The composite of the design architectures for products and their life-cycle processes.
The structure of components, their interrelationships, and the principles and guidelines
[Type text] Page 20

governing their design and evolution over time.
One can think of system architecture as a set of representations of an existing (or future)
system. These representations initially describe a general, high-level functional organization,
and are progressively refined to more detailed and concrete descriptions.
System Architecture:
Data collection
The dataset used in this study consisted of 7001 records collected from asthma patients using
previously prescribed home telemanagement.52 The severity of asthma varied from mild
persistent to severe persistent.31 Patients used a laptop computer at home to fill in their asthma
diary on a daily basis. The diary included information about respiratory symptoms, sleep
disturbances due to asthma, limitation of physical activity, presence of cold, and medication
usage (Table 1). The patients also measured peak expiratory flow (PEF) using a peak expiratory
flow meter that communicated PEF values to the laptop automatically. The laptop sent the results
[Type text] Page 21

of patient self-testing to a central server on a daily basis. Once the self-testing results were
received, the patient’s current asthma status was automatically assigned to one of four levels of
asthma severity on the basis of a widely accepted clinical algorithm promulgated by current
clinical guidelines.
For each day of patient self-testing, our research database included a set of variables from patient
asthma diaries and corresponding asthma severity for this day expressed as “no-alert” zone or
“high alert” zone.
Predictive modeling of asthma
In the methodology for this study, we have adapted notions from intertransactional association
analysis. However, the specific challenge in applying this methodology lies in the fact that we
are interested in predicting the rare event of an imminent asthma exacerbation based on a number
of features over a preceding time window consisting of several consecutive days. Our
methodology consists of the following steps: (1) develop an attribute importance model for
ranking the attributes used for predicting the target event; (2) determine a prediction period
preceding the target event (Wp); (3) create a set of target mega-transactions (Tm) by including
all attributes from step 1 for the repeat step 1 for Wp; (4) repeat step 1 on Tm, which results in a
set of mega-transactions with a reduced subset of features (Td); (5) group the transaction set Td,
such that the transactions belonging to each target class belong to the same group; and (6)
perform three sets of experiments: train and test a classifier with the original data, train and test
classifiers using a stratified sample with equal representation of each class, train a classifier with
the training data from the stratified sample and test data from the rest of the dataset. Ranking
self-reported attributes
The first step in our methodology is to rank attributes based on their ability to discern the class
label of an object. The relative importance of an attribute is developed using the minimum
description length (MDL) principle.55 MDL is a global maximum likelihood estimator that
compares different selections of models in terms of the stochastic complexity they assign to a
given dataset, with the principle that the shorter the stochastic complexity, the better the model.
The relative importance of a given attribute is computed in terms of its relative cost, which
consists of the cost of the model (stochastic complexity) and the classification error. The
stochastic complexity of a model is the shortest code length that it would take to transmit the
data over a channel. The classification error (E) is given by E = 1 − max [p(i|D)]. The attribute
[Type text] Page 22

with the least cost has the highest importance.
ALGORITHM:
Random forest is a flexible, easy to use machine learning algorithm that produces, even without
hyper-parameter tuning, a great result most of the time. It is also one of the most used
algorithms, because of its simplicity and diversity (it can be used for both classification and
regression tasks). In this we'll learn how the random forest algorithm works, how it differs from
other algorithms and how to use it.
WHAT IS RANDOM FOREST?
Random forest is a supervised learning algorithm. The "forest" it builds, is an ensemble of

decision trees, usually trained with the “bagging” method. The general idea of the bagging
method is that a combination of learning models increases the overall result.
HOW RANDOM FOREST WORKS
Random forest is a supervised learning algorithm. The "forest" it builds, is an ensemble of

decision trees, usually trained with the “bagging” method. The general idea of the bagging
method is that a combination of learning models increases the overall result.
One big advantage of random forest is that it can be used for both classification and regression
problems, which form the majority of current machine learning systems. Let's look at random
forest in classification, since classification is sometimes considered the building block of
machine learning. Below you can see how a random forest would look like with two trees:
:
[Type text] Page 23

Random forest has nearly the same hyper parameters as a decision tree or a bagging classifier.
Fortunately, there's no need to combine a decision tree with a bagging classifier because you
can easily use the classifier-class of random forest. With random forest, you can also deal with
regression tasks by using the algorithm's regressor.
Random forest adds additional randomness to the model, while growing the trees. Instead of
searching for the most important feature while splitting a node, it searches for the best feature
among a random subset of features. This results in a wide diversity that generally results in a
better model.
[Type text] Page 24

Therefore, in random forest, only a random subset of the features is taken into consideration by
the algorithm for splitting a node. You can even make trees more random by additionally using
random thresholds for each feature rather than searching for the best possible thresholds (like a
normal decision tree does).
REAL-LIFE ANALOGY
Andrew wants to decide where to go during one-year vacation, so he asks the people who know
him best for suggestions. The first friend he seeks out asks him about the likes and dislikes of his
past travels. Based on the answers, he will give Andrew some advice.
This is a typical decision tree algorithm approach. Andrew's friend created rules to guide his
decision about what he should recommend, by using Andrew's answers.
Afterwards, Andrew starts asking more and more of his friends to advise him and they again ask
him different questions they can use to derive some recommendations from. Finally, Andrew
chooses the places that where recommend the most to him, which is the typical random forest
algorithm approach.
FEATURE IMPORTANCE
Another great quality of the random forest algorithm is that it is very easy to measure the relative
importance of each feature on the prediction. Sklearn provides a great tool for this that measures
a feature's importance by looking at how much the tree nodes that use that feature reduce
impurity across all trees in the forest. It computes this score automatically for each feature after
training and scales the results so the sum of all importance is equal to one.
If you don’t know how a decision tree works or what a leaf or node is, here is a good description
from Wikipedia: '"In a decision tree each internal node represents a 'test' on an attribute (e.g.
whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and
each leaf node represents a class label (decision taken after computing all attributes). A node that
has no children is a leaf.'"
[Type text] Page 25

By looking at the feature importance you can decide which features to possibly drop because
they don’t contribute enough (or sometimes nothing at all) to the prediction process. This is
important because a general rule in machine learning is that the more features you have the more
likely your model will suffer from overfitting and vice versa.
D I F F E R E N C E B E T W E E N D EC I S I O N T R E E S A N D R A N D O M F O R E S T S
While random forest is a collection of decision trees, there are some differences.
If you input a training dataset with features and labels into a decision tree, it will formulate some
set of rules, which will be used to make the predictions.
For example, to predict whether a person will click on an online advertisement, you might collect
the ads the person clicked on in the past and some features that describe his/her decision. If you
put the features and labels into a decision tree, it will generate some rules that help
predict whether the advertisement will be clicked or not. In comparison, the random forest
algorithm randomly selects observations and features to build several decision trees and then
averages the results.
Another difference is "deep" decision trees might suffer from overfitting. Most of the time,
random forest prevents this by creating random subsets of the features and building smaller trees
using those subsets. Afterwards, it combines the subtrees. It's important to note this doesn’t work
every time and it also makes the computation slower, depending on how many trees the random
forest builds.
IMPORTANT HYPERPARAMETERS
The hyperparameters in random forest are either used to increase the predictive power of the
model or to make the model faster. Let's look at the hyperparameters of sklearns built-in random
forest function.
1. Increasing the predictive power

Firstly, there is the n_estimators hyperparameter, which is just the number of trees the algorithm
[Type text] Page 26

builds before taking the maximum voting or taking the averages of predictions. In general, a
higher number of trees increases the performance and makes the predictions more stable, but it
also slows down the computation.
Another important hyperparameter is max_features, which is the maximum number of features
random forest considers to split a node. Sklearn provides several options, all described in
the documentation.
The last important hyperparameter is min_sample_leaf. This determines the minimum number
of leafs required to split an internal node.
2. Increasing the model's speed
The n_jobs hyperparameter tells the engine how many processors it is allowed to use. If it has a
value of one, it can only use one processor. A value of “-1” means that there is no limit.
The random_state hyperparameter makes the model’s output replicable. The model will always
produce the same results when it has a definite value of random_state and if it has been given the
same hyperparameters and the same training data.
Lastly, there is the oob_score (also called oob sampling), which is a random forest cross-
validation method. In this sampling, about one-third of the data is not used to train the model and
can be used to evaluate its performance. These samples are called the out-of-bag samples.
It's very similar to the leave-one-out-cross-validation method, but almost no additional
computational burden goes along with it.
A D V A N TA G E S A N D D I S A D V A N TA G E S O F T H E R A N D O M F O R E S T
ALGORITHM
One of the biggest advantages of random forest is its versatility. It can be used for both
regression and classification tasks, and it’s also easy to view the relative importance it assigns to
the input features.
Random forest is also a very handy algorithm because the default hyperparameters it uses often
produce a good prediction result. Understanding the hyperparameters is pretty straightforward,
and there's also not that many of them.
One of the biggest problems in machine learning is overfitting, but most of the time this won’t
[Type text] Page 27

happen thanks to the random forest classifier. If there are enough trees in the forest, the classifier
won’t overfit the model.
The main limitation of random forest is that a large number of trees can make the algorithm too
slow and ineffective for real-time predictions. In general, these algorithms are fast to train, but
quite slow to create predictions once they are trained. A more accurate prediction requires more
trees, which results in a slower model. In most real-world applications, the random forest
algorithm is fast enough but there can certainly be situations where run-time performance is
important and other approaches would be preferred.
And, of course, random forest is a predictive modeling tool and not a descriptive tool, meaning if
you're looking for a description of the relationships in your data, other approaches would be
better.
RANDOM FOREST USE CASES
The random forest algorithm is used in a lot of different fields, like banking, the stock market,
medicine and e-commerce.
In finance, for example, it is used to detect customers more likely to repay their debt on time, or
use a bank's services more frequently. In this domain it is also used to detect fraudsters out to
scam the bank. In trading, the algorithm can be used to determine a stock's future behavior.
In the healthcare domain it is used to identify the correct combination of components in medicine
and to analyze a patient’s medical history to identify diseases.
Random forest is used in e-commerce to determine whether a customer will actually like the
product or not.
SUMMARY
Random forest is a great algorithm to train early in the model development process, to see how it
performs. Its simplicity makes building a “bad” random forest a tough proposition.
[Type text] Page 28

The algorithm is also a great choice for anyone who needs to develop a model quickly. On top of
that, it provides a pretty good indicator of the importance it assigns to your features.
Random forests are also very hard to beat performance wise. Of course, you can probably always
find a model that can perform better, like a neural network for example, but these usually take
more time to develop, though they can handle a lot of different feature types, like binary,
categorical and numerical.
Overall, random forest is a (mostly) fast, simple and flexible tool, but not without some
limitations.
UML diagrams:
3 modules are as follows:
 Data Acquisition:
The datasets used in our analysis was obtained from the database of the Climate Agency .
Such databases are obtained from various locations across the country. Every sample
comprises of 12 characteristics deemed to be important attributes for forecasting intensity as
seen in Table I. The key reason for selecting the intensity attribute instead of the other
attributes is that the severity attribute is perceived to be of great significance to the flood
control centre.
 Data Pre-processing:
It is a methodology in data mining that is used to convert the raw information into a
meaningful and efficient format. Many unrelated and missing parts may be present in the
results. Data cleaning is performed to tackle the portion. This includes managing details
which are incomplete, noisy information etc. The steps involved in this process are:
Data cleansing: This condition happens when some data is missing in the datasets. Data
transformation: It is a data mining process used to transform the data into suitable forms. It is
usually done in order to convert the data into required format and then carry on the further
processes of analysing.
[Type text] Page 29

 Data modelling
After pre-processing, data will be sent to data modelling module, where testing and training
of the data is carried on. in this step, is modelled using several algorithm. Data models are
built using algorithms and then test models using testing data and then select the best model
with the help of accuracy.
USE CASE DIAGRAM

A use case diagram at its simplest is a representation of a user's interaction with the system that
shows the relationship between the user and the different use cases in which the user is
involved. A use case diagram can identify the different types of users of a system and the
different use cases and will often be accompanied by other types of diagrams as well. While a
use case itself might drill into a lot of detail about every possibility, a use case diagram can
help provide a higher-level view of the system. It has been said before that "Use case diagrams
are the blueprints for your system". They provide the simplified and graphical representation of
what the system must actually do.
Dataflow Diagram:
A data flow diagram (DFD) is a graphical representation of the "flow" of data through an
information system, modelling its process aspects. A DFD is often used as a preliminary step to
create an overview of the system without going into great detail which can later be elaborated.
[Type text] Page 30

DFDs can also be used for the visualization of data processing(structured design).
Level-0
Level-1
[Type text] Page 31

CHAPTER 5
SYSTEM IMPLEMENTATION
MODULES:
• Data collection
• Predictive modeling of asthma exacerbation
• Classification algorithms
• Evaluation
Data collection
The dataset used in this study consisted of 7001 records collected from asthma patients using
previously prescribed home telemanagement.52 The severity of asthma varied from mild
persistent to severe persistent.31 Patients used a laptop computer at home to fill in their asthma
diary on a daily basis. The diary included information about respiratory symptoms, sleep
disturbances due to asthma, limitation of physical activity, presence of cold, and medication
usage. The patients also measured peak expiratory flow (PEF) using a peak expiratory flow
meter that communicated PEF values to the laptop automatically. The laptop sent the results of
patient self-testing to a central server on a daily basis. Once the self-testing results were received,
the patient’s current asthma status was automatically assigned to one of four levels of asthma
severity on the basis of a widely accepted clinical algorithm promulgated by current clinical
guidelines53—green zone (zone 1): “doing well;” high yellow zone (zone 2): “asthma is getting
worse;” low yellow zone (zone 3): “dangerous deterioration;” and red zone (zone 4): “medical
alert.” For the purposes of this study, we merged zones 1 and 2 in one class named “no-alert” and
merged zones 3 and 4 into another class named “high-alert.” For each day of patient self-testing,
our research database included a set of variables from patient asthma diaries and corresponding
asthma severity for this day expressed as “no-alert” zone or “high alert” zone.
Predictive modeling of asthma exacerbation
[Type text] Page 32

In the methodology for this study, we have adapted notions from inter transactional association
analysis.54 However, the specific challenge in applying this methodology lies in the fact that we
are interested in predicting the rare event of an imminent asthma exacerbation based on a number
of features over a preceding time window consisting of several consecutive days. Our
methodology consists of the following steps: (1) develop an attribute importance model for
ranking the attributes used for predicting the target event; (2) determine a prediction period
preceding the target event (Wp); (3) create a set of target mega-transactions (Tm) by including
all attributes from step 1 for the repeat step 1 for Wp; (4) repeat step 1 on Tm, which results in a
set of mega-transactions with a reduced subset of features (Td); (5) group the transaction set Td,
such that the transactions belonging to each target class belong to the same group; and (6)
perform three sets of experiments: train and test a classifier with the original data, train and test
classifiers using a stratified sample with equal representation of each class, train a classifier with
the training data from the stratified sample and test data from the rest of the dataset.
Ranking self-reported attributes

The first step in our methodology is to rank attributes based on their ability to discern the class
label of an object. The relative importance of an attribute is developed using the minimum
description length (MDL) principle.55 MDL is a global maximum likelihood estimator that
compares different selections of models in terms of the stochastic complexity they assign to a
given dataset, with the principle that the shorter the stochastic complexity, the better the model.
The relative importance of a given attribute is computed in terms of its relative cost, which
consists of the cost of the model (stochastic complexity) and the classification error. The
stochastic complexity of a model is the shortest code length that it would take to transmit the
data over a channel. The classification error (E) is given by E = 1 − max [p(i|D)]. The attribute
with the least cost has the highest importance.
Classification algorithms
Three classification algorithms were used for building classification models: adaptive Bayesian
network, naive Bayesian classifier, and support vector machines. We briefly describe each
algorithm and corresponding analytical workflow below.
The naive Bayesian classifier looks at historical data and calculates conditional probabilities for
[Type text] Page 33

the class values by observing the frequency of attribute values and of combinations of attribute
values. For example, suppose A represents “the PEF of a patient is between 250 and 275” and B
represents “the patient is in ZONE 3.” The Bayes theorem states that Prob (B given A) = Prob (A
and B) / Prob (A). Therefore, to calculate the probability that a patient whose PEF is between
250 and 275 will be in ZONE 3, the algorithm must count the number of cases where A and B
occur together as a percentage of all cases (“pairwise” occurrences) and divide that by the
number of cases where A occurs as a percentage of all cases (“singleton” occurrences).
Generalizing this to multiple attributes, Prob (B given A1, A2 … An) = [Prob (A1 and B)* Prob
(A2 and B)* … * Prob (An and B)] / Prob (A). If there are multiple class values B1 … Bm, then
the probabilities for all the class values are calculated and the one with the highest probability is
selected. The naive Bayesian classifier assumes that the attributes are independent. Despite this
assumption this classifier has been reported to have a high predictive accuracy in real-life
datasets.45 The naive Bayesian classifier is also very efficient with linear scalability in terms of
the number of attributes and instances.46
The second algorithm used, adaptive Bayesian network, was based on Bayesian networks, which
use a directed acyclic graph consisting of nodes, where each node represents an attribute.47
Corresponding to each node are instances with conditional probabilities. The conditional
probability of an instance is calculated by the relative frequencies of the associated attributes in
the training data. The problem of learning Bayesian networks has two parts: searching through
the space of possible networks, and evaluating the networks developed. In this study we used an
adaptive Bayesian network model,56 which starts with a baseline model, such as a naive
Bayesian model on the top k predictive attributes. The network is built starting with the attribute
with the best predictive power as the seed and successively adding other attributes until the
predictive accuracy cannot be improved. The resulting model is compared with the baseline
model, and, if it is not more accurate, the process is repeated with the next best attribute. The
algorithm stops either when there are no more seeds left or one of the predefined conditions are
met. Specifically, the predefined conditions are met either when the model does not change after
a preselected number of iterations or it contains a preselected number of attributes. The models
are evaluated using the MDL principle.
The third algorithm we used is a support vector machine algorithm,57,58 which uses a subset of
training data as support vectors. The support vectors are the closest instances to the maximum
[Type text] Page 34

margin hyperplane, which provides the greatest separation between the classes. The support
vectors are determined by constrained quadratic optimization. Support vector machines are
useful in instances with nonlinear class boundaries and avoid the problem of overfitting, as the
maximum margin hyperplane is fairly stable even in high-dimensional space spanned by
nonlinear transformations.
Evaluation
The results of the experiments were captured in terms of the number of values for true positive
(TP), true negative (TN), false negative (FN), and false positive (FP), which subsequently were
converted into overall accuracy, sensitivity, and specificity values as follows:
A receiver operating characteristic (ROC) was used to characterize comparative performance of

classifying algorithms for asthma exacerbation prediction resulting from different training data
sets. The curve was created by plotting the true positive rate against the false positive rate at
various threshold settings.
Results
Preparation and analysis of experimental data sets

The experiments were run using Oracle data miner as the data mining tool and an Oracle 10-g
database server for storing the data. As noted before, the data was preprocessed so that the value
of ZONE on the eighth day (ZONE_8) was discretized to two values: “high-alert” if ZONE_8
equals 3 or 4, and “no-alert” if it equals 1 or 2. This resulted in a dataset with 148 high-alert
records and 2287 no-alert records, which shows that the data is heavily (90%) skewed toward no
[Type text] Page 35

alert cases. We created a stratified sample from this data, such that the data distribution among
the two classes (high-alert and no-alert) was approximately equal.
The resultant dataset had 146 (49%) high-alert records and 152 (51%) no-alert records. The
distribution could not be made exactly equal across the two classes because of software
limitations.
Three sets of experiments were conducted to evaluate the effect of using a stratified sample to
train a classifier as opposed to using only the original skewed data set. In each of these
experiments, we divided the data into two disjoint sets for training and testing so that there are no
common records in the two sets. Furthermore, about 70% of the data in each set was used for
training and the remaining 30% was used for testing. Each set of experiments was conducted on
three classification algorithms. The dataset for the first case consisted of the whole dataset with
2435 records, of which 1452 records were used for training and the remaining 983 records were
used for testing. The second dataset consisted of a stratified sample of 298 records, of which 205
were used for training and the remaining 93 were used for testing. In the third set of experiments
the classifiers were trained using the training data for the stratified sample of 205 records and
were tested against the remaining 2230 of the original records. The breakdown of the dataset
sizes is provided in Table 2. In each of the abovementioned experimental datasets, three
classifiers were trained and tested: adaptive Bayesian network, naive Bayesian classifier, and
support vector machines.
Based on a 7-day window with 21 self-report variables generated daily by an asthma patient,
there were 147 attributes initially, of which 63 had a positive value in our attribute importance
model, which were kept in our subsequent analyses. However, these 63 attributes were spread
over 7 days, and, hence, there were nine attributes with values on each of the 7 days prior to the
target attribute, namely, the value of Zone on the eighth day. The selection of the 7-day
preceding window was based on clinical factors. The most important attribute was the value of
Zone with reduced importance from the seventh day to the first day. This implies that the Zone
value on the seventh day was the best predictor of the condition of the patient on the eighth day.
Evaluation of prediction models
[Type text] Page 36

The results of the experiments are summarized in Table 3. At a first glance it may seem that the
classifiers performed with a high degree of accuracy when trained with the raw data. However,
the sensitivity numbers were particularly low across all three classifiers. Since sensitivity in these
experiments was the TP rate, this was viewed as problematic because it indicated the percentage
of correctly predicted outcomes when the patient was in a high-alert zone. After training the
classifiers with a stratified sample, we found that the sensitivity rates increased substantially. As
shown in Table 3, the increases were 33% for the adaptive Bayesian network, 16% for the naive
Bayesian classifier, and around 40% for support vector machines. In other words, for all of the
experiments, we found that the sensitivity was improved substantially with a classifier trained on
the stratified sample as compared to a classifier trained on the original data.
The reason for this increase in sensitivity is that the original data are skewed, with 90% of the
records in the no-alert class, and hence biased the classifier toward identifying no-alert classes.
We validated this conclusion further by creating an ROC curve where the areas under the curve
are 83.3%, 89.2%, and 87.8% for cases 1, 2, and 3, respectively. This shows clearly that the
classification model built on a stratified sample performed better compared to one that was built
on the raw data. It should be noted that the increase in sensitivity was accompanied by a slight
decrease in the values of accuracy and specificity. The accuracies of the classifiers were reduced
by 2–5% for the naive Bayesian classifier and 14–15% for support vector machines. However,
for the adaptive Bayesian network the accuracy increased around 5%. A similar trend can be
noticed for the values of specificity. While the specificity increased by 3% for the adaptive
Bayesian network, it decreased by 1.5–6.5% for the naive Bayesian classifier, and 18–20% for
support vector machines. The drop of accuracy occurred as a result of the drop in specificity. As
can be seen from its definition, specificity refers to the true negative rate, which is the ratio of the
times where a patient was correctly identified as being in the no-alert zone to the number of true
negatives added to the false positives. This indicates that the number of false positives increased
as a result of the adjustment made in favor of detecting the true positives for patients in the high-
alert zone.
Impact of varying time windows on effectiveness of prediction model

The purpose of determining the prediction window is to decide the number of previous days that
should be included in making the prediction of the Zone value. We used two sets of experiments
[Type text] Page 37

to determine the prediction window. In the first set of experiments we tested the predictive power
of each attribute on day i on the target attribute on day 8, where i ranged from n−1 to n−7. We
tested the predictive power of the attributes selected in the last step by successively moving
backwards, starting with the day prior to the day for which the prediction is to be made and
stopping when the measures of prediction fall below a certain threshold. Three measures were
used: accuracy, sensitivity, and specificity, as defined above. The results showed that the
accuracy of prediction systematically decreased from the day 7 to day 1 for all three algorithms
that were used. The sensitivity values showed a similar trend for support vector machines and the
naive Bayesian classifier, but showed an inconsistent pattern for the adaptive Bayesian network.
Specificity also systematically decreased from day 7 to day 1 for two out of three classifiers.
In the second set of experiments, we compared the three measures (accuracy, sensitivity, and
specificity) for windows with varying width. The first window was n−1 with width 0 (meaning
only 1 day prior to the target day). Next, we kept the starting point the same (n−1) and varied the
width from 0 to 6. Thus, we had seven windows, with the largest window having a width of 6.
The results showed a slightly decreasing trend in the values of all three measures, except on the
fourth window where there was a sudden jump in accuracy and specificity (Fig. 2). One of the
reasons for the decrease in the measures could be temporal auto-correlation.
Impact of temporal auto-correlation on effectiveness of prediction model

In a subsequent set of experiments we tested the temporal auto-correlation for each attribute by
examining its correlation between day di with di−1. We tested the time series data for each diary
answer for temporal auto-correlation, which revealed that all except one attribute had a high
degree (70% or above) correlation with its value on the previous day. Hence, we tested whether
selecting only uncorrelated parameters would impact the effectiveness of the prediction model.
Thus, all three classification models were created and tested against attributes whose temporal
auto-correlation ranged from 0.81 to 0.45 and compared the results with a baseline model where
all parameters were included. The results, shown in Figure 3, did not support the hypothesis that
uncorrelated parameters would result in a more effective model.
[Type text] Page 38

CONCLUSION AND FUTURE WORK
Machine learning approaches have significant potential in prediction of asthma patients. Future
steps in developing predictive models for asthma will utilize a comprehensive predictive
framework combining multiple data streams, and further optimization of feature selection, data
set preparation, and machine learning approaches may significantly enhance resulting
algorithms. Predictive modeling of asthma may be relevant to developing effective predictive
frameworks in other chronic health conditions.
[Type text] Page 39

REFERENCES:
 Stratified, personalised or P4 medicine: a new direction for placing the patient at the
centre of healthcare and health education. Academy of Medical Sciences. 2015
 Collins FS, Varmus H. A new initiative on precision medicine. N Engl J Med. 2015;
372:793–5. [PubMed: 25635347]
 Tenenbaum JD, Avillach P, Benham-Hutchins M, Breitenstein MK, Crowgey EL,
Hoffman MA, Jiang X, Madhavan S, Mattison JE, Nagarajan R, Ray B, Shin D,
Visweswaran S, Zhao Z, Freimuth RR. An informatics research agenda to support
precision medicine: seven key areas. J Am Med Inform Assoc. 2016
 Ashley EA. The precision medicine initiative: a new national effort. JAMA. 2015;
313(21):2119–20. [PubMed: 25928209]
 Tan SS, Gao G, Koch S. Big Data and Analytics in Healthcare. Methods Inf Med. 2015;
54(6):546– 7. [PubMed: 26577624]
 Amarasingham R, Audet AJ, Bates DW, Glenn Cohen, Entwistle M, Escobar GJ, Liu V,
Etheredge L, Lo B, Ohno-Machado L, Ram S, Saria S, Schilling LM, Shahi A, Stewart
WF, Steyerberg EW, Xie B. Consensus Statement on Electronic Health Predictive
Analytics: A Guiding Framework to Address Challenges. EGEMS (Wash DC). 2016;
4(1):1163. [PubMed: 27141516]
 Piccirillo JF, Vlahiotis A, Barrett LB, Flood KL, Spitznagel EL, Steyerberg EW. The
changing prevalence of comorbidity across the age spectrum. Crit Rev OncolHematol.
2008; 67(2):124–32. [PubMed: 18375141]
[Type text] Page 40

Asthma Management Report

Uploaded by

Copyright:

Available Formats

Asthma Management Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Asthma Management Report

Uploaded by

Copyright:

Available Formats

ASTHMA MANAGEMENT USING MACHINE LEARNING 2021

Patient telemonitoring results in an aggregation of significant amounts of information about

[Type text] Page 1

Asthma is a chronic inflammatory disorder of the airways characterized by an obstruction of

[Type text] Page 2

[Type text] Page 3

[Type text] Page 4

[Type text] Page 5

[1] Asthma Academy: Developing Educational Technology to Improve Asthma Medication

[Type text] Page 6

[Type text] Page 7

[Type text] Page 8

[Type text] Page 9

[Type text] Page 10

SYSTEM REQUIREMENT SPECIFICATION

A software requirements specification (SRS) is a detailed description of a software system to

[Type text] Page 11

[Type text] Page 12

[Type text] Page 13

 Machine learning Models

[Type text] Page 14

 Operating system : macOS, Windows XP/7 or higher version.

TOOLS AND TECHNOLOGY DETAILS

It is an object-oriented programming language. The processing happens during the runtime,

[Type text] Page 15

[Type text] Page 16

 Header files: These are located in the OPENCV_BUILD\install\includesubdirectory and

[Type text] Page 17

 Open source computer vision library in C/C++.

 Image data manipulation (allocation, release, copying, setting, conversion).

 Various dynamic data structures (lists, queues, sets, trees, graphs).

 Object recognition (eigen-methods, HMM).

 Basic GUI (display image/video, keyboard and mouse handling, scroll-bars).

 Image labeling (line, conic, polygon, text drawing)

MACHINE LEARNING TECHNIQUES

A branch of artificial intelligence is called machine learning, it is used to study algorithms

[Type text] Page 18

[Type text] Page 19

[Type text] Page 20

[Type text] Page 21

[Type text] Page 22

WHAT IS RANDOM FOREST?

Random forest is a supervised learning algorithm. The "forest" it builds, is an ensemble of

Random forest is a supervised learning algorithm. The "forest" it builds, is an ensemble of

[Type text] Page 23

[Type text] Page 24

[Type text] Page 25

1. Increasing the predictive power

[Type text] Page 26

[Type text] Page 27

RANDOM FOREST USE CASES

[Type text] Page 28

3 modules are as follows:

[Type text] Page 29

USE CASE DIAGRAM

[Type text] Page 30

[Type text] Page 31

• Predictive modeling of asthma exacerbation