Cba 8 Clinical Decision Support System: Capstone Project

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Capstone

Project

CBA 8

Clinical Decision Support System

Ravinderpal Wasu
Shailesh Vishwakarma
Sidharth Gupta
THE INDIAN SCHOOL OF BUSINESS (ISB) evolved from the need for a world-
class business school in Asia. The founders, some of the best minds from the corporate and
academic worlds, anticipated the leadership needs of the emerging Asian economies.

They recognized that the rapidly changing business landscape would require young leaders
who not only have an understanding of the developing economies but who also present a global
perspective. The ISB is committed to creating such leaders through its innovative programs,
outstanding faculty and thought leadership. Funded entirely by private corporations, foundations
and individuals from around the world who believe in its vision, the ISB is a not-for-profit
organization.

CBA is a rigorous and challenging program. The schedule will include full
days of teaching and evenings will be used for guest lectures, projects, and
group work. Participants will be required to stay on campus during those
classroom days.

SPONSORS

A healthcare ecosystem:
•Where access to good health is as easy as shopping online.
•That brings the healthcare provider, right at the doorstep.
•Where you are never out of medicines and basic medical supplies.
•Which provides care for loved ones so personal that you are never away from home.
•Where reports/findings are not mere facts but guides to overall good health.
•Where you aren’t just a patient but an inclusive partner to the success of the ecosystem

TEAM
Student Name PGID About

Ravinderpal 71710004 Senior Program Manager at Microsoft with 20 years of work


Singh Wasu experience and is a high-performing information technology
professional with passion to drive and manage project
deliveries with enterprise customers and partners through
accelerated adoption and productive use of Microsoft
technologies

Shailesh 71710069 Senior Consultant at Brillio Technologies with 6 years of


Vishwakarma experience in Business Intelligence, Business Analytics and
Data Science in domain of Insurance, Ecommerce and Health

Siddharth 71710106 Associate Director at KPMG, India with 10 years of experience


Gupta in Data Analytics, Visual Analytics, Reporting, Process
Automation, Advance Analytics, Predictive Analytics and
Modelling
1. Motivation + Industry perspective .................................................................................................... 3
2. Project Descriptions ........................................................................................................................... 6
3. Data collection, visual exploration, transformation ...................................................................... 10
Part1 ....................................................................................................................................................... 10
Step 1 - Data Collection ................................................................................................................. 10
Step 2 - Data Transformation ......................................................................................................... 10
• Creating the tables and uploading the data from CSV ....................................................... 10
• Denormalized the data (Data cleaning) ................................................................................ 10
Step 3 – Flowchart of the algorithm (Analytical methods and Technology) ............................ 11
Step 4 – Python code to provide outputs (Interpretation of Output/Visualization) ................. 12
Part 2 ...................................................................................................................................................... 13
Step 1 - Data Collection - Data from CallHealth ......................................................................... 13
Step 2 - Data Transformation - Data from CallHealth ............................................................... 15
• Denormalized the data (Data cleaning) ................................................................................ 15
Step 3 – Flowchart of the algorithm (Analytical methods and Technology) ............................ 16
Step 4 – Python code to provide outputs with CallHealth data and factoring Age and
Gender (Interpretation of Output/Visualization) ........................................................................... 16
Step 5 - Change parameters/method and Iterate........................................................................ 20
4. Model Comparison ........................................................................................................................... 24
i) Multi-Class Classifier (One-Vs-All) ............................................................................................. 24
ii) IBM Watson DeepQA Technology for Healthcare .................................................................. 26
5. Challenges......................................................................................................................................... 29
6. Conclusion ......................................................................................................................................... 29
7. References ........................................................................................................................................ 30
8. Apendix: ............................................................................................................................................. 30
1. Motivation + Industry perspective
According to Deloitte’s 2018 Global Health Care Outlook1 - Evolution of smart health care “With
the ever-evolving policies, processes, and capabilities and the given magnitude and complexity
impacting the sector, smart health care is not going to come easy. Clinicians, usually, have
difficulty coordinating appointments and procedures, sharing test results, and involving patients
in their treatment plan. In other words, care providers may be working hard but are they working
“smart”? How is health care moving the barriers of the hospital walls? This 2018 outlook reviews
the current state of the global health care sector; explores trends and issues impacting health
care providers, governments, other payers, and patients; and suggests considerations for
stakeholders as they seek to deliver high-quality, cost-efficient, and smart health care.”

Fig 1

There are many reasons that healthcare facilities implement electronic health records (EHRs);
among those reasons are to recommendation of best suited diagnosis and medicine to
practitioners and avoiding penalties, participating in value-based reimbursement, a desire to
provide better care, and to fulfil a requirement for quality recognition.

Healthcare sector incur a lot of expenditure on customer service for enquiries, booking
appointments, consultation, etc. With growing need of providing better customer services, there
comes a rising cost.
To meet the objective to deliver high-quality and cost efficient, technology can step in and save
healthcare organizations time and money: medical chat-bots powered with data and analytics.
Heath Care sector has been investing a lot of time and money in technological development in
building AI and preventive solutions. Chat-bots could save organizations $8 billion annually
worldwide by 2022, up from $20 million this year, Juniper Research forecasted2.

With growth in the use of EHRs and


combined knowledge of data,
device and decisions, will enhance
healthcare globally now. The
Healthcare sector is undergoing a
transformation with adoption of
major technological advances as
depicted in the Fig 2 and Fig 3

Fig 2

Fig 3
Clinical Decision Support is a sophisticated health IT component with capability to put various
data pieces together and analyze the same to generate valuable insigths. It requires computable
biomedical knowledge, person-specific data, and a reasoning or inferencing mechanism that
combines knowledge and data to generate and present helpful information to clinicians as care is
being delivered. This information must be filtered, organized and presented in a way that supports
the current workflow, allowing the user to make an informed decision quickly and take action.

Clinical Decision Support has a number of important benefits, including:


• Single window for all medical history records of the patient
• Possibility to collate and analyze data across patients
• Increased quality of care and enhanced health outcomes
• Avoidance of errors and adverse events
• Improved efficiency, cost-benefit, and provider and patient satisfaction

The project is to build a Clinical Decision Support System for the CallHealth with two interfaces
for patients and internal team. The Clinical Decision Support System provides real-time response
based on symptoms and other parameters provided by the patient in an interactive manner.
The project will provide an interactive portal to customers to share their symptoms and other
medical details in an interactive manner along with contact details. These details can also be used
by CallHealth team to devise a customized healthcare recommendation for the customer.

The project will be used the CallHealth team to increase the customer base that will further
increase the revenue for the organization and will enhance the efficiency and quality of service
thus reducing the cost to the organization.

The Interactive Customer Portal (ICP) will try to understand the intent of the user. With use of
NLP techniques and algorithms, the ICP will do following:

a. reply to the query of the user


b. ask follow-up question
c. end the conversation if based on non-medical intent

Once the conversation is accepted, the ICP will present the proper response based on
symptoms or medical conditions mentioned by the user. These details are analyzed on real time
basis before responding the user. Once all the relevant details are captured, the information will
be passed on the CallHealth team for getting in touch with the user with relevant and
customized medical care
2. Project Descriptions
The objective of the project is to develop a solution that will help to reduce the errors in the
consultation with the help of Data and Analytics. The solution will have both preventive and
corrective recommendation. The CDS system is bifurcated into 2 sets:
Interactive Customer Portal (ICP)
The facet of ICP is deal with the patients who can login and share their age, gender and list of
symptoms based on their medical condition in an interactive manner on the portal. The CDS will
suggest relevant questions and symptoms during the conversation/consultation based on the
symptoms, demographics & medical history
Interactive Customer Portal
An Interactive Customer Portal that helps in providing all the healthcare need to the customers. The ICP will be an
automated chatbot that would ask/provide dynamic question/response to the customers based on their symptoms

Interactive Customer Portal - Key functionalities


Free Customer Enquiries Paid Subscription

Artificial Intelligence Customer Interaction Medical History


An rule based AI system that will act as Customers should be able to mention
a brain behind the ICP his symptoms and concerns
Doctors consulted

Option Based Responses Dynamic Response Mechanism Diagnosis Reports


The ICP will respond in form of Provides an interactive and dynamic
multiple options where customers can response on the basis of symptoms and
select one of the options customer demographics, if any Recommendation

The idea is to provide a portal to customers to share their symptoms and other medical details in an interactive
manner along with contact details. These details can also be used by CallHealth team to devise a customized
healthcare recommendation for the customer.
Fig 4

Clinical Decision Support System (CDSS)


This facet of the CDS will be accessible to internal team of CallHealth to input the relevant details
such as age, gender and symptoms. Based on the inputs provided to CDSS a list of probables’
will be published to the user along with likelihood
Clinical Decision Support System
Clinical decision support (CDS) provides clinicians, staff, patients or other individuals with knowledge and person-
specific information, intelligently filtered or presented at appropriate times, to enhance health and health care.

Clinical decision support system - Key functionalities

Recommendation Engine Detailed report / analysis


The recommendation will be given on the basis Doctors/physicians/Clinicals should be able to
of similar patterns of the symptoms and medical get detailed report and analysis based on
history symptoms & diagnosis report

Preventive / Deductive Responses Dynamic Response Mechanism


The CDSS will respond to the Provides an interactive and dynamic response
doctors/physicians/clinicals in the form of on the basis of disease identified
preventive / deductive measure

The idea is to provide a portal to the doctors, physicians and clinicals that will support them in terms of correct
treatments to the patients.
Fig 5
The solution framework, as depicted in the image below, has the connectivity with the data tables
of the diseases, symptoms and demographic details and medical history or record of the patient.
The live interaction will feed-in the symptoms and other relevant details to the ICP. The
information will be get analyzed the based on the set of algorithms and the output will be displayed
based on the access rights of the user.

Fig 6

The key functionalities of CDS would include following:


• Intent Analysis: The interactive customer portal will conduct an intent analysis of the initial
queries mentioned by the user to ensure that query is with reference to medical needs. For
instance – if the user initiates a conversation with “Hi, Would you like grab a cup of coffee with
me?” then the portal will respond with “Your query is not related to medical needs. Have a
good day!!” and the session will be closed.
• Demographic details: The CDS provides an options to capture and store the details of the
user such as Gender, Age, etc. Additionally, the portal can also capture the details, with use
of technology, such as device, location, IP address, login ID, etc.
• Collect and store symptoms: The ICP provides users to input or select the symptoms based
on their medical conditions. These symptoms will be stored and analyzed not just of the
session but for future reference as well
• Analyze symptoms: The list of symptoms provided by the user will be analyzed along with
other demographic details and pervious medical history, if applicable.
• Hypothesis testing and shortlist symptoms: A list of correlated symptoms will be
generated based on the analysis of the symptoms, age, gender, etc. and then would be
validated with possibility of a user with mentioned age and gender to have these possible
symptoms
• Publish top 5 best-suited symptoms: The portal will provide user the option to select from
the top 5 symptoms based published list of options or to type a specific symptom that is not
mentioned on the list
• Store list of displayed symptoms: All the list of symptoms displayed to user will be saved
for further analyses
• Store list of selected symptoms: the list of symptoms selected by user will be saved along
with the demographic details of the user to generate the next list of symptoms
• Store of displayed and non-selected symptoms: The symptoms that are displayed but
have not been selected by the user will also be saved to make sure that these symptoms are
not published further on the screen
• Once all the relevant and correlated symptoms are displayed, we will collect these symptoms
to get the diseases
• The user will be able to see all the possible diseases related to the symptoms, age and gender
provided by the user

The aforementioned services can be explained with use of an example as mentioned below:
Key functionalities – For Patient e.g. Fever and cough
✓ Ask patient for personal details – demographics, if provided
Understanding the ✓ Suppose patient enters the first symptoms as “I have Fever and Cough”
questions and ✓ From the 5 tokens, decompose the symptoms as a) Fever b) Cough using NLP
symptoms
✓ Identify the intent using SVM, SoftMax classifier, etc.

✓ Getting all the corresponding responses, possible matches of combination, key words and intent of the query for
the tokens from the Answers source. Here we could get 1000s responses
Generate
hypothesis for ✓ Create the Hypothesis for each response. At this point quantity is more important than accuracy
each options using ✓ Here we will get all the next questions to be asked from the patient for Fever and Cough
list of responses ✓ We filter out responses based on the combination of keywords and previous inputs. For e.g. if for fever, cough is
a response, which is already exists in combination of keywords in previous input, we filter it out

✓ Now based on the various Hypothesis generated, we will evaluate and test the hypothesis based on the data of
Analyze and rank evidences available in the Evidence database for patients
the hypothesis ✓ We will rank the hypothesis based on the p-value and weightages of the evidences based on the previous inputs
using evidences – Fever and Cough combination and demographics provided by the patient
✓ Here we will filter out the hypothesis with low weightage. The highest ranked answers appear first

✓ Train the model with the available inputs. Here the model will keep asking next question until all the possible
symptoms are provided by the patient based on top 5 or 10 corelated symptoms/diseases with a threshold
confidence level
Provide
✓ Provide the response - next question OR probable diseases, to the patient
Responses
✓ For our example, we will get the top 5 or 10 response in order of their ranks as the next questions Or the
probable diseases to be exposed to the patient
✓ Ask for feedback and use this data to further improvise the model
Fig 7

Impacted Stakeholders:
The project is to develop not just support system for CallHealth’s IT team but to build a complete
solution that would help their marketing team, team of doctors and clinical experts, patient support
team and sales team.
a. Marketing Team: The marketing team will be able to collect analyze more data about the
prospective customers that will help to develop and market more customized products and
services
b. Team of Clinical experts: The clinical experts would have access not just to the
symptoms and reports but also the medical history of the patients such as allergies, etc.
Aided with the data and analytics for patients with similar symptoms and medical
requirements, the clinical experts would be able to help attend and consult the patient in
a more effective and efficient manner.
c. Patient support team: Support team will save time as they would already be aware of
the medical condition of the patient and can directly attend the requirements
d. Sales team: The sales team will be able to make more targeted sales of the customized
products and services to the customers with specific medical requirements
3. Data collection, visual exploration, transformation
Part1
Step 1 - Data Collection
Searched data for Diseases and Symptoms from various sites
SNOMED data collected from http://www.nature.com/articles/ncomms5212#supplementary-
information site →cSupplementary Data 7
SNOMED-CT symptom-disease relationships. File Name - ncomms5212-s8.xls
The data file has six sheets:
Sr. Sheet Name (Nos. of Column names Column Description
No. Rows)
1 disease terms rid Row ID
(1623 rows) disease_cui Unique Disease Id
snomed_code Unique SNOMED Disease Id
Terms Disease names
2 disease list rid Row ID
(1623 rows) disease_cui Unique Disease Id
Number of Number of Symptoms associated with
symptoms the Disease
3 symptom terms rid Row ID
(817 rows) symptom_cui Unique Symptom Id
snomed_code Unique SNOMED Symptom Id
Terms Symptom names
4 symptom list rid Row ID
(817 rows) symptom_cui Unique Symptom Id
Number of Number of Disease associated with the
diseases Symptoms
5 SNOMED semantic rid Row ID
types Semantic type Category code
(131 rows) code
Semantic type Category Name
Number of Number of distinct concepts
distinct concepts
6 disease-symptom rid Row ID
relationships disease_cui Unique Disease Id
(2340 rows) symptom_cui Unique Symptom Id

Step 2 - Data Transformation


• Creating the tables and uploading the data from CSV
From the above file, created 6 different csv files and uploaded the csv data to make different
tables in SQL Server using bulk insert
(SQL script - CapstoneProject_Callhealth.sql in appendix)

• Denormalized the data (Data cleaning)


Denormalized the data to create a table with all data in denormalized format, making it easy
to query (SQL script - CapstoneProject_Callhealth.sql in appendix)
Fig 8

Step 3 – Flowchart of the algorithm (Analytical methods and Technology)


Take inputs Ask for
from patient health Is it a medical Ask for
Start condition ?
YES
Name, Age, condition. Symptoms
Gender
NO

Removed the Display the


Query for the
For the Symptoms TOP 5
For given TOP 5
Diseases, get all already Asked Symptoms
Symptoms, get Symptoms as Stop
related for before or and ask for
all Diseases per frequency of
Symptoms provided by next
Symptom
Patient Symptoms
Fig 9
Step 4 – Python code to provide outputs (Interpretation of Output/Visualization)
Libraries used – Pandas, NLTK for NLP, PySpark for SQL Context
Module 1 – Intent Analysis – trained the model for medical and non-medical inputs
So here, if the patient enters anything other than a medical query, as below, the code would
capture the intent

If it is a medical query, the code will request for the Symptoms from the patient

Here the patient will select (or type in) the Symptoms from the UI.
The patient can select the given Symptoms or enter the Symptom after selecting Other.
Module 2 – Initialized the unique stemmed words to classify. Scored the words, tokenized the
input symptoms
Module 3 – To query the dataset and fetch the next top 5 symptoms.
Here we create a dataset to read the data from the csv file. Created table dfSymptomDisease.
Created table with distinct diseases – dfDistinctDiseaseName for the first Symptoms.
Then for the distinct diseases, selected all distinct Symptoms. Created table dfDistinctSymptom
Then from dfDistinctSymptom, created another table, filtering out those Symptoms that were
already presented / selected by the patient.
Finally, selected top 5 frequent Symptoms and displayed to patient to select next possible
symptoms.
Part 2
Step 1 - Data Collection - Data from CallHealth
Here we got data providing relationship between Symptoms and Diseases and their relationship
to Age and Gender.

• Symptoms along with gender and age: A list of symptoms along with gender and age
categories was provided to analyze the symptoms that will be provided by the users. The list
of data fields are mentioned below:
Field Description
SYMPTOM Symptom Name
SYMP_MALE Yes, if present in Males
SYMP_FEMALE Yes, if present in Females
SYMP_INFANTS_LESS_THAN_'N1'YR Yes, if present in infants
Yes, if present in children of certain age
SYMP_CHILDREN_N1_TO_N2YRS group
Yes, if present in teenagers of certain age
SYMP_TEENAGE_N2_TO_N3YRS group
SYMP_ADULT_N3_TO_N4YRS Yes, if present in adults of certain age group
SYMP_ELDERLY_GREATER_THAN_N5YRS Yes, if present in elders of certain age group

• Diseases along with gender and age: A list of diseases along with gender and age
categories was provided to analyze the symptoms that will be provided by the users. Each
category has the probability of having the disease rated as below:
o No
o Rare
o Uncommon
o Common
o Most common
The list of data fields is mentioned below:
Field Description
DISEASE Disease Name
Male_INFANTS_LESS_THAN_'N1'YR_D Probability = No, Rare, Uncommon, Common, Most Common
ISEASE in Male Infants
Female_INFANTS_LESS_THAN_N1YR_ Probability = No, Rare, Uncommon, Common, Most Common
DISEASE in Female Infants
Male_CHILDREN_N1_TO_N2YRS_DISE Probability = No, Rare, Uncommon, Common, Most Common
ASE in Male Childern
Female_CHILDREN_N1_TO_N2YRS_DI Probability = No, Rare, Uncommon, Common, Most Common
SEASE in Female Childern
Male_TEENAGE_N2_TO_N3YRS_DISE Probability = No, Rare, Uncommon, Common, Most Common
ASE in Male Teenagers
Female_TEENAGE_N2_TO_N3YRS_DIS Probability = No, Rare, Uncommon, Common, Most Common
EASE in Female Teenagers
Male_ADULTS_N3_TO_N4YRS_DISEAS Probability = No, Rare, Uncommon, Common, Most Common
E in Male Adults
Female_ADULTS_N3_TO_N4YRS_DISE Probability = No, Rare, Uncommon, Common, Most Common
ASE in Female Adults
Male_ELDERLY_GREATER_THAN_N5Y Probability = No, Rare, Uncommon, Common, Most Common
RS_DISEASE in Male Elders
Female_ELDERLY_GREATER_THAN_N Probability = No, Rare, Uncommon, Common, Most Common
5YRS_DISEASE in Female Elders
• Diseases to Symptoms and their frequency/count

Step 2 - Data Transformation - Data from CallHealth


• Denormalized the data (Data cleaning)
Here we denormalized the data by having the Disease and Symptoms relationship to Age and
Gender, all merged into one sheet, making it easier to query and analyze
Step 3 – Flowchart of the algorithm (Analytical methods and Technology)
Ask for
Take inputs For the Filter out
health For given
from patient Diseases, get all Symptoms as
Start condition. Symptoms, get
Name, Age, related per : Age
Ask for all Diseases
Gender Symptoms ,Gender
Symptoms

YES

Removed the Display the


Query for the For the
Symptoms TOP 5
TOP 5 Symptoms,
already Asked Symptoms Is Correlation
Symptoms as check the
for before or and ask for Index > 15%
per frequency of correlation
provided by next
Symptom index
Patient Symptoms

NO

For all Query TOP 5


Filter the Display the Suggest Ask for
Symptoms, get Diseases as per
Diseases as per : TOP 5 Diagnostics Doctor s
Related frequency of
Age, Gender Diseases tests Appointment
Diseases Disease

Stop

Step 4 – Python code to provide outputs with CallHealth data and factoring Age and
Gender (Interpretation of Output/Visualization)
Libraries used – Pandas, NLTK for NLP, PySpark for SQL Context
Module 1 – Intent Analysis – trained the model for medical and non-medical inputs
So here, if the patient enters anything other than a medical query, as below, the code would
capture the intent
If it is a medical query, the code will request for the Symptoms from the patient

Here the patient will select (or type in) the Symptoms from the UI.
The patient can select the given Symptoms or enter the Symptom after selecting Other.
Module 2 – Initialized the unique stemmed words to classify. Scored the words, tokenized the
input symptoms
Module 3 – To query the dataset and fetch the next top 5 symptoms.
Here we create a dataset to read the data from the csv file. Created table dfSymptomDisease.
Created table with distinct diseases – dfDistinctDiseaseName for the first Symptoms.
Then for the distinct diseases, selected all distinct Symptoms. Created table dfDistinctSymptom
Then from dfDistinctSymptom, created another table, filtering out the Symptoms as per Age and
Gender. We further filter those Symptoms that were already presented / selected by the patient.
Finally, selected top 5 frequent Symptoms and displayed to patient to select next possible
symptoms.
Module 4 – Correlation for Symptoms
Here we try to calculate the correlation between the symptoms selected and try to find if this
correlation is below 15%, in order to stop asking for more symptoms
Here for the Distinct diseases as per the Age and Gender, we take the Symptoms selected /
entered by the patient and transform this to a table as below:

We then enter the frequency for each Symptom for that disease from the master data table

And then for these Symptoms, find the correlation as below:

As we see that the correlation is higher than 15%, we will ask for more symptoms.
Module 5 – This module is for doctors to get the probable diseases with the probability of
occurrence of the Diseases for the provided Symptoms, Age and Gender
Here we were given 2 scenarios by Callhealth
Scenario 1:
Scenario 1 Age 60 yrs
Gender Male
Symptoms Chest Pain
Excessive Sweating
Chest discomfort
SOB (Shortness Of
Breath)
Ran the python code for this scenario

Below is the result for the Scenario 1, for all probability of occurance

We can further filter it on the basis of the Occurrence, where we can enter the occurrence
choice as below:
Here we will now display only Common and Most Common diseases

Step 5 - Change parameters/method and Iterate


Scenario 2:
Scenario 2 Age 48 yrs
Gender Male
Symptoms Confusion
Disorientation
Abdominal pain
Itching
Ran the python code for this scenario

Below is the result for the Scenario 2, for all probability of occurrence
We can further filter it on the basis of the Occurrence, where we can enter the occurrence
choice as below:

Here we will now display only Uncommon, Common and Most Common diseases

Scenario 3:
Scenario
3 Age 18 yrs
Gender Female
Symptoms Confusion
Disorientation
Abdominal pain
Itching
Ran the python code for same Symptoms for this scenario

Below is the result for the Scenario 3, for all probability of occurrence

We can further filter it on the basis of the Occurrence, where we can enter the occurrence
choice as below:
Here we will now display only Most Common diseases
4. Model Comparison
While working on the framework design for the solutions we studied many models and
classification technologies. We found 2 methodologies that could provide intelligent insights for
this solution.
i) Multi-Class Classifier (One-Vs-All) (Youtube link for the tutorial)
ii) IBM Watson for Healthcare (Youtube link for the tutorial)

i) Multi-Class Classifier (One-Vs-All)

In this example we can consider each Class (Class 1, Class 2,…) = Disease, and each feature
(F11, F12, F13, F21…..) as Symptoms

For training, in Case 1 we mark a +1 (+ve class symbol) for Class 1 and -1 for other Classes. In
Case 2, we mark a +1 for Class 2 and -1 for other Classes.

Then we have 4 training sets.


Ow we can use SVM classifier to train the system.
This will create different models, M1, M2…. for all these cases and find the labels for these
models
Now for new dataset – Fa1, Fa2, Fa3….we test it will all classification models and we find the
score against each model as below

As seen here, M3 gives the highest +ve score and hence is the suitable class to assign the
dataset to.
Now in our business problem, when we get the number of diseases, given the symptoms
(features), we can apply this 1 v/s many classification technique to find the probability score for
each disease (class)

Here we also see that there are 2 classes with +ve class score – M1 and M3. So, we can assign
the dataset to all possible classes
Similarly, for given Symptoms (features) we can map to more than one Disease (Class)
ii) IBM Watson DeepQA Technology for Healthcare

Step 1 – Question comes in the left, QUESTION ANALYSIS decides if we want to split the
question into parts, sub-clues, sub-questions or not. Each question / sub-question is handled in
parallel
Step 2 - It decides on the Question type, what is the question objective.
a) Type of question – numeric, puzzle, exclude, rhyming
b) Objective of the question / what the question is asking for – place, person, number.
Primary search = keyword search, and get top ranking documents
Each question / sub-question is handled in parallel
Candidate Answer generation – this looks for key words and then looks can be the Candidate
answers from the Top-Ranking documents. This is light-weight processing.
Hypothesis Generation - We then start restricting the candidates to a smaller set by applying
Hypothesis to process for more detailed evidences/processing. Here from the top-ranking
documents, we are trying to get what are the right answers in those documents.
Hypothesis and Evidence scoring – this is to evaluate the different types of evidence. Here we
put the candidate answers back along with the keywords in the Question, do another search and
come back with passages that contain all of this information; and then do a detailed evaluation of
those passages. So, we could see an keyword overlap between that passage and the original
question as one of the source of evidence.
Synthesis – this module is trying to combine the above pieces. If we have split the question in
different parts, it combines those in this module to come up with a single answer.
Final Confidence merging & Ranking – Finally, we have all candid answers and have computed
different evidence scores, here the merging and ranking happens. Here Duplicates are removed
Main things – Things are done in parallel. Candidate answers, evidences, ranking, merging

In our business (medical) problem, the Symptoms can be the various Questions and Diseases
the Answers. For every Symptom, in first iteration, there can be many other Symptoms as the
Keywords. And there would be many Diseases as Answers. We then have to create the
Hypothesis and Evidence scoring to restrict and filter the results. Further Synthesis happens on
the basis of Age, gender, past medical history and other demographics. Final confidence merging,
and ranking happens to provide results with top high scores and high level of confidence
• Final recommendation
Based on the data provided, the above models could not be used. We followed the mode of
Naive Bayes for intent analysis and querying the data using Python – PySpark SQL context

• Additional Insights
We can improvise the model on additional data points as patient’s past medical history,
location, genetic history and more.
More data on various Symptom combinations and the evidence data for confirming on the
hypotheses can be very helpful in applying the above model techniques

• Integration
This would be provided as a service that CallHealth can incorporate into their company
portal
5. Challenges
• Limited availability of data: The data used for the project can be bifurcated into training and
test data. The training data was downloaded from SNOMED which had limited data fields
with respect to patient. The only data available and approved by CallHealth had only
disease and symptom names. The training data to conduct the intent analysis was also
created by the project team. The test data provided by CallHealth was different from the
training data set and also had age and gender details of the patient.

• Changes in the model and algorithms: The test data provided by CallHealth was different
from the training data set and also had age and gender details of the patient due to which
the algorithms and scripts was modified.

6. Conclusion
CallHealth is, firstly, planning to use the model as a service that can be utilized by internal team
and furthermore data can be collected. Secondly, CallHealth intends to use the model as
customer portal that can be used to gather data from the patients and book appointment/time
with doctor/health consultant
CallHealth wanted to predict the disease based on the symptoms, age and gender without
compromising on the accuracy as even a slight error in prediction may have a significant impact
on patient. We agreed to display all the disease based on the symptoms, age and gender with
associated probability ranked on the basis on probability. With more data and details about the
patient, diagnostic test-results, etc. the list of possible disease can go down and the associated
probability can increase.
The model will have to be re-calibrated whenever a new data fields or data table will be
introduced to the model. However, building on to the existing model with additional diseases
and symptoms with age and gender would not require any re-calibration.
7. References
Call Health : https://www.callhealth.com/
Reference 1 : Deliotte Global Healthcare Outlook 2018
https://www2.deloitte.com/content/dam/Deloitte/global/Documents/Life-Sciences-
Health-Care/gx-lshc-hc-outlook-2018.pdf
Multi-Class Classifier (One-Vs-All) (Youtube link for the tutorial)
IBM Watson for Healthcare (Youtube link for the tutorial)

8. Apendix:

SQL query for creating tables and inserting records in SQL

You might also like