Implementing Clinical Decision Support System Using Naïve Bayesian Classifier
Implementing Clinical Decision Support System Using Naïve Bayesian Classifier
Implementing Clinical Decision Support System Using Naïve Bayesian Classifier
Abstract: To speed up the diagnosis time and improve the diagnosis accuracy in today‟s healthcare system, it is important to provide a much
cheaper and faster way for diagnosis. This system is called as Clinical Decision Support System (CDSS). With various data mining techniques
being applied to assist physicians in diagnosing patient diseases with similar symptoms, has received a great attention now a days. The
advantages of clinical decision support system include not only improving diagnosis accuracy but also reducing diagnosis time. In this paper, the
data mining technique name Naïve Bayesian Classifier, which offers many advantages over the traditional methods of data mining is used that
opens a new way for clinicians to predict patient‟s diseases. As the system is built on the sensitive data for patient privacy it is necessary to add
some features that meets the security requirement. Specifically, with large amounts of data related to healthcare is generated every day, the
classification can be utilized to excavate valuable information that improve clinical decision support system. Here the fuzzywuzzy string
matching algorithm of naïve bayesian classifier is used to perform prediction from large number of symptoms data. The Result analysis perform
in the last section on live data of five patient gives that by using proposed technique we try to make the Clinical Decision Support System more
helpful for providing diagnosis of deceases more accurately and efficiently.
Keywords: Clinical Decision Support System (CDSS), Privacy Preserving, Naïve Bayesian Classifier, Fuzzywuzzy algorithm.
__________________________________________________*****_________________________________________________
I. INTRODUCTION data, patient may feel afraid that his medical data will be
leaked and abused, and refuse to provide his medical data to
Today‟s Healthcare industry has the global scope to CDSS [9]. Therefore, to develop the clinical decision support
provide health services for patients. One of the part of it is system along with address the privacy issues, this paper
Clinical Decision Support System (CDSS) has a massive propose a Privacy Preserving Patient-Centric Clinical Decision
amounts of electronic data and experienced such a sharp Support System, called PPCDSS. For preserving the privacy of
required and growth rate. However, it is necessary to design patient‟s medical data, here the sensitive data gets encrypted
and develop as appropriate technique to find great potential first by using cryptographic approach and then stored to the
economic values from large amount of data and to speed up the data base.
diagnosis time and improve the diagnosis accuracy [1]. It is a
new system in healthcare industry that is workable to provide a Along with this some of the Objectives which are
much cheaper and faster way for diagnosis. As the Clinical targeted to achieve are perform efficient prediction of disease
Decision Support System (CDSS) has huge amount of data it is on the basis of existing data-set. For that the system introduce a
necessary to apply various data mining techniques to assist new classification and aggregation approach called
physicians in diagnosing patient diseases with different data fuzzywuzzy, which allows service provider to build naive
mining classification functions, and has received a great Bayesian classifier [10]. This helps in reducing diagnosis time
attention recently [2] [3] [4]. Out of different data mining for prediction of diseases. And the encryption technique used
classification techniques available Naive Bayesian classifier, is helps for preserving privacy of patient‟s data. As the symptoms
one of the popular machine learning tools, has been widely are vary from patient to patient and may not be present in
used to perform prediction [5]. Despite its simplicity, it is more CDSS database for that on new mechanism is proposed as to
appropriate for medical diagnosis in healthcare than some review and retrain the CDSS dataset [11]. The remaining paper
sophisticated techniques [6] [7]. is organized as, Section II give the implementation procedure
of the PPCDSS. Along with, the working and all functionality
The CDSS with naive Bayesian classifier has offered required for use of fuzzywuzzy algorithm. Section III gives the
many advantages over the traditional healthcare systems and result analysis that performed by tacking actual symptoms for
opens a new way for clinicians to predict patient‟s diseases. five different patients. This validate the efficiency of the
However, one of the main challenges is how to keep patient‟s proposed PPCDSS and gives the advantages of the proposed
medical data away from unauthorized disclosure. The usage of and develop system. Finally Section IV, concludes the paper.
medical data can be of interest for a large variety of healthcare
stakeholders [8]. Without good protection of patient‟s medical
220
IJRITCC | December 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 12 220 - 225
______________________________________________________________________________________
II. IMPLEMENTATION STRATEGIES B. Algorithm Used
This paper tries to improve the existing system using FUZZY SERACHING with NAIVE BAYES
Clinical Decision Support System based on Naïve bayesian CLASSIFIERS
classifier. The system uses are using Data Mining classification
technique for Clinical Decision Support System (CDSS). The Here the system uses fuzzywuzzy searching algorithm for
system will work faster and efficient using this technique [12] diagnosis of patient based on naïve bayesian classifier. The
[13]. It is widely used in real-life applications because of its complete description is as follows [16]:
simplicity and good performance both in theory and practice. FuzzyWuzzy Algorithm:
However, in large-scale problems, where huge training data are
available, such as road sign detection, the method‟s training It is simple library and command-line general regular
and test phases might be prohibitively demanding in terms of expression like utility which could help you when you are in
computations. Thus, for large-scale problems the reduction of need of approximate string matching or substring searching
computational complexity is essential. For the security purpose with the help of primitive regular expressions.
encryption techniques with AES algorithm for preserving
privacy of patient‟s data is used [14][15]. The complete flow of About "approximate" or "fuzzy" string comparison and its
working system in step-by-step manner is as follows: need:
A. Stepwise Work Flow of System: Just imagine that you deal with information (like orders) which
is sent to you by many people. When these people mention
Step 1: Doctor Register with the System. names of places or persons, they could bring to you problems
Step 2: Doctor has to login the system with his authentic of two kinds:
email-id and password.
Step 3: Doctor can add / edit / update /delete any number of they make nasty typos;
disease, their symptoms, and their prescription they use different variants of names;
information.
1. For example if you are responsible for checking
Step 4: Doctor add patient information along with the
incoming mail in, you may want to find letters
symptoms he is suffering from to the database and
addressed to Indian president. You try to find all
check for diagnosis.
which contains words "Narendra Modi" on envelope.
Step 5: Using Database will provide the historical medical data
But you soon discover that sometimes people address
present in our database and processing with the help of
this person as "Priminister Narendra Modi" and
Naïve Bayesian classifier fuzzywuzzy search
sometimes like "Mr. N. Modi" and also
algorithm.
"Narendrabhai Modi" (note typos).
Step 6: After calculation, the predicted result will be send to
2. In another example, if you read google and wikipedia
the next level. On this level the probability of predicted
and found that you can compare "Barak" with
disease risk will be calculated and top three disease
"Barack" and "Baarck" etc (may be the name of
having probability of more than 50% are displayed. In
former US president Barak Obama). With the help of
this algorithm the maximum probability disease risk
"approximate string matching algorithm", also called
will be calculated.
"fuzzy string matching". But after you use or
Step 7: Now doctor check the patient symptoms once again
implement some of algorithms you found that it is not
and from the result generated in step 6, he suggest most
sufficient. You need "approximate" substring search,
suitable prescription for patient. Finally, proper
and ability to specify some complex patterns (for
predicted diseases will be diagnose, this will help to
example country could be specified like "Russia" and
give proper prescription to the patients more
like "Russian Federation" - but it should not be mixed
effectively.
with "Belarussian Republic" etc.
Step 8: For more proper CDSS designing, doctor review his
prescription suggested to the patient. Here he checks Naïve Bayesian classification with fuzzy matching:
that, the patient gets cure form his provided
prescription or not. If the patient gets cure then go to Fuzzy matching is a general term for finding strings
step 9 or stop otherwise. that are almost equal, or mostly the same. Of course almost and
Step 9: Check for any new symptoms that the patient is mostly are ambiguous terms themselves, so it is necessary have
suffering from and already our CDSS data have. If any to determine what they really mean for your specific needs?
new symptoms are identified, Retrain database by The best way to do this is to come up with a list of steps before
adding symptom to that particular disease. starting to write any fuzzy matching code. Once you have
perform all steps, then it‟s much easier to tailor your fuzzy
221
IJRITCC | December 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 12 220 - 225
______________________________________________________________________________________
matching code to get the best results. These steps are [(string: google, score:83), (string: googleplus, score:63),
summarized as follows [17]: (string: plexoogl, score:43)]
Get match ratios with the help of following fuzzy search Ratio
expressions / functions:
1. Simple Ratio
FuzzySearch.ratio("mysmilarstring","myawfullysmilarstirng")
- 72
FuzzySearch.ratio("mysmilarstring","mysimilarstring") - 97
Along with this there are different techniques of this as Partial
Ratio, Token Sort Ratio, Token Set Ratio, Weighted Ratio etc,
Extract Result from calculations using following fuzzy extract
expression / functions are:
1. Extract One:
FuzzySearch.extractOne("cowboys", ["Atlanta Falcons", "New
York Jets", "New York Giants", "Dallas Cowboys"])
Graph 1: Accuracy of prediction for Patient 1
(string: Dallas Cowboys, score: 90)
From the above graph we find out that, in terms of
Extract Top:
diagnosis accuracy of our system with the naïve Bayes
FuzzySearch.extractTop("goolge", ["google", "bing", fuzzywuzzy algorithm find out most accurate result. That is as
"facebook", "linkedin", "twitter", "googleplus", "bingnews", the patient is already suffering from thyroid and the diseases
"plexoogl"], 3) predicted from our system is also Hypothyroidism with highest
probability of having that disease is 92%.
222
IJRITCC | December 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 12 220 - 225
______________________________________________________________________________________
Patient 2: This patient is actually suffering from Osteoporosis Extra added Symptoms: damage of arteries, fatigue
Selected Symptoms: easy bone fractures, stress fractures of Cardio vascular Chicken pox Dengue (86%)
feet at walking or stepping diseases (89%) (86%)
Result given by the system: Patient 5: This patient is actually suffering from Tuberculosis
Symptoms given by patient 5 are:
Chicken Diabetes
Osteoporosis (91%) Chest pain and blood comes out with the sputum.,Cough last
Pox (86%) Mellitus (86%)
from 2-3 weaks,loss of appetite,Coughing up blood,
Extra added Symptoms: general weakness,
Patient 3: This patient is actually suffering from Dengue Result given by the system:
Symptoms given by patient 3 are:
Tuberculosis (91%) Hepatitis (49%) Chicken pox (48%)
Selected Symptoms: easy bone fractures, stress fractures of
feet at walking or stepping
Extra added Symptoms: lower back pain, pain in legs,
Result given by the system: As in case of patient 2, the probability of prediction is
not vary in most cases. That is first prediction is 91%, second
Dengue (88%) Measles (86%) Chicken pox (57%) is 86% and third is 86%. This is because, the symptoms given
by the patient are generalized and that may cause all three
types of disease that obtained in the result. But instead of that
Patient 4: This patient is actually suffering from Cardio the first prediction is given 91% to Osteoporosis that is the
vascular diseases actual disease with which patient 2 is suffering from. Therefore
Symptoms given by patient 4 are: the prediction rate of our proposed system is high. The overall
Selected Symptoms: Persistent high blood pressure (BP), Suger analysis of five patient is given in the graph 2 below.
level is high,
100
90
80
70
60
50
40
30
20
10
0
224
IJRITCC | December 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 12 220 - 225
______________________________________________________________________________________
System for CVD Risk Assessment and Management”,
978-1-4577-0220-4/16/$31.00 ©2016 IEEE.
[12] Kulwinder Singh Mann, Navjot Kaur, “Cloud-
deployable health data mining using secured
framework for Clinical decision support system”,
978-1-4799-6908-1/15/$31.00 ©2015 IEEE.
[13] Jussi Mattila, Juha Koikkalainen, Arho Virkki, Mark
van Gils, Member, IEEE, and Jyrki L¨otj¨onen,
“Design and Application of a Generic Clinical
Decision Support System for Multiscale Data”, IEEE
TRANSACTIONS ON BIOMEDICAL
ENGINEERING, VOL. 59, NO. 1, JANUARY 2012.
[14] C. Schurink, P. Lucas, I. Hoepelman, and M. Bonten,
“Computer- assisted decision support for the
diagnosis and treatment of infectious disease s in
intensive care units,” The Lancet infectious diseases,
vol. 5, no. 5, pp. 305–312, 2005.
[15] Tzu-cheng Chuang, Okan K. Ersoy, Saul B. Gelfand,
Boosting Classification Accuracy With Samples
Chosen From A Validation Set, ANNIE (2007),
Intelligent ` Engineering systems through artificial
neural networks, St. Louis, MO, pp. 455-461.
[16] Fuzzy-String-Matching, [Online] available:
https://stackoverflow.com/questions/21057708/java-
fuzzy-string-matching-with-names.
225
IJRITCC | December 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________