Foml Project 1
Foml Project 1
DECISION TREE
A PROJECTREPORT
Submitted by
adhumitha M (221501072)
M
Menakaa V (221501077)
Nithya shree AK (221501090)
This is to certify that the Mini project work titled “D
ISEASE PREDICTOR
USING DECISION TREE” done by , Madhumitha M-221501077(AIML)
,Menakaa V-221501077(AIML)Nithyashreeak-221501090(AIML)isarecord
ofbonafideworkcarriedoutbyhim/herundermysupervisionasapartofMINI
PROJECT for the subject titled AI19442 -F
UNDAMENTALS OF
MACHINE LEARNING by Department of Artificial Intelligence and
Machine Learning.
SIGNATURE SIGNATURE
Thandalam, Thandalam,
Chennai-602 105. Chennai- 602 105.
Ourprojectisadiseasepredictionsystemdesignedtodiagnoseillnessesbasedonaperson's
symptoms.Leveragingadvancedmachinelearningtechniques,themodelaccuratelypredicts
diseasesandprovidesuserswithrecommendedprecautions.Astandoutfeatureofoursystem
is itsabilitytovocalizethediagnosisandsuggestedprecautions,enhancinguserexperience
bymakingtheinformationmoreaccessibleandunderstandable.Thisaudiofeedbacksetsour
project apart, ensuring that users receive clear and immediate guidance. Our approach not
only guarantees high accuracy in disease prediction but also offers a user-friendly and
inclusive solution, particularly benefiting those with visual impairments or literacy
challenges. This innovative model represents a significant improvement over traditional
diagnosticmethods,providingbothprecisepredictionsandpracticaladvicetosupportusers
in managing their health.
CHAPTER 1
INTRODUCTION
TheDiseasepredictor projectisapioneeringPythonapplicationdesignedtorevolutionizethe
way users seek medical advice based on reported symptoms. In an era of technological
advancements, this project harnesses the power of machine learning to assist individuals in
identifying potential medical conditions promptly. The primary focus lies in utilizing two
robustmachinelearningmodels:theDecisionTreeClassifierandtheSupportVectorMachine
(SVM).Thesemodelsenablethediseasepredictortomakeaccuratepredictionsaboutpotential
diseases, provide information on those diseases, and suggest precautionary measures.
As part of this project, five datasets were employed, each serving a specific purpose. Two
datasets,namely"Data.csv"and"Dataset.csv,"provideessentialinformationaboutsymptoms,
diseases, and their corresponding features. The remaining three datasets, located in the
"MasterData" folder, include "Symptom_Description.csv," "Symptom_Precaution.csv," and
"Symptom_Severity.csv," which offer detailed insights into symptom descriptions,
precautionary measures, and symptom severity, respectively.
ThisreportdelvesintotheintricaciesoftheTalkBot.AI diseasepredictorproject,coveringits
machine learning models, data analysis, implementation details, and the integration of voice
interactionforanenhanceduserexperience.Thesubsequentsectionswillprovideanin-depth
explorationofthevariouscomponents,methodologies,andoutcomesachievedthroughoutthe
development of this innovative healthcare solution.
The integration of machine learning in healthcare has emerged as a transformative force,
revolutionizing various aspects of medical research, diagnosis, and patient care. The
importance of utilizing machine learning in the healthcare sector lies in its abilitytoharness
the vast amounts of data available, enabling more accurate predictions, personalized
treatments,andefficientdecision-making.Belowarekeyaspectshighlightingthesignificance
of machine learning in healthcare:
Predictive Analytics:
Machine learning algorithms can analyze large datasets to identify patterns and trends,
enabling the prediction of potential health issues and disease outcomes. Early detection of
diseases allows for timely intervention and improved patient outcomes.
Training Process:
Features: Symptoms reported by users.
Labels: Corresponding diseases.
Decision tree nodes are split based on symptom values to maximize information gain.
The model iteratively partitions the data until it forms leaf nodes representing predicted
diseases.Cross-validation is performed to evaluate the model's performance.
The average cross-validation score provides an indication of the model's predictive accuracy.
· Training Process:
Features: Symptoms reported by users.
Labels: Corresponding diseases.
SVM identifies the hyperplane with the maximum margin between classes, ensuringoptimal
separation.
The SVM model is evaluated based on its accuracy in predicting diseases on the test set.
A high accuracy score indicates the model's effectiveness in disease classification.
CHAPTER 2
RELATED WORK
2)TheauthorsuseNaturalLanguageProcessing(NLP)algorithm.Thesystemprovidestext-text
assistance to communicate with bot in a user friendly manner.The disease predictorprovides
medical suggestions that can cure the disease based onusersymptoms.Ifitisaseverehealth
problemtheuserwillbeadvisedtoconsultadoctorforabettertreatment.Thediseasepredictor
can also give you medical prescriptions for health problems.Along with the medicines,the
disease predictor also provide you with ayurvedic remedies and homeopathy treatments for
related health problems.
3)The authors proposed a disease predictor which is a software application where there is
interactionbetweenpatientsovertheinternet.Chatbotsareplayinganimportantroleintoday's
world by supporting andhelpingpatientswiththeirappropriateinformation.Proposedideais
to create a Medical disease predictor based on Artificial Intelligence using NLP. Through
chatbots one can communicate with text or voice interface and get reply back to the user.
Thesebotsconnectwithpotentialpatientsvisitingthesite,booksappointments,andalsohelp
getting proper treatment. These chatbots are available 24/7, comfortable, efficient, beneficial
and very friendly to use.
4)The author proposes a medical disease predictorcalled MedBot, which can give proper
advice on leading a healthy lifestyle. The basic idea is to construct a healthcare disease
predictor(MedBot) based on Artificial Intelligence and Natural Language Processing, which
can identify the illness and provide necessary information about it prior to consulting or
visiting a doctor, thereby making the MedBot more reachable and reducing healthcarecosts.
Some of these chatbots act as virtual medical assistants, teaching patients regarding their
sickness and motivating them to have better health. A text-to-text medical disease predictor
involves users in an online conversation about their medical issues and offers a range of
personalized diagnoses depending on the symptoms that have been presented. The MedBot
interacts with potential patients who come to the application
5)Thisdiseasepredictormaybeusedbyregularhumansinanysortofemergencycase,where
it can advise people on primary care before seeing a doctor, or it can sometimes work as a
doctorforsmallandshort-termhealthissuessuchasacold,headache,andsoon.Alongwith
this chatbot, there will be assistance for those in need whoseekimmediatesolutions.Auser
canidentifythetrueconditionbyreportingsymptomsofit.Thetruegoalofthisjobistowork
ontheuser'ssymptomsandmakemedicaladvicebasedontheminordertodecreasethetime
and expense associated with the procedure.
CHAPTER 3
MODEL ARCHITECTURE
· User Input:Users interact with the disease predictor by inputting their symptoms .
· Machine Learning Models: The project employs two machine learning models, the Decision
Tree Classifier and Support Vector Machine, for disease prediction based on symptoms.
· Symptom Severity Assessment:Severity levels of reported symptoms are quantified using the
"Symptom_Severity.csv" dataset, educating users about the potential seriousness of their health
concerns.
· Description of the datasets used :The Disease predictor project relies on a set of five datasets
to facilitate accurate disease prediction, provide detailed information about diseases, and offer
personalized precautionary measures. Below is a comprehensive description of each dataset:
CHAPTER 4
IMPLEMENTATION
The interaction with the user and the symptom input process in the Healthcare disease
predictor is handled by the tree_to_code function. This function prompts the user to input
symptoms,utilizestheDecisionTreeClassifiertomakepredictions,andprovidesinformation
on potential diseases, symptoms, and precautions.
Explanation:
· Symptom Input:
The user is prompted to input symptoms they are experiencing.
The input is processed, and if there is ambiguity or multiple matches, the user is asked to
choose the most relevant symptom.
· Secondary Prediction:
AsecondarypredictionismadeusingaDecisionTreebasedonthesymptomsprovidedbythe
user.
· Condition Calculation:
The severity of symptoms and the duration are used to calculate the condition and provide
appropriate recommendations.
· Result Presentation:
The predicted diseases, descriptions, and precautions are presented to the user.
The models in the provided code, namely the Decision Tree Classifier and Support Vector
Machine (SVM), predict diseases based on symptoms using a machine learning approach.
· Model Training:
During the training phase, the Decision Tree Classifier learns patterns and relationships
between symptoms and diseases from the provided dataset (Data/dataset.csv).
· Feature Importance:
TheDecisionTreeassignsimportancescorestoeachsymptom,indicatinghowinfluentialthey
are in predicting the target variable (diseases).
Features with higher importance scores are considered more critical in making predictions.
· User Interaction:
Whenauserinputssymptoms,thediseasepredictorusesthetrainedDecisionTreetotraverse
its nodes.
At each node, the model evaluates whethertheinputsymptomsmatchtheconditionsdefined
in the tree.
· Decision-Making Nodes:
Decision nodes in the tree compare the input symptoms to specific features (symptoms) and
make decisions based on their values.
For example, if the user reports a symptom like "headache," the DecisionTreemighthavea
node that checks if the input contains this symptom.
· Output Presentation:
The model outputs the predicted disease(s), associated descriptions, and recommended
precautions to the user.
· Explanation of the text-to-speech functionality using pyttsx3 :
· Importing the Library:
This line imports the pyttsx3 library into the script, enabling access to its text-to-speech
capabilities.
· Speaking Text:
The“say()”functionisusedtospecifythetextthattheengineshouldconverttospeech.Inthis
example, it speaks the prompt asking for the user's name. The “runAndWait()” function
ensures that the speech is delivered synchronously.
Challenges Faced
· Domain-Specific Challenges:
Healthcare terminology and domain-specific nuances can be complex. Ensuring that the
disease predictor understands and communicates medical information accurately is crucial for
user trust.
· Domain-Specific Challenges:
Involve healthcare professionals in the development process to validate model outputs.
Implement continuous learning mechanisms to adapt to evolving medical knowledge.
Conclusion
In conclusion, the TalkBot.AI disease prediction project is a Python application designed to
assist users in identifying potential medical conditions based on reported symptoms. The
disease predictor utilizes machine learning techniques, specifically a Decision Tree Classifier
and Support Vector Machine (SVM), to make predictions. The project aims to provide users
with information on diseases, their descriptions, and suggested precautions.
Future Enhancements
Appendix-1: CODE
import re
import pandas as pd
import pyttsx3
from sklearn import preprocessing
from sklearn.tree import DecisionTreeClassifier,_tree
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC
import csv
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
d ata= pd.read_csv('Data/dataset.csv')
cols= data.columns
cols= cols[:-1]
x = data[cols]
y = data['prognosis']
y1= y
reduced_data = data.groupby(data['prognosis']).max()
c lf1 = DecisionTreeClassifier()
clf = clf1.fit(x_train,y_train)
# print(clf.score(x_train,y_train))
# print ("cross result========")
s cores = cross_val_score(clf, x_test, y_test, cv=3)
# print (scores)
print (scores.mean())
odel=SVC()
m
model.fit(x_train,y_train)
print("for svm: ")
print(model.score(x_test,y_test))
importances = clf.feature_importances_
indices = np.argsort(importances)[::-1]
features = cols
def readn(nstr):
engine = pyttsx3.init()
e ngine.setProperty('voice', "english+f5")
engine.setProperty('rate', 130)
e ngine.say(nstr)
engine.runAndWait()
engine.stop()
s everityDictionary=dict()
description_list = dict()
precautionDictionary=dict()
symptoms_dict = {}
def getDescription():
global description_list
with open('MasterData/symptom_Description.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
_description={row[0]:row[1]}
description_list.update(_description)
def getSeverityDict():
global severityDictionary
with open('MasterData/symptom_severity.csv') as csv_file:
def getprecautionDict():
global precautionDictionary
with open('MasterData/symptom_precaution.csv') as csv_file:
def getInfo():
print("-----------------------------------HealthCare ChatBot-----------------------------------")
print("\nYour Name? \t\t\t\t",end="->")
readn("Enter your name")
name=input("")
print("Hello, ",name)
readn("Hello" + name)
def check_pattern(dis_list,inp):
pred_list=[]
inp=inp.replace(' ','_')
patt = f"{inp}"
regexp = re.compile(patt)
pred_list=[item for item in dis_list if regexp.search(item)]
if(len(pred_list)>0):
return 1,pred_list
else:
return 0,[]
def sec_predict(symptoms_exp):
df = pd.read_csv('Data/dataset.csv')
X = df.iloc[:, :-1]
y = df['prognosis']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=20)
rf_clf = DecisionTreeClassifier()
rf_clf.fit(X_train, y_train)
return rf_clf.predict([input_vector])
def print_disease(node):
node = node[0]
val = node.nonzero()
disease = le.inverse_transform(val[0])
return list(map(lambda x:x.strip(),list(disease)))
c hk_dis=",".join(feature_names).split(",")
symptoms_present = []
while True:
p rint("\n","Enter the symptom you are experiencing (if you can't explain just press Enter
and choice)"," \t\t",end="->")
readn("Enter the symptom you are experiencing (if you can't explain just press Enter and
choice)")
disease_input = input("")
conf,cnf_dis=check_pattern(chk_dis,disease_input)
if conf==1:
strn ="searches related to input: "
print(strn)
readn(strn)
for num,it in enumerate(cnf_dis):
print(num,")",it)
if num!=0:
strn = f"Select the one you meant (0 - {num}) if you have more then 1 symptom
choice the The most influential : "
print(strn, end="")
readn(strn)
conf_inp = int(input(""))
else:
conf_inp=0
d isease_input=cnf_dis[conf_inp]
break
# print("Did you mean: ",cnf_dis,"?(yes/no) :",end="")
# conf_inp = input("")
# if(conf_inp=="yes"):
# break
else:
strn = "Enter valid symptom."
print(strn)
readn(strn)
while True:
try:
num_days=int(input("Okay. From how many days ? : "))
readn("Okay. From how many days ? : ")
break
except:
strn = "Enter valid number of days."
print(strn)
readn(strn)
def recurse(node, depth):
indent = " " * depth
if tree_.feature[node] != _tree.TREE_UNDEFINED:
name = feature_name[node]
threshold = tree_.threshold[node]
s econd_prediction=sec_predict(symptoms_exp)
# print(second_prediction)
calc_condition(symptoms_exp,num_days)
if(present_disease[0]==second_prediction[0]):
strn = "You may have "+ present_disease[0]
readn(strn)
print(strn)
strn1 = description_list[present_disease[0]]
p rint(strn1)
readn(strn1)
else:
strn = "You may have "+ present_disease[0]+ "or "+ second_prediction[0]
print(strn)
readn(strn)
print(description_list[present_disease[0]])
readn(description_list[present_disease[0]])
print(description_list[second_prediction[0]])
readn(description_list[second_prediction[0]])
# print(description_list[present_disease[0]])
precution_list=precautionDictionary[present_disease[0]]
print("Take following measures : ")
readn("Take following measures : ")
for i,j in enumerate(precution_list):
print(i+1,")",j)
readn(str(i+1)+j)
# confidence_level = (1.0*len(symptoms_present))/len(symptoms_given)
# print("confidence level is " + str(confidence_level))
r ecurse(0, 1)
getSeverityDict()
getDescription()
getprecautionDict()
getInfo()
tree_to_code(clf,cols)
print("----------------------------------------------------------------------------------------")
CHAPTER 8
OUTPUT
CHAPTER 8
REFERENCES
[1] L.Athota,V.K.Shukla,N.PandeyandA.Rana,"diseasepredictorforHealthcareSystem
Using Artificial Intelligence," 2020 8th International Conference on Reliability, Infocom
TechnologiesandOptimization(TrendsandFutureDirections)(ICRITO),Noida,India,2020,
pp. 619-622, doi: 10.1109/ICRITO48877.2020.9197833. keywords: {Medical
services;Databases;Expert systems;Feature extraction;Medical diagnostic
imaging;Chatbot;Healthcare;Artificial Intelligence;Virtual Assistance;TFID;N-gram},
https://ieeexplore.ieee.org/document/9197833
https://www.researchgate.net/publication/372658029_Section_A-Research_paper_Personal_H
ealthcare_Chatbot_for_Medical_Suggestions_Using_Artificial_Intelligence_and_Machine_Le
arning_Eur
[3] Shetty, Riddhi and Bhosale, Ankita and Verma, Pankaj and Phalke, Ashwini, AI Based
Healthcare disease predictor (April 8, 2022).Proceedingsofthe7thInternationalConference
on Innovations and Research in Technology and Engineering (ICIRTE-2022), organized by
VPPCOE & VA, Mumbai-22, INDIA, Available at SSRN:https://ssrn.com/abstract=4109100
orhttp://dx.doi.org/10.2139/ssrn.4109100
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4109100
[4] K. Anjum, M. Sameer and S. Kumar, "AI Enabled NLP based Text to Text Medical
Chatbot," 2023 3rd International Conference on Innovative Practices in Technology and
Management (ICIPTM), Uttar Pradesh, India, 2023, pp. 1-5, doi:
10.1109/ICIPTM57143.2023.10117966.
keywords: {Training;Machine learning algorithms;Hospitals;Machine
learning;Transforms;Oral communication;Chatbots},
https://ieeexplore.ieee.org/document/10117966