B22IT031 Report
B22IT031 Report
on
MACHINE LEARNING IN HEALTHCARE
Submitted to
In Partial fulfilment of the requirements
By
D. SATHWIKA
B22IT031
1
CERTIFICATE
iii
ACKNOWLEDGEMENT
I wish to take this opportunity to express my deep gratitude to all the people who have
extended their cooperation in various ways during my Seminar. It is my pleasure to acknowledge
the help of all those individuals.
I thank Dr. K. Ashoka Reddy, Principal of Kakatiya Institute of Technology & Science,
Warangal, for his strong support.
In completing this Seminar successfully all our faculty members have given an excellent
cooperation by guiding us in every aspect. All your guidance helped me a lot and I am very
grateful to you
D. SATHWIKA
B22IT031
iv
ABSTRACT
iv
CONTENTS
ABSTRACT iv
CONTENTS v
LIST OF FIGURES vi
1 CHAPTER 1: INTRODUCTION 1
1.1 Introduction 1
2.1 Machine Learning Techniques for Chronic Kidney Disease Risk Prediction 3
2.2 MRI-based brain tumor detection using convolutional deep learning methods and 4
3 CHAPTER 3: METHODOLOGIES 13
6 FUTURE SCOPE 21
7 CONCLUSION 23
8 REFERENCES 24
v
LIST OF FIGURES
vi
CHAPTER 1: INTRODUCTION
Machine learning is changing the healthcare approach because such algorithms and
statistical models analyze various kinds of complex medical data and can make predictions to
offer actionable insight. Thus, by learning patterns and relationships within data, it enables
healthcare professionals to make informed decisions, enhance patient care, and optimize clinical
workflows.
1. Improved Diagnosis: Early and accurate detection of diseases- for example, cancer,
Alzheimer's Analysis of medical images to identify abnormalities
3. Personalized Medicine: Treatment planning from the genetics and history of the patient
8
CHAPTER 2: LITERATURE SURVEY
9
The primary objective of the study The study utilized a dataset of The study encountered several
was to enhance the detection and 3264 T1-weighted contrast- limitations. The dataset was
classification of brain tumors using enhanced MRI images, relatively small for training deep
MRI-based brain tumor detection
using convolutional deep learning
Magnetic Resonance Imaging categorized into glioma, neural networks, with only 3264
(MRI) by leveraging convolutional meningioma, pituitary gland images expanded to 9792 through
deep learning methods and tumors, and healthy brains. augmentation, which may
traditional machine learning Images were preprocessed by introduce noise and class
techniques. The authors aimed to resizing them to 80x80 pixels imbalance. The MRI images
develop a robust computational and augmenting the dataset lacked diversity, as they were
approach for diagnosing three through rotations and vertical obtained from a single source and
methods
tumor types—glioma, meningioma, flipping, resulting in 9792 may not generalize well to
and pituitary gland tumors—as well samples. Two deep learning broader populations or other
as distinguishing them from healthy models were designed: a 2D imaging modalities. Labeling
brains. Recognizing the limitations CNN and a Convolutional Auto- issues were noted; annotating
of manual biopsy, which requires Encoder Network. The 2D CNN MRI data requires expertise and
invasive surgery, the study sought featured a hierarchical structure is time-consuming, restricting the
to create an automated, non- with eight convolutional layers, dataset size.
invasive diagnostic tool with high
accuracy and efficiency.
The research aims to develop a Features such as age, bilirubin The dataset also presents feature
robust machine learning framework levels, albumin, and enzyme correlations, such as between
for predicting and classifying liver counts were normalized to Direct and Total Bilirubin, which
Machine Learning Classification
Liver Disease Prediction using
diseases to reduce the workload on enhance uniformity. The might lead to biased model
healthcare professionals and Particle Swarm Optimization outcomes or overfitting.
improve early diagnosis accuracy. (PSO) feature selection Moreover, while machine
The study leverages multiple technique was employed to learning models like Random
machine learning algorithms and identify critical attributes that Forest and MLP provide high
Techniques
10
The primary objective of the study The study employed a This study's key limitations
Predicting Mental Health Illness is to leverage machine learning structured data science include the small dataset size,
using Machine Learning techniques to predict mental health workflow beginning with data which may restrict the
issues and classify various mental collection, utilizing a dataset generalizability of the results and
health disorders efficiently. This with 27 attributes and 1,259 the model's ability to perform
approach aims to address the global entries. Data preprocessing was well on diverse populations. The
Algorithms
challenge of early detection of conducted to clean the dataset dataset's reliance on structured
psychological issues such as by addressing missing values, attributes may fail to capture
depression, anxiety, and other errors, and inconsistencies, complex mental health conditions
related disorders. By identifying ensuring its readiness for that arise from unstructured data
potential mental health conditions analysis. This was followed by such as text, speech, or
in their early stages, the study data encoding, particularly label behavioral patterns. Future
intends to enable effective encoding, to convert categorical research can address these
intervention, improve the quality of variables into numerical formats limitations by incorporating
life, and reduce the social and while retaining ordinal
economic burden of untreated relationships
mental illnesses.
evaluate predictive models that predict trends in sodium the machine learning models
Improved Prediction of Thyroid Diseases
treatment trends for patients using machine with more advanced deep
hypothyroidism patients. By learning approaches. It learning approaches. To
leveraging historical and real- begins with data collection overcome these limitations,
time clinical data, the study aims from 247 patients treated at future work could expand the
to predict whether LT4 dosage "AOU Federico II" Hospital dataset to include more
should be increased, decreased, in Naples, incorporating diverse, multi-center
or maintained to enhance demographic, clinical, and populations and longitudinal
treatment precision and improve hormonal information into a data. Advanced techniques,
patient outcomes. Using dataset. The dataset is such as recurrent neural
machine learning classifiers, the preprocessed through networks or transformers,
study compares the performance cleaning, interpolation, could improve temporal
of ten algorithms, such as normalization, and analysis.
Decision Trees, Extra Trees, and discretization to address
Neural Networks, on a dataset missing values and ensure
derived from 247 patients consistency.
treated at the "AOU Federico II"
Hospital in Naples.
11
This system seeks to leverage encompassing data collection challenges with data quality
predictive analytics for early and preprocessing from and availability,
Healthcare System (I²HS) Using disease detection, personalized diverse sources such as interoperability between
Intelligent and Interactive
treatment recommendations, and EHRs, wearable devices, and different healthcare systems,
real-time patient monitoring, imaging systems, ensuring and the security risks of data
Machine Learning
ensuring timely interventions data privacy and compliance breaches and cyberattacks.
and better health management. with regulations like HIPAA Machine learning models may
By integrating wearable devices, and GDPR. Machine learning also suffer from biases, and
IoT technologies, and models, including complex algorithms can lack
conversational AI, the system supervised, unsupervised, interpretability, which affects
will facilitate continuous patient and reinforcement learning. clinician trust. Additionally,
engagement and support high implementation costs,
clinicians with interpretable regulatory hurdles, and
insights for diagnosis and difficulties with real-time data
treatment planning. processing further hinder the
system's deployment.
as body mass index (BMI), waist-to-hip ratio, age, and lifestyle, and environmental
Deficiency Prediction by
waist-to-hip ratio, age, gender, gender, along with exposure, adds another layer
and other physical attributes. corresponding serum of difficulty to prediction.
The study aims to develop a Vitamin D levels from Correlations between
robust ML model that can clinical records. Data will be anthropometric features could
predict the risk of Vitamin D preprocessed by handling lead to multicollinearity
deficiency in individuals, missing values, normalizing issues, affecting the stability
enabling early detection and features, and identifying and interpretability of the
intervention. By identifying outliers. Feature selection models.
patterns between anthropometric techniques.
factors and Vitamin D levels,
the research seeks to offer an
affordable, non-invasive
approach to Vitamin D
deficiency prediction.
12
conduct a synergistic analysis of dataset will be collected datasets that include both lung
the impact of lung cancer on from clinical records, cancer and cardiovascular
cardiovascular disease (CVD) including patient disease (CVD) information
using machine learning (ML)- demographics, lung cancer may be scarce, particularly for
Cardiovascular Disease using ML- based Techniques
based techniques to identify stages, treatment regimens, specific subpopulations. data,
Prediction of Muscular Paralysis Disease Synergistic Analysis of Lung Cancer's Impact on
timely medical intervention and assess their generalizability models into clinical decision
personalized treatment plans. and robustness across support systems to help
Additionally, this research seeks different patient healthcare professionals
to explore the long-term effects demographics. Finally, the monitor, detect, and intervene
of COVID-19 on muscle best-performing models will early in cases of muscular
function, offering insights into be integrated into a clinical paralysis, ultimately
the underlying mechanisms decision support system to improving patient outcomes,
contributing to paralysis. help healthcare professionals recovery times, and quality of
Ultimately, the research aims to detect and monitor muscular life. As more data becomes
enhance the quality of life for paralysis in COVID-19 and available and machine
COVID-19 and post-COVID-19 post-COVID-19 patients, learning techniques continue
patients by providing predictive ensuring timely interventions to evolve, these tools could
tools for healthcare and improving patient become integral in managing
professionals to mitigate the outcomes. the long-term effects of
risks of muscular paralysis and COVID-19 on muscle
ensure better management of function.
these complications.
13
aims to identify key factors, a rigorous search of relevant interpretability of ML models,
Machine Learning to Predict Pregnancy Outcomes: A such as maternal age, medical academic databases (e.g., possibly through the use of
Systematic Review, Synthesizing Framework and history, lifestyle, genetic PubMed, Scopus, IEEE explainable AI techniques,
information, and clinical data, Xplore) will be conducted to which would help clinicians
that contribute to the prediction identify studies published on better understand how
of pregnancy-related ML models used for predictions are made. Another
complications like preeclampsia, predicting pregnancy important direction is the
Future Research Agenda
14
CHAPTER 3: METHODOLOGIES
1. Supervised Learning
Definition: Supervised learning involves training a model on labeled data where in the
input features as well as the target labels, or the outcomes, are known.
Techniques:
Classification: Used for applications such as disease diagnosis (e.g., predicting the
presence of cancer or classifying patients as having or lacking diabetes).
Algorithms: Linear Regression, Ridge and Lasso Regression, Decision Trees for
Regression.
Applications in Healthcare:
15
2. Unsupervised Learning
Definition: Models which are unsupervised learning techniques are used for the condition
where the data does not have an output label and is aimed at finding hidden patterns or
structure in the data.
Techniques:
Clustering: Group patients or data into clusters about similarities (for example,
grouping patients with similar symptoms or risk factor).
3. Semi-Supervised Learning
Definition: It is a learning paradigm that combines a small amount of labeled data with a
large set of unlabeled data, thereby enhancing the accuracy of learning when labeled data is
limited Methods.Gains strength both of the supervised and unsupervised learning types.It is
primarily based where getting labeled data is expensive or time-consuming.
16
Applications in Healthcare:
Labeling a small number of medical images and applying those labels to a larger, unmatched
collection
Prediction of patient outcomes where only partial or noisy labels are available
Definition: Reinforcement learning is learning through trial and error by interacting with an
environment in order to maximize a cumulative reward.
Techniques:
Applications in Healthcare:
Robotic surgery: Teaching robots to optimize the surgery sequences based on its past
performance.
5. Deep Learning
Definition: Subset of machine learning focusing on neural networks that possess multiple
layers, i.e., deep networks. It excels in processing large, unstructured datasets like images, text,
and time- series data.
Techniques:
Convolutional Neural Networks (CNNs): Used primarily in processing image data, such
as medical imaging (MRI, CT scans) to identify and classify diseases.
Recurrent Neural Networks (RNNs): Applied in sequential data, such as the monitoring
17
data of patients or clinical time-series data, for example ECG.
18
Autoencoders: Applied to anomaly detection and feature learning from high-dimensional
data, for instance identifying abnormal patterns in medical records.
Applications in Healthcare
Medical Imaging: Detecting tumors, fractures, or other issues from X-rays, MRIs, and
CT scans by using CNNs.
Natural Language Processing (NLP): Applied for the treatment of clinical notes or
medical literature from which pertinent information could be extracted or used for diagnoses
obtained from text.
6. Transfer Learning
Definition: Transfer learning takes the model to pre-training and fine-tunes it on a new dataset
but related, saving valuable time and computation resources.
Techniques:
Using pre-trained deep-learning models on large general data sets like ImageNet and then
adapting it for specific use in healthcare applications, for example, detecting disease within
medical images.
Applications in Healthcare:
Medical Imaging: Using models trained on general image datasets but fine-tuning
for specific use in medical conditions such as tumor detection in mammograms.
Text Classification: Adapt NLP models trained over massive textual corpus into use
with specific medical terminology in electronic health records.
7.Ensemble Learning
19
Definition: Ensemble methods combine multiple models for improved accuracy and robustness
of predictions.
20
Techniques
Boosting: Improves weak models through iteratively adding corrections that gradually
improve accuracy. Examples: Gradient Boosting, AdaBoost.
Stacking: Blends predictions from many models using another model to make
the actual prediction.
Healthcare Applications:
7. Federated Learning
Techniques:
Federated Averaging: Fed Avg is a method that lets the different instances update their
local models independently, with only the model weights being shared. Such an approach
does not transfer data.
Applications in Healthcare:
2. Predictive Care: Machine learning predicts the rates of rehospitalization for patients,
enhancing care management.
4. Health Support: These ML-based chatbots and virtual assistants provide health information
and guidance to the users.
5. Data privacy and fairness: Maintaining privacy in patient data and ensuring fair usage of the
ML technology End
22
CHAPTER 5: COMPARATIVE EVALUATION
1. Evaluation Metrics
Several evaluation metrics are used to compare machine learning models. These depend on
the type of problem to which it is being applied (classification, regression, etc.) and the purpose
of the particular task at hand. Typical evaluation metrics include:
Accuracy: The number of correct classifications in the total number of instances.The terms precision,
recall, and F1-score are very helpful for imbalanced datasets. Precision calculates the percentage of
true positives against the number of predicted positives; recall calculates the percentage of true
positives among all actual positives; the F1-score is the harmonic mean between precision and recall.
23
2. Model Comparison Criteria
This measures the overall correctness of the model in making predictions. It is calculated as the
ratio of correctly predicted instances to the total instances. In healthcare, accuracy is crucial, but it
may not be sufficient in imbalanced datasets.
Computational Efficiency:
How much time and computational resources does the model require in terms of training
and prediction time? Some models, for example, deep learning-based models, may need much
more computation as compared with a simpler model like logistic regression.
Dataset framework
24
FUTURE SCOPE
1. Explainable AI (XAI)
Current challenge: for most machine learning models and deep learning models: They a
appear to be "black boxes." Their decision-making process is hard to interpret.
Future scope: The development of Explainable AI (XAI) aims to make machine learning
models more transparent and interpretable, allowing users to understand how and why a model
arrived at a particular decision. This is critical for applications in sensitive areas such as
healthcare, finance, and legal systems, where model transparency is necessary for trust and
accountability.
2. Federated Learning
Current challenge: There is significant concern over data sharing about medical, financial or
personal details, hindering the development of global machine learning systems.
Future prospects: With Federated Learning, models can now be trained on devices or servers
holding local data in decentralized locations without transferring the data itself. That way,
collaborative machine learning will be done while allowing for privacy and security over
sensitive information.
Personalized Medicine: ML algorithms will analyze genetic data, medical history, and
lifestyle to create tailored treatment plans.
25
Early Diagnosis and Predictive Analytics: ML can help detect diseases like cancer,
Alzheimer's, and heart disease earlier by identifying patterns in medical data such as imaging,
lab results, and patient history.
Drug Discovery: ML can speed up discovering and testing of drugs, saving time and money
by predicting how the drugs will react with the body and disease processes.
Algorithmic Trading: ML algorithms will become even more important to automate as well
as optimize trading decisions by identifying patterns in the market and forecasting stock trends.
Fraud Detection: ML systems will become more competent in detecting fraud by analyzing
transaction patterns, customer behavior, and other external data sources.
Climate Modeling: ML will improve the prediction of climate change through an enormous
amount of environmental data that can be analyzed.
Smart Grids: AI will enable better management of the delivery of electricity and minimize waste.
6. Education
Automated Grading: Through ML, the grading of assignments, essays, and exams will be
automated, allowing for easier feedback on a broader scale.
26
CONCLUSION
27
REFERENCES
[1] American Brain Tumor Association (ABTA), “Brain tumor statistics,” accessed Aug. 12,
2017. http://www.abta.org/about-us/news/braintumor-statistics/
[2] World Health Organization, Mental health: a call for action by world health ministers.
Geneva: World Health Organization, Department of Mental Health and Substance Dependence,
2001.http://www.foxnews.com/health/2012/02/10/hypothyroidismversuhyperthyroidism.html
(accessed dec 2015)
[3] Trends in maternal mortality 2000 to 2017: estimates by who, unicef, unfpa, world bank group
and the united nations population division. https://www.unfpa.org/featured- publication/trends-
maternal-mortality2000-2017. Accessed 10 Jan 2021.
[6] 2022 IEEE VLSI Device Circuit and System (VLSI DCS), 2022
[7] https://produccioncientifica.usal.es/documentos/6223bbf25af2aa3bfdb8679f?lang=en
[8] Choi, E., et al. (2016). Doctor AI: Predicting clinical events via recurrent neural networks.
Proceedings of the 2016 International Conference on Health Informatics, 301-307.
[9] https://pubmed.ncbi.nlm.nih.gov/38349828/
28