ai-05-00126 2
ai-05-00126 2
ai-05-00126 2
1 Department of Computer Science, Kent State University, Kent, OH 44242, USA; dbhati@kent.edu
2 Rutgers Business School, Rutgers University, Newark, NJ 07102, USA; ds1640@scarletmail.rutgers.edu
3 Department of Computer Science, West Chester University, West Chester, PA 19383, USA;
mamiruzzaman@wcupa.edu
* Correspondence: neha@kent.edu
Abstract: ChatGPT, developed by OpenAI, is a large language model (LLM) that leverages artificial
intelligence (AI) and deep learning (DL) to generate human-like responses. This paper provides a
broad, systematic review of ChatGPT’s applications in healthcare, particularly in enhancing patient
engagement through medical history collection, symptom assessment, and decision support for
improved diagnostic accuracy. It assesses ChatGPT’s potential across multiple organ systems and
specialties, highlighting its value in clinical, educational, and administrative contexts. This analy-
sis reveals both the benefits and limitations of ChatGPT, including health literacy promotion and
support for clinical decision-making, alongside challenges such as the risk of inaccuracies, ethical
considerations around informed consent, and regulatory hurdles. A quantified summary of key
findings shows ChatGPT’s promise in various applications while underscoring the risks associated
with its integration in medical practice. Through this comprehensive approach, this review aims to
provide healthcare professionals, researchers, and policymakers with a balanced view of ChatGPT’s
potential and limitations, emphasizing the need for ongoing updates to keep pace with evolving
medical knowledge.
Keywords: ChatGPT; large language models (LLM); artificial intelligence (AI); deep learning; digital
health; healthcare; health literacy; medicine
1. Introduction
Citation: Neha, F.; Bhati, D.; Shukla,
D.K.; Amiruzzaman, M. ChatGPT: Artificial Intelligence (AI) has transformed various sectors, including finance, educa-
Transforming Healthcare with AI. AI tion, transportation, manufacturing, retail, agriculture, entertainment, telecommunications,
2024, 5, 2618–2650. https:// and cybersecurity. Among these, healthcare is one of the most significantly impacted
doi.org/10.3390/ai5040126 areas [1]. In healthcare, AI enhances diagnostic accuracy, streamlines administration, and
improves patient care by analyzing large amounts of data using machine learning (ML),
Academic Editor: Arslan Munir
natural language processing (NLP), and deep learning (DL) [2]. Applications range from
Received: 9 October 2024 early disease detection to automated patient management, significantly improving out-
Revised: 9 November 2024 comes and reducing costs. However, despite these advancements, the healthcare sector
Accepted: 21 November 2024 continues to face persistent challenges such as limited accessibility to health information,
Published: 2 December 2024 shortages of healthcare personnel, and the need for enhanced patient engagement.
A key driver of AI’s impact in healthcare is the emergence of Large Language Models
(LLMs), which are essential for NLP tasks [3]. These models mimic human language
processing using neural networks trained on extensive text datasets and excel in tasks
Copyright: © 2024 by the authors.
such as machine translation, text generation, and summarization. Integrating LLMs into
Licensee MDPI, Basel, Switzerland.
This article is an open access article
healthcare enables professionals to efficiently process vast amounts of medical literature,
distributed under the terms and
make informed decisions, and improve communication with patients, building upon AI’s
conditions of the Creative Commons foundational contributions to the field. Moreover, LLMs can assist in mitigating healthcare
Attribution (CC BY) license (https:// personnel shortages by automating routine tasks and providing decision support, allowing
creativecommons.org/licenses/by/ healthcare professionals to focus on more critical aspects of patient care.
4.0/).
One notable LLM is the Generative Pre-trained Transformer (GPT), particularly Chat-
GPT, which has shown impressive results in healthcare-specific evaluations, including
medical exams and datasets like MedMCQA and PubMedQA [4]. This highlights the grow-
ing potential of conversational models in healthcare, where they can assist with patient
communication and decision support. For instance, ChatGPT can provide patients with
accessible health information, answer common medical queries, and offer preliminary
guidance, enhancing patient engagement and empowerment. In clinical settings, it can
aid clinicians by summarizing patient records, suggesting possible diagnoses, and offering
evidence-based recommendations, thus improving clinical decision-making.
ChatGPT stands out for its capacity to continuously learn and improve through inter-
actions, delivering increasingly accurate and context-aware responses [5]. This adaptability
makes it a valuable tool in healthcare, supporting tasks such as answering medical ques-
tions, resolving technical issues, and automating administrative functions. Additionally, its
scalability enhances healthcare operations, streamlining patient triaging and information
management, vital for improving overall healthcare delivery. In educational contexts,
ChatGPT can serve as a resource for medical students and professionals seeking to stay
updated with the latest research, supporting ongoing education and knowledge dissemina-
tion. In administrative applications, it can automate scheduling, handle billing inquiries,
and manage patient records, alleviating the burden on administrative staff and addressing
personnel shortages.
Despite its promise, challenges remain, particularly concerning real-world applicabil-
ity and the ethical implications of AI-driven decision-making in medicine. Ethical issues
such as patient privacy, data security, algorithmic bias, and the potential for misdiagnosis
must be carefully considered. There is also a need to establish clear regulatory frameworks
and guidelines to ensure the safe and effective integration of ChatGPT into healthcare practices.
Based on an extensive review of existing research on ChatGPT’s transformative role in
healthcare, our study offers several key contributions, each providing quantified insights
where applicable to strengthen the understanding of ChatGPT’s impact:
1. Comprehensive Background: We deliver an in-depth overview of NLP, LLMs, GPT
architecture, and ChatGPT, detailing their evolution and foundational technologies.
2. Clinical Relevance: Our analysis demonstrates how ChatGPT is reshaping patient
care, administrative workflows, and medical research. Studies have shown that
ChatGPT can reduce administrative workloads, freeing healthcare professionals to
focus more on direct patient care.
3. ChatGPT Applications Across Organ Systems: We systematically review ChatGPT’s
effectiveness across various medical specialties, including its roles in diagnostics,
treatment recommendations, patient education, and clinician-patient communication.
In fields such as dermatology and nephrology, ChatGPT has shown promising prelim-
inary accuracy in providing educational information and supporting patient self-care.
4. Risk Analysis: Our risk analysis addresses the reliability, accuracy, and ethical con-
cerns surrounding ChatGPT’s use in healthcare. We evaluate these challenges through
current methodologies, emphasizing the importance of mitigating misinformation
and ensuring patient safety.
5. Future Directions: Identifying research gaps, we propose a taxonomy categorizing
the literature on ChatGPT applications, facilitating a structured understanding of its
diverse healthcare roles and highlighting areas for future investigation.
The paper is structured as follows: Section 2 outlines the research methodology.
Section 3 provides background on NLP, LLMs, GPT, and ChatGPT. Section 4 discusses Chat-
GPT’s role in empowering patients. Section 5 reviews its applications across various organ
systems. Section 6 discusses potential risks of using ChatGPT in healthcare. Section 7 offers
a general discussion on ChatGPT in healthcare. Section 8 highlights the study’s limitations,
Section 9 suggests future research directions, and Section 10 concludes the paper.
AI 2024, 5 2620
2. Research Methodology
To explore the applications of ChatGPT in healthcare, we employed a systematic review
methodology, following established preferred reporting items for systematic reviews and
meta-analyses (PRISMA) protocols to ensure a thorough and unbiased analysis as shown
in Figure 1. The review involved the following key steps:
Research Objective
The primary objectives of this review were:
• RO1: To provide an overview of ChatGPT, detailing its functionalities and rationale
for integration into healthcare systems.
• RO2: To explore scope of ChatGPT in healthcare, focusing on its assistance with
routine tasks like patient management.
• RO3: To see how ChatGPT is utilized in managing various organ-related diseases, analyz-
ing its effectiveness in diagnostics, treatment recommendations, and patient education.
• RO4: To identify the significant applications and limitations of ChatGPT, examining
its impact on patient care, administrative efficiency, and potential inaccuracies in
record handling.
This study seeks to clarify the role of ChatGPT in enhancing healthcare delivery and
provide a robust framework for understanding the implications of integrating AI tools into
healthcare practices, particularly in the organ-specific context.
3. Background
In this section, we will cover key concepts related to NLP, Transformer architecture,
LLMs, enhancements through bidirectional language representation, and the development
of ChatGPT.
QK T
Attention( Q, K, V ) = softmax √ V
dk
In the formula, T stands for the transpose of the matrix K. In the context of the
attention mechanism in transformers, taking the transpose of the keys matrix K is
crucial for the matrix multiplication with the queries matrix Q. This operation aligns
the dimensions appropriately, allowing each query to be compared against all keys,
which is necessary for calculating the attention scores across the input sequence.
The range of the attention scores in a transformer model is from 0 to 1. These scores are
derived through the softmax function, which normalizes the computed scores so that
they add up to 1 across each position in the input sequence for a given output position.
A high attention score between a particular query (representing an output position)
and a key (an input position) indicates that the input at that position is very relevant
for producing the corresponding output. In other words, the transformer “focuses”
more on that part of the input sequence.
The model does assign scores to all parts of the input sequence, but not all parts are
weighed equally. The final output at each position is a weighted sum of all values V,
where the weights are the attention scores. This allows the transformer to dynamically
focus on different parts of the input sequence depending on the context required by
each output element. This mechanism allows transformers to focus on relevant parts
of the input sequence, enhancing performance across various NLP tasks.
Large Language Models (LLMs), such as Bidirectional Encoder Representations from
Transformers (BERT) and Generative Pre-trained Transformer (GPT), are built upon the
transformer architecture and leverage massive datasets to learn language patterns, struc-
tures, and semantics [13–15]. These models have demonstrated remarkable capabilities in
generating coherent text, answering questions, and performing language translations.
left-to-right and right-to-left [17,18] This feature improves the model’s understanding of
polysemous words, contextual nuances, and intricate relationships between words in
a sentence.
The core components of BERT’s architecture include:
• Multi-Head Self-Attention: BERT utilizes multiple attention heads to capture various
contextual meanings of words simultaneously. Each head learns to focus on different
parts of the input sequence, allowing the model to understand complex dependencies.
• Masked Language Modeling (MLM): During pre-training, BERT randomly masks a
percentage of input tokens and trains the model to predict these masked tokens based
on their context. This approach enables the model to learn a rich understanding of
language structures.
• Next Sentence Prediction (NSP): BERT is also trained with a next sentence prediction
objective, where it learns to predict whether a given sentence follows another in a text.
This helps the model understand sentence relationships, which is crucial for tasks like
question answering.
Influential medical LLMs, such as PubMedBERT [19], ClinicalBERT [20], and BioBERT [21],
leverage BERT’s architecture, fine-tuning it on medical datasets to achieve state-of-the-art
performance across various medical Natural Language Processing (NLP) tasks.
Incorporating existing medical knowledge bases, such as the Unified Medical Lan-
guage System (UMLS) [22], into language models further enhances their capabilities. The
integration of domain-specific terminologies and ontologies helps the model understand
medical jargon and relationships more effectively.
Moreover, studies have shown that pre-training LLMs on diverse datasets, even those
not directly related to healthcare, yield improved performance on medical NLP tasks
compared to training solely on domain-specific datasets [23]. This approach highlights the
importance of comprehensive data exposure, as it enables models to generalize better and
understand a wider range of language variations.
3.4. ChatGPT
ChatGPT is a conversational AI model developed by OpenAI, based on the GPT
architecture [13]. Its evolution traces back to the initial GPT model introduced in 2018,
followed by iterations such as GPT-2 and GPT-3 [24,25]. Each version exhibited exponential
growth in parameters and training data, enhancing language understanding and generation
capabilities. The architecture is designed to predict the next word in a sentence, leveraging
extensive training on diverse internet text.
The evolution of ChatGPT can be summarized as follows:
• GPT: The initial model introduced the transformer architecture, outperforming previ-
ous RNN models [26].
• GPT-2: Increased parameters from 117 million to 1.5 billion, showcasing the ability
to generate coherent text; its release was initially withheld due to concerns about
misuse [27].
• GPT-3: Expanded to 175 billion parameters, significantly enhancing language gen-
eration capabilities and gaining attention for its versatility across various tasks with
minimal fine-tuning [28].
ChatGPT itself emerged between 2020 and 2021, specifically fine-tuned for conver-
sational tasks [28]. OpenAI later introduced subscription models like ChatGPT Plus to
provide users with access to more powerful versions while continuously improving the
model’s accuracy and reducing biases.
age these benefits, it is crucial to explore how ChatGPT can be deployed across various
healthcare settings and to identify potential infrastructure and resource challenges that may
arise. This section explores key applications of ChatGPT in healthcare, including patient
education and support, clinical monitoring, information access, administrative tasks, health
promotion, research, and emergency response. Figure 2 gives a snippet of the areas covered
in this section.
it assists patients in monitoring their conditions, sending reminders for check-ups and
medications, and managing their daily health routines effectively.
Nonetheless, integrating ChatGPT into clinical workflows presents infrastructure and
resource challenges. Compatibility with existing Electronic Health Record (EHR) systems
is essential but do require significant IT investment and technical expertise. Ensuring
data privacy and compliance with regulations like Health Insurance Portability and Ac-
countability Act (HIPAA) adds complexity, necessitating secure servers and encryption
protocols. Smaller healthcare facilities struggle with these requirements due to limited
budgets and technical staff. Overcoming these hurdles involves strategic planning, poten-
tial partnerships with technology providers, and seeking funding or grants dedicated to
healthcare innovation.
5.1. Kidney
Kidney is a vital organ in the human body responsible for filtering blood to remove
waste products, excess fluids, and toxins [39–41]. It plays a crucial role in maintaining
overall health by regulating fluid balance, electrolytes (such as sodium and potassium), and
blood pressure. Given the complexity of kidney function and its central role in maintaining
AI 2024, 5 2627
health, advancements in AI, such as ChatGPT, are increasingly being explored to support
kidney disease diagnosis and management. Table 2 summarizes the recent studies which
highlight the integration of ChatGPT into the fields of kidney cancer and nephrology.
5.2. Pharynx
Pharynx, a muscular tube that connects the nose and mouth to the esophagus and
larynx, plays a crucial role in swallowing, breathing, and vocalization [49]. Disorders of the
pharynx and related structures, such as the larynx, are complex and often require precise
diagnosis and treatment strategies.
Lechien et al. evaluates ChatGPT’s performance in managing laryngology and head
and neck cases [50]. It found that ChatGPT achieved 90% accuracy in differential diagnoses
and 60.0–68.0% accuracy in treatment options. However, it was noted that ChatGPT tends to
over-recommend tests and misses some important examinations. The findings suggest that
AI 2024, 5 2628
while ChatGPT can serve as a promising adjunctive tool in laryngology and head and neck
practice, there is a need for refinement in its recommendations for additional examinations.
5.3. Heart
Cardiovascular system, responsible for circulating blood and delivering oxygen and
nutrients to the body, is critical for maintaining overall health [51]. As cardiovascular
diseases remain a leading cause of mortality worldwide, improving patient care and
education in this area is essential.
Table 3 summarizes the recent studies which highlight the integration of ChatGPT
into the field of cardiovascular health advice, highlighting various studies that evaluate its
capabilities, limitations, and implications for patient education and care.
Anaya et al. Heart Failure Readability Answers longer but AI chatbots can
(2024) [53] Education evaluation readable; low enhance patient
actionability score. education, further
research needed.
King et al. Heart Failure Knowledge GPT-4 showed 100.0% ChatGPT could be
(2024) [54] Question and evaluation of accuracy; GPT-3.5 had a valuable
Answers GPT-3.5 and GPT-4 >94.0% accuracy. educational
resource for
patients.
5.4. Brain
Brain, as the central organ of the nervous system, plays a crucial role in controlling
bodily functions, processing information, and facilitating cognitive processes such as
memory, learning, and emotional regulation [57]. Given the complexity of neurological
disorders and the increasing prevalence of brain-related health issues, enhancing our
understanding and management of brain health is essential. Keeping this in mind, Table 4
summarizes the recent studies which highlight the integration of ChatGPT into the field of
brain health.
AI 2024, 5 2629
Demonstrated ChatGPT’s
ability to benchmark
Highlights importance of
Method for theory physical theories through
Adesso (2023) [59] Brain related Discovery effective AI integration
evaluation a gamified environment;
in research.
promotes AI-human
collaboration.
5.5. Thyroid
Thyroid, a butterfly-shaped gland located in the neck, plays a crucial role in regulating
metabolism, energy levels, and overall hormonal balance [61]. Given its importance in
various physiological processes, recent studies are highlighting the integration of ChatGPT
into the field of thyroid health. Table 5 provides a summary of these studies.
Stevenson et al. Thyroid Comparison with ChatGPT and Google Safety concerns
(2024) [64] Function Test practicing Bard only interpreted highlighted; AI
Interpretation biochemists 33.3% and 20.0% cannot replace
correctly, respectively. human
consultations for
test interpretation.
Helvaci et al. Thyroid Cancer Accuracy and Moderately accurate Useful for general
(2024) [65] Information reliability (76.7%) for general inquiries but
assessment information; effective insufficient for
in offering emotional specific case
support. management.
Cazzato et al. conducted a review to evaluate the potential of ChatGPT in the field
of pathology, analyzing five relevant publications out of an initial 103 records [66]. The
AI 2024, 5 2630
findings indicated that while ChatGPT holds promise for assisting pathologists by provid-
ing substantial amounts of scientific data, it also faces significant limitations, including
outdated training data and the occurrence of hallucinations. The review featured a query
session addressing various pathologies, emphasizing that ChatGPT can aid the diagnostic
process but should not be relied upon for clinical decision-making. Overall, the study
concluded that ChatGPT’s role in pathology is primarily supportive, necessitating further
advancements to overcome its current challenges.
5.6. Liver
Liver, an organ responsible for numerous functions including detoxification, metabolism,
and production of essential proteins, plays a crucial role in maintaining overall health [67].
Due to its significance in various diseases, many research studies are exploring applications
of ChatGPT, to enhance the understanding and management of liver-related conditions.
In related works, Yeo et al. assessed ChatGPT’s accuracy and reproducibility in
answering questions about cirrhosis and hepatocellular carcinoma (HCC) management [68].
The study reported high overall accuracy, with ChatGPT scoring 79.1% for cirrhosis and
74.0% for HCC. However, comprehensive responses were limited, particularly in areas of
diagnosis and regional guidelines. Despite these limitations, ChatGPT provided practical
advice for patients and caregivers, suggesting its potential as an adjunct informational tool
to improve patient outcomes in cirrhosis and HCC management.
In another study, Yeo et al. compared the capabilities of ChatGPT and GPT-4 in
responding to cirrhosis-related questions across multiple languages, including English,
Korean, Mandarin, and Spanish [69]. The results indicated that GPT-4 significantly out-
performed ChatGPT in both accuracy and comprehensiveness, especially in non-English
responses, with notable improvements in Mandarin and Korean. This underscores GPT-4’s
potential to enhance patient care by addressing language barriers and promoting equitable
health literacy globally.
Table 6. Related Studies on ChatGPT in Gastrointestinal Pathology and Large Intestine Management.
Highlights potential as an
Professional-directed informative tool; need for
Performance evaluation
Cankurtaran et al. Inflammatory Bowel responses scored higher in improvement in
for healthcare
(2023) [71] Disease reliability and usefulness information quality for
professionals and patients
than patient-directed ones. both patients and
professionals.
Benefits in summarizing
charts, education, and Emphasizes enhancement
Assessment of
research; limitations of human expertise in
Ma (2023) [72] Gastrointestinal Pathology applications in digital
include biases and healthcare quality rather
pathology
inaccuracies from training than replacement.
datasets.
5.8. Pancreas
Pancreas, an essential gland located behind the stomach, plays a vital role in digestion
and glucose regulation by producing digestive enzymes and hormones such as insulin [74].
Recognizing its significance in metabolic and digestive disorders, ChatGPT is being applied
to improve the understanding and management of pancreatic health.
In one of the related works, Du et al. assessed the performance of ChatGPT-3.5 and
ChatGPT-4.0 in answering questions related to acute pancreatitis (AP) [75]. The study
found that ChatGPT-4.0 achieved a higher accuracy rate than ChatGPT-3.5, answering
94.0% of subjective questions correctly compared to 80%. It also performed better on
objective questions, with an accuracy of 78.1% versus 68.5%, with a statistically significant
difference (p = 0.01). The concordance rates between the two versions were reported as
80.8% for ChatGPT-3.5 and 83.6% for ChatGPT-4.0. Both models excelled particularly
in the etiology category, highlighting their potential utility in improving awareness and
understanding of acute pancreatitis.
Qiu et al. evaluated the accuracy of ChatGPT-3.5 in answering clinical questions
based on the 2019 guidelines for severe acute pancreatitis [76]. The results indicated that
ChatGPT-3.5 was more accurate when responding in English (71.0%) compared to Chinese
(59.0%), although the difference was not statistically significant (p = 0.203). Furthermore,
the model performed better on short-answer questions (76%) compared to true/false
questions (60.0%) (p = 0.405). While ChatGPT-3.5 shows potential value for clinicians
managing severe acute pancreatitis, the study suggests it should not be overly relied upon
for clinical decision-making.
5.9. Bladder
Bladder, a hollow muscular organ, plays a crucial role in storing and expelling urine,
which is essential for maintaining fluid balance and overall health [77]. Table 7 summarizes
recent studies which have explored the integration of ChatGPT into the field of bladder health.
ChatGPT provided
Caution is needed when
Urological Diagnoses Assessment of ChatGPT’s partially correct answers,
using AI for clinical
Braga et al. (2024) [79] (including Bladder responses to specific but critical details were
decision-making due to
Cancer) urological conditions missing in certain
incomplete responses.
conditions.
Table 7. Cont.
ChatGPT correctly
Can be a valuable tool in
answered 94.6% of FAQs
urology clinics, aiding
Comparative analysis of with no completely
Cakir et al. (2024) [83] Urolithiasis Information patient understanding
FAQs and EAU guidelines incorrect responses; 83.3%
when supervised
top score for
by urologists.
guideline questions.
5.10. Pituitary
Pituitary gland, often referred to as the “master gland”, plays a critical role in regulat-
ing various hormonal functions throughout the body, including growth, metabolism, and
stress response [85]. Given its central role in endocrine health, ChatGPT is being studied to
improve the understanding and management of pituitary disorders, including adenomas.
In one of the related works, Sambangi et al. evaluated the accuracy, readability, and
grade level of ChatGPT responses regarding pituitary adenoma resection, using different
prompting styles: physician-level, patient-friendly, and no prompting as a control [86]. The
study found that responses without prompting were longer, while physician-level and
patient-friendly prompts resulted in more concise answers. Patient-friendly prompting
led to significantly easier-to-read responses. The accuracy of responses was highest with
physician-level prompting, although the differences among prompting styles were not
statistically significant due to the small sample size. Overall, the study suggests that
ChatGPT has potential as a patient education tool, though further development and data
collection are needed.
Şenoymak et al. assessed ChatGPT’s ability to respond to 46 common queries re-
garding hyperprolactinemia and prolactinoma, evaluating accuracy and adequacy using
Likert scales [87]. The median accuracy score was 6.0, indicating high accuracy, while the
adequacy score was 4.5, reflecting generally adequate responses. Significant agreement
was found between two independent endocrinologists assessing the responses. However,
pregnancy-related queries received the lowest scores for both accuracy and adequacy, indi-
cating limitations in ChatGPT’s responses in medical contexts. The findings suggest that
while ChatGPT shows promise, there is a need for improvement, particularly regarding
pregnancy-related information.
Taşkaldıran et al. examined the accuracy and quality of ChatGPT-4’s responses to ten
hyperparathyroidism cases discussed at multidisciplinary endocrinology meetings [88].
Two endocrinologists independently scored the responses for accuracy, completeness,
and overall quality. Results showed high mean accuracy scores (4.9 for diagnosis and
treatment) and completeness scores (3.0 for diagnosis, 2.6 for further examination, and
2.4 for treatment). Overall, 80.0% of responses were rated as high quality, suggesting
that ChatGPT can be a valuable tool in healthcare, though its limitations and risks should
be considered.
AI 2024, 5 2633
5.11. Uterus
Uterus, a vital organ in the female reproductive system, plays a crucial role in men-
struation, pregnancy, and childbirth [89]. Recognizing its importance in women’s health,
application of ChatGPT to improve the understanding, diagnosis, and management of
uterine and gynecological conditions is being explored.
Table 8 summarizes the integration of ChatGPT into the field of uterus and gyneco-
logic health.
Patel et al. (2024) [90] Genetic Counseling for Assessment of 40 ChatGPT achieved 82.5% ChatGPT could be a
Gynecologic Cancers questions with accuracy; 100.0% accuracy in valuable resource for
oncologist input genetic counseling category; patient information,
88.2% for hereditary needing further
breast/ovarian cancer; 66.6% oncologist input for
for Lynch syndrome. comprehensive education.
Peled et al. (2024) [91] Obstetric Questions Evaluation by 20 75.0% of responses rated ChatGPT can provide
from pregnant obstetric experts positive; accuracy mean of accurate obstetric
individuals 4.2; completeness and safety responses but requires
lower, at means of 3.8 and caution regarding
3.9. maternal and fetal safety.
Psilopatis et al. Intrauterine Growth Assessment of Most responses about ChatGPT could assist in
(2024) [92] Restriction comprehension of definitions and timing were clinical practice but
S2k (a specific set adequate; over half of responses require expert
of clinical practice delivery mode suggestions supervision for accuracy.
guidelines needed correction.
developed by the
German Society for
Gynecology and
Obstetrics
(DGGG))
Winograd et al. Female Puberty Evaluation of 60.0% of responses deemed While generally accurate,
(2024) [93] responses to ten acceptable; 40.0% further study and
puberty questions unacceptable; no verifiable development are needed
references provided. before endorsing
ChatGPT for adolescent
health information.
5.12. Skin
Skin, the body’s largest organ, serves as a critical barrier protecting against external threats
while playing essential roles in thermoregulation, sensation, and immune response [94].
Given its importance, Lantz examined the use of ChatGPT in a case report involving
a critically ill african american woman diagnosed with toxic epidermal necrolysis (TEN),
which affected over 30.0% of her body surface area [95]. The condition, triggered by medi-
cations, poses a high mortality risk, and the report highlighted the challenges of identifying
the offending drug due to the patient’s complex medical history. It also discussed potential
genetic or epigenetic predispositions in African Americans to conditions such as Stevens-
Johnson syndrome (SJS) and TEN, underscoring the necessity for increased representation
of skin of color in medical literature. While the report acknowledged the advantages of
utilizing ChatGPT in medical documentation, it also pointed out its limitations and the
need for careful consideration of its use in clinical settings.
Table 9 summarizes the integration of ChatGPT into the field of dermatology and skin
health in the recent works.
AI 2024, 5 2634
Sanchez-Zapata et al. Inflammatory Evaluated ChatGPT’s Responses were generally Suggests that ChatGPT
(2024) [96] Dermatoses quality in answering rated between can provide valuable
questions on conditions “acceptable” to “very primary information on
like acne and psoriasis, good”, with median skin conditions when
rated by scores around 4, used cautiously
dermatology residents indicating potential as a by clinicians.
patient information tool.
Passby et al. Dermatology Assessed ChatGPT-3.5 ChatGPT-3.5 scored Indicates that advanced
(2024) [97] Examination and ChatGPT-4 on 84 63.0%, while ChatGPT-4 AI can effectively answer
Performance questions from the achieved 90.0%, clinical questions, though
Specialty Certificate exceeding typical pass its limitations in complex
Examination in marks and highlighting cases must be
Dermatology its potential in acknowledged for
medical education. patient safety.
Mondal et al. Dermatological Evaluated ChatGPT’s Generated texts averaged Suggests that while
(2023) [99] diseases Education capability in generating 377 words with ChatGPT can produce
educational content on satisfactory accuracy; useful educational
dermatological diseases however, a high text content, generated texts
similarity index (27.1%) should be reviewed by
raised plagiarism doctors to mitigate
concerns. plagiarism risks.
5.14. Mouth
Mouth is essential for functions like eating, speaking, and breathing, and its health
significantly impacts overall well-being [102]. Early detection of oral cancer is crucial for
improving treatment outcomes, and AI applications like ChatGPT are being applied for
their potential to enhance awareness and education about oral health.
In recent work, Hassona et al. evaluated the quality, reliability, readability, and useful-
ness of ChatGPT in promoting early detection of oral cancer [103]. The study analyzed a
total of 108 patient-oriented questions, with ChatGPT providing “very useful” responses
for 75.0% of the inquiries. The mean Global Quality Score was 4.24 out of 5, and the
reliability score was high, achieving 23.17 out of 25. However, the mean actionability score
was notably lower at 47.3%, and concerns were raised regarding readability, reflected in
a mean Flesch-Kincaid Score (FKS) reading ease score of 38.4%. Despite these readability
challenges, no misleading information was found, suggesting that ChatGPT could serve
AI 2024, 5 2635
as a valuable resource for patient education regarding oral cancer detection. Table 10
summarizes ChatGPT’s integration into mouth health.
Assessed ChatGPT-3’s
Effective as an adjunct tool
ability to identify Achieved 100.0% accuracy
for information, but lacks
radiographic anatomical in describing radiographic
Mago and Sharma Oral and Maxillofacial detail, limiting its use as a
landmarks and landmarks, with mean
(2023) [105] Radiology primary reference; can
understand pathologies scores of 3.94, 3.85, and
enhance knowledge and
using an 80-question 3.96 across categories.
reduce patient anxiety.
questionnaire
Reviewed the impact of Current research is limited, While LLMs may enhance
LLMs like ChatGPT in primarily addressing certain healthcare aspects,
Oral and Maxillofacial OMS, identifying 57 scientific writing and ethical and regulatory
Puladi et al. (2024) [106]
Surgery records with 37 relevant patient communication, concerns need to be
studies focusing on with classic OMS diseases resolved before
GPT-3.5 and GPT-4 underrepresented. widespread adoption.
5.15. Lung
Lungs are essential organs in the respiratory system, responsible for gas exchange
and oxygenating the blood. Lung health is critical, as conditions such as lung cancer can
significantly impact overall well-being and quality of life [107]. Effective diagnosis and
management of lung-related diseases require accurate data extraction and analysis from
medical records, an area where ChatGPT is being explored for their potential.
Fink et al. compared the performance of ChatGPT and GPT-4 in extracting oncologic
phenotypes from free-text CT reports for lung cancer [108]. The study analyzed a total of
424 reports and found that GPT-4 significantly outperformed ChatGPT in several key areas,
including extracting lesion parameters (98.6% for GPT-4 vs. 84.0% for ChatGPT), identifying
metastatic disease (98.1% vs. 90.3%), and labeling oncologic progression, where GPT-4
achieved an F1 score of 0.96 compared to 0.91 for ChatGPT. Additionally, GPT-4 scored higher
on measures of factual correctness (4.3 vs. 3.9) and accuracy (4.4 vs. 3.3) on a Likert scale, with
a notably lower confabulation rate (1.7% vs. 13.7%). Overall, the findings indicate that GPT-4
demonstrated superior capability in data mining from medical records related to lung cancer.
Table 11 summarizes the integration of ChatGPT into the field of lung health.
ChatGPT-3.5 achieved
70.8% accuracy, Highlights risks in
outperforming Google ChatGPT accuracy,
Comparison of Bard at 51.7%, Bing at indicating a critical need
ChatGPT-3.5, Google Bard, 61.7%, and Google search for reliable health
Rahsepar et al. (2023) [109] Lung Cancer
Bing, and Google search at 55.0%. Notably, information tools that can
engines on 40 questions ChatGPT-3.5 and Google enhance patient
search demonstrated understanding and
greater consistency in decision-making.
their responses.
AI 2024, 5 2636
ChatGPT successfully
identified 91 distinct
medications, achieving a
valid therapy quotient Shows promise in
(VTQ) of 0.77, which assisting oncologists with
Evaluated ability to demonstrates good treatment
Lung Cancer Treatment identify therapies for 51 concordance with decision-making, but
Schulte et al. (2023) [112]
Identification advanced solid cancer National Comprehensive underscores the need for
diagnoses Cancer Network (NCCN) accuracy improvements to
guidelines, providing at maximize its clinical
least one utility in oncology.
NCCN-recommended
therapy for each
malignancy.
5.16. Bone
Bones are crucial components of the human skeletal system, providing structure,
support, and protection to vital organs [113]. Bone health is essential for overall well-being,
as conditions such as osteoporosis can lead to increased fracture risk and diminished quality
of life. Ensuring accurate information on bone health and associated disorders is vital for
patient education and management.
In a related study, Ghanem et al. evaluated the accuracy of ChatGPT-3.5 in providing
evidence-based answers to 20 frequently asked questions about osteoporosis [114]. The
responses were reviewed by three orthopedic surgeons and one advanced practice provider,
resulting in an overall mean accuracy score of 91.0%. The responses were categorized as
either “accurate requiring minimal clarification” or “excellent”, with no answers found to be
inaccurate or harmful. Additionally, there were no significant differences in accuracy across
categories such as diagnosis, risk factors, and treatment. While ChatGPT demonstrated
high-quality educational content, the authors recommend it as a supplement to, rather than
a replacement for, human expertise and clinical judgment in patient education. Table 12
summarizes recent work in the field of integrating ChatGPT with bone health.
AI 2024, 5 2637
Son et al. (2023) [115] Bone Metastases Developed a deep The model achieved an Suggests that clinicians
Diagnosis learning model using AUC of 81.6%, sensitivity with basic programming
ChatGPT-3.5 and of 56.0%, and specificity skills can effectively
ResNet50 on bone scans of 88.7%. Class activation leverage AI for medical
from 4,626 cancer patients maps revealed a focus on image analysis,
spinal metastases but potentially improving
confusion with benign clinical decision-making
lesions. and diagnostics.
Cinar (2023) [116] ChatGPT’s Assessed responses to 72 ChatGPT achieved an While showing adequate
Knowledge of FAQs based on National overall accuracy of 80.6%, performance, the study
Osteoporosis Osteoporosis Guideline highest in prevention highlights the need for
Group guidelines (91.7%) and general improvements in
knowledge (85.8%). adherence to clinical
However, only 61.3% guidelines for reliable
aligned with guidelines, patient education.
indicating limitations.
Yang et al. (2024) [117] Diagnostic Evaluated ChatGPT’s Initial diagnostic accuracy Highlights ChatGPT’s
Accuracy of Bone performance on 1,366 was 73.0%, improving to potential to enhance
Tumors imaging reports 87.0% with few-shot diagnostic processes for
diagnosed by experienced learning, achieving bone tumors, while
physicians sensitivity of 99.0% and emphasizing the need for
specificity of 73.0%. collaboration with
Misdiagnoses included experienced physicians in
benign cases clinical settings to
misidentified as mitigate misdiagnosis
malignant. risks.
5.17. Muscles
Muscle health is vital for overall physical function, mobility, and quality of life [118].
It includes the maintenance and improvement of muscle strength, endurance, and flex-
ibility, which are essential for daily activities and overall well-being. Understanding
muscle-related conditions and their management is crucial for effective rehabilitation and
enhancing patient outcomes.
In related works, Sawamura et al. evaluated ChatGPT 4.0’s performance on Japan’s
national exam for physical therapists, specifically assessing its ability to handle complex
questions that involve images and tables [119]. The study revealed that ChatGPT achieved
an overall accuracy of 73.4%, successfully passing the exam. Notably, it excelled in text-
based questions with an accuracy of 80.5%, but faced challenges with practical questions,
achieving only 46.6%, and those requiring visual interpretation, where it scored 35.4%. The
findings suggest that while ChatGPT shows promise for use in rehabilitation and Japanese
medical education, there is a significant need for improvements in its handling of practical
and visually complex questions.
In a study, Agarwal et al. evaluated the capabilities of ChatGPT, Bard, and Bing in gen-
erating reasoning-based multiple-choice questions (MCQs) in medical physiology for MBBS
students [120]. ChatGPT and Bard produced a total of 110 MCQs, while Bing generated 100,
encountering issues with two competencies. Among the models, ChatGPT achieved the
highest validity score of 3, while Bing received the lowest, indicating notable differences in
performance. Despite these variations, all models received comparable ratings for difficulty
and reasoning ability, with no significant differences observed. The findings underscore
the need for further development of AI tools to enhance their effectiveness in creating
reasoning-based MCQs for medical education. Table 13 summarizes the integration of
ChatGPT into the field of muscle health.
AI 2024, 5 2638
Saluja and Tigga Anatomy Evaluated ChatGPT-4’s ChatGPT proved useful Highlights the potential
(2024) [121] Education effectiveness in for clinical relevance of ChatGPT as an
explaining anatomical explanations and educational tool for
structures, generating summarizing material. medical students,
quizzes, and However, it struggled emphasizing that while it
summarizing lectures with accurately depicting can enhance teaching, it
anatomical images, cannot replace the role of
especially complex teachers in
structures. anatomy education.
Kaarre et al. Information on Evaluated ChatGPT’s ChatGPT achieved Reinforces the notion that
(2023) [122] ACL Surgery responses to ACL approximately 65.0% ChatGPT can aid in
surgery-related questions accuracy, demonstrating patient education but
aimed at patients and adaptability in providing cannot substitute the
non-orthopaedic medical relevant information, but nuanced understanding
doctors, with assessments it should be viewed as a of experienced
from four supplementary tool rather medical professionals.
orthopaedic surgeons than a replacement for
orthopaedic expertise due
to its limitations in
understanding complex
medical concepts.
Meng et al. Genu Valgum Developed a deep The combined approach Promises a method for
(2024) [123] Prediction learning architecture for outperformed a baseline assessing genu valgum,
predicting genu valgum model, achieving an emphasizing the potential
using non-contact pose accuracy of 77.2%, of ChatGPT in enhancing
analysis data combined showcasing ChatGPT’s the capabilities of
with ChatGPT-generated effectiveness in semantic traditional medical
features from information extraction for assessments through
subject images medical imaging integrated technologies.
applications.
Mantzou et al. Quality of Assessed the efficacy of Results showed Indicates that while
(2024) [124] Responses on ChatGPT in answering variability in response ChatGPT can provide
Musculoskeletal questions related to quality; 50.0% of useful insights, its
Anatomy musculoskeletal anatomy responses were rated as reliability as an
at different time points, good quality, and 66.6% independent learning
rated by three experts consistent across time resource for
using a 5-point points. However, musculoskeletal anatomy
Likert scale low-quality responses is limited, necessitating
frequently contained validation against
significant mistakes or established anatomical
conflicting information. literature.
Li et al. (2023) [125] Mobile Evaluated the clinical 278 patients will be Suggests that integrating
Rehabilitation for efficacy and assigned to an ChatGPT with wearable
Osteoarthritis cost-effectiveness of a intervention group technology could enhance
mobile rehabilitation receiving personalized rehabilitation service
system integrating exercise therapy through efficiency and availability,
ChatGPT-4 and wearable mobile platforms and potentially improving
devices for patients with wearables, while a control therapeutic outcomes for
osteoarthritis and group receives traditional patients with
sarcopenia in a face-to-face therapy. muscle-related
prospective randomized Outcome measures will conditions.
trial include pain assessment
and functional scores at
multiple time points over
six months.
For instance, a patient seeking reassurance might receive a clinically accurate but
emotionally insensitive response, impacting their psychological well-being. Incorporat-
ing natural language processing techniques that recognize and respond appropriately to
emotional cues can enhance patient interactions. Complementing ChatGPT with human
support, especially in mental health contexts, can ensure that patients receive both accurate
information and the necessary emotional support.
7. Discussion
ChatGPT, developed by OpenAI, demonstrates substantial potential in healthcare by
offering a range of benefits alongside critical challenges that require careful consideration.
With its ability to generate human-like responses, ChatGPT can assist clinicians, improve
patient care, and facilitate communication in diverse healthcare settings. A primary advan-
tage of ChatGPT is its capacity for 24/7 patient support, allowing immediate assistance in
non-critical situations with minimal human intervention.
For instance, ChatGPT can help patients schedule appointments, answer inquiries,
and send medication reminders—enhancing patient engagement and overall satisfaction.
This functionality is particularly valuable in rural or underserved areas where access to
healthcare professionals may be limited. In addition, ChatGPT can act as a clinical decision-
support tool by analyzing symptoms and medical histories to provide evidence-based
recommendations. For example, in a case of suspected respiratory infection, ChatGPT can
AI 2024, 5 2642
guide clinicians through relevant diagnostic criteria and management options, thereby
streamlining care delivery and supporting timely decision-making.
In medical education, ChatGPT enhances learning by answering students’ questions
and generating quizzes. This interactive approach deepens students’ understanding of
complex medical concepts, which is crucial for preparing future healthcare professionals.
Studies have shown that students using AI-assisted tools in medical schools perform better
in assessments compared to those who use traditional learning methods.
Moreover, ChatGPT can automate administrative tasks, such as generating clinical
reports and discharge summaries, which reduces the workload on healthcare providers
and improves efficiency in clinical settings. For example, a hospital that implemented
ChatGPT for documentation reported a 30% reduction in administrative workload, allowing
physicians to allocate more time to direct patient care.
Despite these advantages, it is crucial to address the challenges and risks associated
with ChatGPT’s deployment in healthcare. Regular updates are essential to maintain
the model’s accuracy in light of rapid advancements in medical knowledge. Additionally,
ethical oversight mechanisms are needed to mitigate risks related to misinformation, patient
privacy, and potential biases in AI-generated content.
In examining ChatGPT’s effectiveness across different organ systems and specialties,
we observe that its utility varies based on clinical complexity and specialty-specific re-
quirements. For example, in fields such as dermatology and nephrology, ChatGPT has
demonstrated higher accuracy in providing preliminary educational information and as-
sisting in patient self-care. However, in specialties requiring nuanced interpretation, such
as neurology or oncology, ChatGPT may fall short without additional expert oversight.
This comparison underscores the importance of customizing ChatGPT’s use according to
the specific demands of each medical field. One key area of concern involves intellectual
property and regulatory issues surrounding AI in healthcare. Using ChatGPT to generate
medical content raises questions about ownership and authorship, especially regarding
the originality of AI-produced materials. Ensuring that ChatGPT’s outputs comply with
intellectual property laws is essential to maintaining the integrity of healthcare information.
Moreover, compliance with data privacy regulations, such as HIPAA and GDPR, is critical
when ChatGPT processes sensitive patient data. Implementing clear guidelines and robust
data protection measures can safeguard patient information while ensuring the ethical use
of AI in medical practice.
Furthermore, fostering effective AI-human collaboration is essential for maximizing
ChatGPT’s benefits in healthcare. Training healthcare professionals in the appropriate use
of AI tools can enhance decision-making and improve patient outcomes. For instance,
adopting collaborative frameworks where ChatGPT supports clinicians—while maintaining
human oversight as central to patient care—could lead to more accurate diagnoses and
tailored treatment plans.
By addressing these intellectual property, regulatory, and collaboration aspects, we
aim to provide a comprehensive understanding of ChatGPT’s potential, limitations, and
responsible integration in healthcare.
8. Limitations
Despite its potential, the integration of ChatGPT in healthcare presents significant
limitations. A primary concern is the risk of misinformation; outdated or incorrect informa-
tion could lead to ill-informed health decisions and potentially harm patients. Ensuring
the accuracy and reliability of AI-generated information is crucial, especially in critical
healthcare contexts.
While ChatGPT automates customer support, this could result in job losses for admin-
istrative staff. This raises ethical considerations regarding workforce displacement and the
need for retraining programs to help affected employees transition to new roles within the
healthcare system. Balancing automation with human employment is essential to maintain
a skilled and motivated workforce.
AI 2024, 5 2643
Ethical implications are paramount, including patient privacy and informed consent.
Mishandling sensitive data could undermine trust in the healthcare system and violate
regulations such as HIPAA or the General Data Protection Regulation (GDPR). Robust
data protection measures and transparent data handling policies must be implemented to
safeguard patient information. Additionally, clear communication about data usage and
obtaining explicit patient consent are necessary to uphold ethical standards.
Additionally, it cannot provide personalized medical advice, lacking the capacity to
tailor responses based on individual patient histories. This limitation can lead to generic
recommendations that may not be suitable for all patients, potentially resulting in ineffective
or harmful outcomes. Its responses may lack nuance, particularly in mental health contexts,
where empathy is essential. The absence of emotional intelligence in AI models means they
cannot replace the empathetic support provided by human professionals, which is critical
in mental health care.
While capable of assisting with information, it struggles with fact based information,
requiring excessive detail for effective results. Moreover, ChatGPT may sometimes produce
plausible-sounding but incorrect answers, known as “AI hallucinations”, which can mislead
users. Implementing verification mechanisms and cross-referencing AI outputs with trusted
medical sources can help mitigate this issue. Furthermore, it cannot fully replace the critical
thinking of human professionals, especially in rapidly evolving medical fields, as it may
miss crucial developments. AI models rely on their training data up to a certain cutoff date
and may not incorporate the latest research findings or clinical guidelines. Regular updates
and continuous training of the AI model are necessary to keep it current. Additionally,
encouraging a collaborative approach where AI supports but does not replace human
judgment can enhance decision-making while minimizing risks.
AI model biases also pose significant challenges. ChatGPT’s training data may contain
inherent biases that can lead to unequal treatment recommendations or misdiagnosis for
certain populations. For example, minority groups or those with rare conditions might
receive less accurate information due to underrepresentation in the data. Addressing these
biases requires diversifying training datasets and implementing algorithms that detect and
correct biased outputs.
Real-world implementation challenges include integrating ChatGPT into existing
healthcare systems, which may involve technical hurdles such as compatibility with elec-
tronic health records (EHRs) and ensuring data security. Regulatory compliance is another
critical factor. Healthcare AI applications must adhere to regulations like the FDA’s guidelines
on software as a medical device, necessitating rigorous validation and approval processes.
9. Future Directions
ChatGPT is set to significantly impact healthcare, particularly in patient engagement,
diagnostics, and medical education. It can enhance patient interactions by facilitating
real-time communication, providing information on treatment options, and answering
medication queries, thereby improving satisfaction and adherence to treatment plans.
As real-world data is integrated into healthcare systems, future research should focus
on developing advanced algorithms that enable ChatGPT to analyze this information
effectively. This includes refining natural language processing capabilities to interpret
complex medical data and patient histories, supporting physicians in identifying symptom
patterns, and suggesting appropriate diagnostic tests for personalized treatment plans.
Collaborative studies involving data scientists, clinicians, and AI specialists are essential to
create models that are both accurate and clinically relevant.
In medical education, ChatGPT can act as a virtual tutor, offering immediate feedback
on clinical scenarios and generating customized practice questions for exam preparation.
Research could explore the integration of ChatGPT into medical curricula, assessing its
impact on student learning outcomes, knowledge retention, and critical thinking skills.
Pilot programs in educational institutions can provide valuable insights into best practices
for AI-assisted learning.
AI 2024, 5 2644
10. Conclusions
ChatGPT demonstrates substantial potential to transform healthcare by enhancing
communication between patients and providers, supporting clinical decision-making,
and streamlining administrative processes. Its ability to generate structured, human-like
responses equips it to assist with tasks such as drafting medical reports, summarizing
patient interactions, and performing preliminary triage. Moreover, ChatGPT’s capacity to
process extensive medical literature and adverse event data allows it to support healthcare
professionals in recognizing critical trends that contribute to improved patient safety and
quality of care.
However, while the integration of ChatGPT into healthcare offers numerous benefits,
it is crucial to recognize the model’s limitations. ChatGPT cannot replace the expertise and
nuanced judgment of healthcare professionals, especially in complex medical scenarios.
Issues such as potential ethical concerns, data privacy challenges, lack of accountability,
and difficulties in interpreting specialized medical data underscore the need for rigorous
oversight. Reliance on AI-generated responses without expert validation could risk patient
care, particularly when specialized clinical judgment is required.
AI 2024, 5 2645
Future work should focus on evaluating ChatGPT’s performance alongside other large
language models (LLMs) to provide a broader understanding of how different AI tools
perform in healthcare settings. Comparative studies with healthcare-specific LLMs, such
as BioBERT or PubMedBERT, could reveal insights into model strengths, limitations, and
application-specific suitability. Additionally, developing a regulatory framework for the
responsible integration of LLMs into healthcare is essential, with an emphasis on patient
safety, accuracy, and adherence to ethical standards. Addressing these considerations will
enable ChatGPT and similar models to serve as valuable adjuncts to healthcare practice,
complementing human expertise and allowing healthcare professionals to deliver more
efficient, high-quality care.
References
1. Bajwa, J.; Munir, U.; Nori, A.; Williams, B. Artificial intelligence in healthcare: Transforming the practice of medicine. Future
Healthc. J. 2021, 8, e188–e194. [CrossRef] [PubMed]
2. Davenport, T.; Kalakota, R. The potential for artificial intelligence in healthcare. Future Healthc. J. 2019, 6, 94–98. [CrossRef]
[PubMed]
3. Tan, T.; Thirunavukarasu, A.; Campbell, J.; Keane, P.; Pasquale, L.; Abramoff, M.; Kalpathy-Cramer, J.; Lum, F.; Kim, J.; Baxter,
S.; et al. Generative Artificial Intelligence Through ChatGPT and Other Large Language Models in Ophthalmology: Clinical
Applications and Challenges. Ophthalmol. Sci. 2023, 3, 100394. [CrossRef] [PubMed]
4. Liévin, V.; Hother, C.E.; Motzfeldt, A.G.; Winther, O. Can large language models reason about medical questions? arXiv 2023,
arXiv:2207.08143. [CrossRef]
5. Haleem, A.; Javaid, M.; Singh, R.P. An era of ChatGPT as a significant futuristic support tool: A study on features, abilities, and
challenges. Benchcouncil Trans. Benchmarks Stand. Eval. 2022, 2, 100089. [CrossRef]
6. Page, M.J.; Moher, D.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan,
S.E.; et al. PRISMA 2020 explanation and elaboration: Updated guidance and exemplars for reporting systematic reviews. BMJ
2021, 372, n160. [CrossRef]
7. Khurana, D.; Koli, A.; Khatter, K.; Singh, S. Natural language processing: State of the art, current trends and challenges. Multimed.
Tools Appl. 2023, 82, 3713–3744. [CrossRef] [PubMed]
8. Khan, W.; Daud, A.; Khan, K.; Muhammad, S.; Haq, R. Exploring the Frontiers of Deep Learning and Natural Language
Processing: A Comprehensive Overview of Key Challenges and Emerging Trends. Nat. Lang. Process. J. 2023, 4, 100026.
[CrossRef]
9. Tyagi, N.; Bhushan, B. Demystifying the Role of Natural Language Processing (NLP) in Smart City Applications: Background,
Motivation, Recent Advances, and Future Research Directions. Wirel. Pers. Commun. 2023, 130, 857–908. [CrossRef]
10. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need.
arXiv 2023, arXiv:1706.03762. [CrossRef]
11. Mikolov, T.; Karafiát, M.; Burget, L.; Cernockỳ, J.; Khudanpur, S. Recurrent neural network based language model. In Proceedings
of the Interspeech, Makuhari, Japan, 26–30 September 2010; Volume 2, pp. 1045–1048.
12. Graves, A.; Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer:
Berlin/Heidelberg, Germany, 2012; pp. 37–45.
13. Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A survey on evaluation of large
language models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [CrossRef]
14. Alaparthi, S.; Mishra, M. Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey. arXiv
2020, arXiv:2007.01127.
15. Liu, X.; Zheng, Y.; Du, Z.; Ding, M.; Qian, Y.; Yang, Z.; Tang, J. GPT understands, too. AI Open 2023, 5, 208–215. [CrossRef]
AI 2024, 5 2646
16. Gorenstein, L.; Konen, E.; Green, M.; Klang, E. Bidirectional Encoder Representations from Transformers in Radiology: A
Systematic Review of Natural Language Processing Applications. J. Am. Coll. Radiol. 2024, 21, 914–941. [CrossRef]
17. Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805.
18. Shah Jahan, M.; Khan, H.U.; Akbar, S.; Umar Farooq, M.; Gul, S.; Amjad, A. Bidirectional language modeling: A systematic
literature review. Sci. Program. 2021, 2021, 6641832. [CrossRef]
19. Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Domain-specific language model
pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. (HEALTH) 2021, 3, 1–23. [CrossRef]
20. Huang, K.; Altosaar, J.; Ranganath, R. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. arXiv 2020,
arXiv:1904.05342. [CrossRef]
21. Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model
for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [CrossRef]
22. National Library of Medicine (US). UMLS Knowledge Sources. 2024. Available online: http://www.nlm.nih.gov/research/
umls/licensedcontent/umlsknowledgesources.html (accessed on 5 July 2024).
23. Nazi, Z.A.; Peng, W. Large language models in healthcare and medical domain: A review. Informatics 2024, 11, 57. [CrossRef]
24. Fan, L.; Li, L.; Ma, Z.; Lee, S.; Yu, H.; Hemphill, L. A bibliometric review of large language models research from 2017 to 2023.
arXiv 2023, arXiv:2304.02020. [CrossRef]
25. Sanderson, K. GPT-4 is here: What scientists think. Nature 2023, 615, 773. [CrossRef] [PubMed]
26. Radford, A.; Narasimhan, K. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://api.
semanticscholar.org/CorpusID:49313245 (accessed on 5 July 2024).
27. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners. OpenAI
Blog 2019, 1, 9.
28. Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shi, K.; Sastry, G.; Askell, A.; et al.
Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901.
29. Xu, T.; Weng, H.; Liu, F.; Yang, L.; Luo, Y.; Ding, Z.; Wang, Q. Current Status of ChatGPT Use in Medical Education: Potentials,
Challenges, and Strategies. J. Med. Internet Res. 2024, 26, e57896. [CrossRef]
30. Samala, A.D.; Rawas, S. Generative AI as Virtual Healthcare Assistant for Enhancing Patient Care Quality. Int. J. Online Biomed.
Eng. 2024, 20, 174. [CrossRef]
31. Ferdush, J.; Begum, M.; Hossain, S.T. ChatGPT and clinical decision support: Scope, application, and limitations. Ann. Biomed.
Eng. 2024, 52, 1119–1124. [CrossRef]
32. Iftikhar, L.; Iftikhar, M.F.; Hanif, M.I. Docgpt: Impact of chatgpt-3 on health services as a virtual doctor. EC Paediatr. 2023,
12, 45–55.
33. Zheng, Y.; Wang, L.; Feng, B.; Zhao, A.; Wu, Y. Innovating healthcare: The role of ChatGPT in streamlining hospital workflow in
the future. Ann. Biomed. Eng. 2024, 52, 750–753. [CrossRef]
34. Awal, S.; Awal, S. ChatGPT and the healthcare industry: A comprehensive analysis of its impact on medical writing. J. Public
Health 2023. [CrossRef]
35. Temsah, O.; Khan, S.A.; Chaiah, Y.; Senjab, A.; Alhasan, K.; Jamal, A.; Aljamaan, F.; Malki, K.H.; Halwani, R.; Al-Tawfiq, J.A.; et al.
Overview of early ChatGPT’s presence in medical literature: Insights from a hybrid literature review by ChatGPT and human
experts. Cureus 2023, 15, e37281. [CrossRef] [PubMed]
36. Parray, A.A.; Inam, Z.M.; Ramonfaur, D.; Haider, S.S.; Mistry, S.K.; Pandya, A.K. ChatGPT and global public health: Applications,
challenges, ethical considerations and mitigation strategies. Glob. Transit. 2023, 5, 50–54. [CrossRef]
37. Abd Karim, R.; Cakir, G.K. Investigating ChatGPT Usability in Promoting Smart Health Awareness. In Industry 5.0 for Smart
Healthcare Technologies; CRC Press: Boca Raton, FL, USA, 2024; pp. 227–237.
38. Baldwin, A.J. An artificial intelligence language model improves readability of burns first aid information. Burns 2024,
50, 1122–1127. [CrossRef]
39. Neha, F. Kidney Localization and Stone Segmentation from a CT Scan Image. In Proceedings of the 2023 7th International
Conference On Computing, Communication, Control And Automation (ICCUBEA), Pune, India, 18–19 August 2023; pp. 1–6.
40. Neha, F.; Bansal, A.K. Multi-Layer Feature Fusion with Cross-Channel Attention-Based U-Net for Kidney Tumor Segmentation.
arXiv 2024, arXiv:2410.15472.
41. Neha, F.; Bansal, A.K. Convnext-PCA: A Parameter-Efficient Model for Accurate Kidney Abnormality Classification. In
Proceedings of the 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP), London, UK,
22–25 September 2024; pp. 1–6. [CrossRef]
42. Choi, J.; Kim, J.W.; Lee, Y.S.; Tae, J.H.; Choi, S.Y.; Chang, I.H.; Kim, J.H. Availability of ChatGPT to provide medical information
for patients with kidney cancer. Sci. Rep. 2024, 14, 1542. [CrossRef]
43. Miao, J.; Thongprayoon, C.; Craici, I.M.; Cheungpasitporn, W. How to improve ChatGPT performance for nephrologists: A
technique guide. J. Nephrol. 2024, 37, 1397–1403. [CrossRef] [PubMed]
44. Janus, N. A Comparative Analysis of Chatgpt Vs Expert in Managing Anticancer Drug in Patients Renal Insufficiency. Blood 2023,
142, 7186. [CrossRef]
AI 2024, 5 2647
45. Łaszkiewicz, J.; Krajewski, W.; Tomczak, W.; Chorbińska, J.; Nowak, Ł.; Chełmoński, A.; Krajewski, P.; Sójka, A.; Małkiewicz, B.;
Szydełko, T. Performance of ChatGPT in providing patient information about upper tract urothelial carcinoma. Contemp. Oncol.
Onkol. 2024, 28, 172–181. [CrossRef]
46. Javid, M.; Bhandari, M.; Parameshwari, P.; Reddiboina, M.; Prasad, S. Evaluation of ChatGPT for patient counseling in kidney
stone clinic: A prospective study. J. Endourol. 2024, 38, 377–383. [CrossRef]
47. Miao, J.; Thongprayoon, C.; Suppadungsuk, S.; Garcia Valencia, O.A.; Qureshi, F.; Cheungpasitporn, W. Innovating personalized
nephrology care: Exploring the potential utilization of ChatGPT. J. Pers. Med. 2023, 13, 1681. [CrossRef] [PubMed]
48. Qarajeh, A.; Tangpanithandee, S.; Thongprayoon, C.; Suppadungsuk, S.; Krisanapan, P.; Aiumtrakul, N.; Garcia Valencia, O.A.;
Miao, J.; Qureshi, F.; Cheungpasitporn, W. AI-Powered Renal Diet Support: Performance of ChatGPT, Bard AI, and Bing Chat.
Clin. Pract. 2023, 13, 1160–1172. [CrossRef] [PubMed]
49. German, R.Z.; Palmer, J.B. Anatomy and development of oral cavity and pharynx. GI Motil. Online 2006. [CrossRef]
50. Lechien, J.R.; Georgescu, B.M.; Hans, S.; Chiesa-Estomba, C.M. ChatGPT performance in laryngology and head and neck surgery:
A clinical case-series. Eur. Arch.-Oto-Rhino-Laryngol. 2024, 281, 319–333. [CrossRef]
51. Aaronson, P.I.; Ward, J.P.; Connolly, M.J. The Cardiovascular System at a Glance; John Wiley & Sons: Hoboken, NJ, USA, 2020.
52. Lautrup, A.D.; Hyrup, T.; Schneider-Kamp, A.; Dahl, M.; Lindholt, J.S.; Schneider-Kamp, P. Heart-to-heart with ChatGPT: The
impact of patients consulting AI for cardiovascular health advice. Open Heart 2023, 10, e002455. [CrossRef]
53. Anaya, F.; Prasad, R.; Bashour, M.; Yaghmour, R.; Alameh, A.; Blakumaran, K. Evaluating ChatGPT platform in delivering heart
failure educational material: A comparison with the leading national cardiology institutes. Curr. Probl. Cardiol. 2024, 49, 102797.
[CrossRef] [PubMed]
54. King, R.C.; Samaan, J.S.; Yeo, Y.H.; Mody, B.; Lombardo, D.M.; Ghashghaei, R. Appropriateness of ChatGPT in answering heart
failure related questions. Heart Lung Circ. 2024, 33, 1314–1318. [CrossRef] [PubMed]
55. Bulboacă, A.I.; Borlea, B.; Bulboacă, A.E.; Stănescu, I.C.; Bolboacă, S.D. Exploring ChatGPT’s Efficacy in Pathophysiological
Analysis: A Comparative Study of Ischemic Heart Disease and Anaphylactic Shock Cases. Appl. Med. Inform. 2024, 46, 16–28.
56. Chlorogiannis, D.D.; Apostolos, A.; Chlorogiannis, A.; Palaiodimos, L.; Giannakoulas, G.; Pargaonkar, S.; Xesfingi, S.; Kokkinidis,
D.G. The role of ChatGPT in the advancement of diagnosis, management, and prognosis of cardiovascular and cerebrovascular
disease. Healthcare 2023, 11, 2906. [CrossRef]
57. Ayub, M.; Mallamaci, A. An Introduction: Overview of Nervous System and Brain Disorders. In The Role of Natural Antioxidants
in Brain Disorders; Springer: Cham, Switzerland, 2023; pp. 1–24.
58. Kozel, G.; Gurses, M.E.; Gecici, N.N.; Gökalp, E.; Bahadir, S.; Merenzon, M.A.; Shah, A.H.; Komotar, R.J.; Ivan, M.E. Chat-GPT on
brain tumors: An examination of Artificial Intelligence/Machine Learning’s ability to provide diagnoses and treatment plans for
example neuro-oncology cases. Clin. Neurol. Neurosurg. 2024, 239, 108238. [CrossRef]
59. Adesso, G. Towards the ultimate brain: Exploring scientific discovery with ChatGPT AI. AI Mag. 2023, 44, 328–342. [CrossRef]
60. Fei, X.; Tang, Y.; Zhang, J.; Zhou, Z.; Yamamoto, I.; Zhang, Y. Evaluating cognitive performance: Traditional methods vs. ChatGPT.
Digit. Health 2024, 10, 20552076241264639. [CrossRef]
61. Al-Suhaimi, E.A.; Khan, F.A. Thyroid glands: Physiology and structure. In Emerging Concepts in Endocrine Structure and Functions;
Springer: Berlin/Heidelberg, Germany, 2022; pp. 133–160.
62. Köroğlu, E.Y.; Fakı, S.; Beştepe, N.; Tam, A.A.; Seyrek, N.Ç.; Topaloglu, O.; Ersoy, R.; Cakir, B. A novel approach: Evaluating
ChatGPT’s utility for the management of thyroid nodules. Cureus 2023, 15, e47576. [CrossRef] [PubMed]
63. Sievert, M.; Conrad, O.; Mueller, S.K.; Rupp, R.; Balk, M.; Richter, D.; Mantsopoulos, K.; Iro, H.; Koch, M. Risk stratification of
thyroid nodules: Assessing the suitability of ChatGPT for text-based analysis. Am. J. Otolaryngol. 2024, 45, 104144. [CrossRef]
[PubMed]
64. Stevenson, E.; Walsh, C.; Hibberd, L. Can artificial intelligence replace biochemists? A study comparing interpretation of thyroid
function test results by ChatGPT and Google Bard to practising biochemists. Ann. Clin. Biochem. 2024, 61, 143–149. [CrossRef]
[PubMed]
65. Helvaci, B.C.; Hepsen, S.; Candemir, B.; Boz, O.; Durantas, H.; Houssein, M.; Cakal, E. Assessing the accuracy and reliability of
ChatGPT’s medical responses about thyroid cancer. Int. J. Med. Inform. 2024, 191, 105593. [CrossRef]
66. Cazzato, G.; Capuzzolo, M.; Parente, P.; Arezzo, F.; Loizzi, V.; Macorano, E.; Marzullo, A.; Cormio, G.; Ingravallo, G. Chat GPT in
diagnostic human pathology: Will it be useful to pathologists? A preliminary review with ‘query session’and future perspectives.
AI 2023, 4, 1010–1022. [CrossRef]
67. Alamri, Z.Z. The role of liver in metabolism: An updated review with physiological emphasis. Int. J. Basic Clin. Pharmacol. 2018,
7, 2271–2276. [CrossRef]
68. Yeo, Y.H.; Samaan, J.S.; Ng, W.H.; Ting, P.S.; Trivedi, H.; Vipani, A.; Ayoub, W.; Yang, J.D.; Liran, O.; Spiegel, B.; et al. Assessing
the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin. Mol. Hepatol. 2023,
29, 721. [CrossRef] [PubMed]
69. Yeo, Y.H.; Samaan, J.S.; Ng, W.H.; Ma, X.; Ting, P.S.; Kwak, M.S.; Panduro, A.; Lizaola-Mayo, B.; Trivedi, H.; Vipani, A.; et al.
GPT-4 outperforms ChatGPT in answering non-English questions related to cirrhosis. medRxiv 2023. [CrossRef]
70. Shahsavari, D.; Parkman, H.P. Normal gastrointestinal tract physiology. In Nutrition, Weight, and Digestive Health: The Clinician’s
Desk Reference; Springer: Berlin/Heidelberg, Germany, 2022; pp. 3–28.
AI 2024, 5 2648
71. Cankurtaran, R.E.; Polat, Y.H.; Aydemir, N.G.; Umay, E.; Yurekli, O.T. Reliability and usefulness of ChatGPT for inflammatory
bowel diseases: An analysis for patients and healthcare professionals. Cureus 2023, 15, e46736. [CrossRef]
72. Ma, Y. The potential application of ChatGPT in gastrointestinal pathology. Gastroenterol. Endosc. 2023, 1, 130–131. [CrossRef]
73. Liu, X.; Wang, Y.; Huang, Z.; Xu, B.; Zeng, Y.; Chen, X.; Wang, Z.; Yang, E.; Lei, X.; Huang, Y.; et al. The Application of ChatGPT
in Responding to Questions Related to the Boston Bowel Preparation Scale. arXiv 2024, arXiv:2402.08492.
74. Chang, E.B.; Leung, P.S. Pancreatic physiology. In The Gastrointestinal System: Gastrointestinal, Nutritional and Hepatobiliary
Physiology; Springer: Berlin/Heidelberg, Germany, 2014; pp. 87–105.
75. Du, R.C.; Liu, X.; Lai, Y.K.; Hu, Y.X.; Deng, H.; Zhou, H.Q.; Lu, N.H.; Zhu, Y.; Hu, Y. Exploring the performance of ChatGPT on
acute pancreatitis-related questions. J. Transl. Med. 2024, 22, 527. [CrossRef] [PubMed]
76. Qiu, J.; Luo, L.; Zhou, Y. Accuracy of ChatGPT3. 5 in answering clinical questions on guidelines for severe acute pancreatitis.
BMC Gastroenterol. 2024, 24, 260. [CrossRef] [PubMed]
77. Lorenzo, A.J.; Bagli, D. Basic science of the urinary bladder. In Clinical Pediatric Urology; Informa Healthcare: London, UK, 2007.
78. Guo, A.A.; Razi, B.; Kim, P.; Canagasingham, A.; Vass, J.; Chalasani, V.; Rasiah, K.; Chung, A. The Role of Artificial Intelligence in
Patient Education: A Bladder Cancer Consultation with ChatGPT. Soc. Int. D’Urologie J. 2024, 5, 214–224. [CrossRef]
79. Braga, A.V.N.M.; Nunes, N.C.; Santos, E.N.; Veiga, M.L.; Braga, A.A.N.M.; Abreu, G.E.d.; Bessa, J.d.; Braga, L.H.; Kirsch, A.J.;
Barroso, U. Use of ChatGPT in Urology and its Relevance in Clinical Practice: Is it useful? Int. Braz. J. Urol. 2024, 50, 192–198.
[CrossRef]
80. Ozgor, F.; Caglar, U.; Halis, A.; Cakir, H.; Aksu, U.C.; Ayranci, A.; Sarilar, O. Urological cancers and ChatGPT: Assessing the
quality of information and possible risks for patients. Clin. Genitourin. Cancer 2024, 22, 454–457. [CrossRef]
81. Cakir, H.; Caglar, U.; Yildiz, O.; Meric, A.; Ayranci, A.; Ozgor, F. Evaluating the performance of ChatGPT in answering questions
related to urolithiasis. Int. Urol. Nephrol. 2024, 56, 17–21. [CrossRef]
82. Sagir, S. Evaluating the accuracy of ChatGPT addressing urological questions: A pilot study. J. Clin. Trials Exp. Investig. 2022,
1, 119–123.
83. Cakir, H.; Caglar, U.; Sekkeli, S.; Zerdali, E.; Sarilar, O.; Yildiz, O.; Ozgor, F. Evaluating ChatGPT ability to answer urinary tract
Infection-Related questions. Infect. Dis. Now 2024, 54, 104884. [CrossRef]
84. Szczesniewski, J.J.; Tellez Fouz, C.; Ramos Alba, A.; Diaz Goizueta, F.J.; García Tello, A.; Llanes González, L. ChatGPT and
most frequent urological diseases: Analysing the quality of information and potential risks for patients. World J. Urol. 2023,
41, 3149–3153. [CrossRef] [PubMed]
85. Mashinini, M. Pituitary gland and growth hormone. S. Afr. J. Anaesth. Analg. 2020, 26, S109–S112. [CrossRef]
86. Sambangi, A.; Carreras, A.; Campbell, D.; Bray, D.; Evans, J.J. Evaluating Chatgpt for Patient Education Regarding Pituitary
Adenoma Resection. J. Neurol. Surg. Part B Skull Base 2024, 85, P216.
87. Şenoymak, M.C.; Erbatur, N.H.; Şenoymak, İ.; Fırat, S.N. The Role of Artificial Intelligence in Endocrine Management: Assessing
ChatGPT’s Responses to Prolactinoma Queries. J. Pers. Med. 2024, 14, 330. [CrossRef] [PubMed]
88. Taşkaldıran, I.; Emir Önder, Ç.; Gökbulut, P.; Koç, G.; Kuşkonmaz, Ş.M. Evaluation of the accuracy and quality of ChatGPT-
4 responses for hyperparathyroidism patients discussed at multidisciplinary endocrinology meetings. Digit. Health 2024,
10, 20552076241278692. [CrossRef]
89. Hafez, B.; Hafez, E. Anatomy of female reproduction. In Reproduction in Farm Animals; Wiley: Hoboken, NJ, USA, 2000; pp. 13–29.
90. Patel, J.M.; Hermann, C.E.; Growdon, W.B.; Aviki, E.; Stasenko, M. ChatGPT accurately performs genetic counseling for
gynecologic cancers. Gynecol. Oncol. 2024, 183, 115–119. [CrossRef]
91. Peled, T.; Sela, H.Y.; Weiss, A.; Grisaru-Granovsky, S.; Agrawal, S.; Rottenstreich, M. Evaluating the validity of ChatGPT responses
on common obstetric issues: Potential clinical applications and implications. Int. J. Gynecol. Obstet. 2024, 166, 1127–1133.
[CrossRef] [PubMed]
92. Psilopatis, I.; Bader, S.; Krueckel, A.; Kehl, S.; Beckmann, M.W.; Emons, J. Can Chat-GPT read and understand guidelines? An
example using the S2k guideline intrauterine growth restriction of the German Society for Gynecology and Obstetrics. Arch.
Gynecol. Obstet. 2024, 310, 2425–2437. [CrossRef] [PubMed]
93. Winograd, D.; Alterman, C.; Appelbaum, H.; Baum, J. 51. Evaluation of ChatGPT Responses to Common Puberty Questions.
J. Pediatr. Adolesc. Gynecol. 2024, 37, 261. [CrossRef]
94. Edwards, S. The skin. In Essential Pathophysiology For Nursing And Healthcare Students; McGraw-Hill Education: Maidenhead, UK,
2014; p. 431.
95. Lantz, R. Toxic epidermal necrolysis in a critically ill African American woman: A Case Report Written with ChatGPT Assistance.
Cureus 2023, 15, e35742. [CrossRef]
96. Sanchez-Zapata, M.J.; Rios-Duarte, J.A.; Orduz-Robledo, M.; Motta, A. 53670 Evaluating ChatGPT answers to frequently asked
questions from patients with inflammatory skin diseases in a physician-patient context. J. Am. Acad. Dermatol. 2024, 91, AB205.
[CrossRef]
97. Passby, L.; Jenko, N.; Wernham, A. Performance of ChatGPT on Specialty Certificate Examination in Dermatology multiple-choice
questions. Clin. Exp. Dermatol. 2024, 49, 722–727. [CrossRef] [PubMed]
98. Stoneham, S.; Livesey, A.; Cooper, H.; Mitchell, C. ChatGPT versus clinician: Challenging the diagnostic capabilities of artificial
intelligence in dermatology. Clin. Exp. Dermatol. 2024, 49, 707–710. [CrossRef] [PubMed]
AI 2024, 5 2649
99. Mondal, H.; Mondal, S.; Podder, I. Using ChatGPT for writing articles for patients’ education for dermatological diseases: A pilot
study. Indian Dermatol. Online J. 2023, 14, 482–486. [CrossRef]
100. Mat Lazim, N. Introduction to Head and Neck Surgery. In Head and Neck Surgery: Surgical Landmark and Dissection Guide; Springer:
Berlin/Heidelberg, Germany, 2022; pp. 1–23.
101. Vaira, L.A.; Lechien, J.R.; Abbate, V.; Allevi, F.; Audino, G.; Beltramini, G.A.; Bergonzani, M.; Bolzoni, A.; Committeri, U.; Crimi,
S.; et al. Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: A multicenter collaborative
analysis. Otolaryngol.–Head Neck Surg. 2024, 170, 1492–1503. [CrossRef]
102. Nischwitz, D. It’s All in Your Mouth: Biological Dentistry and the Surprising Impact of Oral Health on Whole Body Wellness; Chelsea
Green Publishing: London, UK, 2020.
103. Hassona, Y.; Alqaisi, D.; Alaa, A.H.; Georgakopoulou, E.A.; Malamos, D.; Alrashdan, M.S.; Sawair, F. How good is ChatGPT at
answering patients’ questions related to early detection of oral (mouth) cancer? Oral Surg. Oral Med. Oral Pathol. Oral Radiol.
2024, 138, 269–278. [CrossRef]
104. Babayiğit, O.; Eroglu, Z.T.; Sen, D.O.; Yarkac, F.U. Potential use of ChatGPT for Patient Information in Periodontology: A
descriptive pilot study. Cureus 2023, 15, e48518. [CrossRef] [PubMed]
105. Mago, J.; Sharma, M. The potential usefulness of ChatGPT in oral and maxillofacial radiology. Cureus 2023, 15, e42133. [CrossRef]
106. Puladi, B.; Gsaxner, C.; Kleesiek, J.; Hölzle, F.; Röhrig, R.; Egger, J. The impact and opportunities of large language models like
ChatGPT in oral and maxillofacial surgery: A narrative review. Int. J. Oral Maxillofac. Surg. 2024, 53, 78–88. [CrossRef]
107. Fahey, J. Optimising lung health. J. Aust.-Tradit.-Med. Soc. 2020, 26, 142–147.
108. Fink, M.A.; Bischoff, A.; Fink, C.A.; Moll, M.; Kroschke, J.; Dulz, L.; Heußel, C.P.; Kauczor, H.U.; Weber, T.F. Potential of ChatGPT
and GPT-4 for data mining of free-text CT reports on lung cancer. Radiology 2023, 308, e231362. [CrossRef] [PubMed]
109. Rahsepar, A.A.; Tavakoli, N.; Kim, G.H.J.; Hassani, C.; Abtin, F.; Bedayat, A. How AI responds to common lung cancer questions:
ChatGPT versus Google Bard. Radiology 2023, 307, e230922. [CrossRef]
110. Nakamura, Y.; Kikuchi, T.; Yamagishi, Y.; Hanaoka, S.; Nakao, T.; Miki, S.; Yoshikawa, T.; Abe, O. ChatGPT for automating lung
cancer staging: Feasibility study on open radiology report dataset. medRxiv 2023, . [CrossRef]
111. Lee, J.E.; Park, K.S.; Kim, Y.H.; Song, H.C.; Park, B.; Jeong, Y.J. Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text
Reports: Comparison Among Three ChatGPT Large-Language Models and Six Human Readers of Varying Experience. Am. J.
Roentgenol. 2024. [CrossRef]
112. Schulte, B. Capacity of ChatGPT to identify guideline-based treatments for advanced solid tumors. Cureus 2023, 15, e37938.
[CrossRef] [PubMed]
113. White, T.D.; Folkens, P.A. The Human Bone Manual; Elsevier: Amsterdam, The Netherlands, 2005.
114. Ghanem, D.; Shu, H.; Bergstein, V.; Marrache, M.; Love, A.; Hughes, A.; Sotsky, R.; Shafiq, B. Educating patients on osteoporosis
and bone health: Can “ChatGPT” provide high-quality content? Eur. J. Orthop. Surg. Traumatol. 2024, 34, 2757–2765. [CrossRef]
115. Son, H.J.; Kim, S.J.; Pak, S.; Lee, S.H. ChatGPT-assisted deep learning for diagnosing bone metastasis in bone scans: Bridging the
AI Gap for Clinicians. Heliyon 2023, 9, e22409. [CrossRef] [PubMed]
116. Cinar, C. Analyzing the performance of ChatGPT about osteoporosis. Cureus 2023, 15, e45890. [CrossRef]
117. Yang, F.; Yan, D.; Wang, Z. Large-Scale assessment of ChatGPT’s performance in benign and malignant bone tumors imaging
report diagnosis and its potential for clinical applications. J. Bone Oncol. 2024, 44, 100525. [CrossRef]
118. Kell, R.T.; Bell, G.; Quinney, A. Musculoskeletal fitness, health outcomes and quality of life. Sport. Med. 2001, 31, 863–873.
[CrossRef]
119. Sawamura, S.; Kohiyama, K.; Takenaka, T.; Sera, T.; Inoue, T.; Nagai, T. Performance of ChatGPT 4.0 on Japan’s National Physical
Therapist Examination: A Comprehensive Analysis of Text and Visual Question Handling. Cureus 2024, 16, e67347. [CrossRef]
[PubMed]
120. Agarwal, M.; Sharma, P.; Goswami, A. Analysing the applicability of ChatGPT, Bard, and Bing to generate reasoning-based
multiple-choice questions in medical physiology. Cureus 2023, 15, e40977. [CrossRef] [PubMed]
121. Saluja, S.; Tigga, S.R. Capabilities and Limitations of ChatGPT in Anatomy Education: An Interaction with ChatGPT. Cureus 2024,
16, e69000. [CrossRef]
122. Kaarre, J.; Feldt, R.; Keeling, L.E.; Dadoo, S.; Zsidai, B.; Hughes, J.D.; Samuelsson, K.; Musahl, V. Exploring the potential
of ChatGPT as a supplementary tool for providing orthopaedic information. Knee Surg. Sport. Traumatol. Arthrosc. 2023,
31, 5190–5198. [CrossRef]
123. Meng, D.; He, S.; Wei, M.; Lv, Z.; Guo, H.; Yang, G.; Wang, Z. Enhanced predicting genu valgum through integrated feature
extraction: Utilizing ChatGPT with body landmarks. Biomed. Signal Process. Control 2024, 97, 106676. [CrossRef]
124. Mantzou, N.; Ediaroglou, V.; Drakonaki, E.; Syggelos, S.A.; Karageorgos, F.F.; Totlis, T. ChatGPT efficacy for answering
musculoskeletal anatomy questions: A study evaluating quality and consistency between raters and timepoints. Surg. Radiol.
Anat. 2024, 46, 1885–1890. [CrossRef]
125. Li, J.; You, M.; Chen, X.; Li, P.; Deng, Q.; Wang, K.; Wang, L.; Xu, Y.; Liu, D.; Ye, L.; et al. ChatGPT-4 and Wearable Device
Assisted Intelligent Exercise Therapy for Co-existing Sarcopenia and Osteoarthritis (GAISO): A feasibility study and design for a
randomized controlled PROBE non-inferiority trial. J. Orthop. Surg. Res. 2024, 19, 635.
AI 2024, 5 2650
126. Walker, H.L.; Ghani, S.; Kuemmerli, C.; Nebiker, C.A.; Müller, B.P.; Raptis, D.A.; Staubli, S.M. Reliability of Medical Information
Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument. J. Med. Internet Res.
2023, 25, e47479. [CrossRef]
127. Liaw, W.; Chavez, S.; Pham, C.; Tehami, S.; Govender, R. The Hazards of Using ChatGPT: A Call to Action for Medical Education
Researchers. PRiMER 2023, 7, 27. [CrossRef]
128. Wang, C.; Liu, S.; Yang, H.; Guo, J.; Wu, Y.; Liu, J. Ethical Considerations of Using ChatGPT in Health Care. J. Med. Internet Res.
2023, 25, e48009. [CrossRef] [PubMed]
129. Si, Y.; Yang, Y.; Wang, X.; Zu, J.; Chen, X.; Fan, X.; An, R.; Gong, S. Quality and Accountability of ChatGPT in Health Care in Low-
and Middle-Income Countries: Simulated Patient Study. J. Med. Internet Res. 2024, 26, e56121. [CrossRef] [PubMed]
130. Baumgartner, C.; Baumgartner, D. A regulatory challenge for natural language processing (NLP)-based tools such as ChatGPT to
be legally used for healthcare decisions. Where are we now? Clin. Transl. Med. 2023, 13, e1362. [CrossRef] [PubMed]
131. Goh, E.; Bunning, B.; Khoong, E.; Gallo, R.; Milstein, A.; Centola, D.; Chen, J.H. ChatGPT Influence on Medical Decision-Making,
Bias, and Equity: A Randomized Study of Clinicians Evaluating Clinical Vignettes. medRxiv 2023. [CrossRef]
132. Palaniappan, K.; Lin, E.Y.T.; Vogel, S. Global Regulatory Frameworks for the Use of Artificial Intelligence (AI) in the Healthcare
Services Sector. Healthcare 2024, 12, 562. [CrossRef] [PubMed]
133. Shieh, A.; Tran, B.; He, G.; Kumar, M.; Freed, J.A.; Majety, P. Assessing ChatGPT 4.0’s Test Performance and Clinical Diagnostic
Accuracy on USMLE STEP 2 CK and Clinical Case Reports. Sci. Rep. 2024, 14, 9330. [CrossRef]
134. Liu, J.; Wang, C.; Liu, S. Utility of ChatGPT in Clinical Practice. J. Med. Internet Res. 2023, 25, e48568. [CrossRef]
135. Teixeira da Silva, J.A. Can ChatGPT rescue or assist with language barriers in healthcare communication? Patient Educ. Couns.
2023, 115, 107940. [CrossRef]
136. Liu, Z.; Zhang, L.; Wu, Z.; Yu, X.; Cao, C.; Dai, H.; Liu, N.; Liu, J.; Liu, W.; Li, Q.; et al. Surviving ChatGPT in healthcare. Front.
Radiol. 2024, 3, 1224682. [CrossRef]
137. Sedaghat, S. Plagiarism and Wrong Content as Potential Challenges of Using Chatbots Like ChatGPT in Medical Research. J.
Acad. Ethics 2024. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.