Chat GPT For The Management of Obstructive Sleep Apnea: Do We Have A Polar Star?
Chat GPT For The Management of Obstructive Sleep Apnea: Do We Have A Polar Star?
Chat GPT For The Management of Obstructive Sleep Apnea: Do We Have A Polar Star?
https://doi.org/10.1007/s00405-023-08270-9
Abstract
Purpose This study explores the potential of the Chat-Generative Pre-Trained Transformer (Chat-GPT), a Large Language
Model (LLM), in assisting healthcare professionals in the diagnosis of obstructive sleep apnea (OSA). It aims to assess the
agreement between Chat-GPT's responses and those of expert otolaryngologists, shedding light on the role of AI-generated
content in medical decision-making.
Methods A prospective, cross-sectional study was conducted, involving 350 otolaryngologists from 25 countries who
responded to a specialized OSA survey. Chat-GPT was tasked with providing answers to the same survey questions.
Responses were assessed by both super-experts and statistically analyzed for agreement.
Results The study revealed that Chat-GPT and expert responses shared a common answer in over 75% of cases for indi-
vidual questions. However, the overall consensus was achieved in only four questions. Super-expert assessments showed a
moderate agreement level, with Chat-GPT scoring slightly lower than experts. Statistically, Chat-GPT's responses differed
significantly from experts' opinions (p = 0.0009). Sub-analysis revealed areas of improvement for Chat-GPT, particularly in
questions where super-experts rated its responses lower than expert consensus.
Conclusions Chat-GPT demonstrates potential as a valuable resource for OSA diagnosis, especially where access to special-
ists is limited. The study emphasizes the importance of AI-human collaboration, with Chat-GPT serving as a complementary
tool rather than a replacement for medical professionals. This research contributes to the discourse in otolaryngology and
encourages further exploration of AI-driven healthcare applications. While Chat-GPT exhibits a commendable level of
consensus with expert responses, ongoing refinements in AI-based healthcare tools hold significant promise for the future
of medicine, addressing the underdiagnosis and undertreatment of OSA and improving patient outcomes.
13
Vol.:(0123456789)
European Archives of Oto-Rhino-Laryngology
Methods
13
European Archives of Oto-Rhino-Laryngology
“super-experts” rated all expert’s responses at a value of 4/5 The results of our study, as presented in Tables 1 and
or more, while this rating was achieved only for 6 Chat-GPT 2, reveal a moderate global degree of consensus between
responses. Chat-GPT and the expert panel. In four questions the level
The mean agreement level, as determined by the super- of agreement between Chat-GPT and experts was high while
experts using the Likert scale for Chat-GPT's responses, in the remaining questions agreement was significantly
was 4.07 (Minimum 1; Maximum 5; Standard Deviation lower. The consensus answers for the ten survey questions
1.22). For the experts, the mean agreement level was 4.56 demonstrate that Chat-GPT might be capable of providing
(Minimum 2; Maximum 5; Standard Deviation 0.78). Nota- responses that align with those of human experts but still
bly, there was a significant difference between these val- needs improvement.
ues (p = 0.0009, as determined by a student t test). Detailed Moreover, our findings indicate that the level of agree-
agreement data for each question can be found in Table 3. ment between Chat-GPT and experts, as assessed by the
The kappa coefficient of agreement between super-experts super-experts, is substantial. The mean agreement levels,
for expert response assessment was R = 0.44 (CI95% [0.30; represented by a Likert scale, were 4.07 for Chat-GPT and
0.58]). For ChatGPT response assessment, the kappa coef- 4.56 for the experts, with the latter showing slightly higher
ficient of agreement was R = 0.17 ([0.03; 0.30]). agreement levels. However, it is important to note that the
differences in agreement between Chat-GPT and experts
were statistically significant (p = 0.0009). This suggests that
Discussion while Chat-GPT's responses are generally in concordance
with expert opinions, there are instances where distinctions
The integration of LLMs, particularly Chat-GPT, into exist.
the field of medicine has shown great promise, offering These distinctions may arise from the inherent limitations
the potential to revolutionize the way healthcare profes- of AI models, including their reliance on data patterns and
sionals access and utilize medical knowledge [7, 8]. This the potential absence of clinical intuition.
study aimed to explore the applicability of Chat-GPT in The data presented in Table 3 provide valuable insights
the domain of obstructive sleep apnea (OSA), a significant into the super-expert assessments of Chat-GPT's answers
health concern associated with various comorbidities and yet compared to experts' consensual answers for each of the ten
often underdiagnosed and undertreated [9, 10]. survey questions.
13
European Archives of Oto-Rhino-Laryngology
1. Which of the following solutions would you (A) Septoplasty C (66 [68%]) B,D
take into consideration for a patient with moder- (B) Lateral Pharyngoplasty D (62 [63.9%])
ate to severe Obstructive Sleep Apnea, without (C) Skeletal surgery
tonsils, who refuses to use continuous positive (D) Hypoglossal nerve stimulation
airway pressure (CPAP) treatment, taking into
account the percentage of success and adherence
to treatment reported in literature? (multiple
choices are possible)
2. In which of the following cases is Drug Induced (A) To look for an alternative treatment to con- A (87 [89.7%)]) B,C
Sleep Endoscopy (DISE) indicated? (multiple tinuous positive airway pressure (CPAP) B (90 [92.8%])
choices are possible) (B) Surgical treatment failure
(C) Medical treatment failure
(D) All of the above
3. Which of the following surgical therapeutic (A) Transoral Robotic Surgery (TORS) A (75 [77.3%]) A,B,C,D
indications would you choose in an adult patient (B) Soft Palate Surgery D (59 [60.8%])
with severe Obstructive Sleep Apnea and severe (C) Maxillomandibular advancement (MMA)
base of tongue hypertrophy, who does not toler- (D) Multilevel surgery
ate continuous positive airway pressure (CPAP)
treatment? (multiple choices are possible)
4. Which of the following therapeutic indications (A) Lateral/Circular Pharyngoplasty A (80 [82.5%]) A,C
would you choose in an adult patient, diagnosed (B) Mandibular Advancement Device (MAD) B (43 [44.3%])
with Obstructive Sleep Apnea, who does not (C) Hypoglossal nerve stimulation
tolerate continuous positive airway pressure D) A + B
(CPAP) treatment and in whom in the Drug
Induced Sleep Endoscopy (DISE) you observe
a Complete Circular Collapse (CCC) at the
retropalatal area?. (only one choice)
5. Which of the following sentences regarding (A) It is, by itself, the treatment for sleep apnea A (81 [83.8%]) A
nasal surgery, in the context of treatment of (B) Helps to improve the adherence to continuous
Obstructive Sleep Apnea, is false? positive airway pressure (CPAP) treatment
(C) Improves the adherence to Mandibular
Advancement Device (MAD)
D) Improves the outcome of a Multilevel surgery
6. In a 5-Year-old patient who has already under- (A) Lingual tonsillectomy under Transoral B (79 [81.4%]) B,D
gone adenotonsillectomy 1 year ago and keeps Robotic Surgery (TORS) C (80 [82.5%])
snoring with apneas, what would be your next (B) Drug Induced Sleep Endoscopy (DISE)
step? (multiple choices are possible) (C) Polysomnography
(D) Maxillomandibular advancement (MMA)
7. In an adult patient, with Retrognathia, small (A) Functional Septoplasty B (91 [93.8%]) B
tonsils, Macroglossia and BMI of 23. Which of (B) Maxillomandibular advancement (MMA)
the following surgical treatments do you think is (C) Lingual tonsillectomy under Transoral
the most adequate? Robotic Surgery (TORS)
D) Barbed reposition Pharyngoplasty
8. In an adult patient with a Body Mass Index (A) Barbed reposition pharyngoplasty C (95 [97.9%]) C
(BMI) of 42, with severe Obstructive Sleep (B) Adenotonsillectomy
Apnea, no nasal obstruction and a poor adher- (C) Bariatric surgery
ence to continuous positive airway pressure (D) Septoturbinoplasty
(CPAP) therapy, which of the following surgical
indications would you choose as a first line of
treatment?
9. Which of the following surgical indications (A) Lateral pharyngoplasty A (92 [94.8%]) A
would you choose in an adult patient with severe (B) Lingual tonsillectomy under Transoral
Obstructive Sleep Apnea and normal weight Robotic Surgery (TORS)
(Body Mass Index of 23), in whom a Lateral (C) Anterior pharyngoplasty
wall collapse at the level of the oropharynx, (D) Multilevel surgery
without any retrobasilingual collapse nor septal
deviation is observed during Drug Induced
Sleep Endoscopy (DISE)?
13
European Archives of Oto-Rhino-Laryngology
Table 1 (continued)
Question Items Experts’ most ChatGPT’s answer
consensual items
(n [%])
10. Which of the following treatments would you (A) Mandibular Advancement Device (MAD) A (75 [77.3%]) A,D
choose in an adult, female patient of 52 years, (B) Hypoglossal nerve stimulation B (57 [58.8%])
with a Body Mass Index (BMI) of 20 and a (C) Septoplasty
moderate Obstructive sleep apnea ( Apnea/ (D) Positional therapy
hypopnea Index of 28, AHI supine: 35, AHI non
supine: 23), who does not tolerate continuous
positive airway pressure (CPAP) therapy (multi-
ple choices are possible)
Table 2 Agreement between Question At least one item common between experts and Total agreement between experts
experts and Chat-GPT's answers Chat-GPT’s answers (n [%]) and Chat-GPT’s answers (n [%])
Q1 86 [88.7%] 3 [3.1%]
Q2 90 [92.8%] 4 [4.1%]
Q3 97 [100%] 5 [5.2%]
Q4 81 [83.5%] 3 [3.1%]
Q5 NA 78 [80.4%]
Q6 97 [100%] 60 [61.9%]
Q7 91 [93.8%] 76 [78.4%]
Q8 95 [97.9%] 94 [96.9%]
Q9 92 [94.8%] 89 [91.8%]
Q10 88 [90.7%] 11 [11.3%]
Table 3 Assessment provided Question Super expert assessment of Chat- Super expert assessment of experts’ p-value
by super-expert on Chat-GPT’s GPT’s answer (n [SD]) consensual answer (n [SD]) (Student t
and experts’ consensual answers test)
NS non-significant
Examining the data, we observe some key points. For for improvement in these particular cases. For question
questions Q1 and Q2 Super-experts rated Chat-GPT's Q4: Super-experts rated Chat-GPT's response lower than
responses lower than the experts' consensual answers, with experts' consensual answer, with a mean of 2 compared to
means of 2.8 and 3.4 compared to 4.1 and 4.6, respectively. 4.1. The p-value of 0.0003 indicates a significant differ-
The p-values of 0.01 for both questions indicate a signifi- ence in these assessments. This suggests that Chat-GPT
cant difference in these assessments. This suggests that struggled to align with expert consensus on this question,
while Chat-GPT provided responses that were generally with room for improvement in its response quality.
aligned with expert consensus, super-experts found room
13
European Archives of Oto-Rhino-Laryngology
Another aspect that warrants a more in-depth examina- Ethics declaration The author Jerome R. Lechien is also guest editor
tion is the level of agreement among super-experts when of the special issue on ‘ChatGPT and Artifcial Intelligence in Otolar-
yngology-Head and Neck Surgery’. He was not involved with the peer
assessing the responses provided by both experts and Chat- review process of this article.
GPT. The degree of agreement was found to be intermedi-
ate for expert responses and low for ChatGPT responses.
These findings underscore the intricate nature of managing
obstructive sleep disorders, where numerous therapeutic References
choices exist, and there is a dearth of conclusive evidence in
1. Shen Y, Heacock L, Elias J et al (2023) ChatGPT and other
the literature to guide the selection of the optimal approach large language models are double-edged swords. Radiology.
for a specific clinical presentation. 307(2):e230163
These results have several implications for the field of 2. Rajkomar A, Dean J, Kohane I (2019) Machine learning in medi-
OSA diagnosis and treatment. Firstly, they highlight the cine. N Engl J Med 380(14):1347–1358
3. Johnson D, Goodman R, Patrinely J et al (2023) Assessing the
potential of Chat-GPT as a valuable resource for general accuracy and reliability of AI-generated medical responses: an
practitioners and medical specialists in the initial assess- evaluation of the chat-GPT model. Res Sq. https://doi.org/10.
ment of OSA cases. Chat-GPT's ability to provide accurate 21203/rs.3.rs-2566942/v1
and consensus-driven responses can aid healthcare providers 4. Yeghiazarians Y, Jneid H, Tietjens JR et al (2021) Obstructive
sleep apnea and cardiovascular disease: a scientific statement from
in making informed decisions and recommendations, espe- the american heart association. Circulation 144(3):E56–E67
cially in regions where access to sleep medicine specialists 5. Peppard PE, Young T, Barnet JH et al (2013) Increased preva-
is limited. lence of sleep-disordered breathing in adults. Am J Epidemiol
Secondly, our study underscores the importance of col- 177(9):1006–1014
6. Warrens MJ (2010) Inequalities between multi-rater kappas. Adv
laboration between AI systems and human experts. While Data Anal Classif 4(4):271–286
Chat-GPT can offer valuable insights, it should be seen as 7. Brown TB, Mann, B, Ryder, N, et al (2020) Language models are
a complementary tool rather than a replacement for medi- few-shot learners. arXiv preprint arXiv:2005.14165.
cal professionals [11–14]. Combining the strengths of AI, 8. Radford A, Wu J, Child R et al (2019) Language models are unsu-
pervised multitask learners. OpenAI Blog 1(8):9
such as rapid data processing, with the clinical expertise of 9. Peppard PE, Young T, Palta M et al (2000) Prospective study of
otolaryngologists can enhance the accuracy and efficiency the association between sleep-disordered breathing and hyperten-
of OSA diagnosis and management. sion. N Engl J Med 342(19):1378–1384
Finally, our findings contribute to the ongoing discourse 10. Senaratna CV, Perret JL, Lodge CJ et al (2017) Prevalence of
obstructive sleep apnea in the general population: a systematic
in otolaryngology regarding OSA and the role of AI-gen- review. Sleep Med Rev 34:70–81
erated content. By demonstrating the potential of Chat- 11. .Lyons RJ, Arepalli SR, Fromal O, et al (2023) Artificial intel-
GPT to align with expert opinions, this study encourages ligence chatbot performance in triage of ophthalmic conditions.
further research and development in AI-driven healthcare Can J Ophthalmol. https://doi.org/10.1016/j.jcjo.2023.07.016
12. Xv Y, Peng C, Wei Z et al (2023) Can Chat-GPT a substitute for
applications. urological resident physician in diagnosing diseases?: a prelimi-
In conclusion, our study signifies the promise of AI, nary conclusion from an exploratory investigation. World J Urol
particularly Chat-GPT, in aiding healthcare professionals 41(9):2569–2571
in the realm of OSA diagnosis. While Chat-GPT exhibits 13. Chiesa-Estomba CM, Lechien JR, Vaira LA et al (2023) Explor-
ing the potential of Chat-GPT as a supportive tool for sialen-
a commendable level of consensus with expert responses, doscopy clinical decision making and patient information
the collaboration between AI and human experts is essen- support. Eur Arch Otorhinolaryngol. https://doi.org/10.1007/
tial for optimal patient care. This research represents a sig- s00405-023-08104-8
nificant step towards harnessing AI's capabilities to address 14. Chen S, Kann BH, Foote MB et al (2023) Use of artificial intel-
ligence chatbots for cancer treatment information. JAMA Oncol
the underdiagnosis and undertreatment of OSA, ultimately 9(10):1459–1462
improving the health outcomes of affected individuals. Fur-
ther investigations and refinements in AI-based healthcare Publisher's Note Springer Nature remains neutral with regard to
tools hold great potential for the future of medicine. jurisdictional claims in published maps and institutional affiliations.
Acknowledgements The authors would like to express their gratutide Springer Nature or its licensor (e.g. a society or other partner) holds
to the following super-experts for having participated to the study exclusive rights to this article under a publishing agreement with the
Bhik Kotecha, Clemens Heiser, Nico De Vrie, Rodolfo Lugo Saldana, author(s) or other rightsholder(s); author self-archiving of the accepted
Joachim Maurer, Ofer Jacobowitz, Kenny Pang, Michel Cahali, Ewa manuscript version of this article is solely governed by the terms of
Olszewska. such publishing agreement and applicable law.
13
European Archives of Oto-Rhino-Laryngology
Felipe Ahumada Mira1,13 · Valentin Favier2,13 · Heloisa dos Santos Sobreira Nunes3,13 · Joana Vaz de Castro4,13 ·
Florent Carsuzaa5,13 · Giuseppe Meccariello6 · Claudio Vicini6 · Andrea De Vito6 · Jerome R. Lechien7,13 ·
Carlos Chiesa Estomba8,13 · Antonino Maniaci9,13 · Giannicola Iannella10,13 · Eduardo Peña Rojas11 ·
Jenifer Barros Cornejo12 · Giovanni Cammaroto6,13
5
* Giovanni Cammaroto ENT Department, University Hospital of Poitiers, Poitiers,
giovanni.cammaroto@hotmail.com France
6
Felipe Ahumada Mira Head and Neck Department, ENT & Oral Surgery Unity,
felipe.ahumada.m@gmail.com G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini,
47121 Forlì, Italy
Valentin Favier
7
valentin_favier@hotmail.com Division of Laryngology and Broncho‑Esophagology,
Department of Otolaryngology and Head and Neck Surgery,
Heloisa dos Santos Sobreira Nunes
EpiCURA Hospital, UMONS Research Institute for Health
helo2005@hotmail.com
Sciences and Technology, University of Mons, Mons,
Joana Vaz de Castro Belgium
joanavazdecastro@gmail.com 8
Department of Otorhinolaryngology, Biodonostia Research
Florent Carsuzaa Institute, Donostia University Hospital, Osakidetza,
florent.carsuzza@gmail.com 20014 San Sebastian, Spain
9
Jerome R. Lechien Department of Medical and Surgical Sciences and Advanced
lechienj@gmail.com Technologies “GF Ingrassia”, ENT Section, University
of Catania, Piazza Università 2, 95100 Catania, Italy
Carlos Chiesa Estomba
10
chiesaestomba86@gmail.com Department of ‘Organi di Senso’, University “Sapienza”,
Viale Dell’Università 33, 00185 Rome, Italy
Antonino Maniaci
11
tnmaniaci29@gmail.com Clínica Lircay, Talca, Chile
12
Giannicola Iannella Hospital Clínico UC Christus, Santiago, Chile
giannicola.iannella@uniroma1.it 13
Young Otolaryngologists-International Federations
1 of Oto-Rhinolaryngological Societies (YO-IFOS), Paris,
ENT Department, Hospital of Linares, Linares, Chile
France
2
ENT Department, University Hospital of Montpellier,
Montpellier, France
3
ENT and Sleep Medicine Department, Nucleus
of Otolaryngology, Head and Neck Surgery and Sleep
Medicine of São Paulo, São Paulo, Brazil
4
ENT Department, Armed Forces Hospital, Lisbon, Portugal
13