Neck dissection in head and neck surgery: An assessment of ChatGPT performance

² Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America

AIH, 025240053 https://doi.org/10.36922/AIH025240053

Received: 9 June 2025 | Revised: 1 September 2025 | Accepted: 11 September 2025 | Published online: 25 November 2025

© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )

Download PDF

XML

Cite

Abstract

Artificial intelligence models such as chat generative pre-trained transformer (ChatGPT) are being increasingly used to inform treatment-related decisions. Among otolaryngology subspecialties, there is a paucity of literature examining the role of ChatGPT within head and neck surgical oncology. The utility of ChatGPT in addressing questions related to surgically relevant anatomy and lymphadenectomy procedures remains poorly understood. The primary pilot study objective was to determine the reliability of ChatGPT in answering neck dissection-related inquiries compared to expert head and neck surgical oncologists. Five neck dissection-related questions were presented to ChatGPT v3.5. Three fellowship-trained head and neck surgeons compared AI-generated responses to those of an expert head and neck surgeon. Raters, blinded to the author’s identity, evaluated the responses given based on a Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). The median level of agreement between raters for the ChatGPT responses was 1.0 (interquartile range [IQR]: 1.0, 2.5; minimum = 1 and maximum = 4), while the median level of agreement between raters for the surgeon responses was 5.0 (IQR: 5.0, 5.0; minimum = 5 and maximum = 5). The Mann–Whitney U test yielded a significance level of p=0.007 when comparing the level of agreement between ChatGPT and surgeon responses. Raters showed minimal consistency when evaluating ChatGPT responses (intraclass correlation coefficient = 0.05; 95% confidence interval: 0.0–0.88), in contrast to perfect agreement observed for the surgeon responses. In summary, ChatGPT is a promising tool in the acquisition of surgical knowledge. For neck dissection-related inquiries, a discrepancy between the reliability of ChatGPT-generated responses and surgeon expertise exists. Further refinement in AI models is needed to strengthen the utility of ChatGPT in head and neck oncologic surgery.

Graphical abstract

Keywords

Artificial intelligence

Chat generative pre-trained transformer

Head and neck

Neck dissection

Lymphadenectomy

Funding

None.

Conflict of interest

The authors declare that they have no competing interests.

References

Dave T, Athaluri SA, Singh S. ChatGPT in medicine: An overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595. doi: 10.3389/frai.2023.1169595

Mesko B. The ChatGPT (generative artificial intelligence) revolution has made artificial intelligence approachable for medical professionals. J Med Internet Res. 2023;25:e48392. doi: 10.2196/48392

Schmalbach CE. Our otolaryngology future with artificial intelligence. Otolaryngol Head Neck Surg. 2024;170(6):1483. doi: 10.1002/ohn.802

Lechien JR, Rameau A. Applications of ChatGPT in otolaryngology-head neck surgery: A state of the art review. Otolaryngol Head Neck Surg. 2024;171(3):667-677. doi: 10.1002/ohn.807

Chiesa-Estomba CM, Speth MM, Mayo-Yanez M, Liu DT, Maniaci A, Borsetto D. Is the evolving role of artificial intelligence and chatbots in the field of otolaryngology embracing the future? Eur Arch Otorhinolaryngol. 2024;281(4):2179-2180. doi: 10.1007/s00405-023-08382-2

Davis RJ, Ayo-Ajibola O, Lin ME, Swanson MS, Chambers TN, Kwon DI, et al. Evaluation of oropharyngealcancer information from revolutionary artificial intelligence chatbot. Laryngoscope. 2024;134(5):2252-2257. doi: 10.1002/lary.31191

Kuşcu O, Pamuk AE, Sütay Süslü N, Hosal S. Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer? Front Oncol. 2023;13:1256459. doi: 10.3389/fonc.2023.1256459

Lee JC, Hamill CS, Shnayder Y, Buczek E, Kakarala K, Bur AM. Exploring the role of artificial intelligence chatbots in preoperative counseling for head and neck cancer surgery. Laryngoscope. 2024;134(6):2757-2761. doi: 10.1002/lary.31243

Washington CJ, Abouyared M, Karanth S, et al. The use of chatbots in head and neck mucosal malignancy treatment recommendations. Otolaryngol Head Neck Surg. 2024;171:1062-1068. doi: 10.1002/ohn.818

Mnajjed L, Patel RJ. Assessment of ChatGPT generated educational material for head and neck surgery counseling. Am J Otolaryngol. 2024;45(5):104410. doi: 10.1016/j.amjoto.2024.104410

Maniyar N, Sarode GS, Sarode SC, Thakkar S. ChatGPT conversations on oral cancer: Unveiling ChatGPT’s potential and pitfalls. Oral Oncol Rep. 2024;10:100280. doi: 10.1016/j.oor.2024.100280

National Comprehensive Cancer Network. Head and Neck Cancers. Ver. 1. Pennsylvania: National Comprehensive Cancer Network; 2024.

Miller MC, Goldenberg D. AHNS Series: Do you know your guidelines? Principles of surgery for head and neck cancer: A review of the national comprehensive cancer network guidelines. Head Neck. 2017;39(4):791-796. doi: 10.1002/hed.24654

Tessler I, Wolfovitz A, Alon EE, et al. ChatGPT’s adherence to otolaryngology clinical practice guidelines. Eur Arch Otorhinolaryngol. 2024;281(7):3829-3834. doi: 10.1007/s00405-024-08634-9

Long C, Subburam D, Lowe K, et al. ChatENT: Augmented Large language model for expert knowledge retrieval in otolaryngology-head and neck surgery. Otolaryngol Head Neck Surg. 2024;171(4):1042-1051. doi: 10.1002/ohn.864

D’Cruz AK, Vaish R, Kapre N, et al. Elective versus therapeutic neck dissection in node-negative oral cancer. N Engl J Med. 2015;373(6):521-529. doi: 10.1056/nejmoa1506007

Chelli M, Descamps J, Lavoué V, et al. Hallucination rates and reference accuracy of chatgpt and bard for systematic reviews: Comparative analysis. J Med Internet Res. 2024;26:e53164. doi: 10.2196/53164

De Wynter A, Wang X, Sokolov A, Gu Q, Chen SQ. An evaluation on large language model outputs: Discourse and memorization. Nat Lang Process J. 2023;4:100024.

Bhattacharyya M, Miller VM, Bhattacharyya D, Miller LE. High rates of fabricated and inaccurate references in ChatGPT-generated medical content. Cureus. 2023;15(5):e39238. doi: 10.7759/cureus.39238

Previous article in this issue

Next article in this issue

Artificial Intelligence in Health, Electronic ISSN: 3029-2387 Print ISSN: 3041-0894, Published by AccScience Publishing