AccScience Publishing / AIH / Online First / DOI: 10.36922/AIH025240053
ORIGINAL RESEARCH ARTICLE

Neck dissection in head and neck surgery: An assessment of ChatGPT performance

Dustin A. Silverman1* John S. Howard1 Priscilla F. A. Pichardo1 Yash J. Patil1 Mekibib Altaye2 Chad A. Zender1 Alice L. Tang1
Show Less
1 Department of Otolaryngology, Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio, United States of America
2 Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
Received: 9 June 2025 | Revised: 1 September 2025 | Accepted: 11 September 2025 | Published online: 25 November 2025
© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )
Abstract

Artificial intelligence models such as chat generative pre-trained transformer (ChatGPT) are being increasingly used to inform treatment-related decisions. Among otolaryngology subspecialties, there is a paucity of literature examining the role of ChatGPT within head and neck surgical oncology. The utility of ChatGPT in addressing questions related to surgically relevant anatomy and lymphadenectomy procedures remains poorly understood. The primary pilot study objective was to determine the reliability of ChatGPT in answering neck dissection-related inquiries compared to expert head and neck surgical oncologists. Five neck dissection-related questions were presented to ChatGPT v3.5. Three fellowship-trained head and neck surgeons compared AI-generated responses to those of an expert head and neck surgeon. Raters, blinded to the author’s identity, evaluated the responses given based on a Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). The median level of agreement between raters for the ChatGPT responses was 1.0 (interquartile range [IQR]: 1.0, 2.5; minimum = 1 and maximum = 4), while the median level of agreement between raters for the surgeon responses was 5.0 (IQR: 5.0, 5.0; minimum = 5 and maximum = 5). The Mann–Whitney U test yielded a significance level of p=0.007 when comparing the level of agreement between ChatGPT and surgeon responses. Raters showed minimal consistency when evaluating ChatGPT responses (intraclass correlation coefficient = 0.05; 95% confidence interval: 0.0–0.88), in contrast to perfect agreement observed for the surgeon responses. In summary, ChatGPT is a promising tool in the acquisition of surgical knowledge. For neck dissection-related inquiries, a discrepancy between the reliability of ChatGPT-generated responses and surgeon expertise exists. Further refinement in AI models is needed to strengthen the utility of ChatGPT in head and neck oncologic surgery.

Graphical abstract
Keywords
Artificial intelligence
Chat generative pre-trained transformer
Head and neck
Neck dissection
Lymphadenectomy
Funding
None.
Conflict of interest
The authors declare that they have no competing interests.
References
  1. Dave T, Athaluri SA, Singh S. ChatGPT in medicine: An overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595. doi: 10.3389/frai.2023.1169595

 

  1. Mesko B. The ChatGPT (generative artificial intelligence) revolution has made artificial intelligence approachable for medical professionals. J Med Internet Res. 2023;25:e48392. doi: 10.2196/48392

 

  1. Schmalbach CE. Our otolaryngology future with artificial intelligence. Otolaryngol Head Neck Surg. 2024;170(6):1483. doi: 10.1002/ohn.802

 

  1. Lechien JR, Rameau A. Applications of ChatGPT in otolaryngology-head neck surgery: A state of the art review. Otolaryngol Head Neck Surg. 2024;171(3):667-677. doi: 10.1002/ohn.807

 

  1. Chiesa-Estomba CM, Speth MM, Mayo-Yanez M, Liu DT, Maniaci A, Borsetto D. Is the evolving role of artificial intelligence and chatbots in the field of otolaryngology embracing the future? Eur Arch Otorhinolaryngol. 2024;281(4):2179-2180. doi: 10.1007/s00405-023-08382-2

 

  1. Davis RJ, Ayo-Ajibola O, Lin ME, Swanson MS, Chambers TN, Kwon DI, et al. Evaluation of oropharyngealcancer information from revolutionary artificial intelligence chatbot. Laryngoscope. 2024;134(5):2252-2257. doi: 10.1002/lary.31191

 

  1. Kuşcu O, Pamuk AE, Sütay Süslü N, Hosal S. Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer? Front Oncol. 2023;13:1256459. doi: 10.3389/fonc.2023.1256459

 

  1. Lee JC, Hamill CS, Shnayder Y, Buczek E, Kakarala K, Bur AM. Exploring the role of artificial intelligence chatbots in preoperative counseling for head and neck cancer surgery. Laryngoscope. 2024;134(6):2757-2761. doi: 10.1002/lary.31243

 

  1. Washington CJ, Abouyared M, Karanth S, et al. The use of chatbots in head and neck mucosal malignancy treatment recommendations. Otolaryngol Head Neck Surg. 2024;171:1062-1068. doi: 10.1002/ohn.818

 

  1. Mnajjed L, Patel RJ. Assessment of ChatGPT generated educational material for head and neck surgery counseling. Am J Otolaryngol. 2024;45(5):104410. doi: 10.1016/j.amjoto.2024.104410

 

  1. Maniyar N, Sarode GS, Sarode SC, Thakkar S. ChatGPT conversations on oral cancer: Unveiling ChatGPT’s potential and pitfalls. Oral Oncol Rep. 2024;10:100280. doi: 10.1016/j.oor.2024.100280

 

  1. National Comprehensive Cancer Network. Head and Neck Cancers. Ver. 1. Pennsylvania: National Comprehensive Cancer Network; 2024.

 

  1. Miller MC, Goldenberg D. AHNS Series: Do you know your guidelines? Principles of surgery for head and neck cancer: A review of the national comprehensive cancer network guidelines. Head Neck. 2017;39(4):791-796. doi: 10.1002/hed.24654

 

  1. Tessler I, Wolfovitz A, Alon EE, et al. ChatGPT’s adherence to otolaryngology clinical practice guidelines. Eur Arch Otorhinolaryngol. 2024;281(7):3829-3834. doi: 10.1007/s00405-024-08634-9

 

  1. Long C, Subburam D, Lowe K, et al. ChatENT: Augmented Large language model for expert knowledge retrieval in otolaryngology-head and neck surgery. Otolaryngol Head Neck Surg. 2024;171(4):1042-1051. doi: 10.1002/ohn.864

 

  1. D’Cruz AK, Vaish R, Kapre N, et al. Elective versus therapeutic neck dissection in node-negative oral cancer. N Engl J Med. 2015;373(6):521-529. doi: 10.1056/nejmoa1506007

 

  1. Chelli M, Descamps J, Lavoué V, et al. Hallucination rates and reference accuracy of chatgpt and bard for systematic reviews: Comparative analysis. J Med Internet Res. 2024;26:e53164. doi: 10.2196/53164

 

  1. De Wynter A, Wang X, Sokolov A, Gu Q, Chen SQ. An evaluation on large language model outputs: Discourse and memorization. Nat Lang Process J. 2023;4:100024.

 

  1. Bhattacharyya M, Miller VM, Bhattacharyya D, Miller LE. High rates of fabricated and inaccurate references in ChatGPT-generated medical content. Cureus. 2023;15(5):e39238. doi: 10.7759/cureus.39238
Share
Back to top
Artificial Intelligence in Health, Electronic ISSN: 3029-2387 Print ISSN: 3041-0894, Published by AccScience Publishing