AccScience Publishing / AIH / Online First / DOI: 10.36922/AIH025400082
Cite this article
22
Download
1623
Views
Supplementary Material
Related Info Links
More by Authors Links
Journal Browser
Volume | Year
Issue
Search
News and Announcements
View All
ORIGINAL RESEARCH ARTICLE

Beyond SMOTE: Evaluating large language models and mixture of experts for prediction of surgical site infections

Stephen Russell1*
Show Less
1 Department of Intelligent Systems and Robotics, University of West Florida, Pensacola, Florida, United States of America
Received: 30 September 2025 | Revised: 23 October 2025 | Accepted: 24 October 2025 | Published online: 4 November 2025
© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )
Abstract

Handling severe class imbalance remains one of the most persistent barriers to deploying reliable artificial intelligence (AI) in healthcare. Conventional approaches such as SMOTE and other resampling strategies often inflate training performance but degrade under real-world distribution shift. We evaluated alternative modeling strategies, including classical machine learning, ensemble methods, imbalance-aware mixture of experts (MoE), and a fine-tuned large language model (LLM) (ModernBERT-large), for surgical site infection (SSI) prediction using structured electronic medical records. Across temporally shifted evaluation cohorts, the ModernBERT model consistently outperformed all baselines without synthetic oversampling or target ratio adjustments, achieving a Matthews correlation coefficient of 0.71 versus 0.35 for the best SMOTE-resampled CatBoost model. In contrast, MoE architectures failed to deliver robustness gains, and resampled classical models deteriorated under distributional change. These results highlight a paradigm shift: pre-trained language models can serve as deployment-stable alternatives to synthetic imbalance correction in structured clinical prediction tasks. Beyond SSI, this finding underscores the potential of LLMs to improve the resilience of healthcare AI systems where minority-class prediction is critical.

Graphical abstract
Keywords
Imbalanced datasets
Artificial intelligence
Health informatics
Large language models
Machine learning
Minority class prediction
Mixture of Experts
Surgical site infection
Funding
None.
Conflict of interest
The author declares no conflict of interest.
References
  1. Leaper DJ, van Goor H, Reilly J, et al. Surgical site infection - a European perspective of incidence and economic burden. Int Wound J. 2004;1(4):247-273. doi: 10.1111/j.1742-4801.2004.00067.x

 

  1. Kirkland KB, Briggs JP, Trivette SL, Wilkinson WE, Sexton DJ. The impact of surgical-site infections in the 1990s: Attributable mortality, excess length of hospitalization, and extra costs. Infect Control Hosp Epidemiol. 1999;20(11): 725-730. doi: 10.1086/501572

 

  1. Ghassemi M, Naumann T, Schulam P, Joshi AL, Suresh I, Nemati MR. A review of challenges and opportunities in machine learning for health. AMIA Jt Summits Transl Sci Proc. 2018;2018:191-200.

 

  1. Salmi M, Atif D, Oliva D, Abraham A, Ventura S. Handling imbalanced medical datasets: Review of a decade of research. Artif Intell Rev. 2024;57(10):273. doi: 10.1007/s10462-024-10884-2

 

  1. Guo LL, Pfohl SR, Fries J, et al. Systematic review of approaches to preserve machine learning performance in the presence of temporal dataset shift in clinical medicine. Appl Clin Inform. 2021;12(4):808-815. doi: 10.1055/s-0041-1735184

 

  1. Fletcher RR, Olubeko O, Sonthalia H, et al. Application of Machine Learning to Prediction of Surgical Site Infection. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE; 2019:2234-2237. doi: 10.1109/embc.2019.8857942

 

  1. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321-357. doi: 10.1613/jair.953

 

  1. Luu J, Borisenko E, Przekop V, Patil A, Forrester JD, Choi J. Practical guide to building machine learning-based clinical prediction models using imbalanced datasets. Trauma Surg Acute Care Open. 2024;9(1):e001222. doi:10.1136/tsaco-2023-001222

 

  1. Al-Ahmari S, Nadeem F. Improving surgical site infection prediction using machine learning: Addressing challenges of highly imbalanced data. Diagnostics (Basel). 2025;15:501. doi: 10.3390/diagnostics15040501

 

  1. Colborn KL, Bronsert M, Amioka E, Hammermeister K, Henderson WG, Meguid R. Identification of surgical site infections using electronic health record data. Am J Infect Control. 2018;46(11):1230-1235. doi: 10.1016/j.ajic.2018.05.011

 

  1. Cho SY, Kim YJ, Park J, et al. Development of machine learning models for the surveillance of colon surgical site infections. J Hosp Infect. 2024;146:224-231. doi: 10.1016/j.jhin.2023.03.025

 

  1. Shen Z, Chen Z, Zhang Y, et al. The development and validation of a novel model for predicting surgical complications in colorectal cancer of elderly patients: Results from 1008 cases. Eur J Surg Oncol. 2018;44(4): 490-495. doi: 10.1016/j.ejso.2017.12.013

 

  1. Yeo I, Klemt C, Robinson MG, Esposito JG, Uzosike AC, Kwon YM. The use of artificial neural networks for the prediction of surgical site infection following TKA. J Knee Surg. 2022;36(06):637-643. doi:10.1055/s-0041-1741396

 

  1. Gholampour S. Impact of nature of medical data on machine and deep learning for imbalanced datasets: Clinical validity of SMOTE is questionable. Mach Learn Knowl Extr. 2024;6(2):827-841. doi: 10.3390/make6020039

 

  1. Sakho A, Malherbe E, Scornet E. Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2402.03819

 

  1. van den Goorbergh R, van Smeden M, Timmerman D, Van Calster B. The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. J Am Med Inform Assoc. 2022;29(9):1525-1534. doi: 10.1093/jamia/ocac093

 

  1. Jacobs R, Jordan MI, Nowlan SJ, Hinton GE. Adaptive mixtures of local experts. Neural Comput. 1991;3(1):79-87. doi: 10.1162/neco.1991.3.1.79

 

  1. Cai W, Jiang J, Wang F, Tang J, Kim S, Huang J. A Survey on Mixture of Experts. TechRxiv. Preprint posted online July 9, 2024. doi: 10.36227/techrxiv.172055626.64129172/v1

 

  1. Du H, Liu G, Lin Y, et al. Mixture of Experts for Intelligent Networks: A Large Language Model-enabled Approach. In: 2024 International Wireless Communications and Mobile Computing (IWCMC). IEEE; 2024:531-536. doi: 10.1109/iwcmc61514.2024.10592370

 

  1. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre- Training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT); 2019:4171-4186.

 

  1. Warner B, Chaffin A, Clavié B, et al. Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2412.13663

 

  1. Lee J, Yoon W, Kim S, et al. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682

 

  1. Liu C, Ouyang C, Cheng S, Shah A, Bai W, Arcucci R. G2D: From Global to Dense Radiography Representation Learning Via Vision-Language Pre-Training. In: Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS); 2024.

 

  1. Liu C, Cheng S, Ding W, Arcucci R. Spectral cross-domain neural network with soft-adaptive threshold spectral enhancement. IEEE Trans Neural Netw Learning Syst. 2025;36(1):692-703. doi: 10.1109/tnnls.2023.3332217

 

  1. Qin J, Liu C, Cheng S, Guo Y, Arcucci R. Freeze the Backbones: a Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-Training. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2024:1686- 1690. doi: 10.1109/icassp48485.2024.10447326

 

  1. Huang X, Khetan A, Cvitkovic M, Karnin Z. TabTransformer: Tabular Data Modeling using Contextual Embeddings. In: Proceedings of the 2020 Workshop on Tabular Representation Learning (NeurIPS 2020); 2020.

 

  1. Gorishniy Y, Rubachev I, Khrulkov V, Babenko A. Revisiting deep learning models for tabular data. In: 2021 35th International Conference on Neural Information Processing System (NIPS). NIPS; 2021: 18932 - 18940 proceeding. 2021;34:18932-18943. Available on: https://proceedings. neurips.cc/paper_files/paper/2021/file/9d86d83f925f2149e 9edb0ac3b49229c-Paper.pdf

 

  1. Somepalli G, Goldblum M, Schwarzschild A, Bruss CB, Goldstein T. SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training. arXiv. Preprint posted online 2021. doi: 10.48550/arXiv.2106.01342

 

  1. van Walraven C, Musselman R. The surgical site infection risk score (SSIRS): A model to predict the risk of surgical site infections. PLoS ONE. 2013;8(6):e67167. doi:10.1371/journal.pone.0067167

 

  1. Korol E, Johnston K, Waser N, Sifakis F, Jafri HS, Lo M, Kyaw MH. A systematic review of risk factors associated with surgical site infections among surgical patients. PLoS One. 2013;8(12):e83743. doi: 10.1371/journal.pone.0083743

 

  1. Mamlook REA, Wells LJ, Sawyer R. Machine-learning models for predicting surgical site infections using patient pre-operative risk and surgical procedure factors. Am J Infectt Control. 2023;51(5):544-550. doi: 10.1016/j.ajic.2022.08.013

 

  1. Wu G, Cheligeer C, Southern DA, et al. Development of machine learning models for the detection of surgical site infections following total hip and knee arthroplasty: A multicenter cohort study. Antimicrob Resist Infect Control. 2023;12(1). doi:10.1186/s13756-023-01294-0

 

  1. van Boekel AM, van der Meijden SL, Arbous SM, et al. Systematic evaluation of machine learning models for postoperative surgical site infection prediction. PLoS ONE. 2024;19(12):e0312968. doi: 10.1371/journal.pone.0312968

 

  1. Heffernan A, Ganguli R, Sears I, Stephen AH, Heffernan DS. Choice of machine learning models is important to predict post-operative infections in surgical patients. Surg Infect (Larchmt). 2025;26:520-529. doi: 10.1089/sur.2024.288

 

  1. Petrosyan Y, Thavorn K, Smith G, et al. Predicting postoperative surgical site infection with administrative data: A random forests algorithm. BMC Med Res Methodol. 2021;21(1):179. doi: 10.1186/s12874-021-01369-9

 

  1. Xiong C, Zhao R, Xu J, et al. Construct and validate a predictive model for surgical site infection after posterior lumbar interbody fusion based on machine learning algorithm. Comput Math Methods Med. 2022;2022:2697841. doi: 10.1155/2022/2697841

 

  1. Sharafoddini A, Dubin JA, Lee J. Patient similarity in prediction models based on health data: A scoping review. JMIR Med Inform. 2017;5(1):e7. doi: 10.2196/medinform.6730

 

  1. Dockès J, Varoquaux G, Poline JB. Preventing dataset shift from breaking machine-learning biomarkers. GigaScience. 2021;10(9):giab055. doi: 10.1093/gigascience/giab055

 

  1. Gabriel RA, Kuo TT, McAuley J, Hsu CN. Identifying and characterizing highly similar notes in big clinical note datasets. J Biomed Inform. 2018;82:63-69. doi: 10.1016/j.jbi.2018.04.009

 

  1. Purwadi J, Delima R, Wibowo A, Rumuy A. Comparison of the application of weighted cosine similarity and Minkowski distance similarity methods in stroke diagnostic systems. Int J Adv Comput Sci App. 2023;14(12). doi: 10.14569/ijacsa.2023.0141240

 

  1. McElfresh DC, Khandagale S, Valverde JP, et al. When do Neural Nets Outperform Boosted Trees on Tabular Data? In: Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS ‘23). Curran Associates Inc., Red Hook, NY, USA; 2023:76336–76369.

 

  1. Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21(1):6. doi: 10.1186/s12864-019-6413-7

 

  1. Marchesi R, Micheletti N, Jurman G, Osmani V. Mitigating Health Data Poverty: Generative Approaches versus Resampling for Time-series Clinical Data. arXiv. Preprint posted online 2022. doi: 10.48550/arXiv.2210.13958
Share
Back to top
Artificial Intelligence in Health, Electronic ISSN: 3029-2387 Print ISSN: 3041-0894, Published by AccScience Publishing