AccScience Publishing / AIH / Online First / DOI: 10.36922/AIH025400082
ORIGINAL RESEARCH ARTICLE

Beyond SMOTE: Evaluating large language models and mixture of experts for prediction of surgical site infections

Stephen Russell1*
Show Less
1 Department of Intelligent Systems and Robotics, University of West Florida, Pensacola, Florida, United States of America
Received: 30 September 2025 | Revised: 23 October 2025 | Accepted: 24 October 2025 | Published online: 4 November 2025
© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )
Abstract

Handling severe class imbalance remains one of the most persistent barriers to deploying reliable artificial intelligence (AI) in healthcare. Conventional approaches such as SMOTE and other resampling strategies often inflate training performance but degrade under real-world distribution shift. We evaluated alternative modeling strategies, including classical machine learning, ensemble methods, imbalance-aware mixture of experts (MoE), and a fine-tuned large language model (LLM) (ModernBERT-large), for surgical site infection (SSI) prediction using structured electronic medical records. Across temporally shifted evaluation cohorts, the ModernBERT model consistently outperformed all baselines without synthetic oversampling or target ratio adjustments, achieving a Matthew correlation coefficient of 0.71 versus 0.35 for the best SMOTE-resampled CatBoost model. In contrast, MoE architectures failed to deliver robustness gains, and resampled classical models deteriorated under distributional change. These results highlight a paradigm shift: pre-trained language models can serve as deployment-stable alternatives to synthetic imbalance correction in structured clinical prediction tasks. Beyond SSI, this finding underscores the potential of LLMs to improve the resilience of healthcare AI systems where minority-class prediction is critical.

Graphical abstract
Keywords
Imbalanced datasets
Artificial intelligence
Health informatics
Large language models
Machine learning
Minority class prediction
Mixture of Experts
Surgical site infection
Funding
None.
Conflict of interest
The author declares no conflict of interest.
References
  1. Leaper DJ, van Goor H, Reilly J, et al. Surgical site infection - a European perspective of incidence and economic burden. Int Wound J. 2004;1(4):247-273. doi: 10.1111/j.1742-4801.2004.00067.x

 

  1. Kirkland KB, Briggs JP, Trivette SL, Wilkinson WE, Sexton DJ. The impact of surgical-site infections in the 1990s: Attributable mortality, excess length of hospitalization, and extra costs. Infect Control Hosp Epidemiol. 1999;20(11): 725-730. doi: 10.1086/501572.

 

  1. Ghassemi M, Naumann T, Schulam P, Joshi AL, Suresh I, Nemati MR. A review of challenges and opportunities in machine learning for health. AMIA Jt Summits Transl Sci Proc. 2018;2018:191-200. doi: 10.1109/MSP.2018.8478161

 

  1. Salmi M, Atif D, Oliva D, Abraham A, Ventura S. Handling imbalanced medical datasets: Review of a decade of research. Artif. Intell. Rev. 2024;57:273.

 

  1. Winkler EA, Rolston MW, Chang JD, Veeravagu EJ, Burke CJ. Association of imbalanced datasets and bias in machine learning models in health care. JAMA Netw Open. 2021;4(7):e2131810.doi: 10.1001/jamanetworkopen.2021.31810

 

  1. Fletcher RR, Olubeko O, Sonthalia H, et al. Application of Machine Learning to Prediction of Surgical Site Infection. In: Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2019. Berlin, Germany. IEEE; 2019. p. 2234-2237.

 

  1. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321-357. doi: 10.1613/jair.953

 

  1. Wu G, Cheligeer C, Southern DA, et al. Improving surgical site infection prediction using machine learning. In: Antimicrobial Resistance and Infection Control. Berlin: Springer; 2025.

 

  1. Al-Ahmari S, Nadeem F. Improving surgical site infection prediction using machine learning: Addressing challenges of highly imbalanced data. Diagnostics (Basel). 2025;15:501. doi: 10.3390/diagnostics15040501

 

  1. Colborn KL, Bronsert M, Amioka E, Hammermeister K, Henderson WG, Meguid R. Identification of surgical site infections using electronic health record data. Am J Infect Control. 2018;46(11):1230-1235. doi: 10.1016/j.ajic.2018.07.021

 

  1. Cho SY, Kim YJ, Park J, et al. Development of machine learning models for the surveillance of colon surgical site infections. J Hosp Infect. 2024;146:224-231. doi: 10.1016/j.jhin.2023.03.025

 

  1. Shen Z, Chen Z, Zhang Y, et al. The development and validation of a novel model for predicting surgical complications in colorectal cancer of elderly patients: Results from 1008 cases. Eur J Surg Oncol. 2018;44(4): 490-495. doi: 10.1016/j.ejso.2017.12.013

 

  1. Chowdhury AAA, Sultana A, Rafi AH, Tariq M. AI-driven predictive analytics in orthopedic surgery outcomes. Rev Esp Doc Cient. 2024;19(2):104-124.

 

  1. Gholampour S. Impact of nature of medical data on machine and deep learning for imbalanced datasets: Clinical validity of SMOTE is questionable. Mach Learn Knowl Extr. 2024;6(2):827-841. doi: 10.3390/make6020038

 

  1. Sakho A, Malherbe E, Scornet E. Do we Need Rebalancing Strategies? A Theoretical and Empirical Study Around SMOTE and its Variants. [Preprint]; 2024.

 

  1. Van den Goorbergh R, Van Smeden M, Timmerman D, Van Calster B. The harm of class imbalance corrections for risk prediction models: Illustration and simulation using logistic regression. J Am Med Inform Assoc. 2022;29(9):1525-1534. doi: 10.1093/jamia/ocac097

 

  1. Jacobs R, Jordan MI, Nowlan SJ, Hinton GE. Adaptive mixtures of local experts. Neural Comput. 1991;3(1):79-87. doi: 10.1162/neco.1991.3.1.79

 

  1. Cai W, Jiang J, Wang F, Tang J, Kim S, Huang J. A Survey on Mixture of Experts. [Preprint]; 2024.

 

  1. Du H, Liu G, Lin Y, et al. Mixture of Experts for Intelligent Networks: A Large Language Model-Enabled Approach. In: Proceedings of the 2024 International Wireless Communications and Mobile Computing (IWCMC). Ayia Napa, Cyprus. IEEE; 2024. p. 531-536. doi: 10.1109/IWCMC60181.2024.10701184

 

  1. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre- Training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT); 2019. p. 4171-4186. doi: 10.48550/arXiv.1810.04805

 

  1. Warner B, Chaffin A, Clavié B, et al. Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for fast, Memory- Efficient, and Long-Context Finetuning and Inference. [Preprint]; 2024.

 

  1. Lee J, Yoon W, Kim S, et al. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682

 

  1. Liu C, Ouyang C, Cheng S, Shah A, Bai W, Arcucci R. G2D: From Global to Dense Radiography Representation Learning Via Vision-Language Pre-Training. In: Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS); 2024.

 

  1. Liu C, Cheng S, Ding W, Arcucci R. Spectral cross-domain neural network with soft-adaptive threshold spectral enhancement. IEEE Trans Neural Netw Learn Syst. 2023;34(11):9492-9504. doi: 10.1109/tnnls.2023.3253951

 

  1. Qin J, Liu Y, Wang X, et al. Freeze the Backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-Training. In: ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2024. p. 1-5. doi: 10.1109/ICASSP48485.2024.10446328

 

  1. Huang X, Khetan A, Cvitkovic M, Karnin Z. TabTransformer: Tabular Data Modeling using Contextual Embeddings. In: Proceedings of the 2020 Workshop on Tabular Representation Learning (NeurIPS 2020); 2020.

 

  1. Gorishniy D, Rubinstein H, Khrulkov O, Lempitsky V. Feature Tokenizer Transformer (FT-Transformer): A Universal Architecture for Tabular Data. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) Workshop on Tabular Data; 2021.

 

  1. Somepalli G, Goldblum M, Schwarzschild A, Bruss CB, Goldstein T. SAINT: Improved neural Networks for Tabular Data Via Row Attention and Contrastive Pre-Training. [Preprint]; 2021.

 

  1. Burns RJ, Rosenthal AS, Hollenbeak TM, Shabahang BE, Bennett SM. Development of a simple surgical site infection (SSI) risk score for general surgery. Am Surg. 2016;82(11):1042-1049. doi: 10.1177/000313481608201118

 

  1. Korol E, Johnston K, Waser N, Sifakis F, Jafri HS, Lo M, Kyaw MH. A systematic review of risk factors associated with surgical site infections among surgical patients. PLoS One. 2013;8(12):e83743. doi: 10.1371/journal.pone.0083743

 

  1. Al Mamlook RE, Hammoudeh H, Khader N, Rawashdeh M, Al-Aqtash I. Predicting surgical site infections using deep learning models on large-scale medical claims data. BMC Med Inform Decis Mak. 2021;21(1):329. doi: 10.1186/s12911-021-01704-z

 

  1. Wu C, Liu X, Liu Q, Yu Y, Li M. A machine learning model to predict surgical site infections after total joint arthroplasty. J Orthop Surg Res. 2021;16(1):301. doi: 10.1186/s13018-021-02412-7

 

  1. Al Mamlook RE, Wells LJ, Sawyer R. Machine-learning models for predicting surgical site infections using patient pre-operative risk and surgical procedure factors. Am J Infect Control. 2023;51(5):551-557. doi: 10.1016/j.ajic.2022.10.009

 

  1. Van Boekel AM, Greuters MJ, Van Zundert A, Noordzij PG, Van Dijk J. Systematic evaluation of machine learning models for postoperative surgical site infection prediction. PLoS One. 2024;19(3):e0301576. doi: 10.1371/journal.pone.0301576

 

  1. Heffernan A, Ganguli R, Sears I, Stephen AH, Heffernan DS. Choice of machine learning models is important to predict post-operative infections in surgical patients. Surg Infect (Larchmt). 2025;26:520-529. doi: 10.1089/sur.2024.288

 

  1. Petrosyan Y, Thavorn K, Smith G, et al. Predicting postoperative surgical site infection with administrative data: A random forests algorithm. BMC Med Res Methodol. 2021;21(1):179. doi: 10.1186/s12874-021-01369-9

 

  1. Xiong C, Zhao R, Xu J, et al. Construct and validate a predictive model for surgical site infection after posterior lumbar interbody fusion based on machine learning algorithm. Comput Math Methods Med. 2022;2022:2697841. doi: 10.1155/2022/2697841

 

  1. Sharafoddini A, Dubin JA, Lee J. Patient similarity in prediction models based on health data: A scoping review. JMIR Med Inform. 2017;5(1):e7. doi: 10.2196/medinform.6730

 

  1. Dockes J, Varoquaux G, Poline JB. Preventing dataset shift from breaking machine-learning biomarkers. Gigascience. 2021;10(7):giab051. doi: 10.1093/gigascience/giab051

 

  1. Gabriel RA, Kuo TT, McAuley J, Hsu CN. Identifying and characterizing highly similar notes in big clinical note datasets. J Biomed Inform. 2018;82:63-69. doi: 10.1016/j.jbi.2018.02.001

 

  1. Purwadi J, Delima R, Wibowo A, Rumuy A. Comparison of the application of weighted cosine similarity and Minkowski distance similarity methods in stroke diagnostic systems. Sci World J. 2023;2023:5592673. doi: 10.1155/2023/5592673

 

  1. McElfresh DC, Khandagale S, Valverde JP, et al. When do Neural Nets Outperform Boosted Trees on Tabular Data? In: Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, New Orleans, LA; 2023.

 

  1. Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21(1):6. doi: 10.1186/s12864-019-6413-7

 

  1. Marchesi R, Micheletti N, Jurman G, Osmani V. Mitigating Health Data Poverty: Generative Approaches Versus Resampling for Time-Series Clinical Data. [Preprint]; 2022.
Share
Back to top
Artificial Intelligence in Health, Electronic ISSN: 3029-2387 Print ISSN: 3041-0894, Published by AccScience Publishing