Bridging clinical knowledge and machine learning: Leveraging large language models to predict in vitro fertilization outcomes

Introduction: In vitro fertilization (IVF) and frozen-thawed embryo transfer (FET) are vital components of assisted reproductive technology. However, predicting pregnancy outcomes remains challenging due to various biological and clinical factors. Recent advances in artificial intelligence (AI) and machine learning (ML) have shown the potential in offering innovative solutions for forecasting reproductive success. Objective: This study explores the use of large language models, specifically ChatGPT-4o, to optimize ML models for predicting pregnancy outcomes in IVF. Methods: The clinical dataset comprised 1061 IVF patients who underwent FET from 2014 to 2017, including variables such as age, body mass index, infertility duration, endometrial thickness, and serum beta-human chorionic gonadotrophin (β-HCG) levels on the 7th day after FET. ChatGPT-4o was tasked with preprocessing the data, evaluating several ML models, and optimizing performance. Results: The random forest model emerged as the best-performing model, achieving an accuracy of 85.45% and an area under the receiver operating characteristic curve of 0.8287 after applying the optimal threshold of 0.548, indicating strong predictive capability. Feature importance analysis revealed that serum β-HCG levels on the 7th day after FET were the most influential predictor of pregnancy outcomes. Despite these promising results, the study noted potential overfitting, likely due to the limited training dataset, a constraint largely attributable to the computational limitations of ChatGPT-4o. Conclusion: ChatGPT-4o shows potential in enhancing ML models in IVF outcome prediction. While AI-driven models can significantly aid clinical decision-making, clinicians should maintain a central role in patient outcome predictions. Future work will focus on improving model generalization with larger datasets and enhanced computational resources.
- Li B, Chen H, Lin X, Duan H. Multimodal learning system integrating electronic medical records and hysteroscopic images for reproductive outcome prediction and risk stratification of endometrial injury: A multicenter diagnostic study. Int J Surg. 2024;110(6):3237-3248. doi: 10.1097/JS9.0000000000001241
- Henderson I, Rimmer MP, Keay SD, et al. Predicting the outcomes of assisted reproductive technology treatments: A systematic review and quality assessment of prediction models. F S Rev. 2021;2(1):1-10. doi: 10.1016/j.xfnr.2020.11.002
- Huang C, Xiang Z, Zhang Y, et al. Using deep learning in a monocentric study to characterize maternal immune environment for predicting pregnancy outcomes in the recurrent reproductive failure patients. Front Immunol. 2021;12:642167. doi: 10.3389/fimmu.2021.642167
- Melli B, Morini D, Spaggiari G, et al. P-032 sperm parameters can predict the success of assisted reproductive technology. Single-center and retrospective analysis of assisted reproductive technology cycles from 1992 to 2020. Hum Reprod. 2022;37:deac107.03032. doi: 10.1093/humrep
- Westbye HJ, Moltu C, McAleavey AA. eXplainable AI for routine outcome monitoring and clinical feedback. Counsell Psychother Res. 2025;25(1):e12764. doi: 10.1002/capr.12764
- Chen PH, Liu Y, Peng L. How to develop machine learning models for healthcare. Nat Mater. 2019;18(5):410-414. doi: 10.1038/s41563-019-0345-0
- Widyasari R, Lo D, Liao L. Beyond ChatGPT: Enhancing Software Quality Assurance Tasks with Diverse LLMs and Validation Techniques. [arXiv Preprint]; 2024. doi: 10.48550/arXiv.2409.01001
- Sadik AR, Ceravola A, Joublin F, Patra J. Analysis of Chatgpt on Source Code. [arXiv Preprint]; 2023. doi: 10.48550/arXiv.2306.00597
- Telenti A, Auli M, Hie BL, Maher C, Saria S, Ioannidis JP. Large language models for science and medicine. Eur J Clin Invest. 2024;54(6):e14183. doi: 10.1111/eci.14183
- Zheng Y, Koh HY, Ju J, et al. Large Language Models for Scientific Synthesis, Inference and Explanation. [arXiv Preprint]; 2023. doi: 10.48550/arXiv.2310.07984
- Silver DH, Feder M, Gold-Zamir Y, et al. Data-Driven Prediction of Embryo Implantation Probability Using IVF Time-Lapse Imaging. [arXiv Preprint]; 2020. doi: 10.48550/arXiv.2006.01035
- Sun L, Li J, Zeng S, et al. Artificial intelligence system for outcome evaluations of human in vitro fertilization-derived embryos. Chin Med J (Engl). 2024;137(16):1939-1949. doi: 10.1097/CM9.0000000000003162
- Liu R, Bai S, Jiang X, et al. Multifactor prediction of embryo transfer outcomes based on a machine learning algorithm. Front Endocrinol (Lausanne). 2021;12:745039. doi: 10.3389/fendo.2021.745039
- Nazi ZA, Peng W. Large language models in healthcare and medical domain: A review. Informatics. 2024;11(3):57. doi: 10.3390/informatics11030057
- Chen S. Potential applications and safety of large language models in healthcare. Interdiscip Humanit Commun Stud. 2024;1(6):6. doi: org/10.61173/f578jp05
- Rezgui K. Large language models for healthcare: Applications, models, datasets, and challenges. In: 2024 10th International Conference on Control, Decision and Information Technologies. Maharashtra: CoDIT; 2024. P. 2366-2371. doi: 10.1109/CoDIT62066.2024.10708253
- Li SW, Kemp MW, Logan SJ, et al. ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology. Am J Obstet Gynecol. 2023;229(2):172.e1-12. doi: 10.1016/j.ajog.2023.04.020
- Yuan L, Yu L, Sun Z, et al. Association between 7-day serum β-hCG levels after frozen-thawed embryo transfer and pregnancy outcomes: A single-centre retrospective study from China. BMJ Open. 2020;10(10):e035332. doi: 10.1136/bmjopen-2019-035332
- Kiser AC, Eilbeck K, Ferraro JP, Skarda DE, Samore MH, Bucher B. Standard vocabularies to improve machine learning model transferability with electronic health record data: Retrospective cohort study using health care-associated infection. JMIR Med Inform. 2022;10(8):e39057. doi: 10.2196/39057
- Ng K, Stewart WF, DeFilippi C, et al. Data driven modeling of electronic health record data to detect pre-diagnostic heart failure in primary care. Circulation. 2015;132(Suppl 3):A17713 doi: 10.1161/circ.132.suppl_3.17713
- Hong D, Fort D, Shi L, Price-Haywood EG. Electronic medical record risk modeling of cardiovascular outcomes among patients with type 2 diabetes. Diabetes Ther. 2021;12(7):2007-2017. doi: 10.1007/s13300-021-01096-w
- Ganesan R, Habraken SC, Van De Vosse FN, Huberts W. Explainable machine learning based prediction of severity of heart failure using primary electronic health records. Stud Health Technol Inform. 2024;316:542-546. doi: 10.3233/SHTI240471
- Stevens CA, Lyons AR, Dharmayat KI, et al. Ensemble machine learning methods in screening electronic health records: A scoping review. Digit Health. 2023;9:20552076231173225. doi: 10.1177/20552076231173225
- Wang J, Luo J, Ye M, et al. Recent Advances in Predictive Modeling with Electronic Health Records. [arXiv Preprint]; 2024. doi: 10.48550/arXiv.2402.01077
- Iwagami M, Inokuchi R, Kawakami E, et al. Comparison of machine-learning and logistic regression models for prediction of 30-day unplanned readmission in electronic health records: A development and validation study. PLOS Digit Health. 2024;3(8):e0000578. doi: 10.1371/journal.pdig.0000578
- Abu-Rayyash H. Revolutionizing translator training through human-ai collaboration: Insights and implications from integrating gpt-4. Curr Trends Transl Teach Learn E. 2023;10:259-301. doi: 10.51287/cttl20239
- Takayanagi T, Takamura H, Izumi K, Chen CC. Beyond Turing Test: Can GPT-4 Sway Experts’ Decisions? [arXiv Preprint]; 2024. doi: 10.48550/arXiv.2409.16710
- Mondal A, Naskar A. Artificial Intelligence in Diabetes Care: Evaluating GPT-4’s Competency in Reviewing Diabetic Patient Management Plan in Comparison to Expert Review. [medRxiv Preprint]; 2024. doi: 10.1101/2024.04.12.24305732
- Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of Gpt-4 on Medical Challenge Problems. [arXiv Preprint]; 2023. doi: 10.48550/arXiv.2303.13375
- Björklund A, Henelius A, Oikarinen E, Kallonen K, Puolamäki K. Explaining any black box model using real data. Front Comput Sci. 2023;5:1143904. doi: 10.3389/fcomp.2023.1143904
- Schumacher A, Zenclussen AC. Human chorionic gonadotropin-mediated immune responses that facilitate embryo implantation and placentation. Front Immunol. 2019;10:2896. doi: 10.3389/fimmu.2019.02896
- Sung N, Kwak-Kim J, Koo HS, Yang KM. Serum hCG-β levels of postovulatory day 12 and 14 with the sequential application of hCG-β fold change significantly increased predictability of pregnancy outcome after IVF-ET cycle. J Assist Reprod Genet. 2016;33:1185-1194. doi: 10.1007/s10815-016-0744-y
- Liu Y, Liu Y, Li X, Jiao X, Zhang R, Zhang J. Predictive value of serum β-hCG for early pregnancy outcomes among women with recurrent spontaneous abortion. Int J Gynecol Obstet. 2016;135(1):16-21. doi: org/10.1016/j.ijgo.2016.03.007
- Li Y, Xiang YG, Zhang M, Ma LY, Tan L, Zhao DM. Prediction of pregnancy outcome by serum β-hCG and progesterone on the fourteenth day after IVF-ET. J Int Reprod Health Family Plan. 2013;32(1):9.
- Wang L, Jiang Y, Shen H, et al. Independent value of serum β-human chorionic gonadotropin in predicting early pregnancy loss risks in IVF/ICSI cycles. Front Immunol. 2022;13:992121. doi: 10.3389/fimmu.2022.992121
- Ozer G. Initial β-hCG levels and 2-day-later increase rates effectively predict pregnancy outcomes in single blastocyst transfer in frozen-thawed or fresh cycles: A retrospective cohort study. Medicine (Baltimore). 2023;102(42):e35605. doi: 10.1097/MD.0000000000035605
- Ying X. An overview of overfitting and its solutions. J Phys Conf Ser. 2019;1168:022022. doi: 10.1088/1742-6596/1168/2/022022
- Habib N, Buzzaccarini G, Centini G, et al. Impact of lifestyle and diet on endometriosis: A fresh look to a busy corner. Prz Menopauzalny. 2022;21(2):124-132. doi: 10.5114/pm.2022.116437
- Gullo G, Carlomagno G, Unfer V, D’Anna R. Myo-inositol: From induction of ovulation to menopausal disorder management. Minerva Ginecol. 2015;67(5):485-486.
- Zaami S, Melcarne R, Patrone R, et al. Oncofertility and reproductive counseling in patients with breast cancer: A retrospective study. J Clin Med. 2022;11(5):1311. doi: 10.3390/jcm11051311
- Greco E, Litwicka K, Minasi MG, Cursio E, Greco PF, Barillari P. Preimplantation genetic testing: Where we are today. Int J Mol Sci. 2020;21(12):4381. doi: 10.3390/ijms21124381
- Rueangket P, Rittiluechai K, Prayote A. Predictive analytical model for ectopic pregnancy diagnosis: Statistics vs. Machine learning. Front Med (Lausanne). 2022;9:976829. doi: 10.3389/fmed.2022.976829
- Aljameel SS, Aljabri M, Aslam N, et al. An automated system for early prediction of miscarriage in the first trimester using machine learning. Comput Mater Contin. 2023;75(1):1291-1304. doi: 10.32604/cmc.2023.035710
- Amitai T, Kan-Tor Y, Yuval O, et al. Embryo classification beyond pregnancy: Early prediction of first trimester miscarriage using machine learning. J Assist Reprod Genet. 2023;40(2):309-322. doi: 10.1007/s10815-022-02619-5
- Wangsa K, Karim S, Gide E, Elkhodr M. A systematic review and comprehensive analysis of pioneering AI chatbot models from education to healthcare: ChatGPT, Bard, Llama, Ernie and Grok. Future Int. 2024;16(7):219. doi: 10.3390/fi16070219