Bridging clinical knowledge and machine learning: Leveraging large language models to predict in vitro fertilization outcomes

© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC-by the license) ( https://creativecommons.org/licenses/by-nc/4.0/ )

Download PDF

XML

Cite

Abstract

Introduction: In vitro fertilization (IVF) and frozen-thawed embryo transfer (FET) are vital components of assisted reproductive technology. However, predicting pregnancy outcomes remains challenging due to various biological and clinical factors. Recent advances in artificial intelligence (AI) and machine learning (ML) have shown the potential in offering innovative solutions for forecasting reproductive success.

Objective: This study explores the use of large language models, specifically ChatGPT-4o, to optimize ML models for predicting pregnancy outcomes in IVF.

Methods: The clinical dataset comprised 1061 IVF patients who underwent FET from 2014 to 2017, including variables such as age, body mass index, infertility duration, endometrial thickness, and serum beta-human chorionic gonadotrophin (β-HCG) levels on the 7^th day after FET. ChatGPT-4o was tasked with preprocessing the data, evaluating several ML models, and optimizing performance.

Results: The random forest model emerged as the best-performing model, achieving an accuracy of 85.45% and an area under the receiver operating characteristic curve of 0.8287 after applying the optimal threshold of 0.548, indicating strong predictive capability. Feature importance analysis revealed that serum β-HCG levels on the 7^th day after FET were the most influential predictor of pregnancy outcomes. Despite these promising results, the study noted potential overfitting, likely due to the limited training dataset, a constraint largely attributable to the computational limitations of ChatGPT-4o.

Conclusion: ChatGPT-4o shows potential in enhancing ML models in IVF outcome prediction. While AI-driven models can significantly aid clinical decision-making, clinicians should maintain a central role in patient outcome predictions. Future work will focus on improving model generalization with larger datasets and enhanced computational resources.

Keywords

In vitro fertilization

Pregnancy prediction

Large language models

Machine learning

Assisted reproductive technology

Funding

This work was funded by Heilongjiang Provincial Research Institutes Research Business Fund Project (CZKYF2024- 1-B006).

Conflict of interest

The authors declare no conflicts of interest.

References

Li B, Chen H, Lin X, Duan H. Multimodal learning system integrating electronic medical records and hysteroscopic images for reproductive outcome prediction and risk stratification of endometrial injury: A multicenter diagnostic study. Int J Surg. 2024;110(6):3237-3248. doi: 10.1097/JS9.0000000000001241

Henderson I, Rimmer MP, Keay SD, et al. Predicting the outcomes of assisted reproductive technology treatments: A systematic review and quality assessment of prediction models. F S Rev. 2021;2(1):1-10. doi: 10.1016/j.xfnr.2020.11.002

Huang C, Xiang Z, Zhang Y, et al. Using deep learning in a monocentric study to characterize maternal immune environment for predicting pregnancy outcomes in the recurrent reproductive failure patients. Front Immunol. 2021;12:642167. doi: 10.3389/fimmu.2021.642167

Melli B, Morini D, Spaggiari G, et al. P-032 sperm parameters can predict the success of assisted reproductive technology. Single-center and retrospective analysis of assisted reproductive technology cycles from 1992 to 2020. Hum Reprod. 2022;37:deac107.03032. doi: 10.1093/humrep

Westbye HJ, Moltu C, McAleavey AA. eXplainable AI for routine outcome monitoring and clinical feedback. Counsell Psychother Res. 2025;25(1):e12764. doi: 10.1002/capr.12764

Chen PH, Liu Y, Peng L. How to develop machine learning models for healthcare. Nat Mater. 2019;18(5):410-414. doi: 10.1038/s41563-019-0345-0

Widyasari R, Lo D, Liao L. Beyond ChatGPT: Enhancing Software Quality Assurance Tasks with Diverse LLMs and Validation Techniques. [arXiv Preprint]; 2024. doi: 10.48550/arXiv.2409.01001

Sadik AR, Ceravola A, Joublin F, Patra J. Analysis of Chatgpt on Source Code. [arXiv Preprint]; 2023. doi: 10.48550/arXiv.2306.00597

Telenti A, Auli M, Hie BL, Maher C, Saria S, Ioannidis JP. Large language models for science and medicine. Eur J Clin Invest. 2024;54(6):e14183. doi: 10.1111/eci.14183

Zheng Y, Koh HY, Ju J, et al. Large Language Models for Scientific Synthesis, Inference and Explanation. [arXiv Preprint]; 2023. doi: 10.48550/arXiv.2310.07984

Silver DH, Feder M, Gold-Zamir Y, et al. Data-Driven Prediction of Embryo Implantation Probability Using IVF Time-Lapse Imaging. [arXiv Preprint]; 2020. doi: 10.48550/arXiv.2006.01035

Sun L, Li J, Zeng S, et al. Artificial intelligence system for outcome evaluations of human in vitro fertilization-derived embryos. Chin Med J (Engl). 2024;137(16):1939-1949. doi: 10.1097/CM9.0000000000003162

Liu R, Bai S, Jiang X, et al. Multifactor prediction of embryo transfer outcomes based on a machine learning algorithm. Front Endocrinol (Lausanne). 2021;12:745039. doi: 10.3389/fendo.2021.745039

Nazi ZA, Peng W. Large language models in healthcare and medical domain: A review. Informatics. 2024;11(3):57. doi: 10.3390/informatics11030057

Chen S. Potential applications and safety of large language models in healthcare. Interdiscip Humanit Commun Stud. 2024;1(6):6. doi: org/10.61173/f578jp05

Rezgui K. Large language models for healthcare: Applications, models, datasets, and challenges. In: 2024 10^thInternational Conference on Control, Decision and Information Technologies. Maharashtra: CoDIT; 2024. P. 2366-2371. doi: 10.1109/CoDIT62066.2024.10708253

Li SW, Kemp MW, Logan SJ, et al. ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology. Am J Obstet Gynecol. 2023;229(2):172.e1-12. doi: 10.1016/j.ajog.2023.04.020

Yuan L, Yu L, Sun Z, et al. Association between 7-day serum β-hCG levels after frozen-thawed embryo transfer and pregnancy outcomes: A single-centre retrospective study from China. BMJ Open. 2020;10(10):e035332. doi: 10.1136/bmjopen-2019-035332

Kiser AC, Eilbeck K, Ferraro JP, Skarda DE, Samore MH, Bucher B. Standard vocabularies to improve machine learning model transferability with electronic health record data: Retrospective cohort study using health care-associated infection. JMIR Med Inform. 2022;10(8):e39057. doi: 10.2196/39057

Ng K, Stewart WF, DeFilippi C, et al. Data driven modeling of electronic health record data to detect pre-diagnostic heart failure in primary care. Circulation. 2015;132(Suppl 3):A17713 doi: 10.1161/circ.132.suppl_3.17713

Hong D, Fort D, Shi L, Price-Haywood EG. Electronic medical record risk modeling of cardiovascular outcomes among patients with type 2 diabetes. Diabetes Ther. 2021;12(7):2007-2017. doi: 10.1007/s13300-021-01096-w

Ganesan R, Habraken SC, Van De Vosse FN, Huberts W. Explainable machine learning based prediction of severity of heart failure using primary electronic health records. Stud Health Technol Inform. 2024;316:542-546. doi: 10.3233/SHTI240471

Stevens CA, Lyons AR, Dharmayat KI, et al. Ensemble machine learning methods in screening electronic health records: A scoping review. Digit Health. 2023;9:20552076231173225. doi: 10.1177/20552076231173225

Wang J, Luo J, Ye M, et al. Recent Advances in Predictive Modeling with Electronic Health Records. [arXiv Preprint]; 2024. doi: 10.48550/arXiv.2402.01077

Iwagami M, Inokuchi R, Kawakami E, et al. Comparison of machine-learning and logistic regression models for prediction of 30-day unplanned readmission in electronic health records: A development and validation study. PLOS Digit Health. 2024;3(8):e0000578. doi: 10.1371/journal.pdig.0000578

Abu-Rayyash H. Revolutionizing translator training through human-ai collaboration: Insights and implications from integrating gpt-4. Curr Trends Transl Teach Learn E. 2023;10:259-301. doi: 10.51287/cttl20239

Takayanagi T, Takamura H, Izumi K, Chen CC. Beyond Turing Test: Can GPT-4 Sway Experts’ Decisions? [arXiv Preprint]; 2024. doi: 10.48550/arXiv.2409.16710

Mondal A, Naskar A. Artificial Intelligence in Diabetes Care: Evaluating GPT-4’s Competency in Reviewing Diabetic Patient Management Plan in Comparison to Expert Review. [medRxiv Preprint]; 2024. doi: 10.1101/2024.04.12.24305732

Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of Gpt-4 on Medical Challenge Problems. [arXiv Preprint]; 2023. doi: 10.48550/arXiv.2303.13375

Björklund A, Henelius A, Oikarinen E, Kallonen K, Puolamäki K. Explaining any black box model using real data. Front Comput Sci. 2023;5:1143904. doi: 10.3389/fcomp.2023.1143904

Schumacher A, Zenclussen AC. Human chorionic gonadotropin-mediated immune responses that facilitate embryo implantation and placentation. Front Immunol. 2019;10:2896. doi: 10.3389/fimmu.2019.02896

Sung N, Kwak-Kim J, Koo HS, Yang KM. Serum hCG-β levels of postovulatory day 12 and 14 with the sequential application of hCG-β fold change significantly increased predictability of pregnancy outcome after IVF-ET cycle. J Assist Reprod Genet. 2016;33:1185-1194. doi: 10.1007/s10815-016-0744-y

Liu Y, Liu Y, Li X, Jiao X, Zhang R, Zhang J. Predictive value of serum β-hCG for early pregnancy outcomes among women with recurrent spontaneous abortion. Int J Gynecol Obstet. 2016;135(1):16-21. doi: org/10.1016/j.ijgo.2016.03.007

Li Y, Xiang YG, Zhang M, Ma LY, Tan L, Zhao DM. Prediction of pregnancy outcome by serum β-hCG and progesterone on the fourteenth day after IVF-ET. J Int Reprod Health Family Plan. 2013;32(1):9.

Wang L, Jiang Y, Shen H, et al. Independent value of serum β-human chorionic gonadotropin in predicting early pregnancy loss risks in IVF/ICSI cycles. Front Immunol. 2022;13:992121. doi: 10.3389/fimmu.2022.992121

Ozer G. Initial β-hCG levels and 2-day-later increase rates effectively predict pregnancy outcomes in single blastocyst transfer in frozen-thawed or fresh cycles: A retrospective cohort study. Medicine (Baltimore). 2023;102(42):e35605. doi: 10.1097/MD.0000000000035605

Ying X. An overview of overfitting and its solutions. J Phys Conf Ser. 2019;1168:022022. doi: 10.1088/1742-6596/1168/2/022022

Habib N, Buzzaccarini G, Centini G, et al. Impact of lifestyle and diet on endometriosis: A fresh look to a busy corner. Prz Menopauzalny. 2022;21(2):124-132. doi: 10.5114/pm.2022.116437

Gullo G, Carlomagno G, Unfer V, D’Anna R. Myo-inositol: From induction of ovulation to menopausal disorder management. Minerva Ginecol. 2015;67(5):485-486.

Zaami S, Melcarne R, Patrone R, et al. Oncofertility and reproductive counseling in patients with breast cancer: A retrospective study. J Clin Med. 2022;11(5):1311. doi: 10.3390/jcm11051311

Greco E, Litwicka K, Minasi MG, Cursio E, Greco PF, Barillari P. Preimplantation genetic testing: Where we are today. Int J Mol Sci. 2020;21(12):4381. doi: 10.3390/ijms21124381

Rueangket P, Rittiluechai K, Prayote A. Predictive analytical model for ectopic pregnancy diagnosis: Statistics vs. Machine learning. Front Med (Lausanne). 2022;9:976829. doi: 10.3389/fmed.2022.976829

Aljameel SS, Aljabri M, Aslam N, et al. An automated system for early prediction of miscarriage in the first trimester using machine learning. Comput Mater Contin. 2023;75(1):1291-1304. doi: 10.32604/cmc.2023.035710

Amitai T, Kan-Tor Y, Yuval O, et al. Embryo classification beyond pregnancy: Early prediction of first trimester miscarriage using machine learning. J Assist Reprod Genet. 2023;40(2):309-322. doi: 10.1007/s10815-022-02619-5

Wangsa K, Karim S, Gide E, Elkhodr M. A systematic review and comprehensive analysis of pioneering AI chatbot models from education to healthcare: ChatGPT, Bard, Llama, Ernie and Grok. Future Int. 2024;16(7):219. doi: 10.3390/fi16070219

Previous article in this issue

Next article in this issue

Eurasian Journal of Medicine and Oncology, Electronic ISSN: 2587-196X Print ISSN: 2587-2400, Published by AccScience Publishing