Leveraging convolutional neural networks to address overfitting and generalizability in automated bone fracture detection

¹ Department of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology at Arkansas, Arkansas State University, Jonesboro, Arkansas, United States of America

² Department of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, New York, United States of America

Global Translational Medicine 2025, 4(3), 83–95; https://doi.org/10.36922/gtm.8526

Received: 14 January 2025 | Revised: 21 March 2025 | Accepted: 12 August 2025 | Published online: 29 August 2025

© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )

Download PDF

XML

Cite

Abstract

Bone fractures represent a significant health burden that demands precise and timely diagnosis to optimize patient outcomes. To address challenges such as data scarcity, overfitting, and generalizability, this study investigates the use of convolutional neural networks (CNNs) for automated fracture detection in X-ray images. A dataset of 4,900 X-ray images was preprocessed and evenly divided into training, validation, and test subsets. The proposed CNN model directly addressed generalizability and overfitting issues by prioritizing training stability and incorporating advanced techniques. These techniques included batch normalization and dropout to enhance stability and mitigate overfitting, with five-fold cross-validation yielding an average accuracy of 95%. Validation and held-out test datasets achieved accuracies of 95.8% and 94.5%, respectively, while external validation on an independent dataset confirmed the model’s generalizability at 91.7%. High recall rates across all datasets underscore the model’s capacity to minimize missed fracture diagnoses, whereas slightly lower precision on external data indicates a need to address false positives. These findings suggest that artificial intelligence is best deployed as a screening tool, serving as an initial triage mechanism that flags potential cases for further human-guided evaluation, thereby enhancing clinical efficiency without replacing the diagnostic expertise of healthcare professionals.

Keywords

Bone fracture detection

Artificial intelligence

Convolutional neural networks

Medical imaging

Deep learning

Funding

None.

Conflict of interest

The authors declare they have no competing interests.

References

Kutbi M. Artificial intelligence-based applications for bone fracture detection using medical images: A systematic review. Diagnostics (Basel). 2024;14(17):1879. doi: 10.3390/diagnostics14171879

Dankelman LHM, Schilstra S, IJpma FFA, et al. Artificial intelligence fracture recognition on computed tomography: Review of literature and recommendations. Eur J Trauma Emerg Surg. 2022;49:681-691. doi: 10.1007/s00068-022-02128-1

Sharma S. Artificial intelligence for fracture diagnosis in orthopedic X-rays: Current developments and future potential. SICOT J. 2023;9:21. doi: 10.1051/sicotj/2023018

Thomas D. How AI and convolutional neural networks can revolutionize orthopaedic surgery. J Clin Orthop Trauma. 2023;40:102165. doi: 10.1016/j.jcot.2023.102165

Lopez Pinaya WH, Vieira S, Garcia-Dias R, Mechelli A. Convolutional neural networks. In: Machine Learning. Netherlands: Elsevier; 2020. p. 173-191. doi: 10.1016/b978-0-12-815739-8.00010-9

Yamashita R, Nishio M, Do RKG, Togashi K. Convolutional neural networks: An overview and application in radiology. Insights Imaging. 2018;9:611-629. doi: 10.1007/s13244-018-0639-9

Ketkar N, Moolayil J. Convolutional neural networks. In: Deep Learning with Python. New York: Apress; 2021. p. 197-242. doi: 10.1007/978-1-4842-5364-9_6

Kuo RYL, Harrison C, Curran TA, et al. Artificial intelligence in fracture detection: A systematic review and meta-analysis. Radiology. 2022;304:50-62. doi: 10.1148/radiol.211785

Jung J, Dai J, Liu B, Wu Q. Artificial intelligence in fracture detection with different image modalities and data types: A systematic review and meta-analysis. PLOS Digit Health. 2024;3:e0000438. doi: 10.1371/journal.pdig.0000438

Li H, Li J, Guan X, Liang B, Lai Y, Luo X. Research on Overfitting of Deep Learning. In: Proceedings of the 2019 15th International Conference on Computational Intelligence and Security (CIS). IEEE; 2019. p. 78-81. doi: 10.1109/cis.2019.00025

Zhang H, Zhang L, Jiang Y. Overfitting and Underfitting Analysis for Deep Learning Based End-to-end Communication Systems. In: Proceedings of the 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP). IEEE; 2019. p. 1-6. doi: 10.1109/wcsp.2019.8927876

Diogo P, Morais M, Calisto FM, et al. Weakly-supervised diagnosis and detection of breast cancer using deep multiple instance learning. In: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). IEEE; 2023. p. 1-4. doi: 10.1109/isbi53787.2023.10230448.

Thomas RL, Uminsky D. Reliance on metrics is a fundamental challenge for AI. Patterns (N Y). 2022;3:100476. doi: 10.1016/j.patter.2022.100476

Ektefaie Y, Shen A, Bykova D, Marin MG, Zitnik M, Farhat M. Evaluating generalizability of artificial intelligence models for molecular datasets. Nat Mach Intell. 2024; 6:1512-1524. doi: 10.1038/s42256-024-00931-6

Foody GM. Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient. PLoS One. 2023;18:e0291908. doi: 10.1371/journal.pone.0291908

Husain G, Mayer J, Bekbolatova M, Vathappallil P, Matalia M, Toma M. Machine learning for medical image classification. Acad Med. 2024;1(4):1-18. doi: 10.20935/AcadMed7444

Buddhiraju A, Chen TLW, Subih MA, Seo HH, Esposito JG, Kwon YM. Validation and generalizability of machine learning models for the prediction of discharge disposition following revision total knee arthroplasty. J Arthroplasty. 2023;38:S253-S258. doi: 10.1016/j.arth.2023.02.054

Sarker IH. Machine learning: Algorithms, real-world applications and research directions. SN Comput Sci. 2021;2:160. doi: 10.1007/s42979-021-00592-x

Ho SY, Phua K, Wong L, Bin Goh WW. Extensions of the external validation for checking learned model interpretability and generalizability. Patterns (N Y). 2020; 1:100129. doi: 10.1016/j.patter.2020.100129

Maleki F, Ovens K, Gupta R, Reinhold C, Spatz A, Forghani R. Generalizability of machine learning models: Quantitative evaluation of three methodological pitfalls. Radiol Artif Intell. 2023;5:e220028. doi: 10.1148/ryai.220028

Salehinejad H, Kitamura J, Ditkofsky N, et al. A real-world demonstration of machine learning generalizability in the detection of intracranial hemorrhage on head computerized tomography. Sci Rep. 2021;11:17051. doi: 10.1038/s41598-021-95533-2

Zihni E, Madai VI, Livne M, et al. Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome. PLoS One. 2020;15:e0231166. doi: 10.1371/journal.pone.0231166

Yang G, Ye Q, Xia J. Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond. Inf Fusion. 2022;77:29-52. doi: 10.1016/j.inffus.2021.07.016

Felder RM. Coming to terms with the black box problem: How to justify AI systems in health care. Hastings Cent Rep. 2021;51:38-45. doi: 10.1002/hast.1248

Reyna MA, Nsoesie EO, Clifford GD. Rethinking algorithm performance metrics for artificial intelligence in diagnostic medicine. JAMA. 2022;328:329. doi: 10.1001/jama.2022.10561

Chowdhury R. Bone Fracture Detection Using CNN; 2024. Available from: https://www.kaggle.com/code/27ituparna/ bonefracture-cnn [Last accessed on 2025 Jan 11].

Chaskar P. Bone Fracture Detection - 97% Accuracy CNN; 2024. Available from: https://www.kaggle.com/code/prasadchaskar/bone-fracture-detection-97-accuracy-cnn [Last accessed on 2025 Jan 11].

Chaddad A, Hu Y, Wu Y, et al. Generalizable and explainable deep learning for medical image computing: An overview. Curr Opin Biomed Eng. 2025;33(3):100567. doi: 10.1016/j.cobme.2024.100567

U.S. Food and Drug Administration. Considerations for the Use of Artificial Intelligence to Support Regulatory Decision- Making for Drug and Biological Products. Draft Guidance for Industry; 2025. Available from: https://www. fda.gov/media/184830/download [Last accessed on 2025 Mar 06].

Alam A, Al-Shamayleh AS, Thalji N, et al. Novel transfer learning based bone fracture detection using radiographic images. BMC Med Imaging. 2025;25:5. doi: 10.1186/s12880-024-01546-4

Alwzwazy HA, Alzubaidi L, Zhao Z, Gu Y. FracNet: An end-to-end deep learning framework for bone fracture detection. Pattern Recogn Lett. 2025;190:1-7. doi: 10.1016/j.patrec.2025.01.034

Ahmed KD, Hawezi R. Detection of bone fracture based on machine learning techniques. Measur Sens. 2023;27:100723. doi: 10.1016/j.measen.2023.100723

Abdusalomov A, Mirzakhalilov S, Umirzakova S, et al. Lightweight deep learning framework for accurate detection of sports-related bone fractures. Diagnostics (Basel). 2025;15:271. doi: 10.3390/diagnostics15030271

Thorat SR, Jha DG, Sharma AK, Katkar DV. Wrist fracture detection using self-supervised learning methodology. J Musculoskelet Surg Res. 2024;8(2):133-141. doi: 10.25259/JMSR_260_2023

Chi P, Liang R, Hao C, Li G, Xin M. Cable fault diagnosis with generalization capability using incremental learning and deep convolutional neural network. Electr Power Syst Res. 2025;241(4):111304. doi: 10.1016/j.epsr.2024.111304.

Calisto FM, Abrantes JM, Santiago C, et al. Personalized explanations for clinician-AI interaction in breast imaging diagnosis by adapting communication to expertise levels. Int J Hum Comput Stud. 2025;197(3):103444. doi: 10.1016/j.ijhcs.2025.103444

Abrantes J. External validation of a deep learning model for breast density classification. In: Conference: European Congress of Radiolog; 2023. doi: 10.26044/ECR2023/C-16014

Jensen EB, Knapp A, King H, et al. Methodology for the 2020 Demographic Analysis Estimates. U.S. Census Bureau; 2020. Available from: https://www.census.gov [Last accessed on 2025 Mar 06].

Koçak B, Ponsiglione A, Stanzione A, et al. Bias in artificial intelligence for medical imaging: Fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects. Diagn Interv Radiol. 2025;31(2). doi: 10.4274/dir.2024.242854

Husain G, Nasef D, Jose R, et al. SMOTE vs. SMOTEENN: A study on the performance of resampling algorithms for addressing class imbalance in regression models. Algorithms. 2025;18(1):37. doi: 10.3390/a18010037

Previous article in this issue

Next article in this issue

Global Translational Medicine, Electronic ISSN: 2811-0021 Print ISSN: 3060-8600, Published by AccScience Publishing