Leveraging convolutional neural networks to address overfitting and generalizability in automated bone fracture detection

Bone fractures represent a significant health burden that demands precise and timely diagnosis to optimize patient outcomes. To address challenges such as data scarcity, overfitting, and generalizability, this study investigates the use of convolutional neural networks (CNNs) for automated fracture detection in X-ray images. A dataset of 4,900 X-ray images was preprocessed and evenly divided into training, validation, and test subsets. The proposed CNN model directly addressed generalizability and overfitting issues by prioritizing training stability and incorporating advanced techniques. These techniques included batch normalization and dropout to enhance stability and mitigate overfitting, with five-fold cross-validation yielding an average accuracy of 95%. Validation and held-out test datasets achieved accuracies of 95.8% and 94.5%, respectively, while external validation on an independent dataset confirmed the model’s generalizability at 91.7%. High recall rates across all datasets underscore the model’s capacity to minimize missed fracture diagnoses, whereas slightly lower precision on external data indicates a need to address false positives. These findings suggest that artificial intelligence is best deployed as a screening tool, serving as an initial triage mechanism that flags potential cases for further human-guided evaluation, thereby enhancing clinical efficiency without replacing the diagnostic expertise of healthcare professionals.
- Kutbi M. Artificial intelligence-based applications for bone fracture detection using medical images: A systematic review. Diagnostics (Basel). 2024;14(17):1879. doi: 10.3390/diagnostics14171879
- Dankelman LHM, Schilstra S, IJpma FFA, et al. Artificial intelligence fracture recognition on computed tomography: Review of literature and recommendations. Eur J Trauma Emerg Surg. 2022;49:681-691. doi: 10.1007/s00068-022-02128-1
- Sharma S. Artificial intelligence for fracture diagnosis in orthopedic X-rays: Current developments and future potential. SICOT J. 2023;9:21. doi: 10.1051/sicotj/2023018
- Thomas D. How AI and convolutional neural networks can revolutionize orthopaedic surgery. J Clin Orthop Trauma. 2023;40:102165. doi: 10.1016/j.jcot.2023.102165
- Lopez Pinaya WH, Vieira S, Garcia-Dias R, Mechelli A. Convolutional neural networks. In: Machine Learning. Netherlands: Elsevier; 2020. p. 173-191. doi: 10.1016/b978-0-12-815739-8.00010-9
- Yamashita R, Nishio M, Do RKG, Togashi K. Convolutional neural networks: An overview and application in radiology. Insights Imaging. 2018;9:611-629. doi: 10.1007/s13244-018-0639-9
- Ketkar N, Moolayil J. Convolutional neural networks. In: Deep Learning with Python. New York: Apress; 2021. p. 197-242. doi: 10.1007/978-1-4842-5364-9_6
- Kuo RYL, Harrison C, Curran TA, et al. Artificial intelligence in fracture detection: A systematic review and meta-analysis. Radiology. 2022;304:50-62. doi: 10.1148/radiol.211785
- Jung J, Dai J, Liu B, Wu Q. Artificial intelligence in fracture detection with different image modalities and data types: A systematic review and meta-analysis. PLOS Digit Health. 2024;3:e0000438. doi: 10.1371/journal.pdig.0000438
- Li H, Li J, Guan X, Liang B, Lai Y, Luo X. Research on Overfitting of Deep Learning. In: Proceedings of the 2019 15th International Conference on Computational Intelligence and Security (CIS). IEEE; 2019. p. 78-81. doi: 10.1109/cis.2019.00025
- Zhang H, Zhang L, Jiang Y. Overfitting and Underfitting Analysis for Deep Learning Based End-to-end Communication Systems. In: Proceedings of the 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP). IEEE; 2019. p. 1-6. doi: 10.1109/wcsp.2019.8927876
- Diogo P, Morais M, Calisto FM, et al. Weakly-supervised diagnosis and detection of breast cancer using deep multiple instance learning. In: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). IEEE; 2023. p. 1-4. doi: 10.1109/isbi53787.2023.10230448.
- Thomas RL, Uminsky D. Reliance on metrics is a fundamental challenge for AI. Patterns (N Y). 2022;3:100476. doi: 10.1016/j.patter.2022.100476
- Ektefaie Y, Shen A, Bykova D, Marin MG, Zitnik M, Farhat M. Evaluating generalizability of artificial intelligence models for molecular datasets. Nat Mach Intell. 2024; 6:1512-1524. doi: 10.1038/s42256-024-00931-6
- Foody GM. Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient. PLoS One. 2023;18:e0291908. doi: 10.1371/journal.pone.0291908
- Husain G, Mayer J, Bekbolatova M, Vathappallil P, Matalia M, Toma M. Machine learning for medical image classification. Acad Med. 2024;1(4):1-18. doi: 10.20935/AcadMed7444
- Buddhiraju A, Chen TLW, Subih MA, Seo HH, Esposito JG, Kwon YM. Validation and generalizability of machine learning models for the prediction of discharge disposition following revision total knee arthroplasty. J Arthroplasty. 2023;38:S253-S258. doi: 10.1016/j.arth.2023.02.054
- Sarker IH. Machine learning: Algorithms, real-world applications and research directions. SN Comput Sci. 2021;2:160. doi: 10.1007/s42979-021-00592-x
- Ho SY, Phua K, Wong L, Bin Goh WW. Extensions of the external validation for checking learned model interpretability and generalizability. Patterns (N Y). 2020; 1:100129. doi: 10.1016/j.patter.2020.100129
- Maleki F, Ovens K, Gupta R, Reinhold C, Spatz A, Forghani R. Generalizability of machine learning models: Quantitative evaluation of three methodological pitfalls. Radiol Artif Intell. 2023;5:e220028. doi: 10.1148/ryai.220028
- Salehinejad H, Kitamura J, Ditkofsky N, et al. A real-world demonstration of machine learning generalizability in the detection of intracranial hemorrhage on head computerized tomography. Sci Rep. 2021;11:17051. doi: 10.1038/s41598-021-95533-2
- Zihni E, Madai VI, Livne M, et al. Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome. PLoS One. 2020;15:e0231166. doi: 10.1371/journal.pone.0231166
- Yang G, Ye Q, Xia J. Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond. Inf Fusion. 2022;77:29-52. doi: 10.1016/j.inffus.2021.07.016
- Felder RM. Coming to terms with the black box problem: How to justify AI systems in health care. Hastings Cent Rep. 2021;51:38-45. doi: 10.1002/hast.1248
- Reyna MA, Nsoesie EO, Clifford GD. Rethinking algorithm performance metrics for artificial intelligence in diagnostic medicine. JAMA. 2022;328:329. doi: 10.1001/jama.2022.10561
- Chowdhury R. Bone Fracture Detection Using CNN; 2024. Available from: https://www.kaggle.com/code/27ituparna/ bonefracture-cnn [Last accessed on 2025 Jan 11].
- Chaskar P. Bone Fracture Detection - 97% Accuracy CNN; 2024. Available from: https://www.kaggle.com/code/prasadchaskar/bone-fracture-detection-97-accuracy-cnn [Last accessed on 2025 Jan 11].
- Chaddad A, Hu Y, Wu Y, et al. Generalizable and explainable deep learning for medical image computing: An overview. Curr Opin Biomed Eng. 2025;33(3):100567. doi: 10.1016/j.cobme.2024.100567
- U.S. Food and Drug Administration. Considerations for the Use of Artificial Intelligence to Support Regulatory Decision- Making for Drug and Biological Products. Draft Guidance for Industry; 2025. Available from: https://www. fda.gov/media/184830/download [Last accessed on 2025 Mar 06].
- Alam A, Al-Shamayleh AS, Thalji N, et al. Novel transfer learning based bone fracture detection using radiographic images. BMC Med Imaging. 2025;25:5. doi: 10.1186/s12880-024-01546-4
- Alwzwazy HA, Alzubaidi L, Zhao Z, Gu Y. FracNet: An end-to-end deep learning framework for bone fracture detection. Pattern Recogn Lett. 2025;190:1-7. doi: 10.1016/j.patrec.2025.01.034
- Ahmed KD, Hawezi R. Detection of bone fracture based on machine learning techniques. Measur Sens. 2023;27:100723. doi: 10.1016/j.measen.2023.100723
- Abdusalomov A, Mirzakhalilov S, Umirzakova S, et al. Lightweight deep learning framework for accurate detection of sports-related bone fractures. Diagnostics (Basel). 2025;15:271. doi: 10.3390/diagnostics15030271
- Thorat SR, Jha DG, Sharma AK, Katkar DV. Wrist fracture detection using self-supervised learning methodology. J Musculoskelet Surg Res. 2024;8(2):133-141. doi: 10.25259/JMSR_260_2023
- Chi P, Liang R, Hao C, Li G, Xin M. Cable fault diagnosis with generalization capability using incremental learning and deep convolutional neural network. Electr Power Syst Res. 2025;241(4):111304. doi: 10.1016/j.epsr.2024.111304.
- Calisto FM, Abrantes JM, Santiago C, et al. Personalized explanations for clinician-AI interaction in breast imaging diagnosis by adapting communication to expertise levels. Int J Hum Comput Stud. 2025;197(3):103444. doi: 10.1016/j.ijhcs.2025.103444
- Abrantes J. External validation of a deep learning model for breast density classification. In: Conference: European Congress of Radiolog; 2023. doi: 10.26044/ECR2023/C-16014
- Jensen EB, Knapp A, King H, et al. Methodology for the 2020 Demographic Analysis Estimates. U.S. Census Bureau; 2020. Available from: https://www.census.gov [Last accessed on 2025 Mar 06].
- Koçak B, Ponsiglione A, Stanzione A, et al. Bias in artificial intelligence for medical imaging: Fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects. Diagn Interv Radiol. 2025;31(2). doi: 10.4274/dir.2024.242854
- Husain G, Nasef D, Jose R, et al. SMOTE vs. SMOTEENN: A study on the performance of resampling algorithms for addressing class imbalance in regression models. Algorithms. 2025;18(1):37. doi: 10.3390/a18010037