EpilepsyLLM: Fine-tuning large language models for Japanese epilepsy knowledge representation

Xuyang Zhao; Qibin Zhao; Toshihisa Tanaka

doi:10.36922/AIH025180042

Journals

Books

Careers

HomeEditorial Office Submissions

Article

Article Types

Year

—

Volume

Issue

Pages

—

Submit to AIH

Apply for Special Issue

Cite this article

Download

772

Views

More by Authors Links

Toshihisa Tanaka

Journal Browser

Volume | Year

Issue

Forthcoming Issue

Current Issue

View All

News and Announcements

View All

ORIGINAL RESEARCH ARTICLE

EpilepsyLLM: Fine-tuning large language models for Japanese epilepsy knowledge representation

Xuyang Zhao^1,2,3, Qibin Zhao⁴, Toshihisa Tanaka^5*

Show Less

¹ Medical Science Data-driven Mathematics Team, RIKEN Center for Interdisciplinary Theoretical and Mathematical Sciences, Yokohama, Kanagawa, Japan

² Medical Data Mathematical Reasoning Special Team, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan

³ Department of Artificial Intelligence Medicine, Chiba University, Chiba, Japan

⁴ Tensor Learning Team, RIKEN Center for Advanced Intelligence Project, Chuo, Tokyo, Japan

⁵ Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, Koganei, Tokyo, Japan

AIH, 025180042 https://doi.org/10.36922/AIH025180042

Received: 3 May 2025 | Revised: 5 August 2025 | Accepted: 14 August 2025 | Published online: 8 September 2025

© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )

Download PDF

XML

Cite

Abstract

With massive training data and sufficient computing resources, large language models (LLMs) have demonstrated impressive capabilities. These models can rapidly respond to questions in almost all domains and are capable of retrieving, synthesizing, and summarizing information. The capabilities demonstrated by LLMs can enhance our livelihood and foster innovation. Nonetheless, in some professional domains, the focus is not only on response speed but also on higher requirements for response reliability. For example, in the medical domain, the reliability of information provided by the model poses a great risk to subsequent diagnosis and treatment, especially when the language is not English. In specific domains, domain-specific knowledge can be used to refine pre-trained LLMs to improve their performance in specific tasks. In this study, we aimed to build an LLM for epilepsy, called EpilepsyLLM. We constructed an epilepsy knowledge dataset in Japanese for LLM fine-tuning, and the dataset contained basic information on epilepsy, common treatment methods and drugs, and important notes on patients’ lives. Using the constructed dataset, we refined several different pre-trained models with supervised learning. In the evaluation process, we applied multiple metrics to measure the reliability of the LLMs’ output. The experimental results highlighted that the fine-tuned EpilepsyLLM can provide more reliable and specialized epilepsy responses.

Graphical abstract

Keywords

Epilepsy

Large language models

Domain-specific

Fine-tuning

Funding

This work was supported by JST CREST (grant number: JP-MJCR1784).

Conflict of interest

The authors declare they have no competing interests.

References

Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). Vol. 30. United States: Cornell University; 2017. doi: 10.48550/arXiv.1706.03762

Radford A, Narasimhan K, Salimans T, Sutskever I. Improving Language Understanding by Generative Pre- Training. OpenAI Blog; 2018.

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language Models are Unsupervised Multitask Learners. Vol. 1. OpenAI blog; 2019. p. 9.

Brown T, Mann B, Ryder N, et al. Language models are few-shot learners. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS). Vol. 33. United States: Cornell University; 2020. p. 1877-1901. doi: 10.48550/arXiv.2005.14165

Ouyang L, Wu J, Jiang X, et al. Training Language Models to Follow Instructions with Human Feedback. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS). Vol. 35. 2022. p. 27730-27744. doi: 10.48550/arXiv.2203.02155

OpenAI. GPT-4 Technical Report; 2023. doi: 10.48550/arXiv.2303.08774

Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302 13971. United States: Cornell University; 2023. doi: 10.48550/arXiv.2302.13971

Hoffmann J, Borgeaud S, Mensch A, et al. Training Compute- Optimal Large Language Models. arXiv [Preprint]; 2022. doi: 10.48550/arXiv.2203.15556

Narang S, Chowdhery A. Pathways Language Model (Palm): Scaling to 540 Billion Parameters for Breakthrough Performance. Google AI Blog; 2022.

Zong H, Wu R, Cha J, et al. Large language models in worldwide medical exams: Platform development and comprehensive analysis. J Med Int Res. 2024;26:e66114. doi: 10.2196/66114

Brin D, Sorin V, Konen E, Nadkarni G, Glicksberg BS, Klang E. How Large Language Models Perform on the United States Medical Licensing Examination: A Systematic Review. MedRxiv [Preprint]; 2023. p. 2023-2009. doi: 10.1101/2023.09.03.23294842

Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States medical licensing examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9(1):e45312. doi: 10.2196/57594

Kim Y, Park C, Jeong H, et al. MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS). Vol 37. 2024. p. 79410-79452. doi: 10.48550/arXiv.2404.15155

Ullah E, Parwani A, Baig MM, Singh R. Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology--a recent scoping review. Diagn Pathol. 2024;19(1):43. doi: 10.1186/s13000-024-01464-7

Subramanian CR, Yang DA, Khanna R. Enhancing health care communication with large language models-the role, challenges, and future directions. JAMA Network Open. 2024;7(3):e240347-e240347. doi: 10.1001/jamanetworkopen.2024.0347

Mukherjee S, Gamble P, Ausin MS, et al. Polaris: A Safety- Focused LLM Constellation Architecture for Healthcare. arXiv [Preprint]; 2024. doi: 10.48550/arXiv.2403.13313

Abd-Alrazaq A, AlSaad R, Alhuwail D, et al. Large language models in medical education: Opportunities, challenges, and future directions. JMIR Med Educ. 2023;9(1):e48291. doi: 10.2196/48291

Yu H, Zhou J, Li L, et al. AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow. arXiv [Preprint]; 2024. doi: 10.48550/arXiv.2409.18924

Lucas HC, Upperman JS, Robinson JR. A systematic review of large language models and their implications in medical education. Med Educ. 2024;58(11):1276-1285. doi: 10.1111/medu.15402

Yuan T, He Z, Dong L, et al. R-Judge: Benchmarking Safety Risk Awareness for LLM Agents. arXiv [Preprint]; 2024. doi: 10.48550/arXiv.2401.10019

Ong JCL, Chang SYH, William W, et al. Medical ethics of large language models in medicine. NEJM AI. 2024;1(7):AIra2400038. doi: 10.1056/AIra2400038

Haltaufderheide J, Ranisch R. The ethics of ChatGPT in medicine and healthcare: A systematic review on large language models (LLMs). NPJ Digit Med. 2024;7(1):183. doi: 10.1038/s41746-024-01157-x

Taori R, Gulrajani I, Zhang T, et al. Alpaca: A Strong, Replicable Instruction-Following Model. Stanford Center for Research on Foundation Models. Vol. 3. 2023. p. 7. Available from: https://crfm.stanford.edu/2023/03/13/alpaca.html

Taori R, Gulrajani I, Zhang T, et al. Stanford Alpaca: An Instruction-following LLaMA model. GitHub Repository; 2023. Available from: https://github.com/tatsu-lab/ stanford_alpaca

Gu Y, Tinn R, Cheng H, et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc. 2021;3(1):1-23. doi: 10.1145/3458754

Yasunaga M, Leskovec J, Liang P. LinkBERT: Pretraining Language Models with Document Links. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL); 2022. p. 8003-8016. doi: 10.18653/v1/2022.acl-long.551

Venigalla A, Frankle J, Carbin M. BioMedLM: A Domain- Specific Large Language Model for Biomedical Text. Vol. 23. United States: MosaicML; 2022. p. 2.

Luo R, Sun L, Xia Y, et al. BioGPT: Generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022;23(6):bbac409. doi: 10.1093/bib/bbac409

Tu T, Azizi S, Driess D, et al. Towards Generalist Biomedical AI. arXiv:2307 14334. United States: Cornell University; 2023. doi: 10.48550/arXiv.2307.14334

Wang G, Yang G, Du Z, Fan L, Li X. ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation. arXiv [Preprint]; 2023. doi: 10.48550/arXiv.2306.09968

Wu C, Lin W, Zhang X, Zhang Y, Xie W, Wang Y. PMC-LLaMA: Toward building open-source language models for medicine. J Am Med Inform Assoc. 2024;31(9):1833-1843. doi: 10.1093/jamia/ocae045

Li Y, Li Z, Zhang K, Dan R, Jiang S, Zhang Y. ChatDoctor: A medical chat model fine-tuned on a large language model Meta-AI (LLaMA) using medical domain knowledge. Cureus. 2023;15(6):e40895. doi: 10.7759/cureus.40895

World Health Organization. Epilepsy: A Public Health Imperative. Geneva: World Health Organization; 2019.

Annegers JF, Rocca WA, Hauser WA. Causes of epilepsy: contributions of the Rochester epidemiology project. Mayo Clin Proc. 1996;71(6):570-575. doi: 10.4065/71.6.570

Shorvon SD. The causes of epilepsy: Changing concepts of etiology of epilepsy over the past 150 years. Epilepsia. 2011;52(6):1033-1044. doi: 10.1111/j.1528-1167.2011.03051.x

Korenke GC, Hunneman DH, Eber S, Hanefeld F. Severe encephalopathy with epilepsy in an infant caused by subclinical maternal pernicious anaemia: Case report and review of the literature. Eur J Pediatr. 2004;163:196-201. doi: 10.1007/s00431-004-1402-4

Pauschek J, Bernhard MK, Syrbe S, et al. Epilepsy in children and adolescents: Disease concepts, practical knowledge, and coping. Epilepsy Behav. 2016;59:77-82. doi: 10.1016/j.yebeh.2016.03.033

Unsworth C. Living with epilepsy: Safety during home, leisure and work activities. Aust Occup Ther J. 1999;46(3):89-98. doi: 10.1046/j.1440-1630.1999.00181.x

Aizawa A, Aramaki E, Chen B, et al. LLM-jp: A Cross- Organizational Project for the Research and Development of Fully Open Japanese LLMs. arXiv [Preprint]; 2024. doi: 10.48550/arXiv.2407.03963

Conover M, Hayes M, Mathur A, et al. Free Dolly: Introducing the World’s First Truly Open Instruction-Tuned LLM; 2023. Available from: https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm

Köpf A, Kilcher Y, Von Rütte D, et al. Openassistant Conversations-Democratizing Large Language Model Alignment. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS). Vol 36. 2023. p. 47669-47681. doi: 10.48550/arXiv.2304.07327

Papineni K, Roukos S, Ward T, Zhu WJ. BLEU: A Method for Automatic Evaluation of Machine Translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics; 2002. p. 311-318. doi: 10.3115/1073083.1073135

Banerjee S, Lavie A. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization; 2005. p. 65-72. doi: 10.3115/1626355.1626389

Lin CY. ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out. USA: Association for Computational Linguistics; 2004. p. 74-81.

Anderson P, Fernando B, Johnson M, Gould S. SPICE: Semantic Propositional Image Caption Evaluation. In: Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, the Netherlands, October 11-14, 2016, Proceedings, Part V 14. Berlin: Springer; 2016. p. 382-398. doi: 10.1007/978-3-319-46454-1_24

Previous article in this issue

Next article in this issue

Artificial Intelligence in Health, Electronic ISSN: 3029-2387 Print ISSN: 3041-0894, Published by AccScience Publishing