Accepted for/Published in: JMIR Medical Education
Date Submitted:
Open Peer Review Period: -
Date Accepted:
Date Submitted to PubMed:
- Rongxi P, Weiru W, Xiaoli Z, Zhong Y, Youcai D
- Evaluating Large Language Models' Capabilities in Laboratory Hematology: A Case Study Based on the Chinese Medical Laboratory Technician Examination
- JMIR Medical Education
- DOI: 10.2196/11848
- PMID: 30303485
- PMCID: 6352016
Evaluating Large Language Models' Capabilities in Laboratory Hematology: A Case Study Based on the Chinese Medical Laboratory Technician Examination
Abstract
background
Large language models (LLMs) have shown considerable promise in the medical field, yet their application in specialized areas such as laboratory hematology remains underexplored.
objective
This study aims to evaluate the performance of two prominent LLMs, GPT-4o and Kimi, in laboratory hematology and explore their potential applications in Chinese medical education.
methods
We selected 400 laboratory hematology questions from the Chinese Medical Laboratory Technician Examination (2015-2022), encompassing four subjects: basic knowledge, related professional knowledge, professional knowledge, and professional practice ability. GPT-4o and Kimi were tested using these questions combined with appropriate prompts, with each question administered twice independently. The accuracy and consistency of the models' responses were assessed by comparing them to standard answers, followed by statistical analysis.
results
GPT-4o and Kimi achieved overall accuracy rates of 87.9% and 72.8%, respectively, with response consistencies of 93.0% and 83.5%. Both models demonstrated relatively weaker performance in the professional knowledge subject and in specific areas such as erythrocyte disorders and normal hematopoiesis. GPT-4o consistently outperformed Kimi across all evaluated aspects.
conclusions
LLMs exhibit strong performance in laboratory hematology, despite certain limitations. These findings provide empirical evidence supporting the potential application of LLMs in Chinese medical education and highlight areas for future optimization and research.
clinicaltrial
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it’s website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.