JMIR Publications

Accepted for/Published in: JMIR Medical Education

Date Submitted: Oct 29, 2024

Open Peer Review Period: Oct 29, 2024 - Dec 24, 2024

Date Accepted: Not yet accepted

Date Submitted to PubMed: Not yet submitted to PubMed

closed for review but you can still tweet

Rongxi P, Weiru W, Xiaoli Z, Zhong Y, Youcai D
Evaluating Large Language Models' Capabilities in Laboratory Hematology: A Case Study Based on the Chinese Medical Laboratory Technician Examination
JMIR Medical Education
DOI: 10.2196/11848
PMID: 30303485
PMCID: 6352016

Preprints Accepted Manuscript

Evaluating Large Language Models' Capabilities in Laboratory Hematology: A Case Study Based on the Chinese Medical Laboratory Technician Examination

Abstract

background

Large language models (LLMs) have shown considerable promise in the medical field, yet their application in specialized areas such as laboratory hematology remains underexplored.

objective

This study aims to evaluate the performance of two prominent LLMs, GPT-4o and Kimi, in laboratory hematology and explore their potential applications in Chinese medical education.

methods

We selected 400 laboratory hematology questions from the Chinese Medical Laboratory Technician Examination (2015-2022), encompassing four subjects: basic knowledge, related professional knowledge, professional knowledge, and professional practice ability. GPT-4o and Kimi were tested using these questions combined with appropriate prompts, with each question administered twice independently. The accuracy and consistency of the models' responses were assessed by comparing them to standard answers, followed by statistical analysis.

results

GPT-4o and Kimi achieved overall accuracy rates of 87.9% and 72.8%, respectively, with response consistencies of 93.0% and 83.5%. Both models demonstrated relatively weaker performance in the professional knowledge subject and in specific areas such as erythrocyte disorders and normal hematopoiesis. GPT-4o consistently outperformed Kimi across all evaluated aspects.

conclusions

LLMs exhibit strong performance in laboratory hematology, despite certain limitations. These findings provide empirical evidence supporting the potential application of LLMs in Chinese medical education and highlight areas for future optimization and research.

clinicaltrial

As per the author’s request the PDF is not available.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it’s website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

Citation

Please cite as:

Rongxi P, Weiru W, Xiaoli Z, Zhong Y, Youcai D

Evaluating Large Language Models' Capabilities in Laboratory Hematology: A Case Study Based on the Chinese Medical Laboratory Technician Examination

JMIR Medical Education

DOI: 10.2196/14886

URL: https://preprints.jmir.org/preprint/11848

PMID: 31789598

PMCID: 6352016

Evaluating Large Language Models' Capabilities in Laboratory Hematology: A Case Study Based on the Chinese Medical Laboratory Technician Examination

Current Preprint Settings

(as selected by the authors)

Allow access to the preprint PDF upon submission to:
(a) Open peer-review purposes

(b) Logged-in users only

(c) Anybody, anytime

(d) Nobody
When the manuscript is accepted, display the accepted manuscript PDF to:
(a) Logged-in users only

(b) Anybody, anytime

(c) Anybody, anytime, and the author has chosen to deposit the ahead-of-print metadata to PubMed

(d) Nobody
When a final paper is published in a JMIR journal, display the preprint as follows:
(a) Allow download

(b) Show abstract only

(c) Do not display anything
If the paper is rejected from JMIR journals, display the preprint to:
(a) Logged-in users only

(b) Anybody, anytime

(c) Nobody