Search Articles

View query in Help articles search

Search Results (1 to 10 of 65 Results)

Download search results: CSV END BibTex RIS


A Practical Guide and Assessment on Using ChatGPT to Conduct Grounded Theory: Tutorial

A Practical Guide and Assessment on Using ChatGPT to Conduct Grounded Theory: Tutorial

Reliability is a key metric for evaluating Chat GPT’s coding performance, assessing whether different coders (human or machine) produce similar results [9-11]. Typical measures include percent agreement (comparing human and artificial intelligence [AI] coding similarity) and the κ coefficient (measuring agreement beyond chance) [12,13].

Yongjie Yue, Dong Liu, Yilin Lv, Junyi Hao, Peixuan Cui

J Med Internet Res 2025;27:e70122

Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis

Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis

Recent studies have extensively evaluated the performance of generative AI models in medical question-answering scenarios. These models have shown promising results in medical licensing examinations [4,5] clinical case discussions and diagnostic reasoning [6,7]. However, their performance varies significantly based on task complexity.

Mélanie Suppan, Pietro Elias Fubini, Alexandra Stefani, Mia Gisselbaek, Caroline Flora Samer, Georges Louis Savoldelli

JMIR AI 2025;4:e66796

The Applications of Large Language Models in Mental Health: Scoping Review

The Applications of Large Language Models in Mental Health: Scoping Review

When evaluating the performance of these LLMs, most studies measured the performance of LLMs with various metrics, such as F1-score (54/95, 57%), precision (34/95, 36%), accuracy (45/95, 47%), and recall (32/95, 34%). The number of studies mapped by country is presented in Figure 3 A (a higher resolution version of figure is also available in Multimedia Appendix 5).

Yu Jin, Jiayi Liu, Pan Li, Baosen Wang, Yangxinyu Yan, Huilin Zhang, Chenhao Ni, Jing Wang, Yi Li, Yajun Bu, Yuanyuan Wang

J Med Internet Res 2025;27:e69284

Encouraging the Voluntary Mobilization of Mental Resources by Manipulating Task Design: Explorative Study

Encouraging the Voluntary Mobilization of Mental Resources by Manipulating Task Design: Explorative Study

Thus, in this context it should be considered regarding the corresponding perceived performance and frustration when performing the task. Positive mental effort should be accompanied by high perceived performance and low frustration. To make the notion of positive mental effort operational, we return to the definition of this concept, specifying that it is part of the MWL construct.

Lina-Estelle Louis, Saïd Moussaoui, Sébastien Ravoux, Isabelle Milleville-Pennel

JMIR Form Res 2025;9:e63491

Comparison of an Emergency Medicine Asynchronous Learning Platform Usage Before and During the COVID-19 Pandemic: Retrospective Analysis Study

Comparison of an Emergency Medicine Asynchronous Learning Platform Usage Before and During the COVID-19 Pandemic: Retrospective Analysis Study

Previous studies have demonstrated that podcasts have positive effects on knowledge retention and test performance [5,6]. Multiple studies have previously been published on the effectiveness of remote learning during the COVID-19 pandemic via remote learning and web-based modules [7,8]. Most recently, 1 study aimed to measure podcast and blog utilization during the early months of the COVID-19 pandemic [9].

Blake Briggs, Madhuri Mulekar, Hannah Morales, Iltifat Husain

JMIR Med Educ 2025;11:e58100

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study

Model performance is based on training datasets. The incorporation of AI into traditional health care and medical education has had a substantial impact on medical practices [3]. It has accelerated diagnostic processes in radiography [4], pathology, endoscopy, and ultrasonography, has improved clinical decision-making, and has decreased the workloads of health care personnel. AI has had an impact on pharmaceutical development and management and medical education, resulting in a new paradigm [5].

Naritsaret Kaewboonlert, Jiraphon Poontananggul, Natthipong Pongsuwan, Gun Bhakdisongkhram

JMIR Med Educ 2025;11:e58898

Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis

Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis

However, current research shows that chatbot performance is uneven; in some areas or tasks, chatbot performance can reach more than 90% of the accuracy rate or satisfaction, and chatbot performance can even exceed that of some doctors; however, in some tasks, the answer provided is not valid or even wrong [13,14]. There are performance differences among different models, which are also affected by many factors, such as language, question type, and topic [15].

Yong Zhang, Xiao Lu, Yan Luo, Ying Zhu, Wenwu Ling

JMIR Med Inform 2025;13:e63924

Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study

Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study

This study was conducted in 2023, comparing AI model performance against medical student performance data from March 21, 2023 (German questions) and June 16, 2023 (English questions). Error bars represent 95% CIs of the mean. Sample sizes (n) are provided for each group. AI: artificial intelligence.

Jonas Roos, Ron Martin, Robert Kaczmarczyk

JMIR Form Res 2024;8:e57592

Electronic Health Record Data Quality and Performance Assessments: Scoping Review

Electronic Health Record Data Quality and Performance Assessments: Scoping Review

Titles and abstracts were screened to include original research articles assessing the DQ and performance of all or part of a hospital’s EHR system. We looked for studies reporting on 1 or more aspects of DQ (the assessment of EHR data without consideration of follow-up actions) and data performance (the assessment of EHR data applications) as defined (Table 1). Data quality and performance indicator definitions, mitigation strategies, and references. a EHR: electronic health record.

Yordan P Penev, Timothy R Buchanan, Matthew M Ruppert, Michelle Liu, Ramin Shekouhi, Ziyuan Guan, Jeremy Balch, Tezcan Ozrazgat-Baslanti, Benjamin Shickel, Tyler J Loftus, Azra Bihorac

JMIR Med Inform 2024;12:e58130

Development and Validation of a Computed Tomography–Based Model for Noninvasive Prediction of the T Stage in Gastric Cancer: Multicenter Retrospective Study

Development and Validation of a Computed Tomography–Based Model for Noninvasive Prediction of the T Stage in Gastric Cancer: Multicenter Retrospective Study

In addition, the feasibility and promising performance of machine learning approaches in assessing T staging in lung cancer has been demonstrated [14]. However, few studies have reported the combination of deep learning and radiomics in predicting T staging in GC.

Jin Tao, Dan Liu, Fu-Bi Hu, Xiao Zhang, Hongkun Yin, Huiling Zhang, Kai Zhang, Zixing Huang, Kun Yang

J Med Internet Res 2024;26:e56851