Search Articles

View query in Help articles search

Search Results (1 to 9 of 9 Results)

Download search results: CSV END BibTex RIS

CSV download: Download all 9 search results (up to 5,000 articles maximum)

Comparative Evaluation of a Medical Large Language Model in Answering Real-World Radiation Oncology Questions: Multicenter Observational Study

Comparative Evaluation of a Medical Large Language Model in Answering Real-World Radiation Oncology Questions: Multicenter Observational Study

Benchmarking and evaluation studies of LLMs and other forms of generative AI in medicine are of increasing relevance [21]. They are essential to ensure a responsible implementation of these systems in the clinical environment. Regardless of the current uncertainties, LLMs are already frequently used both by clinicians and patients [22]. These systems typically did not undergo a medicine-specific quality assurance process, nor did they receive formal approval as a medical device.

Fabio Dennstädt, Max Schmerder, Elena Riggenbach, Lucas Mose, Katarina Bryjova, Nicolas Bachmann, Paul-Henry Mackeprang, Maiwand Ahmadsei, Dubravko Sinovcic, Paul Windisch, Daniel Zwahlen, Susanne Rogers, Oliver Riesterer, Martin Maffei, Eleni Gkika, Hathal Haddad, Jan Peeken, Paul Martin Putora, Markus Glatzer, Florian Putz, Daniel Hoefler, Sebastian M Christ, Irina Filchenko, Janna Hastings, Roberto Gaio, Lawrence Chiang, Daniel M Aebersold, Nikola Cihoric

J Med Internet Res 2025;27:e69752


Toward Clinical Generative AI: Conceptual Framework

Toward Clinical Generative AI: Conceptual Framework

Comprehensive evaluation Real-world relevance Assessment of contextual understanding and probabilistic reasoning Complex to design Resource intensive Potential bias in test creation Real-world applicability Evidence-based evaluation Objective benchmarking Dependent on data quality Historical bias May not capture AI’sa potential for novel insights Leverages human expertise Valuable in complex cases Incorporates ethical judgment Subjective Time-consuming Potential for expert bias Comprehensive evaluation from

Nicola Luigi Bragazzi, Sergio Garbarino

JMIR AI 2024;3:e55957


How Health Care Organizations Approach Social Media Measurement: Qualitative Study

How Health Care Organizations Approach Social Media Measurement: Qualitative Study

Three types of benchmarks were apparent in social media measurement: personal benchmarking, comparative benchmarking, and metric benchmarking. At least one participant alluded to using personal benchmarks to track progress, evaluate performance, and determine areas for improvement. This study conceptualized personal benchmarking as using self-set targets to evaluate social media performance.

Chukwuma Ukoha

JMIR Form Res 2020;4(8):e18518


Value of Eye-Tracking Data for Classification of Information Processing–Intensive Handling Tasks: Quasi-Experimental Study on Cognition and User Interface Design

Value of Eye-Tracking Data for Classification of Information Processing–Intensive Handling Tasks: Quasi-Experimental Study on Cognition and User Interface Design

Based on the results of this study, benchmarking D1 and D2 showed the following. Inserting the material seemed to be challenging for both UI designs in general. Therefore, the guiding material (manual and quick starting guide) and training should focus on this task. The lever of D1 seemed to result in lower mental workload. It has a more dominant appearance compared with D2, where the lever is integrated into the housing for protection in case of a fall.

Stephan Wegner, Quentin Lohmeyer, Dimitri Wahlen, Sandra Neumann, Jean-Claude Groebli, Mirko Meboldt

JMIR Hum Factors 2020;7(2):e15581


Internet-Based Cognitive Therapy for Social Anxiety Disorder in Hong Kong: Therapist Training and Dissemination Case Series

Internet-Based Cognitive Therapy for Social Anxiety Disorder in Hong Kong: Therapist Training and Dissemination Case Series

Benchmarking against the CTCS-SP, these scores therefore indicate a level of competency well above a 50% minimum standard. It should be noted that this assessment was completed following Phase 2 of the training, before the therapists’ pilot cases; this means they had further opportunities to practice and consolidate their skills in Phase 3. All patients showed excellent adherence. Each patient completed all of the core treatment modules.

Graham R Thew, Candice LYM Powell, Amy PL Kwok, Mandy H Lissillour Chan, Jennifer Wild, Emma Warnock-Parkes, Patrick WL Leung, David M Clark

JMIR Form Res 2019;3(2):e13446