Performance of DeepSeek-R1, ChatGPT (GPT-o3-mini), and Gemini 2.0 Flash on German Medical Multiple-Choice Questions: Comparative Evaluation

doi:10.2196/77357

Published on 18.Dec.2025 in Vol 9 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/77357, first published 12.May.2025.

Laptop screen displaying ChatGPT interface with examples and limitations

Performance of DeepSeek-R1, ChatGPT (GPT-o3-mini), and Gemini 2.0 Flash on German Medical Multiple-Choice Questions: Comparative Evaluation

Annika Meyer¹

; Yassin Karay²

; Andrea U Steinbicker¹

; Thomas Streichert³

; Remco Overbeek¹

Article Authors Cited by (3) Tweetations Metrics

Annika Meyer ¹ , Dr med ; Yassin Karay ² , Dr rer med ; Andrea U Steinbicker ¹ , Prof Dr Med ; Thomas Streichert ³ , Prof Dr Med ; Remco Overbeek ¹ , Dr med

¹ Department of Anesthesiology and Operative Intensive Care, Faculty of Medicine and University Hospital, University Hospital Cologne, Cologne, Germany

² Dean’s Office for Student Affairs, Faculty of Medicine, University Hospital Cologne, Cologne, Germany

³ Institute for Clinical Chemistry, Faculty of Medicine and University Hospital, University Hospital Cologne, Cologne, Germany

Corresponding Author:

Annika Meyer, Dr med
Department of Anesthesiology and Operative Intensive Care
Faculty of Medicine and University Hospital
University Hospital Cologne
Kerpener Str. 62
Cologne 50937
Germany
Phone: 1 0000000000
Email: annika.meyer1@uk-koeln.de

This paper is in the following e-collection/theme issue:

Performance of DeepSeek-R1, ChatGPT (GPT-o3-mini), and Gemini 2.0 Flash on German Medical Multiple-Choice Questions: Comparative Evaluation

Performance of DeepSeek-R1, ChatGPT (GPT-o3-mini), and Gemini 2.0 Flash on German Medical Multiple-Choice Questions: Comparative Evaluation

Corresponding Author: