Published on in Vol 8 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/59267, first published .
Evaluating ChatGPT-4’s Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases

Evaluating ChatGPT-4’s Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases

Evaluating ChatGPT-4’s Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases

Journals

  1. Dalal A, Plombon S, Konieczny K, Motta-Calderon D, Malik M, Garber A, Lam A, Piniella N, Leeson M, Garabedian P, Goyal A, Roulier S, Yoon C, Fiskio J, Schnock K, Rozenblum R, Griffin J, Schnipper J, Lipsitz S, Bates D. Adverse diagnostic events in hospitalised patients: a single-centre, retrospective cohort study. BMJ Quality & Safety 2025;34(6):377 View
  2. Barabucci G, Shia V, Chu E, Harack B, Laskowski K, Fu N. Combining Multiple Large Language Models Improves Diagnostic Accuracy. NEJM AI 2024;1(11) View
  3. Sanchez Tena M, Alvarez‐Peregrina C, Martinez‐Perez C. Evaluation of the perception of information from ChatGPT in myopia education: Perspectives of students and professionals. Ophthalmic and Physiological Optics 2025;45(3):883 View
  4. Padovan M, Palla A, Marino R, Porciatti F, Cosci B, Carlucci F, Nerli G, Petillo A, Necciari G, Dell’Amico L, Lucisano V, Scarinci S, Foddis R. ChatGPT-4 vs. Google Bard: Which Chatbot Better Understands the Italian Legislative Framework for Worker Health and Safety?. Applied Sciences 2025;15(3):1508 View
  5. Saglam S, Uludag V, Karaduman Z, Arıcan M, Yücel M, Dalaslan R. Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study. BMC Medical Informatics and Decision Making 2025;25(1) View
  6. Sarı M, Tufenkci P. Evaluation of the Competency of Large Language Models GPT-4o and Claude 3.5 Sonnet in Endodontic Emergencies. European Annals of Dental Sciences 2025;52(1):10 View
  7. Bolgova O, Ganguly P, Mavrych V. Comparative analysis of LLMs performance in medical embryology: A cross‐platform study of ChatGPT, Claude, Gemini, and Copilot. Anatomical Sciences Education 2025;18(7):718 View
  8. Bridges J, Jiang X, Ige M, Toyobo O. Computerized diagnostic decision support systems—Isabel Pro versus ChatGPT-4 part II. JAMIA Open 2025;8(3) View
  9. Zhang A, Chen J. AI-driven network biology identifies SRC as a therapeutic target in metastatic pancreatic adenocarcinoma. Intelligent Oncology 2025;1(3):233 View
  10. Gün M. Can AI match emergency physicians in managing common emergency cases? A comparative performance evaluation. BMC Emergency Medicine 2025;25(1) View
  11. Umman V, Tosun B, Uygur A, Emre S. Evaluation of the Effectiveness, Safety, and Patient Satisfaction of Artificial Intelligence-Based Patient Education and Counseling for Both Recipients and Donors in the Preoperative and Postoperative Phases of Organ Transplantation. Transplantation Proceedings 2025;57(9):1832 View
  12. Wu X, Huang Y, He Q. Diagnostic performance of newly developed large language models in critical illness cases: A comparative study. International Journal of Medical Informatics 2025;204:106088 View
  13. Sarvari P, Al-fagih Z. Rapidly Benchmarking Large Language Models for Diagnosing Comorbid Patients: Comparative Study Leveraging the LLM-as-a-Judge Method. JMIRx Med 2025;6:e67661 View
  14. Cotfas L, Sandu A, Delcea C, Diaconu P, Frăsineanu C, Stănescu A. From Transformers to ChatGPT: An Analysis of Large Language Models Research. IEEE Access 2025;13:146889 View
  15. Günay Polatkan Ş, Sığırlı D, Durak V, Alak Ç, Kan I. Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash. Uludağ Üniversitesi Tıp Fakültesi Dergisi 2025;51(2):239 View
  16. Shimizu T, Hautz W, van Sassen C, Zwaan L. The global progress for improving diagnosis: what we’ve learned, what comes next. Diagnosis 2025;12(4):529 View
  17. Chen G, Lin C, Zhang L, Luo Z, Shin Y, Li X. Virtual case reasoning and AI-assisted diagnostic instruction: an empirical study based on body interact and large language models. BMC Medical Education 2025;25(1) View
  18. Hou C, Zhang H, Zhao P, Lu J, Geng J, Li H, Sun X, He T, Zhang H, Tang Y, Zhang L, Xi Y, Li C, Gao C, Lu X. DeepSeek R1 excels in diagnosing previously misdiagnosed cases. Array 2025;28:100559 View
  19. Sales A, Gizaw C, Beck J, Grauvogel J. Evaluating Large Language Models in Interpreting MRI Reports and Recommending Treatment for Vestibular Schwannoma. Diagnostics 2025;15(22):2841 View
  20. Schroeder A, Tran Z, Sexton K, Salzberg A. Clinician’s Guide to Artificial Intelligence. Medical Clinics of North America 2025 View

Books/Policy Documents

  1. Hirosawa T. Artificial Intelligence in Medical Diagnostics. View