@Article{info:doi/10.2196/64986, author="M'gadzah, Shona Alex Tapiwa and O'Malley, Andrew", title="Enhancing Diagnostic Accuracy of Ophthalmological Conditions With Complex Prompts in GPT-4: Comparative Analysis of Global and Low- and Middle-Income Country (LMIC)--Specific Pathologies", journal="JMIR Form Res", year="2025", month="Jun", day="30", volume="9", pages="e64986", keywords="artificial intelligence; AI; ophthalmology; clinical diagnostics; medical technology; data project; complex prompt; diagnostic accuracy; ophthalmological conditions; ophthalmological disorder; eyes; blindness; low- and middle-income countries; LMIC; low-income or middle-income economies; health care; LLMs; NLP; machine learning; statistical analysis; GPT-4", abstract="Background: The global incidence of blindness has continued to increase, despite the enactment of a Global Eye Health Action Plan by the World Health Assembly. This can be attributed, in part, to an aging population, but also to the limited diagnostic resources within low- and middle-income countries (LMICs). The advent of generative artificial intelligence (AI) within health care could pose a novel solution to combating the prevalence of blindness globally. Objective: The objectives of this study are to quantify the effect the addition of a complex prompt has on the diagnostic accuracy of a commercially available LLM, and to assess whether such LLMs are better or worse at diagnosing conditions that are more prevalent in LMICs. Methods: Ten clinical vignettes representing globally and LMIC-prevalent ophthalmological conditions were presented to GPT-4‐0125-preview using simple and complex prompts. Diagnostic performance metrics, including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), were calculated. Statistical comparison between prompts was conducted using a chi-square test of independence. Results: The complex prompt achieved a higher diagnostic accuracy (90.1{\%}) compared to the simple prompt (60.4{\%}), with a statistically significant difference ($\chi$2=428.86; P<.001). Sensitivity, specificity, PPV, and NPV were consistently improved for most conditions with the complex prompt. The simple prompt struggled with LMIC-prevalent conditions, diagnosing only 1 of 5 accurately, while the complex prompt successfully diagnosed 4 of 5. Conclusions: The study established that overall, the inclusion of a complex prompt positively affected the diagnostic accuracy of GPT-4‐0125-preview, particularly for LMIC-prevalent conditions. This highlights the potential for LLMs, when appropriately tailored, to support clinicians in diverse health care settings. Future research should explore the generalizability of these findings across other models and specialties. ", issn="2561-326X", doi="10.2196/64986", url="https://formative.jmir.org/2025/1/e64986", url="https://doi.org/10.2196/64986" }