%0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e46800 %T Assessing ChatGPT’s Capability for Multiple Choice Questions Using RaschOnline: Observational Study %A Chow,Julie Chi %A Cheng,Teng Yun %A Chien,Tsair-Wei %A Chou,Willy %+ Department of Physical Medicine and Rehabilitation, Chi Mei Medical Center, No. 901, Chung Hwa Road, Yung Kung District, Tainan, 710, Taiwan, 886 937399106, smilewilly@mail.chimei.org.tw %K RaschOnline %K ChatGPT %K multiple choice questions %K differential item functioning %K Wright map %K KIDMAP %K website tool %K evaluation tool %K tool %K application %K artificial intelligence %K scoring %K testing %K college %K students %D 2024 %7 8.8.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: ChatGPT (OpenAI), a state-of-the-art large language model, has exhibited remarkable performance in various specialized applications. Despite the growing popularity and efficacy of artificial intelligence, there is a scarcity of studies that assess ChatGPT’s competence in addressing multiple-choice questions (MCQs) using KIDMAP of Rasch analysis—a website tool used to evaluate ChatGPT’s performance in MCQ answering. Objective: This study aims to (1) showcase the utility of the website (Rasch analysis, specifically RaschOnline), and (2) determine the grade achieved by ChatGPT when compared to a normal sample. Methods: The capability of ChatGPT was evaluated using 10 items from the English tests conducted for Taiwan college entrance examinations in 2023. Under a Rasch model, 300 simulated students with normal distributions were simulated to compete with ChatGPT’s responses. RaschOnline was used to generate 5 visual presentations, including item difficulties, differential item functioning, item characteristic curve, Wright map, and KIDMAP, to address the research objectives. Results: The findings revealed the following: (1) the difficulty of the 10 items increased in a monotonous pattern from easier to harder, represented by logits (–2.43, –1.78, –1.48, –0.64, –0.1, 0.33, 0.59, 1.34, 1.7, and 2.47); (2) evidence of differential item functioning was observed between gender groups for item 5 (P=.04); (3) item 5 displayed a good fit to the Rasch model (P=.61); (4) all items demonstrated a satisfactory fit to the Rasch model, indicated by Infit mean square errors below the threshold of 1.5; (5) no significant difference was found in the measures obtained between gender groups (P=.83); (6) a significant difference was observed among ability grades (P<.001); and (7) ChatGPT’s capability was graded as A, surpassing grades B to E. Conclusions: By using RaschOnline, this study provides evidence that ChatGPT possesses the ability to achieve a grade A when compared to a normal sample. It exhibits excellent proficiency in answering MCQs from the English tests conducted in 2023 for the Taiwan college entrance examinations. %M 39115919 %R 10.2196/46800 %U https://formative.jmir.org/2024/1/e46800 %U https://doi.org/10.2196/46800 %U http://www.ncbi.nlm.nih.gov/pubmed/39115919