%0 Journal Article
%@ 2561-326X
%I JMIR Publications
%V 8
%N 
%P e46800
%T Assessing ChatGPT’s Capability for Multiple Choice Questions Using RaschOnline: Observational Study
%A Chow,Julie Chi
%A Cheng,Teng Yun
%A Chien,Tsair-Wei
%A Chou,Willy
%+ Department of Physical Medicine and Rehabilitation, Chi Mei Medical Center, No. 901, Chung Hwa Road, Yung Kung District, Tainan, 710, Taiwan, 886 937399106, smilewilly@mail.chimei.org.tw
%K RaschOnline
%K ChatGPT
%K multiple choice questions
%K differential item functioning
%K Wright map
%K KIDMAP
%K website tool
%K evaluation tool
%K tool
%K application
%K artificial intelligence
%K scoring
%K testing
%K college
%K students
%D 2024
%7 8.8.2024
%9 Original Paper
%J JMIR Form Res
%G English
%X Background: ChatGPT (OpenAI), a state-of-the-art large language model, has exhibited remarkable performance in various specialized applications. Despite the growing popularity and efficacy of artificial intelligence, there is a scarcity of studies that assess ChatGPT’s competence in addressing multiple-choice questions (MCQs) using KIDMAP of Rasch analysis—a website tool used to evaluate ChatGPT’s performance in MCQ answering. Objective: This study aims to (1) showcase the utility of the website (Rasch analysis, specifically RaschOnline), and (2) determine the grade achieved by ChatGPT when compared to a normal sample. Methods: The capability of ChatGPT was evaluated using 10 items from the English tests conducted for Taiwan college entrance examinations in 2023. Under a Rasch model, 300 simulated students with normal distributions were simulated to compete with ChatGPT’s responses. RaschOnline was used to generate 5 visual presentations, including item difficulties, differential item functioning, item characteristic curve, Wright map, and KIDMAP, to address the research objectives. Results: The findings revealed the following: (1) the difficulty of the 10 items increased in a monotonous pattern from easier to harder, represented by logits (–2.43, –1.78, –1.48, –0.64, –0.1, 0.33, 0.59, 1.34, 1.7, and 2.47); (2) evidence of differential item functioning was observed between gender groups for item 5 (P=.04); (3) item 5 displayed a good fit to the Rasch model (P=.61); (4) all items demonstrated a satisfactory fit to the Rasch model, indicated by Infit mean square errors below the threshold of 1.5; (5) no significant difference was found in the measures obtained between gender groups (P=.83); (6) a significant difference was observed among ability grades (P<.001); and (7) ChatGPT’s capability was graded as A, surpassing grades B to E. Conclusions: By using RaschOnline, this study provides evidence that ChatGPT possesses the ability to achieve a grade A when compared to a normal sample. It exhibits excellent proficiency in answering MCQs from the English tests conducted in 2023 for the Taiwan college entrance examinations. 
%M 39115919
%R 10.2196/46800
%U https://formative.jmir.org/2024/1/e46800
%U https://doi.org/10.2196/46800
%U http://www.ncbi.nlm.nih.gov/pubmed/39115919