TY - JOUR AU - Chow, Julie Chi AU - Cheng, Teng Yun AU - Chien, Tsair-Wei AU - Chou, Willy PY - 2024 DA - 2024/8/8 TI - Assessing ChatGPT’s Capability for Multiple Choice Questions Using RaschOnline: Observational Study JO - JMIR Form Res SP - e46800 VL - 8 KW - RaschOnline KW - ChatGPT KW - multiple choice questions KW - differential item functioning KW - Wright map KW - KIDMAP KW - website tool KW - evaluation tool KW - tool KW - application KW - artificial intelligence KW - scoring KW - testing KW - college KW - students AB - Background: ChatGPT (OpenAI), a state-of-the-art large language model, has exhibited remarkable performance in various specialized applications. Despite the growing popularity and efficacy of artificial intelligence, there is a scarcity of studies that assess ChatGPT’s competence in addressing multiple-choice questions (MCQs) using KIDMAP of Rasch analysis—a website tool used to evaluate ChatGPT’s performance in MCQ answering. Objective: This study aims to (1) showcase the utility of the website (Rasch analysis, specifically RaschOnline), and (2) determine the grade achieved by ChatGPT when compared to a normal sample. Methods: The capability of ChatGPT was evaluated using 10 items from the English tests conducted for Taiwan college entrance examinations in 2023. Under a Rasch model, 300 simulated students with normal distributions were simulated to compete with ChatGPT’s responses. RaschOnline was used to generate 5 visual presentations, including item difficulties, differential item functioning, item characteristic curve, Wright map, and KIDMAP, to address the research objectives. Results: The findings revealed the following: (1) the difficulty of the 10 items increased in a monotonous pattern from easier to harder, represented by logits (–2.43, –1.78, –1.48, –0.64, –0.1, 0.33, 0.59, 1.34, 1.7, and 2.47); (2) evidence of differential item functioning was observed between gender groups for item 5 (P=.04); (3) item 5 displayed a good fit to the Rasch model (P=.61); (4) all items demonstrated a satisfactory fit to the Rasch model, indicated by Infit mean square errors below the threshold of 1.5; (5) no significant difference was found in the measures obtained between gender groups (P=.83); (6) a significant difference was observed among ability grades (P<.001); and (7) ChatGPT’s capability was graded as A, surpassing grades B to E. Conclusions: By using RaschOnline, this study provides evidence that ChatGPT possesses the ability to achieve a grade A when compared to a normal sample. It exhibits excellent proficiency in answering MCQs from the English tests conducted in 2023 for the Taiwan college entrance examinations. SN - 2561-326X UR - https://formative.jmir.org/2024/1/e46800 UR - https://doi.org/10.2196/46800 UR - http://www.ncbi.nlm.nih.gov/pubmed/39115919 DO - 10.2196/46800 ID - info:doi/10.2196/46800 ER -