@Article{info:doi/10.2196/60215, author="Amagai, Saki and Kaat, Aaron J and Fox, Rina S and Ho, Emily H and Pila, Sarah and Kallen, Michael A and Schalet, Benjamin D and Nowinski, Cindy J and Gershon, Richard C", title="Customizing Computerized Adaptive Test Stopping Rules for Clinical Settings Using the Negative Affect Subdomain of the NIH Toolbox Emotion Battery: Simulation Study", journal="JMIR Form Res", year="2025", month="Mar", day="21", volume="9", pages="e60215", keywords="computerized adaptive testing; CAT; stopping rules; NIH Toolbox; reliability; test burden; clinical setting; patient-reported outcome; clinician", abstract="Background: Patient-reported outcome measures are crucial for informed medical decisions and evaluating treatments. However, they can be burdensome for patients and sometimes lack the reliability clinicians need for clear clinical interpretations. Objective: We aimed to assess the extent to which applying alternative stopping rules can increase reliability for clinical use while minimizing the burden of computerized adaptive tests (CATs). Methods: CAT simulations were conducted on 3 adult item banks in the NIH Toolbox for Assessment of Neurological and Behavioral Function Emotion Battery; the item banks were in the Negative Affect subdomain (ie, Anger Affect, Fear Affect, and Sadness) and contained at least 8 items. In the originally applied NIH Toolbox CAT stopping rules, the CAT was stopped if the score SE reached <0.3 before 12 items were administered. We first contrasted this with a SE-change rule in a planned simulation analysis. We then contrasted the original rules with fixed-length CATs (4‐12 items), a reduction of the maximum number of items to 8, and other modifications in post hoc analyses. Burden was measured by the number of items administered per simulation, precision by the percentage of assessments yielding reliability cutoffs (0.85, 0.90, and 0.95), and accurate score recovery by the root mean squared error between the generating $\theta$ and the CAT-estimated ``expected a posteriori''--based $\theta$. Results: In general, relative to the original rules, the alternative stopping rules slightly decreased burden while also increasing the proportion of assessments achieving high reliability for the adult banks; however, the SE-change rule and fixed-length CATs with 8 or fewer items also notably increased assessments yielding reliability <0.85. Among the alternative rules explored, the reduced maximum stopping rule best balanced precision and parsimony, presenting another option beyond the original rules. Conclusions: Our findings demonstrate the challenges in attempting to reduce test burden while also achieving score precision for clinical use. Stopping rules should be modified in accordance with the context of the study population and the purpose of the study. ", issn="2561-326X", doi="10.2196/60215", url="https://formative.jmir.org/2025/1/e60215", url="https://doi.org/10.2196/60215" }