@Article{info:doi/10.2196/37838, author="Di, Shuang and Petch, Jeremy and Gerstein, Hertzel C and Zhu, Ruoqing and Sherifali, Diana", title="Optimizing Health Coaching for Patients With Type 2 Diabetes Using Machine Learning: Model Development and Validation Study", journal="JMIR Form Res", year="2022", month="Sep", day="13", volume="6", number="9", pages="e37838", keywords="diabetes health coaching; artificial intelligence; reinforcement learning; health coaching; patient outcome; diabetes; community health; digital intervention; health outcome", abstract="Background: Health coaching is an emerging intervention that has been shown to improve clinical and patient-relevant outcomes for type 2 diabetes. Advances in artificial intelligence may provide an avenue for developing a more personalized, adaptive, and cost-effective approach to diabetes health coaching. Objective: We aim to apply Q-learning, a widely used reinforcement learning algorithm, to a diabetes health-coaching data set to develop a model for recommending an optimal coaching intervention at each decision point that is tailored to a patient's accumulated history. Methods: In this pilot study, we fit a two-stage reinforcement learning model on 177 patients from the intervention arm of a community-based randomized controlled trial conducted in Canada. The policy produced by the reinforcement learning model can recommend a coaching intervention at each decision point that is tailored to a patient's accumulated history and is expected to maximize the composite clinical outcome of hemoglobin A1c reduction and quality of life improvement (normalized to [ $\null$0, 1 $\null$], with a higher score being better). Our data, models, and source code are publicly available. Results: Among the 177 patients, the coaching intervention recommended by our policy mirrored the observed diabetes health coach's interventions in 17.5{\%} (n=31) of the patients in stage 1 and 14.1{\%} (n=25) of the patients in stage 2. Where there was agreement in both stages, the average cumulative composite outcome (0.839, 95{\%} CI 0.460-1.220) was better than those for whom the optimal policy agreed with the diabetes health coach in only one stage (0.791, 95{\%} CI 0.747-0.836) or differed in both stages (0.755, 95{\%} CI 0.728-0.781). Additionally, the average cumulative composite outcome predicted for the policy's recommendations was significantly better than that of the observed diabetes health coach's recommendations (tn-1=10.040; P<.001). Conclusions: Applying reinforcement learning to diabetes health coaching could allow for both the automation of health coaching and an improvement in health outcomes produced by this type of intervention. ", issn="2561-326X", doi="10.2196/37838", url="https://formative.jmir.org/2022/9/e37838", url="https://doi.org/10.2196/37838", url="http://www.ncbi.nlm.nih.gov/pubmed/36099006" }