This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
Diabetes management is complex, and program personalization has been identified to enhance engagement and clinical outcomes in diabetes management programs. However, 50% of individuals living with diabetes are unable to achieve glycemic control, presenting a gap in the delivery of self-management education and behavior change. Machine learning and recommender systems, which have been used within the health care setting, could be a feasible application for diabetes management programs to provide a personalized user experience and improve user engagement and outcomes.
This study aims to evaluate machine learning models using member-level engagements to predict improvement in estimated A1c and develop personalized action recommendations within a remote diabetes monitoring program to improve clinical outcomes.
A retrospective study of Livongo for Diabetes member engagement data was analyzed within five action categories (interacting with a coach, reading education content, self-monitoring blood glucose level, tracking physical activity, and monitoring nutrition) to build a member-level model to predict if a specific type and level of engagement could lead to improved estimated A1c for members with type 2 diabetes. Engagement and improvement in estimated A1c can be correlated; therefore, the doubly robust learning method was used to model the heterogeneous treatment effect of action engagement on improvements in estimated A1c.
The treatment effect was successfully computed within the five action categories on estimated A1c reduction for each member. Results show interaction with coaches and self-monitoring blood glucose levels were the actions that resulted in the highest average decrease in estimated A1c (1.7% and 1.4%, respectively) and were the most recommended actions for 54% of the population. However, these were found to not be the optimal interventions for all members; 46% of members were predicted to have better outcomes with one of the other three interventions. Members who engaged with their recommended actions had on average a 0.8% larger reduction in estimated A1c than those who did not engage in recommended actions within the first 3 months of the program.
Personalized action recommendations using heterogeneous treatment effects to compute the impact of member actions can reduce estimated A1c and be a valuable tool for diabetes management programs in encouraging members toward actions to improve clinical outcomes.
Diabetes is a chronic progressive disease affecting 34 million Americans with 1.5 million newly diagnosed each year [
Diabetes self-management efficacy and improved glycemic control is supported by programs that offer education, coaching, glucose monitoring, and physical activity [
Personalization has been identified as a key tool in digital health to enhance user engagement for improved outcomes, which is often a missing factor in the development of diabetes digital health programs [
Livongo for Diabetes is an RDMP focused on empowering members with education and tools to self-manage their diabetes through mobile technology. The program offers members a cellular-enabled, two-way messaging device that measures BG and delivers personalized insights into their glycemic management; free unlimited BG test strips; real-time support from diabetes response specialists 24 hours a day, 7 days a week, 365 days a year; and access to certified diabetes care and education specialists (CDCESs) for support and goal setting.
Livongo members’ glucose meter use was captured remotely through the cellular-enabled device. Members also had access to a mobile phone app that tracked historical SMBG readings and provided reminders for SMBG checking, physical activity, and food log tracking; asynchronous chat with coaches; ability to schedule private coaching sessions with CDCES; educational content for diabetes self-management; and allowed members to send historical reports of SMBG readings to care providers, family members, and friends.
A retrospective feasibility study was conducted to compute heterogeneous treatment effect for five different action categories in the reduction of estimated A1c for members enrolled in Livongo for Diabetes with type 2 diabetes and to identify which actions could be most effective for each member. Within each action category, members were classified into a treatment or control group defined by engagement level. The effectiveness of each action category was assessed by computing the heterogeneous treatment effect for each action category for each member.
Members enrolled in Livongo for Diabetes for a minimum of 4 months with a baseline estimated A1c ≥7.5% at 30 days post enrollment were included in the study. Additional inclusion criteria were a self-reported diagnosis of type 2 diabetes at enrollment, ≥5 SMBG measures between 50 and 400 mg/dL in month one and month four of their program, and had not self-reported the use of a continuous glucose monitor (see
Study population funnel with inclusion and exclusion criteria. HbA1c: hemoglobin A1c.
The institutional review board approval was granted by Aspire IRB (#520160099), and guidelines outlined in the Declaration of Helsinki were followed.
Various program features were categorized and grouped to form five action categories, otherwise known as interventions per the causal inference formulations (see
Number of self-monitoring blood glucose checks a member performed on the device
Number of scheduled certified diabetes care and education specialist coaching sessions or asynchronous chat with a coach
Physical activity recorded using synced steps data
Members engagement with nutrition- and meal plan–related nudge recommendations and food log
Members’ engagement with educational content nudge recommendations
Estimated A1c was calculated using the A1c-derived average glucose model where estimated A1c = [mean BG over past 30 days + 46.7] / 28.7 [
The intervention outcome,
Treatment effect was modelled using self-reported member information and member engagement during the first 3 months post enrollment. The following variables were used as covariates in the model:
Demographics: age, gender, BMI, race
Self-reported medical information: self-reported HbA1c at enrollment, diabetes management level of self-efficacy, insulin use, on oral diabetes meds, received flu vaccine, smoking behavior
Self-reported preferences: preferred channels of communication, interest level in becoming active and healthy
Engagement: average days between Livongo website use; average days between Livongo mobile app use; number of days of Livongo mobile app use; average days between SMBG checks; estimated A1c at month two and three; days with SMBG hypoglycemia readings in month one, two, and three; days with SMBG hyperglycemia readings in month one, two, and three
The sample was defined to be composed of members with covariates,
However, this was not possible in a real-world data set where a member could only be in one cohort, treatment or control, at a time and not in both. Therefore, observed samples were assumed to be from a joint distribution modeled by the equations:
and the treatment effect was expressed as:
where
With the assumption that all potential confounders were observed, the heterogeneous treatment effect for each member was computed using the doubly robust (DR) learning algorithm [
Engagement level was used to split data into control and treatment groups. The application of the DR learning algorithm enabled intervention outcomes from the treatment and control groups to be representative of the same population because the propensity model,
Member engagement was measured for each action category during the initial 3 months post enrollment. The level of member engagement with each action category was used to assign members into the treatment or control groups through a defined threshold. If a member had higher engagement than the threshold, then the member was assigned to the treatment cohort, and the members who did not achieve the threshold were assigned to the control cohort. The treatment and control split only included member’s engagement in the action category of the program. Members in the control cohort received communication from Livongo in the form of emails and newsletters.
The engagement thresholds were defined independently for each action category. The threshold value impacted the size imbalance between the control and treatment groups, thereby affecting noise in the data set and consequently the model performance. For this reason, engagement thresholds for each action category were selected that minimize the modelling error while optimizing treatment effect.
Treatment effect for the five actions categories were modelled independently. The action category of coaching is used to detail the process of selecting an engagement threshold and evaluate the heterogenous treatment effect model. Treatment effects across all action categories are then reported, followed by a proposed method to personalize action recommendations to optimize clinical outcomes.
Members who completed sufficient scheduled coaching sessions or asynchronous coaching chat sessions were assigned to the treatment cohort, with members not meeting the criteria assigned to control. This intervention engagement threshold was observed to have an impact on the DR model performance. As the threshold increases, so does the control-treatment size imbalance and noise in the data. The control-treatment size imbalance for different thresholds is shown in
For a member to be considered as receiving treatment, the data is split into training and validation data sets, which was split by a ratio of 65:35. The treatment effect was modelled with a forest DR learner algorithm using a gradient boosting classifier and a random forest regressor to model the likelihood of the outcome,
The MSE of the heterogenous treatment effect estimator model for different thresholds and computed average treatment effect are shown in
Treatment effect of the coaching action category computed with the DR learner algorithm shows most members having a negative treatment effect, therefore, promoting a greater impact of coaching on estimated A1c (see
Control and treatment cohort sample size for different minimum number of coaching sessions defined as engagement threshold.
Top left: MSE of the doubly robust model to predict treatment effect of coaching intervention for different number of coaching sessions threshold. Top right: predicted treatment effect of coaching intervention for different thresholds. MSE: mean squared error.
Distribution of computed treatment effects of coaching intervention.
A direct evaluation of causal models cannot be made on observational data where the true treatment effect is not known due to an inability to observe the effect of being treated or not for a particular sample simultaneously. Our causal model performance was evaluated indirectly by comparing the cumulative gain of the outcome when members are ranked by model prediction when compared to random sorting [
Cumulative gain is cumulative uplift multiplied by sample size, where uplift is defined as the difference between average outcomes of treatment and control cohorts. A model that performs well will have large uplift values in the first quantiles and decreasing values for larger ones. By comparing the cumulative gain of members sorted by treatment effects and randomly sorted, model performance can be inferred. The higher the area under the uplift curve (AUUC) in prediction when compared to random assignment, the better the model prediction.
The cumulative gain in the outcome for the coaching action category when members are ranked by model predicted treatment effect and when randomly ordered is shown in
Cumulative gain of coaching intervention when members are sorted by predicted treatment effect (solid blue line) and with random sorting (dashed orange line). The cumulative gain plotted is the negative of uplift from intervention outcome variable (change in A1c) so that higher gain values reflect better results.
The control-treatment cohort assignment for each of the five action categories for each member was inferred independently. From these independent analyses, engagement thresholds were defined for the five action categories optimizing treatment effect and sample size while lowering model MSE (see
≥70 days with self-monitoring blood glucose checks
≥3 coaching sessions (scheduled coaching sessions or asynchronous chat sessions with coach)
≥30 days with 2000 daily steps
≥2 food logs or ≥50% yes responses to nutrition-related nudges
≥50% yes responses to content-related nudges
As shown in
Top: distribution of members in the treatment cohort within the five different intervention categories or other category as a fraction of total data set sample size. Bottom: outcome (difference in estimated A1c level) for the different interventions.
The model performances were evaluated using a common validation data set across interventions. The cumulative gain of the outcome when members are sorted by model predictions compared to random sorting for the five action category outcomes independently is shown in
Cumulative gain of outcome for the five different interventions by model predictions (solid blue) and random sorting (dashed orange).
For each member, the intervention with the most negative treatment effect is the action that the model predicted would result in a larger reduction in estimated A1c (ie, optimal intervention). The average change in estimated A1c for members who were part of the treatment cohort in at least one action category in the validation set is shown in dark blue if the received intervention was the same as the predicted intervention and shown in light blue if the received intervention was not the same as the predicted intervention (see
The distribution of predicted optimal intervention for each member with negative predicted treatment effects were coaching (28%), SMBG checks (26%), physical activity (18%), content (16%), and nutrition (12%). Interaction with coaches and SMBG checks were observed to be the optimal intervention for 28% and 25% of the sample size, respectively, and the most recommended interventions. A balanced distribution of recommendations for optimal clinical outcomes was observed and opens an opportunity to prioritize recommendations based on a heterogeneous causal effect model.
A comparison of average treatment effect with current interventions and with recommended optimal intervention predicted by the model is shown in
Change in estimated A1c of members in different intervention treatment cohorts if the member received the model predicted optimal intervention (dark blue) or not (light blue).
Computed treatment effect for members who received different interventions. The light blue bars denote the treatment effect for members who were in the treatment cohort of our data set. Dark blue bars denote treatment effect for members if they received the optimal intervention.
This study highlights the feasibility of analyzing the engagement of members in an RDMP to develop a causal inference-based recommender system for predicting actions driving optimal clinical outcomes. Five action categories were identified upon member engagement level, and the causal inference model computed heterogenous treatment effect of each action per member. Model predictions were evaluated by comparing uplift gain when members were ranked by treatment effect to random sorting, and AUUC of the model predicted gain curves were larger for all actions, validating the method to infer treatment effect. Coaching and glucose monitoring were found to be the most frequently recommended actions for members to achieve optimal clinical outcomes. On average, members who engaged within their recommended actions had a 0.8% higher reduction in estimated A1c than those who did not engage within recommended actions in their first 3 months of the program, with coaching showing the largest reduction in estimated A1c at 1.7% when recommended and used by members.
Machine learning has been used to study precision medicine in diabetes care and complications, variables to predict the development of diabetes, and individual characteristics related to diabetes outcomes [
Our study observed better outcomes for members who engaged in their recommended actions over members who did not, across all action categories. On average, members who engaged within their recommended actions had a significant improvement estimated A1c than those who did not engage within recommended actions. Therefore, we propose RDMPs develop recommended actions of engagement that are more likely to lead to better outcomes based on computed heterogeneous treatment effects with the most optimal action having the most negative treatment effect. By offering personalized recommendations, members can receive a more effective experience through both digital and human coaching allowing for not only a medical cost saving for the individual through improved health outcomes but also a more cost-effective approach for the RDMP by directing the member to the most valuable program features.
Type 2 diabetes management is complex and dependent upon many factors such as nutrition, physical activity, and medication adherence, which varies widely among this population as a whole; therefore, to generate successful personalized recommendations, all variables must be gathered to match an individual’s specific needs [
This study has several strengths, including the report of real-world data, as well as insight into the demographics and program engagement of members participating in an RDMP. Members were not provided incentives to participate in the program or study beyond the Livongo for Diabetes program being provided as a benefit through their employer or health plan package. The study also had some limitations, including the retrospective analysis study design. Members in the Livongo for Diabetes program received promotional engagement outreach in the form of mobile app nudges, emails, and text messages; therefore, observational data collected for the study contained a diverse set of engagement behaviors within the program features and did not provide a clean treatment and control cohort split. Improvement in estimated A1c was calculated from participants’ SMBG values, which has been successfully correlated with laboratory HbA1c values; however, it does have some limitations and is best used as a population-level tool.
Nonetheless, this study demonstrates how engagement thresholds that minimize modelling errors can be used to create control-treatment samples in observational data and compute treatment effects. The recommended action within the study is based solely on the likelihood of the member attaining a better outcome and member preferences, and propensity to engage in an action during prediction was not considered. Therefore, real-life implementation of the recommender system would have to include the likelihood of engagement and likelihood of outcome while personalizing the action recommendations. Although treatment effects were computed, it was assumed that the interventions were independent of each other and analyzed separately; however, using Bayesian inference of treatment effect would have accounted for dependencies between interventions and is recommend for a future study to further explore RDMP personalization.
Personalized action recommendations using heterogeneous treatment effects to compute the impact of member actions within an RDMP to significantly reduce estimated A1c can be a valuable tool in driving member behaviors toward actions that are more likely to impact clinical outcomes. Future research is recommended to implement and evaluate this model prospectively within an RDMP.
area under the uplift curve
blood glucose
certified diabetes care and education specialist
doubly robust
hemoglobin A1c
mean squared error
remote diabetes monitoring program
self-monitoring of blood glucose
All the authors were employed by Teladoc Health (formerly Livongo) at the time of the study.