Assessing Digital Phenotyping for App Recommendations and Sustained Engagement: Cohort Study

doi:10.2196/62725

Original Paper

Division of Digital Psychiatry, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, United States

Corresponding Author:

John Torous, MD, MBI

Division of Digital Psychiatry

Beth Israel Deaconess Medical Center

Harvard Medical School

330 Brookline Avenue

Boston, MA, 02115

United States

Phone: 1 6176676700

Email: jtorous@bidmc.harvard.edu

Background: Low engagement with mental health apps continues to limit their impact. New approaches to help match patients to the right app may increase engagement by ensuring the app they are using is best suited to their mental health needs.

Objective: This study aims to pilot how digital phenotyping, using data from smartphone sensors to infer symptom, behavioral, and functional outcomes, could be used to match people to mental health apps and potentially increase engagement

Methods: After 1 week of collecting digital phenotyping data with the mindLAMP app (Beth Israel Deaconess Medical Center), participants were randomly assigned to the digital phenotyping arm, receiving feedback and recommendations based on those data to select 1 of 4 predetermined mental health apps (related to mood, anxiety, sleep, and fitness), or the control arm, selecting the same apps but without any feedback or recommendations. All participants used their selected app for 4 weeks with numerous metrics of engagement recorded, including objective screentime measures, self-reported engagement measures, and Digital Working Alliance Inventory scores.

Results: A total of 82 participants enrolled in the study; 17 (21%) dropped out of the digital phenotyping arm and 18 (22%) dropped out from the control arm. Across both groups, few participants chose or were recommended the insomnia or fitness app. The majority (39/47, 83%) used a depression or anxiety app. Engagement as measured by objective screen time and Digital Working Alliance Inventory scores were higher in the digital phenotyping arm. There was no correlation between self-reported and objective metrics of app use. Qualitative results highlighted the importance of habit formation in sustained app use.

Conclusions: The results suggest that digital phenotyping app recommendation is feasible and may increase engagement. This approach is generalizable to other apps beyond the 4 apps selected for use in this pilot, and practical for real-world use given that the study was conducted without any compensation or external incentives that may have biased results. Advances in digital phenotyping will likely make this method of app recommendation more personalized and thus of even greater interest.

JMIR Form Res 2024;8:e62725

doi:10.2196/62725

Keywords

engagement (435); mental health (2074); digital phenotype (28); pilot study (147); phenotyping (19); smartphone sensors (4); anxiety (823); sleep (247); fitness (63); depression (1219); qualitative (317); app recommendation (1); app use (7); mobile phone (3598)

While COVID-19 accelerated interest in mental health smartphone apps, limited patient use and engagement with these apps has emerged as a primary barrier to successful uptake [Nwosu A, Boardman S, Husain MM, Doraiswamy PM. Digital therapeutics for mental health: is attrition the Achilles heel? Front Psychiatry. 2022;13:900615. [FREE Full text] [CrossRef] [Medline]1,Woolley MG, Klimczak KS, Davis CH, Levin ME. Predictors of adherence to a publicly available self-guided digital mental health intervention. Cogn Behav Ther. 2024:1-15. [CrossRef] [Medline]2]. While the challenge of limited engagement has already been well documented and ascribed to numerous causes ranging from individual patient preferences to health care system barriers, there have been fewer efforts seeking to actually improve engagement [Balaskas A, Schueller SM, Cox AL, Doherty G. Understanding users' perspectives on mobile apps for anxiety management. Front Digit Health. 2022;4:854263. [FREE Full text] [CrossRef] [Medline]3,Forbes A, Keleher MR, Venditto M, DiBiasi F. Assessing patient adherence to and engagement with digital interventions for depression in clinical trials: systematic literature review. J Med Internet Res. 2023;25:e43727. [FREE Full text] [CrossRef] [Medline]4]. This study pilots 1 approach, digital navigator–guided app recommendation, to increase engagement and seeks to address methodological challenges with prior studies through objectively assessing app usage.

The challenges of low engagement with mental health apps have been well-known for nearly a decade. A landmark 2019 study [Baumel A, Muench F, Edan S, Kane JM. Objective user engagement with mental health apps: systematic search and panel-based usage analysis. J Med Internet Res. 2019;21(9):e14567. [FREE Full text] [CrossRef] [Medline]5] of 93 mental health–related smartphone apps found that the median 15-day retention rate was 3.9%. Numerous other studies confirm exponential decay in app engagement, regardless of the health condition, age, gender, or race of users [Torous J, Nicholas J, Larsen ME, Firth J, Christensen H. Clinical review of user engagement with mental health smartphone apps: evidence, theory and improvements. Evid Based Ment Health. 2018;21(3):116-119. [FREE Full text] [CrossRef] [Medline]6,Torous J, Staples P, Slaters L, Adams J, Sandoval L, Onnela JP, et al. Characterizing smartphone engagement for schizophrenia: results of a naturalist mobile health study. Clin Schizophr Relat Psychoses. 2017. [CrossRef] [Medline]7]. These low engagement numbers are further exacerbated by the low initial use of mental health apps. A 2023 survey of US veterans noted that while up to 76% reported apps are important for their mental health, only 5% ever reported having tried an app at least once [Jaworski BK, Ramsey KM, Taylor K, Heinz AJ, Senti S, Mackintosh MA, et al. Mental health apps and U.S. military veterans: perceived importance and utilization of the National Center for Posttraumatic Stress Disorder app portfolio. Psychol Serv. 2024;21(3):538-551. [CrossRef] [Medline]8].

Yet, appreciating the challenge of app engagement does not in itself offer actionable solutions. Recent reviews have covered broad reasons why people often do not download mental health apps as well as why they rapidly stop using them if they do download them [Balaskas A, Schueller SM, Cox AL, Doherty G. Understanding users' perspectives on mobile apps for anxiety management. Front Digit Health. 2022;4:854263. [FREE Full text] [CrossRef] [Medline]3]. Common themes raised to boost engagement often include the need for personalized app experiences, social and therapeutic support, customization, in-app guidance, and the use of sensors to offer users real-time feedback [Balaskas A, Schueller SM, Cox AL, Doherty G. Understanding users' perspectives on mobile apps for anxiety management. Front Digit Health. 2022;4:854263. [FREE Full text] [CrossRef] [Medline]3]. Yet, awareness of such themes raises the question: Will implementing these themes actually increase engagement? And if such a solution can work, will that method of increasing engagement be generalizable given that over 10,000 mental health apps exist today and research specific to each unique app is not practical or feasible [Torous J, Roberts LW. Needed innovation in digital health and smartphone applications for mental health: transparency and trust. JAMA Psychiatry. 2017;74(5):437-438. [CrossRef] [Medline]9].

One promising approach toward increasing engagement is the impact of digital navigator–guided app recommendation. A digital navigator is a member of the care team trained to perform digital health roles related to equity, digital literacy, app selection, and engagement [Chen K, Lane E, Burns J, Macrynikola N, Chang S, Torous J. The digital navigator: standardizing human technology support in app-integrated clinical care. Telemed J E Health. 2024;30(7):e1963-e1970. [CrossRef] [Medline]10]. There is strong research data to suggest that patients would like guidance from clinical teams around selecting an app [Posselt J, Baumann E, Dierks ML. A qualitative interview study of patients' attitudes towards and intention to use digital interventions for depressive disorders on prescription. Front Digit Health. 2024;6:1275569. [FREE Full text] [CrossRef] [Medline]11-Khan W, Jebanesan B, Ahmed S, Trimmer C, Agic B, Safa F, et al. Stakeholders' views and opinions on existing guidelines on "How to Choose Mental Health Apps". Front Public Health. 2023;11:1251050. [FREE Full text] [CrossRef] [Medline]13]. Yet, clinical teams are not aware of where to find evidence-based mental health apps and even fewer how to evaluate them [Rawnsley C, Stasiak K. Unlocking the digital toolbox—a mixed methods survey of New Zealand mental health clinicians’ knowledge, use and attitudes towards digital mental health interventions. J Technol Behav Sci. 2024. [CrossRef]14]. Indeed, in numerous surveys, clinical teams note that they actually want education about app evaluation [Khan W, Jebanesan B, Ahmed S, Trimmer C, Agic B, Safa F, et al. Stakeholders' views and opinions on existing guidelines on "How to Choose Mental Health Apps". Front Public Health. 2023;11:1251050. [FREE Full text] [CrossRef] [Medline]13,McGee-Vincent P, Mackintosh M, Jamison AL, Juhasz K, Becket-Davenport C, Bosch J, et al. Training staff across the veterans affairs health care system to use mobile mental health apps: a national quality improvement project. JMIR Ment Health. 2023;10:e41773. [FREE Full text] [CrossRef] [Medline]15,Suggs B, Sanderfer Stull M, Baker S, Erwin K, Savinsky D. A tide of technical trends: technology competence among licensed counselors. J Technol Counsel Educ Supervision. 2022;2(1):2. [CrossRef]16]. While several large health care systems have begun to offer app toolkits for their clinical teams to use with patients [Mordecai D, Histon T, Neuwirth E, Heisler WS, Kraft A, Bang Y, et al. How Kaiser Permanente created a mental health and wellness digital ecosystem. NEJM Catalyst. 2021;2(1):1. [CrossRef]17,Hoffman L, Benedetto E, Huang H, Grossman E, Kaluma D, Mann Z, et al. Augmenting mental health in primary care: a 1-year study of deploying smartphone apps in a multi-site primary care/behavioral health integration program. Front Psychiatry. 2019;10:94. [FREE Full text] [CrossRef] [Medline]18], efforts to help clinical teams recommend apps remain limited. One prior study found little impact of guidance on sustained engagement, but in this study, participants were limited to picking exercises within a single app platform [Mohr DC, Schueller SM, Tomasino KN, Kaiser SM, Alam N, Karr C, et al. Comparison of the effects of coaching and receipt of app recommendations on depression, anxiety, and engagement in the IntelliCare platform: factorial randomized controlled trial. J Med Internet Res. 2019;21(8):e13609. [CrossRef]19] that subsequently was shown to suffer from low uptake or engagement [Lattie EG, Cohen KA, Hersch E, Williams KD, Kruzan KP, MacIver C, et al. Uptake and effectiveness of a self-guided mobile app platform for college student mental health. Internet Interv. 2022;27:100493. [FREE Full text] [CrossRef] [Medline]20]. A prior study by our own team found that guided app recommendation did increase engagement with apps [Kopka M, Camacho E, Kwon S, Torous J. Exploring how informed mental health app selection may impact user engagement and satisfaction. PLOS Digit Health. 2023;2(3):e0000219. [FREE Full text] [CrossRef] [Medline]21] as compared to national rates.

However, no prior study has examined the impact of digital phenotyping on app recommendation. This involves accessing sensors on a patient’s smartphone to capture data related to behaviors (eg, sleep and mobility), cognition (eg, memory), and self-reported symptoms to better understand a patient’s state and use that information to match them to the best app for that state. For example, a patient who reports depression while at home may benefit from a cognitive behavioral therapy–focused app, and another who reports anxiety at work may benefit from a different app offering brief mindfulness exercises. Digital phenotyping methods can also be used to predict changes in anxiety and depression [Currey D, Torous J. Digital phenotyping data to predict symptom improvement and mental health app personalization in college students: prospective validation of a predictive model. J Med Internet Res. 2023;25:e39258. [FREE Full text] [CrossRef] [Medline]22], meaning that it may be possible to suggest mental health app use early and as a preventive approach.

In piloting how digital phenotyping may help improve app recommendation, there are many metrics to consider. The most important may be engagement, as without engagement, even the most effective app will not be impactful. Unfortunately, recent reviews confirm that there is no standard approach to measuring engagement, with the most common method to measure the percentage of patients who complete available modules [Khan W, Jebanesan B, Ahmed S, Trimmer C, Agic B, Safa F, et al. Stakeholders' views and opinions on existing guidelines on "How to Choose Mental Health Apps". Front Public Health. 2023;11:1251050. [FREE Full text] [CrossRef] [Medline]13]. This is problematic as not all apps have modules and the completion of modules may not always signify clinically meaningful engagement. Alternative means to measure engagement include time spent in the app and subjective reports of engagement. Yet, other means to assess engagement include newer metrics like the Digital Working Alliance Inventory (D-WAI), which assesses the degree of alliance a user has to an app and has previously been shown to predict app engagement and outcomes [Goldberg SB, Baldwin SA, Riordan KM, Torous J, Dahl CJ, Davidson RJ, et al. Alliance with an unguided smartphone app: validation of the Digital Working Alliance Inventory. Assessment. 2022;29(6):1331-1345. [FREE Full text] [CrossRef] [Medline]23]. Thus, in this study, we focused on multiple means to measure engagement with the secondary aim of assessing how the measurement of engagement itself, via subjective and objective metrics, may impact clinical outcomes.

This study seeks to improve mental health app engagement through piloting digital phenotyping–based recommendations. We hypothesize that this recommendation approach will lead to greater app engagement as compared to a control condition of participant self-selection of apps. As a secondary outcome, we explore different metrics of engagement and how different measures of engagement may inform different clinical outcomes related to app use. We hypothesize that subjective measures like self-reported engagement and D-WAI will better correlate to clinical outcomes as compared to objective measures of app use measured from screen-time logs.

Study Design

In this 5-week study, the first week was observational and used to gather digital phenotyping data on all participants. After this first week of data collection, all participants were randomly assigned to receive app recommendations based on their digital phenotyping data or to select an app without any assistance or data. Over the next 4 weeks, participants used their designated app and completed pre- or postintervention questionnaires via web-based study visits.

Participants

All participants were recruited from ResearchMatch. Inclusion criteria included being aged >18 years, being proficient in the English language, being able to sign an informed consent form through a web-based process, having a primary care physician or psychiatrist, owning an Apple or Android smartphone, and scoring higher than 5 on the General Anxiety Questionnaire-7 (GAD-7) at the initial visit.

Materials

All participants used 2 digital health apps throughout the study. The first app that every participant used was mindLAMP, an app developed by the Division of Digital Psychiatry at Beth Israel Deaconess Medical Center (BIDMC) [Torous J, Wisniewski H, Bird B, Carpenter E, David G, Elejalde E, et al. Creating a digital health smartphone app and digital phenotyping platform for mental health and diverse healthcare needs: an interdisciplinary and collaborative approach. J Technol Behav Sci. 2019;4(2):73-85. [CrossRef]24]. In this study, mindLAMP served solely as a digital phenotyping data collection tool. The second app varied between participants and served as an intervention tool. Participants downloaded 1 of 4 intervention apps: UCLA Mindful (University of California Los Angeles Health), How We Feel (HWF Project Inc)., Insomnia Coach (US Department of Veterans Affairs), or Nike Training Club (Nike, Inc).

mindLAMP App

mindLAMP is a digital phenotyping app developed at BIDMC [Torous J, Wisniewski H, Bird B, Carpenter E, David G, Elejalde E, et al. Creating a digital health smartphone app and digital phenotyping platform for mental health and diverse healthcare needs: an interdisciplinary and collaborative approach. J Technol Behav Sci. 2019;4(2):73-85. [CrossRef]24,Vaidyam A, Halamka J, Torous J. Enabling research and clinical use of patient-generated health data (the mindLAMP platform): digital phenotyping study. JMIR Mhealth Uhealth. 2022;10(1):e30557. [FREE Full text] [CrossRef] [Medline]25]. mindLAMP has a customizable interface with 5 main sections: feed, learn, assess, manage, and portal. While it can be customized to offer both interventional and data capacities, this study only used its data collection capacity including custom surveys and sensors (GPS, accelerometer, and screen use metrics; Figure 1).

Interventional Mental Health Apps

All participants were designated to engage with 1 of 4 health apps for 4 weeks: UCLA Mindful, How We Feel, Insomnia Coach, and Nike Training Club. UCLA Mindful offers guided meditations for users [UCLA Mindful. Apple. URL: https://apps.apple.com/us/app/ucla-mindful/id1459128935) [accessed 2019-05-23] 26]. How We Feel is a mood-tracking app that offers a range of emotions for users to choose from while also tracking aspects of their physical health such as sleep and exercise [How We feel. Apple. 2022. URL: https://apps.apple.com/us/app/how-we-feel/id1562706384 [accessed 2022-04-18] 27]. Insomnia Coach guides users with their sleep through cognitive behavioral therapy and offers weekly training with a sleep coach, tips, a log, and a diary [Insomnia Coach. Apple. 2018. URL: https://apps.apple.com/ca/app/insomnia-coach/id1341944736 [accessed 2018-02-06] 28]. Nike Training Club Fitness offers home workouts to healthy recipes [Nike Training Club: Wellness. Apple. 2009. URL: https://apps.apple.com/us/app/nike-training-club-wellness/id301521403 [accessed 2009-01-15] 29]. All apps were found through the Mobile Health Index and Navigation Database (MINDapps) developed by the Division of Digital Psychiatry [Lagan S, Aquino P, Emerson MR, Fortuna K, Walker R, Torous J. Actionable health app evaluation: translating expert frameworks into objective metrics. NPJ Digit Med. 2020;3:100. [FREE Full text] [CrossRef] [Medline]30]. For this study, we created a new filter on MINDapps to display the app(s) when participants were recommended or selected an app (Figure 2). The selection of apps to include was based on feedback from patients in our clinic, advisory board, volunteer MINDapps app raters, and our prior research on app engagement [Kopka M, Camacho E, Kwon S, Torous J. Exploring how informed mental health app selection may impact user engagement and satisfaction. PLOS Digit Health. 2023;2(3):e0000219. [FREE Full text] [CrossRef] [Medline]21].

**Figure 2.** Mental health apps from MINDapps with the Beth Israel Deaconess Medical Center study filter that identified apps selected for this study. MINDapps: Mobile Health Index and Navigation Database.

Data Collection

Overview

Both active and passive data features were collected as a part of this study. Passive data collection was continuously collected through the mindLAMP app for the duration of the study. Passive data specifically refers to GPS, accelerometer, phone use (eg, screen on/off and phone on/off), and step count data. Active data were categorized as survey responses and were collected at different time points. We also collected objective data on the use of each app, which were reported via the participant taking a screenshot of their screen time page in their Settings app and not possible to automatically capture with digital phenotyping across Apple and Android devices (Figure 3). Participants were asked to take screenshots of their Screen Time page at the final study visit as part of the digital data collection.

**Figure 3.** Screen Time page in the Settings app: (A) overall screen time and (B) app-specific screen time (How We Feel).

Active Data (Surveys)

This study included a total of 11 surveys completed at different time points through REDCap (Research Electronic Data Capture; Vanderbilt University) and mindLAMP (Figure 4). On study visit days (3 times), participants completed a battery of standardized neuropsychiatric tests on symptoms and cognition to establish baseline, interim, and evaluation scores. The psychiatric scales consisted of GAD-7, the Insomnia Index Scale (ISI) + question about the time of sleep onset or offset, the Social Interaction Anxiety Scale (SIAS), the UCLA (University of California, Los Angeles) Loneliness Scale, the Flourishing Scale, and the Perceived Stress Scale-10 (PSS-10). During the intake appointment, researchers completed the Clinical Global Impression Scale to evaluate the participant’s illness severity. Four additional surveys were administered throughout the study: the Daily Survey, the System Usability Scale, D-WAI, and the final engagement survey. The daily survey was developed by the Digital Psychiatry research team and consisted of 6 questions to briefly assess daily activity mental health status (see

Multimedia Appendix 1

Scales developed by the Division of Digital Psychiatry.

DOCX File , 13 KB Multimedia Appendix 1 for the full survey). Participants took the daily survey twice per day between visits 1 and 2. Between visits 2 and 3, participants reduced daily survey completion to 3 times per week. The System Usability Scale and D-WAI were completed during visits 1, 2, and 3. They are standardized scales used to assess app usability or perceived satisfaction and the therapeutic alliance in smartphone-based interventions, respectively [,]. The final engagement survey was also developed by the Digital Psychiatry team (see for the full survey) and completed on the final day of the study to understand participants’ perception of their engagement with the interventional mental health app downloaded during visit 2.

**Figure 4.** Study flow and data collection.

Group 1: Precision App Recommendation (Experimental)

After a week of capturing digital phenotyping data about the participant, the digital navigator reviewed visualizations of those data and shared them back with the participant. The data visualizations are shown in Figures 5 and Torous J, Nicholas J, Larsen ME, Firth J, Christensen H. Clinical review of user engagement with mental health smartphone apps: evidence, theory and improvements. Evid Based Ment Health. 2018;21(3):116-119. [FREE Full text] [CrossRef] [Medline]6, consisting of a radar plot (Figure 5) and correlation matrix (Figure 6) of their data from the previous week. To recommend an app, the digital navigator assessed how mental health correlated with functioning. They selected mental health targets that featured elevated correlations for impaired functioning and persistence of this relationship over the week and ultimately recommended an app that targeted the identified symptoms.

**Figure 5.** Radar plot of participant’s score on GAD-7, ISI + question about time of sleep onset/offset, Social Interaction Anxiety Scale, UCLA Loneliness Scale, the Flourishing Scale, and PSS. GAD-7: General Anxiety Questionnaire 7; ISI: Insomnia Severity Index; PSS: Perceived Stress Scale; UCLA: University of California, Los Angeles.

**Figure 6.** Correlation matrix displaying factors from participants’ active and passive data collected via mindLAMP. Daily questionnaires related to anxiety, depression, sleep, exercise, and difficulty functioning are correlated with passive factors such as screen duration, entropy, and hometime.

Group 2: Unguided App Selection (Control)

Participants in the control group also collected digital phenotyping for 1 week, but these data were not shared back with them until after the study. They were asked for 1 week to select 1 of the same 4 mental health apps (UCLA Mindful, How We Feel, Nike Training Club, or Insomnia Coach) without guidance from their digital phenotyping data or the digital navigator. A standardized description of each app was given to the participants in the control group prior to their choosing.

Interventional App Use (Both Groups)

After downloading 1 of the 4 mental health apps, both groups were asked to use that app for the remaining 4 weeks of the study. With the goal of capturing naturalistic engagement, participants were not given specific instructions regarding how or when to interact with the app. Researchers instructed participants to “Engage with the app in your daily life as you see fit.” After 4 weeks, participants had their third and final meeting where their objective (screen time) and self-report (final engagement survey) engagement data were collected.

Data Analysis Techniques

To assess if a personalized app recommendation would increase engagement effectively, we assessed correlations between self-reported engagement and objective engagement. To measure this relationship, an ordinary least squares (OLS) linear regression model was implemented using the statsmodels library in Python (Python Software Foundation). To assess the relationship between participants’ engagement with their respective apps and the participant’s anxiety symptoms, a series of OLS linear regressions were performed using the statsmodels library.

A regression model was created for every measure of engagement, which was either objective (mean app screen time) or subjective (participant’s self-reported engagement gathered from a survey). From these measures of engagement, we compared them to the change in structured surveys they took during the study. These surveys include the GAD-7, Flourishing Scale, UCLA Loneliness Scale, SIAS, ISI, and PSS-10. The specific change in structured survey scores was calculated from appointment 2 (when app use occurred) to appointment 3 (the final appointment).

Qualitative Analysis

Following the Braun and Clarke [Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3(2):77-101. [CrossRef]32] framework, a group of 5 research assistants initially reviewed the raw responses to the open-ended questions in the final engagement survey. They identified themes associated with the use and engagement of the app and added them to a table: notifications, memory, ease of use, and the content of the app. Each individual’s final engagement survey was printed out and rated by at least 2 research assistants to ensure interrater reliability. They marked where they saw the theme and indicated whether it seemed positive or negative (ie, “The notifications were annoying” → notifications → negative). In the case of dispute, an additional research assistant contributed a rating until a consensus was identified. A spreadsheet was developed to indicate themes and positive, negative, or both associations.

Ethical Considerations

This study was approved by the BIDMC institutional review board as protocol 2022P001143. Written informed consent regarding primary and potential secondary data analyses of research data was collected and documented for all participants via the REDCap (version 14.0.30; REDCap Consortium). The protected health information of participants was securely stored on REDCap, which is a Health Insurance Portability and Accountability Act–compliant, web-based app specifically designed for research data collection and management. Study data were subsequently deidentified during the analysis process. No identification of individual participants or users is included in this paper. Participants did not receive any form of compensation for this study.

Demographics and Groups

A total of 82 adults were recruited and enrolled in the study. There were no significant differences in sex for the control and experimental groups (P=1.0). There were no significant differences in baseline anxiety or depressive symptoms in each group (P>.05).

A total of 35 (43%) participants dropped out; 22 (27%) dropped out of the study after the first meeting, divided evenly between the treatment (n=11, 13%) and control groups (n=11, 13%). Of those participants, 9 (11%) left for unknown reasons, 6 (7%) lost interest, 3 (4%) left due to the time commitment, 3 (4%) for data quality reasons, and 1 (1%) participant left for a family emergency. After the second meeting, 13 (16%) participants dropped out (control: n=7, 8%; treatment: n=6, 7%). Two (2%) left for data quality reasons while the rest (n=11, 13%) left for unknown reasons. Table 1 below shows the full breakdown of the demographics for all 47 remaining participants.

Of note, only How We Feel and UCLA Mindful had enough participants to complete the study while using the app to produce meaningful results.

Table 1. Demographics of participants (n=47).

Sample characteristics			Values
Age (years), mean (SD)			43.0 (16)
Sex, n (%)
	Female	37 (79)
	Male	7 (15)
	Other	3 (6)
Race
	Black or African American	2 (4.2)
	White	42 (89)
	Multiracial or other	3 (6)
Education
	High school graduate or GED^a	1 (2)
	Some college	10 (21)
	4-year college graduate	20 (43)
	Master’s degree or higher	14 (30)
	Missing	2 (4)

^aGED: General Educational Development.

Engagement

To determine if a personalized app recommendation would increase engagement at a population level, we plotted the mean objective engagement (mean screen time) of the control and experimental groups broken down by app (Figure 7).

**Figure 7.** Mean screen time (in minutes) for (A) How We Feel and (B) UCLA Mindful apps in control versus experimental groups across weeks.

How We Feel App

In both the control and experimental groups for the How We Feel app, mean screen time was the highest during the first week of use and steadily declined throughout the 4 weeks (Figure 7A). While statistically insignificant, the experimental group for How We Feel showed higher screen time overall (P=.36).

UCLA Mindful App

In the UCLA Mindful app group, participants in the control group barely used the app after week 2 of the study, while the experimental group tended to use the app more in the beginning, with a steep drop at the end of the study (Figure 7B).

Self-Reported Engagement Versus Objective Engagement

The mean screen time values across apps could not be directly compared to self-reported engagement. In order to compare all apps against each other, we used the MinMaxScaler function from the sklearn Python library to map all mean screen time values to a 0-1 range for each app separately before combining the data for analysis.

Through OLS regression we found no significant correlation (R²=0.0188; P=.39) between self-reported engagement and scaled screen time in our pilot results. When participants were asked to rate their engagement on a scale of 1-10 (10=highest engagement), the control group’s mean rating was 6.42 (SD 2.52) as compared to the experimental group’s mean rating of 6.30 (SD 2.1).

Engagement Versus D-WAI

In addition to comparing self-reported engagement to mean screen time, we also compared both self-reported engagement and mean screen time to the participant’s mean score on the D-WAI scale (Figure 8) using the MinMaxScaler function noted earlier.

**Figure 8.** Self-reported engagement (1-10) and scaled screen time (0-1) versus mean D-WAI (0-50) for all participants and apps with regression line and 95% CI. D-WAI: Digital Working Alliance Inventory.

There was a significant positive correlation between self-reported engagement and D-WAI scores (R²=0.1199; P=.02; coefficient=1.0238) but no equivalent correlation between scaled screen time and D-WAI values (R²=0.0013; P=.83; coefficient= –0.7849). In both cases, these findings were driven more by the control group than the experimental group.

Engagement Versus Change in Structured Survey Scores

In addition to comparing different types of engagement, a preliminary analysis compared engagement scores to changes in various clinical surveys. We explored how engagement metrics correlated with clinical symptom score changes after 1 month of app use in all participants. Overall, correlations between app engagement (via any metric) and clinical changes (via any survey) were small and most results were not statistically significant (P>.01). The small sample size precludes us from making any significant claims about the findings. A full table of results from our regression analyses for the How We Feel and UCLA Mindful apps can be found in

Multimedia Appendix 2

Engagement versus change in the structured survey scores.

DOCX File , 17 KB Multimedia Appendix 2.

Qualitative Results