Predicting Depression in Adolescents Using Mobile and Wearable Sensors: Multimodal Machine Learning–Based Exploratory Study

doi:10.2196/35807

Original Paper

¹Department of Engineering Systems and Environment, University of Virginia, Charlottesville, VA, United States

²Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, United States

³NuRelm, Pittsburgh, PA, United States

Corresponding Author:

Tahsin Mullick, MEng

Department of Engineering Systems and Environment

University of Virginia

Olsson Hall, 151 Engineer's Way

Charlottesville, VA, 22904

United States

Phone: 1 434 243 5823

Email: tum7q@virginia.edu

Background: Depression levels in adolescents have trended upward over the past several years. According to a 2020 survey by the National Survey on Drug Use and Health, 4.1 million US adolescents have experienced at least one major depressive episode. This number constitutes approximately 16% of adolescents aged 12 to 17 years. However, only 32.3% of adolescents received some form of specialized or nonspecialized treatment. Identifying worsening symptoms earlier using mobile and wearable sensors may lead to earlier intervention. Most studies on predicting depression using sensor-based data are geared toward the adult population. Very few studies look into predicting depression in adolescents.

Objective: The aim of our work was to study passively sensed data from adolescents with depression and investigate the predictive capabilities of 2 machine learning approaches to predict depression scores and change in depression levels in adolescents. This work also provided an in-depth analysis of sensor features that serve as key indicators of change in depressive symptoms and the effect of variation of data samples on model accuracy levels.

Methods: This study included 55 adolescents with symptoms of depression aged 12 to 17 years. Each participant was passively monitored through smartphone sensors and Fitbit wearable devices for 24 weeks. Passive sensors collected call, conversation, location, and heart rate information daily. Following data preprocessing, 67% (37/55) of the participants in the aggregated data set were analyzed. Weekly Patient Health Questionnaire-9 surveys answered by participants served as the ground truth. We applied regression-based approaches to predict the Patient Health Questionnaire-9 depression score and change in depression severity. These approaches were consolidated using universal and personalized modeling strategies. The universal strategies consisted of Leave One Participant Out and Leave Week X Out. The personalized strategy models were based on Accumulated Weeks and Leave One Week One User Instance Out. Linear and nonlinear machine learning algorithms were trained to model the data.

Results: We observed that personalized approaches performed better on adolescent depression prediction compared with universal approaches. The best models were able to predict depression score and weekly change in depression level with root mean squared errors of 2.83 and 3.21, respectively, following the Accumulated Weeks personalized modeling strategy. Our feature importance investigation showed that the contribution of screen-, call-, and location-based features influenced optimal models and were predictive of adolescent depression.

Conclusions: This study provides insight into the feasibility of using passively sensed data for predicting adolescent depression. We demonstrated prediction capabilities in terms of depression score and change in depression level. The prediction results revealed that personalized models performed better on adolescents than universal approaches. Feature importance provided a better understanding of depression and sensor data. Our findings can help in the development of advanced adolescent depression predictions.

JMIR Form Res 2022;6(6):e35807

doi:10.2196/35807

Keywords

adolescent (482); depression (1219); uHealth (10); machine learning (1728); mobile phone (3598)

Background

According to the World Health Organization, half of all mental health conditions start at the age of 14 years, but most cases are undetected and untreated. Among mental health conditions, depression is one of the leading causes of illness and disability among adolescents [Adolescent mental health. World Health Organization. URL: https://www.who.int/news-room/fact-sheets/detail/adolescent-mental-health [accessed 2021-01-02] 1], the most likely mental illness to be a risk factor for suicide [Beautrais A. Risk factors for suicide and attempted suicide among young people. Aust N Z J Psychiatry 2000 Jun;34(3):420-436 [FREE Full text] [CrossRef] [Medline]2], the second leading cause of death among US adolescents [10 leading causes of death, United States. Centers for Disease Control and Prevention. URL: https://wisqars.cdc.gov/data/lcd/home [accessed 2021-01-02] 3], and among the top causes of death in adolescents worldwide [Dick B, Ferguson BJ. Health for the world's adolescents: a second chance in the second decade. J Adolesc Health 2015 Jan;56(1):3-6 [FREE Full text] [CrossRef] [Medline]4].

Major depressive disorder, more commonly termed depression, can be defined as a medical disorder that results in negative feelings in a person’s thoughts or actions. The effects of depression are both emotional and physical [What is depression? American Psychiatric Association. URL: https://www.psychiatry.org/patients-families/depression/what-is-depression [accessed 2021-02-23] 5]. The sources of depression are varied and include biochemical changes, genetics, personality traits, and environmental factors [What is depression? American Psychiatric Association. URL: https://www.psychiatry.org/patients-families/depression/what-is-depression [accessed 2021-02-23] 5]. Depression has a combination of effects that play a role in its diagnosis, such as alteration in mood, negative self-image, self-punitive desires, vegetative changes, and physiological changes such as activity retardation or agitation [Beck AT, Alford BA. Depression: Causes and Treatment. Philadelphia: University of Pennsylvania Press; 2014.6].

Depression is difficult to monitor or regulate in adolescents as part of the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, diagnosis [Bolton D. Overdiagnosis problems in the DSM-IV and the new DSM-5: can they be resolved by the distress-impairment criterion? Can J Psychiatry 2013 Nov 01;58(11):612-617. [CrossRef] [Medline]7] includes not only depressive symptoms but also irritability, which may be difficult to distinguish from typical adolescent behavior. As an internalizing disorder that is expressed more through thoughts and not actions, worsening depressive symptoms can be more difficult for others such as parents or caregivers to identify [Logan D, King CA. Parental identification of depression and mental health service use among depressed adolescents. J Am Acad Child Adolesc Psychiatry 2002 Mar;41(3):296-304 [FREE Full text] [CrossRef] [Medline]8]. Adolescents also report using cognitive coping strategies far less often than adults [Garnefski N, Legerstee J, Kraaij VV, Van Den Kommer T, Teerds J. Cognitive coping strategies and symptoms of depression and anxiety: a comparison between adolescents and adults. J Adolesc 2002 Dec;25(6):603-611 [FREE Full text] [CrossRef] [Medline]9]. In a study on adolescent mental health literacy, it was found that <50% of adolescents were able to identify depression [Burns JR, Rapee RM. Adolescent mental health literacy: young people's knowledge of depression and help seeking. J Adolesc 2006 Apr;29(2):225-239 [FREE Full text] [CrossRef] [Medline]10]. Although earlier intervention on symptom worsening improves outcomes in depression, adolescents themselves and their caregivers not being able to identify these symptoms serves as a barrier [Davey CG, McGorry PD. Early intervention for depression in young people: a blind spot in mental health care. Lancet Psychiatry 2019 Mar;6(3):267-272 [FREE Full text] [CrossRef]11]. This indicates a strong need for interventions that can assist adolescents and their caregivers in monitoring the symptoms of depression earlier.

The result of not addressing depression can extend into adulthood, impairing both physical and mental health and limiting future employment opportunities and the potential to lead satisfied lives [Keenan-Miller D, Hammen CL, Brennan PA. Health outcomes related to early adolescent depression. J Adolesc Health 2007 Sep;41(3):256-262 [FREE Full text] [CrossRef] [Medline]12]. With the increased use of screening tools such as the Patient Health Questionnaire-9 (PHQ-9), mental health clinicians and primary care providers can more efficiently screen for depression. However, screening does not always lead to a substantial increase in treatment engagement [Hacker K, Arsenault L, Franco I, Shaligram D, Sidor M, Olfson M, et al. Referral and follow-up after mental health screening in commercially insured adolescents. J Adolesc Health 2014 Jul;55(1):17-23 [FREE Full text] [CrossRef] [Medline]13]. Measurement-based care [Elmquist J, Melton TK, Croarkin P, McClintock SM. A systematic overview of measurement-based care in the treatment of childhood and adolescent depression. J Psychiatr Pract 2010 Jul;16(4):217-234. [CrossRef] [Medline]14] or using these validated screening tools as recurring to identify, monitor, and treat depressive symptoms results in improved outcomes for patients with depression by identifying and intervening earlier on worsening nonresponsive symptoms or their treatment [Guo T, Xiang YT, Xiao L, Hu CQ, Chiu HF, Ungvari GS, et al. Measurement-based care versus standard care for major depression: a randomized controlled trial with blind raters. Am J Psychiatry 2015 Oct;172(10):1004-1013 [FREE Full text] [CrossRef] [Medline]15].

The success of validated screening tools has provided mental health clinicians and primary care providers with better assessment tools for symptom severity and, especially with the ability to embed these tools in electronic health records, more frequent monitoring may result in earlier intervention and improved care. Unfortunately, constant monitoring of depression symptomatology is still far from reality. With the advent of mobile phones, fitness trackers, and their inbuilt sensors, this can be made possible. Our goal is to look closely into adolescent depression through the eyes of passively sensed data and evaluate machine learning (ML) approaches that offer predictive capabilities.

By exploring approaches to adolescent depression data, we want to enable the future development of apps geared toward the continuous monitoring of patients experiencing depression and allow clinicians, adolescents, and their parents the opportunity to take preventive or earlier actions.

This study was aimed at using passively sensed data to generate predictions on depression levels and change in depression levels. The predictions took on both universal and personalized modeling approaches. We then determined key contextual features that affected our ML models. Finally, we presented how the performance of personalized models changed over time and across data samples.

Related Work

Related work in this section takes an inverse pyramid approach to describe the state of the art in mobile sensing for health apps and then focuses on the impact in the space of mental health.

Mobile-Based Sensing for Health Apps

Mobile sensing has been an active research area in health apps. A number of studies [Bánhalmi A, Borbás J, Fidrich M, Bilicki V, Gingl Z, Rudas L. Analysis of a pulse rate variability measurement using a smartphone camera. J Healthc Eng 2018;2018:4038034 [FREE Full text] [CrossRef] [Medline]16-Koenig N, Seeck A, Eckstein J, Mainka A, Huebner T, Voss A, et al. Validation of a new heart rate measurement algorithm for fingertip recording of video signals with smartphones. Telemed J E Health 2016 Aug;22(8):631-636 [FREE Full text] [CrossRef] [Medline]18] have analyzed areas of cardiovascular health and sensed participant heart rate and heart rate variability with the help of mobile camera sensors. Areas of study such as sleep have benefited from mobile sensing by using sensors to detect sleep quality and sleep states using ML [Min JK, Doryab A, Wiese J, Amini S, Zimmerman J, Hong JI. Toss 'n' turn: smartphone as sleep and sleep quality detector. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2014 Presented at: CHI '14: CHI Conference on Human Factors in Computing Systems; Apr 26- May 1, 2014; Toronto Ontario Canada URL: https://doi.org/10.1145/2556288.2557220 [CrossRef]19]. Further studies on sleep have explored both supervised and unsupervised approaches to detect sleep variation in contextual settings [Huang K, Ding X, Xu J, Chen G, Ding W. Monitoring sleep and detecting irregular nights through unconstrained smartphone sensing. In: Proceedings of the 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom). 2015 Presented at: 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom); Aug 10-14, 2015; Beijing, China URL: https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.30 [CrossRef]20-Lin Y, Wong BY, Lin SH, Chiu YC, Pan YC, Lee YH. Development of a mobile application (App) to delineate "digital chronotype" and the effects of delayed chronotype by bedtime smartphone use. J Psychiatr Res 2019 Mar;110:9-15 [FREE Full text] [CrossRef] [Medline]22]. Mobile sensors such as accelerometers, gyroscopes, and GPSs have been used to model human behavior and cognition through contextualized feature extraction [Seera M, Loo CK, Lim CP. A hybrid FMM-CART model for human activity recognition. In: Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 2014 Presented at: 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC); Oct 5-8, 2014; San Diego, CA, USA URL: https://doi.org/10.1109/SMC.2014.6973904 [CrossRef]23,Doryab A, Togelius J, Bardram J. Activity-aware recommendation for collaborative work in operating rooms. In: Proceedings of the 2012 ACM international conference on Intelligent User Interfaces. 2012 Presented at: IUI '12: 17th International Conference on Intelligent User Interfaces; Feb 14 - 17, 2012; Lisbon Portugal URL: https://doi.org/10.1145/2166966.2167023 [CrossRef]24]. Studies on overall health and well-being have combined the aforementioned sensing capabilities to help promote general health. For example, the use of health apps to monitor human behavior through sleep, physical activity, and social interaction [Lin M, Lane DN, Mohammod M, Yang X, Lu H, Cardone G, et al. BeWell+: multi-dimensional wellbeing monitoring with community-guided user feedback and energy optimization. In: Proceedings of the conference on Wireless Health. 2012 Presented at: WH '12: Wireless Health 2012; Oct 23 - 25, 2012; San Diego California URL: https://doi.org/10.1145/2448096.2448106 [CrossRef]25] has been found to show improvement in behavior patterns. Another example of general well-being [Liu C, Chan CT. Exercise performance measurement with smartphone embedded sensor for well-being management. Int J Environ Res Public Health 2016 Oct 11;13(10):1001 [FREE Full text] [CrossRef] [Medline]26] generates an index as a medium of feedback for improving health through exercise-based goal setting. All of the aforementioned studies have shown the efficacy of using mobile sensing to predict or diagnose health-related changes. The next subsection delves into how mobile sensing is changing mental health.

Mobile Sensing in Mental Health Apps

Mobile sensing–based mental health studies have been conducted in the areas of bipolar disorder [Beiwinkel T, Kindermann S, Maier A, Kerl C, Moock J, Barbian G, et al. Using smartphones to monitor bipolar disorder symptoms: a pilot study. JMIR Ment Health 2016 Jan 06;3(1):e2 [FREE Full text] [CrossRef] [Medline]27], schizophrenia [Wang R, Aung MS, Abdullah S, Brian R, Campbell AT, Choudhury T, et al. CrossCheck: toward passive sensing and detection of mental health changes in people with schizophrenia. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 2016 Presented at: UbiComp '16: The 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing; Sep 12 - 16, 2016; Heidelberg Germany URL: https://doi.org/10.1145/2971648.2971740 [CrossRef]28], anxiety [Huckins JF, DaSilva AW, Hedlund EL, Murphy EI, Rogers C, Wang W, et al. Causal factors of anxiety and depression in college students: longitudinal ecological momentary assessment and causal analysis using peter and clark momentary conditional independence. JMIR Ment Health 2020 Jun 10;7(6):e16684 [FREE Full text] [CrossRef] [Medline]29,Costa J, Adams AT, Jung MF, Guimbretière F, Choudhury T. Emotioncheck:: a wearable device to regulate anxiety through false heart rate feedback. GetMobile 2017 Jun;21(2):22-25 [FREE Full text] [CrossRef] [Medline]30], stress [Lu H, Frauendorfer D, Rabbi M, Mast MS, Chittaranjan GT, Campbell AT, et al. StressSense: detecting stress in unconstrained acoustic environments using smartphones. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing. 2012 Presented at: Ubicomp '12: The 2012 ACM Conference on Ubiquitous Computing; Sep 5 - 8, 2012; Pittsburgh Pennsylvania URL: https://doi.org/10.1145/2370216.2370270 [CrossRef]31,Low C, Dey AK, Ferreira D, Kamarck T, Sun W, Bae S, et al. Estimation of symptom severity during chemotherapy from passively sensed data: exploratory study. J Med Internet Res 2017 Dec 19;19(12):e420 [FREE Full text] [CrossRef] [Medline]32], and depression [Canzian L, Musoles M. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 2015 Presented at: UbiComp '15: The 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing; Sep 7 - 11, 2015; Osaka Japan URL: https://doi.org/10.1145/2750858.2805845 [CrossRef]33-Cao J, Truong AL, Banu S, Shah AA, Sabharwal A, Moukaddam N. Tracking and predicting depressive symptoms of adolescents using smartphone-based self-reports, parental evaluations, and passive phone sensor data: development and usability study. JMIR Ment Health 2020 Jan 24;7(1):e14045 [FREE Full text] [CrossRef] [Medline]39]. These studies have shown that mobile sensing can play an integral role in detecting and predicting mental health–related problems. Daily mood, physical activity, and social communication tracking of participants helped predict symptoms of bipolar relapse [Beiwinkel T, Kindermann S, Maier A, Kerl C, Moock J, Barbian G, et al. Using smartphones to monitor bipolar disorder symptoms: a pilot study. JMIR Ment Health 2016 Jan 06;3(1):e2 [FREE Full text] [CrossRef] [Medline]27]. This was achieved using random coefficient methods to analyze the relationship between phone-based data and the rating of manic and depressive symptoms. Schizophrenia is another mental condition in which passive sensing has demonstrated predictive capability by showing the relationships between tracked features as indicators of schizophrenia [Wang R, Aung MS, Abdullah S, Brian R, Campbell AT, Choudhury T, et al. CrossCheck: toward passive sensing and detection of mental health changes in people with schizophrenia. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 2016 Presented at: UbiComp '16: The 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing; Sep 12 - 16, 2016; Heidelberg Germany URL: https://doi.org/10.1145/2971648.2971740 [CrossRef]28]. The study used bivariate analysis and tree-based methods to perform ecological momentary assessment scores. Depression and anxiety in college students were studied, in particular the effect of stress and self-esteem, using the tool of causal networks derived from time-series sensor data [Huckins JF, DaSilva AW, Hedlund EL, Murphy EI, Rogers C, Wang W, et al. Causal factors of anxiety and depression in college students: longitudinal ecological momentary assessment and causal analysis using peter and clark momentary conditional independence. JMIR Ment Health 2020 Jun 10;7(6):e16684 [FREE Full text] [CrossRef] [Medline]29]. These data helped in understanding the causal relationship between anxiety, depression, and stress. Anxiety regulation using wearable devices was explored through false feedback of slow heart rate [Costa J, Adams AT, Jung MF, Guimbretière F, Choudhury T. Emotioncheck:: a wearable device to regulate anxiety through false heart rate feedback. GetMobile 2017 Jun;21(2):22-25 [FREE Full text] [CrossRef] [Medline]30] and was found to be beneficial for helping control anxiety symptoms. Researchers have been successful in tracking physiological changes during stress using voice sensing across different acoustic environments and individuals [Lu H, Frauendorfer D, Rabbi M, Mast MS, Chittaranjan GT, Campbell AT, et al. StressSense: detecting stress in unconstrained acoustic environments using smartphones. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing. 2012 Presented at: Ubicomp '12: The 2012 ACM Conference on Ubiquitous Computing; Sep 5 - 8, 2012; Pittsburgh Pennsylvania URL: https://doi.org/10.1145/2370216.2370270 [CrossRef]31]. Patients undergoing chemotherapy were studied using passively sensed data. This exploratory study used instruments of random forest classifiers showing a strong correlation between sedentary behavior, less time spent in light physical activity, and other factors such as longer onscreen time and app interactions [Low C, Dey AK, Ferreira D, Kamarck T, Sun W, Bae S, et al. Estimation of symptom severity during chemotherapy from passively sensed data: exploratory study. J Med Internet Res 2017 Dec 19;19(12):e420 [FREE Full text] [CrossRef] [Medline]32]. All of these studies provide sufficient evidence to consider passively sensed data as an effective method to track mental health, which provides support for our approach in this study.

One of the first studies to use mobile phones for depression used GPS data to track participant mobility [Canzian L, Musoles M. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 2015 Presented at: UbiComp '15: The 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing; Sep 7 - 11, 2015; Osaka Japan URL: https://doi.org/10.1145/2750858.2805845 [CrossRef]33]. The study provided evidence of a correlation between location-based data and depressive mood. In addition to GPS, phone use has been another feature to exhibit a strong relationship with depression severity [Saeb S, Zhang MI, Karr CJ, Schueller SM, Corden ME, Kording KP, et al. Mobile phone sensor correlates of depressive symptom severity in daily-life behavior: an exploratory study. J Med Internet Res 2015 Jul 15;17(7):e175 [FREE Full text] [CrossRef] [Medline]34]. The study extracted features such as phone use frequency and duration along with GPS-based features such as location variance and normalized entropy to show the correlation with depression. Behavior in people with depression has also been investigated by monitoring additional features such as sleep and social interaction through smartphones [Doryab A, Min JK, Wiese J, Zimmerman J, Hong J. Detection of behavior change in people with depression. In: Proceedings of the AAAI 14. 2014 Presented at: The Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI-14); Jul 27–31, 2014; Quebec Canada URL: https://www.aaai.org/ocs/index.php/WS/AAAIW14/paper/viewPaper/885035]. Multimodal features were gradually introduced into the research sphere to derive a contextual filtering of features that detected depression in college students and showed that multimodal feature information could outperform unimodal features [Xu X, Chikersal P, Doryab A, Villalba DK, Dutcher JM, Tumminia MJ, et al. Leveraging routine behavior and contextually-filtered features for depression detection among college students. Proc ACM Interact Mob Wearable Ubiquitous Technol 2019 Sep 09;3(3):1-33 [FREE Full text] [CrossRef]36]. This study used association rule mining to choose features and applied standard ML to detect depression, showing the merit in using multimodal features. Detecting depression is dependent on the approach used; analyzing the problem from the perspective of longitudinal data and exploring changes in depressive symptoms were shown to generate good accuracy [Chikersal P, Doryab A, Tumminia M, Villalba DK, Dutcher JM, Liu X, Dey AK. Detecting depression and predicting its onset using longitudinal symptoms captured by passive sensing. ACM Trans Comput Hum Interact 2021 Feb 28;28(1):1-41 [FREE Full text] [CrossRef]37]. The work in the latter study is closely related to our endeavor and serves as an inspiration. Collaborative filtering-based study is yet another approach that has shown promise in using personalized models to derive better predictions [Xu X, Chikersal P, Dutcher JM, Sefidgar YS, Seo W, Tumminia MJ, et al. Leveraging collaborative-filtering for personalized behavior modeling. Proc ACM Interact Mob Wearable Ubiquitous Technol 2021 Mar 19;5(1):1-27 [FREE Full text] [CrossRef]38]. Our study also proposes 2 personalized strategies to model individual participants using ML.

Narrowing down to adolescent depression studies, we present some existing works in the literature and later explain their differences from our work in Table 1. Studies on adolescent depression have been primarily survey-based and social sentiment– and feasibility-centric [Kauer S, Reid SC, Crooke AH, Khor A, Hearps SJ, Jorm AF, et al. Self-monitoring using mobile phones in the early stages of adolescent depression: randomized controlled trial. J Med Internet Res 2012 Jun 25;14(3):e67 [FREE Full text] [CrossRef] [Medline]40-Cliffe B, Croker A, Denne M, Smith J, Stallard P. Digital cognitive behavioral therapy for insomnia for adolescents with mental health problems: feasibility open trial. JMIR Ment Health 2020 Mar 03;7(3):e14842 [FREE Full text] [CrossRef] [Medline]42]. The work in the study by Cao et al [Cao J, Truong AL, Banu S, Shah AA, Sabharwal A, Moukaddam N. Tracking and predicting depressive symptoms of adolescents using smartphone-based self-reports, parental evaluations, and passive phone sensor data: development and usability study. JMIR Ment Health 2020 Jan 24;7(1):e14045 [FREE Full text] [CrossRef] [Medline]39] closely relates to our aim of detecting depression in adolescents. However, their study had a smaller sample size and used both parent and adolescent inputs and was also more reliant on participant feedback. The differences between the studies highlighted and our work are further elaborated on in the Discussion section.

In this study, we first investigated the feasibility of universal and personalized ML modeling strategies to predict adolescent depression scores and change in depression levels. We then identified features that were more predictive of adolescents’ depression during the ML process. Finally, we studied how missingness of data affected model performance along with understanding how much data were required for our models to perform over a predetermined threshold.

Our findings revealed that a regression-based predictive modeling approach was able to capture more granular changes in depression scores. We also showed that personalized strategies were more effective predictors compared with universal strategies. The performance of personalized models did not improve with increase in the weeks of data and, instead of a steady increase in model performance with increase in data, we experienced fluctuations in the results. In an attempt to explain this phenomenon, we performed additional studies that separated our participants into 2 pools. The pools were generated based on the SD of the depression scores. Our results showed that the pool with a small SD in depression score was more accurately modeled in comparison with the higher-SD pool.

Table 1. Papers on adolescent mental health prediction and how our work differs from the existing work.

Paper	Study aim	Methods	Results	Difference from our work
Cao et al [Cao J, Truong AL, Banu S, Shah AA, Sabharwal A, Moukaddam N. Tracking and predicting depressive symptoms of adolescents using smartphone-based self-reports, parental evaluations, and passive phone sensor data: development and usability study. JMIR Ment Health 2020 Jan 24;7(1):e14045 [FREE Full text] [CrossRef] [Medline]39]	Investigated the effectiveness of smartphone apps useful in evaluating and monitoring depression symptoms in a clinically depressed adolescent population compared with psychometric instruments (PHQ-9^a, HAM-D^b, and HAM-A^c); 13 participants aged 12 to 17 years	Used self-evaluation of adolescents and parents with smartphone data to improve predictions of PHQ-9 scores; used the SOLVD app installed only on Android phones; used only linear regressor and support vector regressor with polynomial kernel	Correlation between mood averaged over a 2-week period and biweekly psychometric score from PHQ-9, HAM-D, and HAM-A; combining self-evaluation from both parents and children along with smartphone sensor data resulted in PHQ-9 score prediction accuracy	Our work does not depend on self-evaluation by adolescents and parents to help improve predictions; instead, we consider a system where our reliance is exclusively on the captured sensor values to make predictions of PHQ-9 scores. We used universal and personalized modeling strategies with multiple machine learning algorithms.
Maharjan et al [Maharjan SM, Poudyal A, van Heerden A, Byanjankar P, Thapa A, Islam C, et al. Passive sensing on mobile devices to improve mental health services with adolescent and young mothers in low-resource settings: the role of families in feasibility and acceptability. BMC Med Inform Decis Mak 2021 Apr 07;21(1):117 [FREE Full text] [CrossRef] [Medline]43]	StandStrong app used to assess feasibility and acceptability of sensing technologies for maternal depression treatment in low-resource settings for mothers aged between 15 and 25 years	They explored possible explanations for differences in successful data collection by time of day and sensor type along with description of qualitative results to illuminate these differences	The study mainly identified concerns related to technological barriers in passively sensed data collection.	The study was based on passively sensed data collection. It did not perform predictive modeling. The aim was to assess how well the app performed in data collection and the hurdles encountered therein. The study had 11 participants with depression with a mix of young and older participants, whereas our study was focused on adolescents, and all participants had been diagnosed with some form of depression.
MacLeod et al [MacLeod L, Suruliraj B, Gall D, Bessenyei K, Hamm S, Romkey I, et al. A mobile sensing app to monitor youth mental health: observational pilot study. JMIR Mhealth Uhealth 2021 Oct 26;9(10):e20638 [FREE Full text] [CrossRef] [Medline]44]	Explored whether passively collected smartphone sensor data can be used to predict internalizing symptoms among youths in Canada; participants aged between 10 and 21 years	Self-reports of anxiety, depression, and attention-deficit hyperactivity disorder collected; N=122 for 2 weeks of passively sensed data; CES-DC^d and SCARED^e anxiety assessments were used	Depressive symptoms correlated with time spent stationary, less mobility, higher light intensity during the night, and fewer outgoing calls. Anxiety correlated with less time spent stationary, greater mobility, and more time on-screen. Adding passively collected smartphone data to prediction models of internalizing symptoms significantly improved their fit.	This work was primarily focused on establishing correlations between self-reports. The study used passive sensor data to perform linear regressor model fitting for predictions of the CES-DC and SCARED values. Nonlinear modeling approaches were not considered, whereas we have explored and produced better results.

^aPHQ-9: Patient Health Questionnaire-9.

^bHAM-D: Hamilton Depression Rating Scale.

^cHAM-A: Hamilton Anxiety Rating Scale.

^dCES-DC: Center for Epidemiological Studies Depression Scale for Children.

^eSCARED: Screen for Child Anxiety Related Disorders.

Data Collection

Recruitment and Participant Breakdown

Adolescents aged 12 to 17.99 years and their parents were recruited from psychiatric clinics at the University of Pittsburgh Medical Center Western Psychiatric Hospital serving depressed and suicidal youth, an adolescent and young adult medicine clinic seeing youth for primary and subspecialty services, as well as through the University of Pittsburgh research registry. A total of 114 adolescents expressed an interest in this study. Of these 114 adolescents, 94 (82.5%) completed a screening assessment, and 31 (27.2%) were screened out because of minimal symptoms of depression (PHQ-9 score [Kroenke K, Spitzer R. The PHQ-9: a new depression diagnostic and severity measure. Psychiatric Annals 2002 Sep;32(9):509-515 [FREE Full text] [CrossRef]45] of <5), no self-reported previous diagnosis of depression, not having a smartphone, and age restrictions. A total of 57 adolescents and their parents consented to the study, of whom 55 (96%) completed a baseline assessment and were entered into the study. The aggregated data set after exploratory data analysis (EDA) and initial cleaning consisted of 67% (37/55) of the participants. The reduction in participant number was due to sensor issues and irregular syncing that constituted missing data and the dropping out of some participants in between the study. The data for each participant were collected over a period of 24 weeks.

Passively sensed data from mobile phones were collected using the AWARE app [Doryab A, Togelius J, Bardram J. Activity-aware recommendation for collaborative work in operating rooms. In: Proceedings of the 2012 ACM international conference on Intelligent User Interfaces. 2012 Presented at: IUI '12: 17th International Conference on Intelligent User Interfaces; Feb 14 - 17, 2012; Lisbon Portugal URL: https://doi.org/10.1145/2166966.2167023 [CrossRef]24], which logs relevant sensor data and harnesses those data within the device. It was installed on the participants’ phones and set up to record the sensor information in the desired sampling frequencies. We collected data from multiple sensors, including calls, conversations, location, Wi-Fi, and screen use. Features were classified into event-based features, which included phone use, calling, and conversational recording, and time series–based features, comprising Wi-Fi and GPS-based location. We used Fitbit Inspire HR (software version 1.84.5) to collect heart rate, sleep, and steps. Sensor data from GPS and Wi-Fi were collected at a frequency of 10 minutes. The Fitbit features were collected every minute and accumulated daily. The data collected from both AWARE and Fitbit were uploaded to the cloud and then hosted in a database for cleaning and further processing.

The AWARE passive sensing data and Fitbit were, on average, 69.11% and 32.36% complete, respectively. Missing Fitbit data were attributed to less than expected adherence to wearing the Fitbit because of several reasons, including forgetting to wear it, fatigue, rash (recurred in 1/55, 2% of the participants even after the band was changed), and the need to charge the device. The data collection process was approved by the University of Pittsburgh Institutional Review Board.

Weekly PHQ-9 surveys were sent over the 24 weeks, and the adolescents completed 69.01% (873/1265) of the weekly surveys on average, respectively. The PHQ-9 is an evaluative questionnaire used to assess depression severity. The PHQ-9 has been used effectively in multiple studies related to depression [Xu X, Chikersal P, Dutcher JM, Sefidgar YS, Seo W, Tumminia MJ, et al. Leveraging collaborative-filtering for personalized behavior modeling. Proc ACM Interact Mob Wearable Ubiquitous Technol 2021 Mar 19;5(1):1-27 [FREE Full text] [CrossRef]38,Cao J, Truong AL, Banu S, Shah AA, Sabharwal A, Moukaddam N. Tracking and predicting depressive symptoms of adolescents using smartphone-based self-reports, parental evaluations, and passive phone sensor data: development and usability study. JMIR Ment Health 2020 Jan 24;7(1):e14045 [FREE Full text] [CrossRef] [Medline]39]. The questionnaire consists of a set of 9 questions with scores between 0 and 3. This results in an overall score range between 0 and 27. For the purpose of our study, this was our choice ground truth owing to its strength in categorizing depression severity levels and its effectiveness in yielding responses from participants when administered remotely [Sun Y, Fu Z, Bo Q, Mao Z, Ma X, Wang C. The reliability and validity of PHQ-9 in patients with major depressive disorder in psychiatric hospital. BMC Psychiatry 2020 Sep 29;20(1):474 [FREE Full text] [CrossRef] [Medline]46,Pinto-Meza A, Serrano-Blanco A, Peñarrubia MT, Blanco E, Haro JM. Assessing depression in primary care with the PHQ-9: can it be carried out over the telephone? J Gen Intern Med 2005 Aug;20(8):738-742 [FREE Full text] [CrossRef] [Medline]47]. The scores are divided into levels based on depression severity and allow for easier interpretability by clinicians, parents, and adolescents [Kroenke K, Spitzer R. The PHQ-9: a new depression diagnostic and severity measure. Psychiatric Annals 2002 Sep;32(9):509-515 [FREE Full text] [CrossRef]45].

Descriptive Statistics of Collected Participant Data

The adolescent sample included participants aged 12 to 17.99 years, with an average age of 15.5 years. Most of the sample was White (47/56, 84%), with 16% (9/55) of the individuals representing a minority population. There was variability in gender, with approximately 73% (41/56) of the adolescent sample identifying as female, 23% (13/56) identifying as male, and 9% (3/56) identifying as transgender or other. The demographic statistics are provided in Figure 1 associated with the data collected.

Figure 1. Demographic statistics: (A) gender distribution, (B) race distribution of the adolescents, (C) sexual orientation, (D) depression score distribution for each week of observation, and (E) depression score distribution for each participant.

Figure 1 also contains box plots of depression scores based on weeks (bottom left) and depression scores based on the participants’ box plots (bottom right). The depression score versus participants box plot presents the variation in depression scores across participants. The data set comprised 507 data points. The PHQ-9 scores ranged from a minimum of 0 to a maximum of 27. The mean PHQ-9 score was 11.21 (SD 5.23). For depression score versus weeks, we observed a mean PHQ-9 score of 10.63 (SD 4.92), and the minimum and maximum values were similar to those of the participant plot. The PHQ-9 scores are also expressed in the form of levels of depression: minimal (0-4), mild (5-9), moderate (10-14), moderately severe (15-19), and severe (20-27). The distribution of depression levels according to the number of participants was as follows: minimal depression (12/55, 22%), mild depression (26/55, 47%), moderate depression (31/55, 56%), moderately severe depression (21/55, 38%), and severe depression (5/55, 9%). There were rare occurrences of participants traversing up to 4 levels of depression over the course of their time in the study. It is also important to mention that, owing to data limitations and survey completion rate, 5% (3/55) of the participants maintained a single level of depression in the data set. The observations also revealed that most participants fluctuated between 2 levels of depression.

Feature Extraction

The collected sensor data were passed to the Reproducible Analysis Pipeline for Data Streams framework [Lin M, Lane DN, Mohammod M, Yang X, Lu H, Cardone G, et al. BeWell+: multi-dimensional wellbeing monitoring with community-guided user feedback and energy optimization. In: Proceedings of the conference on Wireless Health. 2012 Presented at: WH '12: Wireless Health 2012; Oct 23 - 25, 2012; San Diego California URL: https://doi.org/10.1145/2448096.2448106 [CrossRef]25] for feature extraction. The data set retained 66 features, including calls, conversations, locations, screen, Wi-Fi, heart rate, sleep, and steps. The data were then compiled into an aggregated data set and used as input for our ML modeling operations.

The data set was in a 2D tabular format for the application of our supervised modeling approaches populated with the survey results from the PHQ-9 weekly surveys to serve as the ground truth. To match the weekly ground truth depression score, we aggregated our features into daily and then weekly values. Figure 2 shows the combined harnessing framework comprising AWARE, Fitbit, and Reproducible Analysis Pipeline for Data Streams. Each sensor-based feature set was used to extract a range of features.

ML Modeling

Overview

The data processing pipeline started with extensive EDA to check for skewness and filter missing data. This step was followed by the calculation of the Pearson correlation values for our feature set and the removal of highly correlated features. On the basis of our EDA, we set thresholds for missing data and adopted a robust imputation strategy such as the k-nearest neighbors, which is effective in handling multivariate time-series data. Our final data set consisted of 507 data points with 61 features, which represented 37 participants owing to high data sparsity. An illustration of the EDA and final data set generation is presented in Figure 3.

The ML phase after the data preprocessing can be segmented into a model-fitting stage and a cross-validation (CV) stage.

In the model-fitting stage, we applied both the depression score prediction and change in depression level prediction approaches. This model involved passing the feature sets through linear and nonlinear ML algorithms. The linear algorithms included Least Absolute Shrinkage and Selection Operator and elastic net. Nonlinear modeling included tree-based algorithms such as random forest; decision trees; and ensemble methods such as AdaBoost, extra trees, gradient boosting, and XGBoost.

The CV stage was responsible for the train-test splitting of data. This stage was also designed to consider universal and personalized modeling strategies. The universal strategies ensured that the modeling was based on the sample population data splits. The personalized strategies modeled based on individual data train-test splits. These strategies will be further elaborated on in the following subsection.

Figure 3. Machine learning (ML) pipeline comprising exploratory data analysis that includes (A) check for skewness of data, (B) missing value assessment, (C) check of depression level distribution, (D) generation of correlation matrix and removal of features that are highly correlated, (E) k-nearest neighbors (KNN)-based missing value imputation, (F) aggregated data set creation, and (G) nonlinear and linear ML modeling of data.

Prediction of Depression Score

To predict depression score, we used linear and nonlinear regression-based ML algorithms, as shown in Figure 4. The algorithms included Least Absolute Shrinkage and Selection Operator, elastic net, random forest, AdaBoost, extra trees, gradient boosting, and XGBoost for regression. The features extracted were used as input based on sensor combinations. The ML algorithms modeled on the data output predictions of the depression score. The model was based on universal and personalized modeling strategies. The models were evaluated based on mean absolute error (MAE), mean squared error (MSE), root MSE (RMSE), and mean absolute percentage error (MAPE).

Figure 4. Depression score prediction approach. MAE: mean absolute error; MAPE: mean absolute percentage error; ML: machine learning; MSE: mean squared error; RMSE: root mean squared error.

Prediction of Change in Depression Level

The prediction of change in depression level used the feature set combinations as input. The ML algorithms regressed on the feature data to predict the change in depression score and is shown in Figure 5. The change in depression level was then derived from the predicted change in depression score. This was a regression modeling approach with MAE, MSE, RMSE, and MAPE as evaluation metrics.

Figure 5. Machine learning approach for predicting change in depression level.

As mentioned previously, there were 5 depression levels. A jump to a level above (positive change) or a level below (negative change) was considered a change in level. The magnitude of the change was determined by the number of jumps seen in participant depression levels. On the basis of this idea, there were 9 changes in levels: positive changes (1, 2, 3, and 4), negative changes (−1, −2, −3, and –4), and no change (0). The change in depression level observed in our data fell within the range of −3 to 3. The change in depression scores was mapped to these 7 changes in depression levels. The establishment of levels helps in the better interpretation of depression changes by health care providers and aligns with standard medical diagnostics [Kroenke K, Spitzer R. The PHQ-9: a new depression diagnostic and severity measure. Psychiatric Annals 2002 Sep;32(9):509-515 [FREE Full text] [CrossRef]45].

CV Strategy

We used multiple variations of the leave-one-out CV as presented in Figure 6. These strategies were designed to accommodate both personalization and generalization of the trained models.

Figure 6. Cross-validation strategies: (A) Leave One User Out, (B) Leave Week X Out, (C) Leave One Week One User Instance, and (D) Accumulated Weeks.

Leave One Participant Out

In this strategy, we held out a single participant for validation and trained the model on the other participants. This strategy reflects the cold start case where a new user starts using the health app. This is a generalized approach to model fitting that takes advantage of the existing data set participants.

Leave Week X Out

In Leave Week X Out, we held out a given week for all participants and trained on the rest of the weeks. This strategy evaluates the impact of time-specific segments of data on the prediction. The training phase captures the similarity and variation of the data during different weeks to build the models. This too is categorized as a general modeling strategy to detect patterns in weekly depressive behavior.

Accumulate Weeks

A sliding window approach was followed in this CV strategy where, for each participant, the model was built with data from weeks t to t+n and tested on week t+n+1. This strategy examines the feasibility of the personalized ML models using data from individual users and evaluates the impact of longer-term data on prediction accuracy.

Leave One Week One User Instance Out

Here, we trained the models on all the weeks of a participant leaving one of their weeks for testing. This was done for all participants. This method also evaluates the feasibility of the personalized models using each individual user’s data on a week-by-week basis without considering the temporal and historical trend.

Baseline Performance

The idea of a baseline was to establish a reference for our accuracy levels. In this study, we operated with a naïve random baseline for our depression score approach and a majority baseline for the depression level change approach. This baseline analysis was carried out for all the CV strategies.

Feature Set–Based Detailed Modeling