This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on http://formative.jmir.org, as well as this copyright and license information must be included.
This study aimed to explore the properties of passively recorded environmental audio from a subject’s smartphone to find potential correlates of symptom severity of social anxiety disorder, generalized anxiety disorder, depression, and general impairment.
An Android app was designed, together with a centralized server system, to collect periodic measurements of the volume of sounds in the environment and to detect the presence or absence of English-speaking voices. Subjects were recruited into a 2-week observational study during which the app was run on their personal smartphone to collect audio data. Subjects also completed self-report severity measures of social anxiety disorder, generalized anxiety disorder, depression, and functional impairment. Participants were 112 Canadian adults from a nonclinical population. High-level features were extracted from the environmental audio of 84 participants with sufficient data, and correlations were measured between the 4 audio features and the 4 self-report measures.
The regularity in daily patterns of activity and inactivity inferred from the environmental audio volume was correlated with the severity of depression (
In this study group, the environmental audio was shown to contain signals that were associated with the severity of depression and functional impairment. Associations with the severity of social anxiety disorder and generalized anxiety disorder were much weaker in comparison and not statistically significant at the 5% significance level. This work also confirmed previous work showing that the presence of voices is associated with depression. Furthermore, this study suggests that sparsely sampled audio volume could provide potentially relevant insight into subjects’ mental health.
Depression and anxiety disorders are some of the most prevalent mental health disorders [
The health care process can be modeled as beginning with assessment and measurement, followed by diagnosis, and finally treatment. Subsequent rounds of measurement or assessment occur with the final goal of achieving remission. This work focuses on the measurement and diagnosis components by working toward building an automated and objective severity measurement of anxiety, depression, and functional impairment associated with poor mental health.
Research in both psychiatry and clinical psychology traditionally involves assessments of subjects’ state (eg, mood and behavior) in clinical or research settings where they are removed from their natural home and living environment. Often, these assessments were performed retrospectively, where the subjects were asked to recollect behaviors and feelings over several weeks in the past. Ecological momentary assessment (EMA) [
The feasibility of building passive EMA systems has been greatly improved by the smartphone revolution. Smartphones are ubiquitous and affordable consumer electronics, which are equipped with a wide range of sensors that can enable the type of sensing or monitoring necessary to perform passive EMA [
A general methodology in these passive EMA or mobile sensing studies, and one that is used in this work, is to compute metrics, or
A systematic review by Rohani et al [
This general methodology of sampling objective data from subjects’ smartphones (or other digital sensors) to infer health characteristics has been used in numerous studies, across many conditions. Although we will provide a focused review of relevant works that have used audio data to predict or measure mood and anxiety disorders, there is a wealth of research that has looked at using many different data sources to investigate, predict, or measure the severity of many characteristics of health and mental health disorders. Interested readers are directed to work that has investigated subjects’ general mood and mental health [
As all smartphones are equipped with microphones, they can be used to detect audio-based features of a subject’s environment. Several works have investigated the recording and analysis of speech audio from subjects’ smartphones. There are different strategies to record audio, ranging from (1) actively prompting users to speak into a microphone, (2) passively recording subjects’ phone calls, and (3) to passively recording environmental audio with no interaction from the user.
Using the active prompt-style methodology, Dickerson et al [
Using the passive phone call recording–style methodology, Faurholt-Jepsen et al [
Finally, audio can be sampled in a much more passive and pervasive manner by using a smartphone’s microphone to record environmental (ambient) audio. The StudentLife study by Wang et al [
This exploratory study seeks to discover potential correlates of anxiety and depression symptomatology from environmental audio acquired using passive smartphone sensing. Although previous research has studied how some features of the audio environment relate to depression and bipolar mood disorders, we will extend this to include anxiety disorders. In addition, one aspect of our study is the exploration of the sampled
Subjects from a nonclinical population were recruited for a 2-week observational study in which a custom app was installed on their personal Android phone. Self-report measures of anxiety, depression, and general quality of life and impairment were collected at the beginning and end of the study. Throughout the duration of the study, the smartphone app passively collected the average volume of environmental audio and the presence of voice activity (whether or not speech was detected in the environment at the time of recording). A set of features was designed and used to extract higher-level information from this set of data, and a statistical analysis was performed to determine if a significant relationship existed between subjects’ self-reported anxiety, depression, and general impairment and these features. The study was approved by the University of Toronto Health Sciences Research Ethics Board (Protocol #36687).
Subjects were recruited from Prolific [
The study inclusion criteria were as follows: subjects should (1) reside in Canada, (2) be fluent in English, (3) own an Android phone, (4) have completed at least 95% of their previous Prolific studies successfully, and (5) have previously participated in at least 20 Prolific studies. The final criterion was used to ensure that subjects were proficient at using the Prolific system and were generally technology literate. There were no exclusion criteria for the study. Subjects were paid Can $18.50 (US $14) for participating in the study.
Members of the Prolific community who met the inclusion criteria could read a description of the study, which included an informed consent guide. Those who consented to the study were then directed to a webpage that acted as the study entry point. This website directed subjects to install the app from the Google Play app store and provided them with log-in credentials for using the study app. Once installed, the study app guided subjects through a short setup, where they were asked to provide the app with the necessary permissions to access their data, followed by a log-in. Immediately following setup and log-in, subjects were asked to complete a set of 4 self-report measures in digital form within the study app. At this point, following the completion of the self-report measures, the app began to periodically collect data in the background. No further actions or interactions with the study app were performed until the end of the study, exactly 14 days later, at the same time of day as the app installation/self-report work. At this time, subjects received a notification on their phone, informing them that the study had ended and requesting that they complete the same set of 4 self-report measures done at the beginning, again in the smartphone app. Following completion of this task, subjects were directed to uninstall the app from their phone and mark their study tasks as complete on the Prolific website. Subjects were then paid through Prolific’s payment system.
Subjects completed 4 self-report measures in digital form within the study app at the beginning and end of the 14-day study. A review by Belisario et al [
Both the GAD-7 and PHQ-8 instruments ask subjects to evaluate their symptoms over the past 2 weeks, whereas the LSAS and SDS ask subjects to evaluate their symptoms over the past week. Therefore, 2 weeks was the shortest duration possible to encompass the largest time window of assessment of the self-report measures, which is the rationale behind a 2-week study duration.
An Android app was designed and created to collect all study data. This includes both the self-reported measures, described earlier, and the passively collected audio data—the volume of environmental audio and the presence or absence of speaking voices in the environment.
The study app records audio every 5 min, for a duration of 15 seconds, by turning on the microphone and recording the environment. This recording process occurs without any interaction from the user and with no notification to the user. Audio recordings are then securely transmitted from subjects’ smartphones over the internet to a computer server where 2 further processing steps are performed. First, the average volume of each 15-second audio recording was calculated using the FFmpeg audio processing software framework [
The audio sampling period was chosen to be 5 min as a good trade-off between large amounts of data (with a shorter period) and the preservation of battery life of subjects’ smartphones (with a longer period). Internal testing before the study showed a 5-min sampling period to be satisfactory for preserving battery life. Although a shorter period could yield more data, versions of the Android operating system since version 6 prevent this. Specifically, devices are prevented from performing background processing (such as this type of microphone sampling) while the device is sleeping more than once in a 9-min period [
Preprocessing of the volume time series was performed before feature extraction to account for missing data and to perform normalization. Periodic audio recordings were not reliably produced at a precise period of 5 min by the study app, so the volume time series were resampled to a period of 5 min, and missing samples were imputed using linear interpolation. After resampling and interpolation, volume samples were clipped at the ceiling and floor of 3 SDs from the mean of the volume time series to remove outliers (using subject mean and subject SD, not group). Finally, the volume time series were scaled linearly to ensure that all volume measurements were within the range of 0 to 1. No preprocessing of the voice presence time series was performed.
This subsection describes the methods used to compute the 4 correlates of anxiety and depression symptomatology derived from subjects’ environmental audio recordings. These correlates, or
The daily similarity feature was designed to infer the consistency of the subjects’ sequence of daily activities. Visualizations of the volume time series clearly show distinct periods of activity (characterized by large spikes in volume) and inactivity (characterized by quieter volume with less variance). These periods coincide roughly with daytime and nighttime, respectively. Furthermore, these patterns are periodic and repeat daily.
A link between regularity in daily activities, including sleep, and anxiety and depression is commonly described in the literature [
Visualization of a subject's environmental audio volume data (7 of 14 days).
Majority of the periods of apparent inactivity that are visible in the volume time series appear to coincide with sleep. It was hypothesized that a proxy measure of sleep quality can be inferred by measuring how chaotic the volume of subjects’ environments are during sleep times, replicating the link between sleep and mental health reported in the literature. For example, the prevalence of mood disorders has been shown to be much higher in populations with chronic sleep problems [
To quantify sleep quality, the volume time series was examined with periods of quiet noted to be characterized by low variance in volume. Although the absolute value of the volume is also low at quiet times, the threshold for what can be considered quiet is greatly dependent on the specific microphone and phone placement; therefore, variance was considered a more appropriate measure of the noisiness of the environment. The
Previous studies of mental health using mobile sensing have computed proxy measures of social interaction as a feature predictive of depression severity [
A number of privacy considerations drove the design of the study procedure, app, and data collection. Prolific, the platform from which subjects were recruited, anonymizes subjects. Subjects were provided with app log-in credentials, which were provided on demand to each subject as they enrolled in the study to avoid subjects using their name, email address, or some other potentially identifying information as their log-in name. Audio recordings were encrypted both at rest (on subjects’ phones) and in transit to the server. Once processed on the server side, audio files were deleted. The speech transcripts generated to detect the presence of speech were processed in the following way: each transcript was broken into pairs of words (ie, bigrams) and then stored in random order for use in later studies. The stripping of ordering and time information from bigrams was done to prevent later re-creation of transcripts.
From July 2019 to December 2019, 205 eligible Prolific members entered the study. Withdrawals were common, with 86 subjects choosing to withdraw at some point in the study (commentary on the high withdrawal rate is provided in the Limitations subsection of the Discussion section). Of the 119 subjects who did not withdraw, 112 completed both sets of self-report questionnaires. Finally, 84 of the 112 completed subjects yielded sufficient audio data for analysis based on the criterion that at least 50% of the ex
Flow chart of study recruitment.
The study sample had an average age of 30 years (SD 8.6) and 42% (35/84) of subjects were female. The mean and SD of the 4 self-reported measures are presented in
To further characterize the mental health of our study subjects, self-report measures were used to screen for social anxiety disorder, generalized anxiety disorder, and major depressive disorder. A cutoff of 60 was used with the LSAS scores to screen for social anxiety disorder (generalized subtype), as recommended by Mennin et al [
Descriptive statistics for self-report measures of anxiety and depression (n=84).
Measures | Score, mean (SD) | Correlation with age | Difference between mean scores of the sexes | ||
|
|
|
|||
Liebowitz Social Anxiety Scale | 53.7 (25.8) | −0.27 | .01 | −1.68 (82) | .10 |
Generalized Anxiety Disorder seven-item scale | 6.6 (4.6) | −0.29 | .01 | −1.37 (82) | .17 |
Patient Health Questionnaire eight-item scale | 8.5 (5.6) | −0.19 | .09 | −1.18 (82) | .24 |
Sheehan Disability Scale | 10.8 (7.7) | −0.26 | .02 | −1.12 (82) | .27 |
Results of screening the study sample for depression and anxiety disorders (n=84).
Disorders | Screening criteria | Positive screenings, n (%) |
Social anxiety disorder | Liebowitz Social Anxiety Scale score ≥60 | 32 (38) |
Generalized anxiety disorder | Generalized Anxiety Disorder seven-item scale score ≥10 | 22 (26) |
Major depressive disorder | Patient Health Questionnaire eight-item scale score ≥10 | 31 (37) |
The objective audio features described in subsection
Descriptive statistics for objective audio features (n=84).
Features | Mean (SD) | Minimum | Q1 | Q2 | Q3 | Maximum |
Daily similarity | 0.80 (0.07) | 0.45 | 0.77 | 0.83 | 0.85 | 0.90 |
Sleep disturbance—all nights | 0.14 (0.06) | 0.03 | 0.10 | 0.13 | 0.17 | 0.32 |
Sleep disturbance—weeknights | 0.14 (0.06) | 0.03 | 0.09 | 0.12 | 0.18 | 0.34 |
Speech presence ratio | 0.15 (0.06) | 0.01 | 0.11 | 0.16 | 0.20 | 0.30 |
To test the association between the audio features and the self-reported measures of anxiety, depression, and functional impairment, the Pearson correlation coefficient was computed between each feature and scale. The daily similarity feature is negatively correlated with all 4 self-report measures, which supports the hypothesis that regularity in daily activity and circadian rhythm is associated with more positive mental health (ie, lower scale scores). This feature was most strongly correlated with depressive symptoms. The sleep disturbance feature, whether computed using all nighttime audio or only weeknight audio, was positively correlated with all 4 self-report measures, which is in line with the hypothesis that better sleep quality (ie, less sleep disturbance) is associated with positive mental health. The strength of the correlation is improved when only the weeknight audio is considered. Finally, the speech presence ratio feature was negatively correlated with all 4 self-report measures, where the correlation with depressive symptoms was the strongest for all observed correlations (
Pearson correlation between objective audio features and self-reported measures of anxiety and depression (n=84).
Features | Liebowitz Social Anxiety Scale | Generalized Anxiety Disorder seven-item scale | Patient Health Questionnaire eight-item scale | Sheehan Disability Scale | |||||
|
|
|
|
|
|||||
Daily similarity | −0.20 | .07 | −0.19 | .09 | −0.37 | <.001 | −0.18 | .10 | |
Sleep disturbance—all nights | 0.00 | .99 | 0.07 | .52 | 0.17 | .13 | 0.15 | .17 | |
Sleep disturbance—weeknights | 0.05 | .65 | 0.12 | .26 | 0.23 | .03 | 0.18 | .11 | |
Speech presence ratio | −0.19 | .08 | −0.16 | .14 | −0.37 | <.001 | −0.29 | .01 |
The self-report measures completed by the subjects revealed that this study’s sample, despite being recruited from a healthy population, had a high prevalence of depression and anxiety. Data reported by the Government of Canada in 2006 estimate a 12-month prevalence of major depressive disorder at 4.8% [
Negative correlations were measured between age and self-reported measures of social anxiety (
The key finding of this work is the development and evaluation of a set of features, computed from subjects’ environmental audio, as potential correlates of symptoms of anxiety, depression, and functional impairment. Correlation analysis of these features and self-reported measures, summarized in
We note that while associations with the LSAS and GAD-7 are weak or nonexistent, the associations with the SDS are nearly as strong as those with the PHQ-8. This may suggest that the impairment that is being measured by the SDS may, in large part, be due to symptoms of depression. Indeed, we measured a stronger correlation between the SDS and the PHQ-8 scores of subjects (
The fact that some features are associated with depressive symptomatology but none are associated with the symptomatology of generalized anxiety or social anxiety disorder is interesting, and we offer some speculation as to why this may be the case. First, we must observe that our features are very coarse—they measure sleep, activity, and speech, but with no specific context (they are measured on a gross scale). Depression is broadly debilitating on energy and activity, and if this impact is independent of a specific context, our features are appropriately designed to detect this impact. This is in contrast to anxiety disorders, which often have specific triggers that are context dependent. Individuals with anxiety can avoid contexts that trigger their anxiety and, therefore, present as if they do not suffer from the effects of anxiety as long as they continue to exhibit avoidance behaviors. Although it is unlikely that anxious individuals are able to avoid all triggers, especially those with generalized subtypes, avoidance behavior may be partially responsible for weakening the associations between inferred behavior and mental state. To passively measure the severity of anxiety disorders, it seems that any feature must capture avoidance behavior as a proxy for anxiety itself and also have some measure of state anxiety to detect when a subject is in a specific context that acts as an anxiety trigger.
To our knowledge, no other studies have inferred daily patterns of activity and inactivity solely from volume samples of ambient audio, which is captured by the daily similarity feature, so direct comparisons of the daily similarity feature with other known works are not possible. However, a feature called circadian movement, first proposed by Saeb et al [
The association between sleep and mental health has been investigated in a number of studies. Self-reported measures of sleep quality have been shown to be associated with state anxiety [
Finally, although the speech presence ratio feature does not appear in identical form in the literature, there are other studies that have used similar proxy measures of social interaction as correlates of depression severity. The pioneering study by Wang et al [
A fundamental limitation of the study design is that as a cross-sectional study, it is not possible to make any claim regarding causation between the observed features and severity of anxiety, depression, or functional impairment. It cannot be determined, for example, if avoiding social contact (as inferred by the speech presence ratio feature) causes increased depression severity or if an increase in depression severity owing to some other factor causes individuals to retreat socially and engage in less speaking.
A high proportion of subject withdrawals can also be considered as a limitation of this study (86/205 subjects, 42%, chose to withdraw). The majority of withdrawals, 87% (75/86), occurred before a successful log-in to the study app (see
A further limitation surrounding subject withdrawals is the possibility of sample bias. The 86 individuals who withdrew from the study may differ from those who remained in the study. We are unable to test this because Prolific removes researchers’ access to the demographic data (age and sex) of participants who withdraw from studies. Nondemographic data (ie, digital data and self-report measures) may also differ between the groups, yet this is difficult to test because 92% (79/86) of withdrawals withdrew early enough in the study so as not to provide any digital data or self-report measures. Of the 112 individuals who remained in the study and completed all tasks, it is possible to test for differences between the group included in the analysis (ie, the group with sufficient audio data) and the group excluded from the analysis. These 2 groups did not differ significantly in age or on any of the 4 self-report measures, either at intake or exit, as tested by
Some limitations also exist regarding the validity of the features. The speech presence ratio feature does not distinguish between recorded speech (eg, from a TV or radio) and human speech, it simply detects intelligible speech. This method does not distinguish between speakers, so in many cases, the subject themselves may not be the person speaking. The method also only detects English speech, so it will potentially miss speech if, for example, a subject does not speak English at home. Finally, our technique for detecting speech using automatic speech recognition is more biased toward false negatives than false positives. If speech is detected by the system, it is highly likely that the speech is present, yet it is much more likely to miss speech in noisy environments or environments with multiple speakers speaking concurrently.
The sleep disturbance feature is affected by the subjects’ specific mobile phone hardware, where different microphones with different automatic gain control functionality (which dynamically adjusts the volume and is not controllable by programmers) could produce different measurements in the same environment. To produce perfectly consistent features, one would be required to use a device such as a calibrated sound level meter, which measures volume as an absolute measure with no gain control.
Finally, it must be noted that the feasibility of completely passive mobile sensing with a high frequency of data sampling is becoming increasingly difficult on Android devices. Battery optimization features limit the rate at which apps can turn on in the background and sample their environment [
This work contributes toward the development of automated and objective severity measurements of anxiety, depression, and functional impairment associated with poor mental health. Focusing solely on environmental audio, which was passively sensed from subjects’ smartphones, this work presents 2 new correlates of depressive symptoms and general impairment, which we refer to as the daily similarity and sleep disturbance features. Furthermore, this work supports previous findings by reproducing a measured association between time spent proximal to speech and severity of depression.
ecological momentary assessment
7-item Generalized Anxiety Disorder Scale
Liebowitz Social Anxiety Scale
Patient Health Questionnaire eight-item scale
Sheehan Disability Scale
This research was funded in part by the Natural Science and Engineering Research Council Discovery Grant number RGPIN-2019-04395.
MAK has been a consultant or advisory board member for GlaxoSmithKline, Lundbeck, Eli Lilly, Boehringer Ingelheim, Organon, AstraZeneca, Janssen, Janssen-Ortho, Solvay, Bristol-Myers Squibb, Shire, Sunovion, Pfizer, Purdue, Merck, Astellas, Tilray, Bedrocan, Takeda, Eisai, and Otsuka. MAK conducted research for GlaxoSmithKline, Lundbeck, Eli Lilly, Organon, AstraZeneca, Jannsen-Ortho, Solvay, Genuine Health, Shire, Bristol-Myers Squibb, Takeda, Pfizer, Hoffman La Rosche, Biotics, Purdue, Astellas, Forest, and Lundbeck. MK has received honoraria from GlaxoSmithKline, Lundbeck, Eli Lilly, Boehringer Ingelheim, Organon, AstraZeneca, Janssen, Janssen-Ortho, Solvay, Bristol-Myers Squibb, Shire, Sunovion, Pfizer, Purdue, Merck, Astellas, Bedrocan, Tilray, Allergan, and Otsuka. MAK has received research grants from the Canadian Institutes of Health Research, Sick Kids Foundation, Centre for Addiction and Mental Health Foundation, Canadian Psychiatric Research Foundation, Canadian Foundation for Innovation, and the Lotte and John Hecht Memorial Foundation.