This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
Diagnosing major depressive disorder (MDD) is challenging, with diagnostic manuals failing to capture the wide range of clinical symptoms that are endorsed by individuals with this condition.
This study aims to provide evidence for an extended definition of MDD symptomatology.
Symptom data were collected via a digital assessment developed for a delta study. Random forest classification with nested cross-validation was used to distinguish between individuals with MDD and those with subthreshold symptomatology of the disorder using disorder-specific symptoms and transdiagnostic symptoms. The diagnostic performance of the Patient Health Questionnaire–9 was also examined.
A depression-specific model demonstrated good predictive performance when distinguishing between individuals with MDD (n=64) and those with subthreshold depression (n=140) (area under the receiver operating characteristic curve=0.89; sensitivity=82.4%; specificity=81.3%; accuracy=81.6%). The inclusion of transdiagnostic symptoms of psychopathology, including symptoms of depression, generalized anxiety disorder, insomnia, emotional instability, and panic disorder, significantly improved the model performance (area under the receiver operating characteristic curve=0.95; sensitivity=86.5%; specificity=90.8%; accuracy=89.5%). The Patient Health Questionnaire–9 was excellent at identifying MDD but overdiagnosed the condition (sensitivity=92.2%; specificity=54.3%; accuracy=66.2%).
Our findings are in line with the notion that current diagnostic practices may present an overly narrow conception of mental health. Furthermore, our study provides proof-of-concept support for the clinical utility of a digital assessment to inform clinical decision-making in the evaluation of MDD.
Major depressive disorder (MDD) is a common and heterogeneous condition representing the leading cause of disability worldwide [
In an attempt to improve the current diagnostic practice, the search for objective diagnostic tests and valid biomarkers for depression has received a lot of attention. However, despite substantial research expenditures and large-scale genome-wide studies, no pathognomonic biological markers of depression have been identified [
Another issue pertaining to psychiatric nosology is
Some authors suggest that assessing the presence of anxiety symptoms in patients with MDD is critical [
In this regard, a transdiagnostic view of MDD encompassing symptoms of anxiety and other commonly co-occurring disorders may improve early and accurate diagnosis, reflect biological disease understanding (eg, twin studies have shown shared genetic predisposition for MDD and GAD [
Critically, time is premium in primary care settings, where relying on brief symptom-count checklists, such as the Patient Health Questionnaire–9 (PHQ-9) [
This study aims to provide evidence for an extended definition of MDD symptomatology using a digital assessment that was developed for the delta study [
This study used data from the delta study that was conducted by the Cambridge Centre for Neuropsychiatric Research between April 2018 and November 2019. Olmert et al [
Over 5000 participants were recruited on the web through email, via paid Facebook (Facebook Inc) advertisements, and updates on the Cambridge Centre for Neuropsychiatric Research laboratory website. Eligible participants were invited to take part in the main study; of these, 3232 completed the digital assessment via the delta study website. The digital assessment was designed following an extensive analysis of validated questionnaires for mood disorders [
A subgroup of the original study cohort (n=1740) consented to provide dried blood spot samples and complete a telephone interview for MDD and BD using the
Participant characteristics and comorbidities were collected via digital assessments and are shown in
Participant characteristics and comorbidities: major depressive disorder versus subthreshold depression group comparisons.
Characteristics | Subthreshold depression (n=140) | MDDa (n=64) |
|
|
Chi-square ( |
|
|||||||||
Age (years), mean (SD) | 25.84 (6.66) | 25.94 (5.7) | 4209 | .49 | 0.05 | N/Ae | N/A | ||||||||
BMI, mean (SD) | 24.62 (4.73) | 28.29 (6.8) | 2901 | <.001 | 0.28 | N/A | N/A | ||||||||
|
|||||||||||||||
|
Male | 49 (35) | 11 (17.2) | N/A | N/A | N/A | N/A | N/A | |||||||
|
Female | 91 (65) | 53 (82.8) | N/A | .01 | N/A | 6.7 (1) | 0.18 | |||||||
|
|||||||||||||||
|
Yes | 87 (62.1) | 37 (42.2) | N/A | N/A | N/A | N/A | N/A | |||||||
|
No | 53 (37.1) | 27 (57.8) | N/A | .56 | N/A | 0.4 (1) | 0.04 | |||||||
|
|||||||||||||||
|
Employed | 81 (57.9) | 35 (54) | N/A | N/A | N/A | N/A | N/A | |||||||
|
Unemployed | 5 (3.6) | 8 (12.5) | N/A | N/A | N/A | N/A | N/A | |||||||
|
Student | 54 (38.6) | 21 (32.8) | N/A | .05 | N/A | 6.0 (2) | 0.17 | |||||||
|
|||||||||||||||
|
Secure and stable relationship | 84 (59.6) | 133 (55.4) | N/A | N/A | N/A | N/A | N/A | |||||||
|
Insecure and unstable relationship | 9 (6.4) | 14 (5.8) | N/A | N/A | N/A | N/A | N/A | |||||||
|
Single | 48 (34) | 93 (38.8) | N/A | .54 | N/A | 1.3 (2) | 0.08 | |||||||
|
|||||||||||||||
|
Yes | 9 (6.4) | 9 (14.1) | N/A | N/A | N/A | N/A | N/A | |||||||
|
No | 131 (93.6) | 55 (85.9) | N/A | .07 | N/A | 3.2 (1) | 0.13 | |||||||
|
|||||||||||||||
|
GADg | 12 (8.6) | 47 (73.4) | N/A | <.001 | N/A | 89.9 | 0.66 | |||||||
|
Personality disorder | 0 | 3 (4.7) | N/A | N/A | N/A | N/A | N/A | |||||||
|
OCDh | 3 (2.1) | 3 (4.7) | N/A | .28 | N/A | 0.4 (2) | 0.07 | |||||||
|
Panic disorder | 0 | 7 (10.9) | N/A | N/A | N/A | N/A | N/A | |||||||
|
Social anxiety | 1 (0.7) | 9 (14.1) | N/A | <.001 | N/A | 16.8 (2) | 0.29 | |||||||
|
Eating disorder | 2 (1.4) | 5 (7.8) | N/A | .03 | N/A | 5.4 (2) | 0.16 | |||||||
|
|||||||||||||||
|
Thyroid disease | 4 (2.9) | 2 (3.1) | N/A | .99 | N/A | 0.0 (3) | 0.01 | |||||||
|
Cardiovascular disease | 1 (0.7) | 0 | N/A | N/A | N/A | N/A | N/A | |||||||
|
Irritable bowel syndrome | 6 (4.3) | 2 (3.1) | N/A | .99 | N/A | 0.2 (3) | 0.03 | |||||||
|
Chronic pain | 30 (21.4) | 14 (21.9) | N/A | .94 | N/A | 0.0 (3) | 0.01 | |||||||
|
Migraines | 53 (37.9) | 31 (48.4) | N/A | .15 | N/A | 2.0 (3) | 0.10 | |||||||
|
|||||||||||||||
|
SSRIi antidepressants | 12 (8.6) | 32 (50) | N/A | <.001 | N/A | 44.6 (3) | 0.47 | |||||||
|
SNRIj antidepressants | 0 | 5 (7.8) | N/A | N/A | N/A | N/A | N/A | |||||||
|
Tricyclic antidepressants | 0 | 3 (4.7) | N/A | N/A | N/A | N/A | N/A | |||||||
|
Other antidepressants | 1 (0.7) | 8 (12.5) | N/A | <.001 | N/A | 14.5 (3) | 0.27 | |||||||
|
Anxiety medication | 4 (2.9) | 10 (15.6) | N/A | .002 | N/A | 11.2 (3) | 0.23 | |||||||
|
Antipsychotics | 0 | 3 (4.7) | N/A | N/A | N/A | N/A | N/A | |||||||
|
Mood stabilizers | 0 | 2 (3.1) | N/A | N/A | N/A | N/A | N/A | |||||||
|
Psychotherapy | 4 (2.9) | 16 (25) | N/A | <.001 | N/A | 24.4 (3) | 0.35 |
aMDD: major depressive disorder.
bMann–Whitney U test.
cEffect size (
dEffect size (Cramer V).
eN/A: not applicable.
fUndergraduate degree or equivalent and above was coded as
gGAD: generalized anxiety disorder.
hOCD: obsessive-compulsive disorder.
iSNRI: serotonin–norepinephrine reuptake inhibitor.
jSSRI: selective serotonin reuptake inhibitor.
Random forest classification models were constructed in Python 3.7.4 (Python Software Foundation) using the scikit-learn library 0.21.3 to distinguish between MDD and subthreshold depression using (1) disorder-specific symptoms (ie, symptoms of depression), and (2) transdiagnostic symptoms (ie, cross-disorder symptoms). We constructed two models: a
Although some symptoms overlapped across disorders (eg, tiredness, low energy, and irritability), we did not feel that it would be appropriate to combine these as the questions were framed in the context of each condition. Furthermore, although all participants were asked about symptoms of depression, BD or mania, and hypomania, the questions for the remaining conditions were adaptive in nature, such that only relevant questions were asked based on responses to previous questions. This resulted in some participants having
For each of the models, nested cross-validation (NCV) was performed to obtain the highest algorithmic accuracy while ensuring the generalizability of the models. At each iteration of NCV, the data were randomly split into three folds; two-thirds of the data were used in the inner loop for model training and validation, and one-third was used for testing the model in the outer loop. The inner loop was (further) randomly split into three folds, whereby the hyperparameters (ie, number of estimators and maximum depth) were tuned, and the best cross-validated model was selected. To do this, two of the three folds were used to tune the model parameters and train the model, which was then validated on the third fold. This procedure was repeated with the remaining combinations of training and validation folds. The final model (ie, the optimized classifier) was obtained by fitting a model with the tuned parameters to all three data folds from the inner loop and then evaluating the hold-out test data in the outer loop. This procedure was repeated 100 times with different splits of the data (into train and test sets), resulting in a total of 300 unique models for each feature set (ie, depression model vs extended model).
Model performance was evaluated by measuring the area under the receiver operating characteristic curve (AUC) for the 300 models and averaging across all models for each feature set. The AUC shows the degree of separability between two conditions (ie, MDD vs subthreshold depression) and represents the probability that a randomly selected subject with the condition is rated or ranked as more likely to have the condition than a randomly selected individual without the condition (AUC: ≥0.9=excellent; ≥0.8=good; ≥0.7=fair; ≥0.6=poor; ≥0.5=fail) [
The mean sensitivity, specificity, and accuracy scores per model were also evaluated. Here, sensitivity refers to the model’s ability to classify MDD cases correctly (ie, true positives), whereas specificity refers to the model’s ability to classify subthreshold depression cases correctly (ie, true negatives). Accuracy corresponds to the model’s ability to classify all true cases (ie, both true positives and true negatives).
Relative feature importances (ie, Gini impurity [
Finally, to establish the diagnostic performance of the PHQ-9 on the basis of its intended use, we calculated the sensitivity, specificity, and accuracy in the current sample using the standard cut-off score of ≥10 [
This model comprised 36 features, with analyses demonstrating good discriminatory performance on both the training (AUC=0.89±0.03) and test sets (AUC=0.89±0.04;
Area under the receiver operating characteristic curves showing mean predictive performance of the depression model. The models were applied to predict the probability of major depressive disorder in the: (1) training and (2) test sets. AUC: area under the receiver operating characteristic curve; CV AUC: cross-validated area under the receiver operating characteristic curve; MDD: major depressive disorder; ROC: receiver operating characteristic.
The top 20 features contributing to the depression model (averaged across all 300 models) were leaden paralysis, tiredness, low energy, harder to concentrate, functional impairment (work), restlessness, functional impairment (leisure), excessive or inappropriate guilt, short-tempered, easily annoyed, easily fatigued, functional impairment (home), decreased enjoyment, irritability, blaming oneself, significant weight change, functional impairment (relationships), decreased interest, large appetite, and unable to relax.
Top 20 mean relative importance for the depression-specific model. Features have been ordered from most to least important.
Next, we added 98 features to the model, resulting in an extended model comprising 134 features. Analyses demonstrated excellent discriminatory performance on both the training (AUC=0.94±0.03) and test sets (AUC=0.94±0.04;
On the basis of these findings, we then reran the analyses using a
The analyses revealed a significant improvement in the model’s discriminatory performance on both the training (AUC=0.95±0.02) and test sets (AUC=0.95±0.03;
Area under the receiver operating characteristic curves showing mean model performance of the truncated version of the extended model. The models were applied to predict the probability of major depressive disorder in the: (1) training and (2) test sets. AUC: area under the receiver operating characteristic curve; CV AUC: cross-validated area under the receiver operating characteristic curve; MDD: major depressive disorder; ROC: receiver operating characteristic.
Mean relative importance for the 33 features in the truncated version of the extended model. Features have been ordered from most to least important and colored according to the disorder or symptom cluster they correspond to.
The sensitivity of the PHQ-9 for detecting MDD was 92.2%, whereas the specificity was 54.3%, and the overall diagnostic accuracy was 66.2%.
This study provides evidence for an extended definition of MDD symptomatology and supports the use of a digital assessment as an aid to clinical decision-making in the identification of MDD. Relative to a disorder-specific model of MDD psychopathology, an extended model of symptomatology was better at distinguishing between individuals with MDD and those with subthreshold levels of the disorder. In particular, a truncated version of the model, comprising symptoms of depression, GAD, insomnia, emotional instability, and panic disorder, demonstrated excellent predictive performance (AUC=0.95; sensitivity=86.5%; specificity=90.8%; and accuracy=89.5%).
Critically, although the PHQ-9 was particularly good at detecting MDD in the current sample, it tended to overdiagnose MDD in subthreshold depression and, in turn, was associated with poor overall diagnostic performance. Overdiagnosis of MDD presents a significant problem and has the potential for antidepressant overprescription and adverse drug effects in individuals who may benefit from alternative treatment options [
Overall, the findings from our models are in line with the notion that current diagnostic practices may present a narrow conception of mental health that does not allow for the wide range of clinical signs and symptoms that are endorsed by individuals with MDD. Across our models, the most predictive symptom of MDD was leaden paralysis, which refers to an extreme form of fatigue or heavy, leaden feelings in the arms and legs. This finding is in line with a recent study by Han et al [
As expected, symptoms of GAD were among the most predictive features of MDD, with
Although our findings should be interpreted with caution, our view is that this should not detract from the importance of assessing for transdiagnostic symptoms of MDD, especially as these are likely to share common underlying pathophysiology and genetics [
Indeed, suicidality has particularly been associated with the co-occurrence of depression and panic disorder [
Finally, symptoms of emotional instability or personality disorder, including feeling empty, low self-esteem, and self-harm, were also seen to be important when distinguishing between MDD and subthreshold depression. Feeling empty or chronic emptiness has been closely related to depression and suicidal ideation [
Taken together, our findings indicate that the current diagnostic criteria for MDD may fail to evaluate relevant clinical information that is important for the diagnosis and treatment of individuals with the disorder. Although time is a luxury in the primary care setting, our study supports the use of digital technologies as a means for obtaining a more comprehensive depiction of MDD symptomatology in a time-efficient manner. Notably, related research using the same digital mental health assessment has highlighted the utility of the tool in distinguishing individuals with MDD from those with BD [
To our knowledge, this is the first study to provide evidence for an extended definition of MDD symptomatology using a digital assessment. Furthermore, the digital assessment was designed following an extensive analysis of existing validated questionnaires for psychiatric disorders and diagnostic manuals, as well as input from psychiatrists and a service user group. In addition, as opposed to the use of healthy controls as a reference population against which patients are compared, our subthreshold depression group represented a clinically relevant reference group. Finally, the use of ML methods meant that patterns in data could be more readily and accurately identified, whereas our NCV approach allowed us to obtain high algorithmic accuracy while ensuring the generalizability of the models.
This study also had several limitations. First, as with any supervised ML approach to psychopathology, our analyses were limited by the
In an attempt to answer the question “when does depression become a mental disorder?”, our study demonstrated that a data-driven view of MDD may improve our understanding of the condition. A more comprehensive conceptualization of the psychopathology of MDD, including symptoms of depression, GAD, insomnia, panic disorder, and emotional instability, may not only facilitate patient stratification but also allow for personalized treatment plans and strategies. Although further studies with larger sample sizes are required to replicate our findings, our study shines a positive light on the use of digital technologies as an innovative way to help develop and facilitate mental health care provision. In particular, digital technologies have the capacity to collect a vast range of key clinical information that may be important for the diagnosis and treatment of individuals with MDD.
Mean symptom severity per group was separated by disorder or symptom clusters.
Depression model: mean relative feature importance.
Depression model: percentage feature occurrences.
Receiver operating characteristic curves showing the mean predictive performance of the extended model. The models were applied to predict the probability of major depressive disorder in the training and test sets.
Extended model: mean relative feature importance colored by disorder or symptom clusters.
Extended model: percentage feature occurrences colored by disorder or symptom clusters.
Truncated model: percentage of feature occurrences colored by disorder or symptom clusters.
area under the receiver operating characteristic curve
bipolar disorder
Composite International Diagnostic Interview
Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition
Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition
generalized anxiety disorder
International Statistical Classification of Diseases, Tenth Revision
International Statistical Classification of Diseases, Eleventh Revision
major depressive disorder
machine learning
nested cross-validation
Patient Health Questionnaire–9
This study was funded by the Stanley Medical Research Institute (grant number: 07R-1888). The authors thank the participants for taking part in the delta study, as well as the CIDI interviewers, without whom this work would not have been possible.
SB is a director of Psynova Neurotech Ltd and Psyomics Ltd. SB, DC, GBO, LF, and EB have financial interests in Psyomics, Ltd, which provided funding and support for the delta study. SB, PE, and TO could benefit financially from any product that arises from work performed in the delta study.