Original Paper
Abstract
Background: There is a clear need for enhanced mental health assessment, depressive symptom (DS) evaluation being no exception. A promising approach to this aim is using virtual reality (VR), which entails the potential of adding a wider set of assessment domains with enhanced ecological validity. However, whilst several studies have used VR for both diagnostic and treatment purposes, its acceptance, in particular how exposure to virtual environments affects populations with psychiatric conditions remains unknown.
Objective: This study aims to report on the acceptability, usability, and cybersickness levels of a pilot VR environment designed for the purpose of differentiating between individuals with DSs.
Methods: The exploratory study, conducted in Italy, included 50 healthy controls and 50 young adults with mild-to-moderate DSs (without the need for a formal diagnosis). The study used an observational design with approximately 30 minutes of VR exposure followed by a self-report questionnaire battery. The battery included a questionnaire based on the Theoretical Framework of Acceptability, the System Usability Scale as well as the Simulator Sickness Questionnaire.
Results: Results indicate that the majority found VR acceptable for the purposes of mental health screening and treatment. However, for diagnostics, there was a clear preference for VR to be used by mental health professionals as a supplementary tool, as opposed to a stand-alone solution. In practice, following exposure to the pilot VR environment, generally, good levels of acceptability and usability were reported, but areas in need of improvement were identified (such as self-efficacy). Self-reported cybersickness levels were comparable to literature averages but were considerably higher among those with DSs.
Conclusions: These findings raise questions about the potential interplay between underlying somatic symptoms of depression and VR-induced cybersickness and call for more attention from the scientific community both in terms of methodology as well as potential clinical and theoretical implications. Conclusively, user support indicates a potential for VR to aid mental health assessment, but further research is needed to understand how exposure to virtual environments might affect populations with varying severity and other forms of psychiatric symptoms.
International Registered Report Identifier (IRRID): RR2-10.1186/ISRCTN16396369
doi:10.2196/68132
Keywords
Introduction
Background
In 2019, an estimated 970 million people worldwide, or 12.6% of the global population, were living with a mental disorder, with anxiety (301 million) and depression (280 million) being the most prevalent [
]. These conditions contribute substantially to the global burden of disease, with depression being one of the leading causes of disability worldwide [ ]. Despite considerable efforts to address this issue, there has been no substantial reduction in the burden of mental disorders since 1990 [ ].Depressive disorders (hereafter referred to as depression) present significant challenges in research, among other reasons due to the diagnostic uncertainty [
], characterized by both high heterogeneity and low reliability [ ]. The persistence of these challenges over different iterations of the DSM (Diagnostic and Statistical Manual of Mental Disorders) [ ] underscores the need to advance clinical practice, which currently predominantly relies on self-report questionnaires and clinical interviews [ ].In response to the challenges mentioned above, suggestions have been made to include a broader range of measures that cover more assessment domains and are not solely dependent on verbal self-report [
, ]. Such measures include speech, text, and facial expression analysis [ ]; genomics, transcriptomics, proteomics, metabolomics, and imaging [ ]; behavioral and physiological variables collected via wearable sensors [ ] or virtual reality (VR) systems [ ].VR systems are particularly relevant to mental health disorders due to immersive, controlled, and customizable environments that can be tailored to individual needs [
]. VR has been used across psychiatric pathologies but reviews consistently report that depression has remained understudied compared to other conditions [ , , ]. The gap in the literature is particularly evident for studies on assessment, of which we could only find three [ - ]. These are specifically focusing on attentional processes [ ], spatial navigation [ ], and social interactions [ ] in relation to depression. Despite these pioneering studies, the potential of VR to integrate multiple domains of data for enhancing assessment [ ] has yet to be explored.While efforts are being dedicated to improving treatment and the testing of various measures to enhance assessment, including the use of VR, even less attention has been paid to the evaluation of such novel approaches from a user perspective. Research on VR for depression treatment and assessment is limited, with even fewer studies focusing on its acceptability, usability, and tolerability. Beyond diagnostic accuracy, evaluating the acceptability and usability of diagnostic tools is crucial, particularly when involving individuals with mental health conditions, like depression. In such cases, where patients might face additional treatment barriers, including motivational difficulties, stigma, delayed help-seeking [
], or poor treatment adherence [ ], acceptability and usability play a crucial role in the successful implementation of new diagnostic and treatment technologies.Sekhon et al [
] define acceptability as a “multifaceted construct that reflects the extent to which people delivering or receiving a health care intervention consider it to be appropriate, based on anticipated or experienced cognitive and emotional responses to the intervention.” On the other hand, usability has various definitions and approaches [ ], but here we will refer to the extent to which the VR system can be used with ease for the purpose of assessing depression symptom severity. Third, tolerability may be defined in different ways, but here will refer to cybersickness severity. Cybersickness describes a visually induced motion sickness-like discomfort during or following exposure to VR. Cybersickness is accompanied by changes in the activity of the central and autonomic nervous systems and results in symptoms like headache, nausea, and eyestrain among others [ ].To the best of our knowledge, only 2 studies using VR have reported on the earlier concepts in relation to depression: one focusing on psychoeducation [
] and another on behavior activation [ ]. They both found generally high acceptability and satisfaction levels. Migoya et al [ ] also found favorable ratings for ease of use and perceived usefulness, whereas Paul et al [ ] reported a decreasing interest in continuing the program after having attended multiple sessions. Paul et al [ ] also assessed cybersickness, noting decreasing symptoms as the sessions progressed, but their unconventional scoring method (adding up raw scores as opposed to using weight adjustments) complicates comparisons with studies on healthy controls (HCs) [ ] or other psychiatric conditions [ ]. To date, no studies have compared cybersickness levels between depressed individuals and HCs.To tackle the aforementioned challenges, this study introduces a novel approach aimed at enhancing the quality—including the breadth and ecological validity—of depression assessment. A VR system and environment were developed to capture physiological data (heart rate, skin conductance, and eye tracking), behavioral data (curiosity-related behavior and attentional biases), and cognitive data (working memory, processing speed, sustained attention, cognitive flexibility, self-evaluation, and metacognitive sensitivity) hypothesized to be associated with depression [
]. These multidomain data are then analyzed via machine learning models and explainable artificial intelligence to assess depressive symptom (DS) severity. The categorization accuracy of this system is reported on in a dedicated study [ ], however, given the innovative nature of this approach and the gap in the literature, it is fundamental to consider the patient perspective as well. Therefore, the objective of this study was to evaluate the acceptability, usability, and cybersickness levels of this pilot virtual environment.Aim
The aim of the study was to assess the acceptability, usability, and cybersickness levels of a pilot VR environment designed to aid and enrich the assessment of DSs.
Methods
Overview
The study had an exploratory nature and followed an observational design, where all participants completed the same questionnaire batteries before and post VR and followed the same VR exposure protocol.
Sample
A total of 266 individuals were screened at the University of Padua, Italy, with continuous enrollment between November 2022 and November 2023 (
). The screening concerned the inclusion and exclusion criteria detailed later and was done using self-report questionnaires administered in Qualtrics (Qualtrics, Inc). Recruitment was completed when the predefined target sample size of 100 was reached ( ), including 50 participants with DSs and 50 participants classed as HCs. Participants were aged 18-35 years, fulfilled all inclusion criteria, and provided written informed consent. Further demographic and clinical characteristics of the sample are outlined in .
DSab (N=50) | HCc (N=50) | Between group difference, rrbd | |||||
Sex, n (%) | |||||||
Male | 10 (20) | 10 (20) | —e | ||||
Female | 40 (80) | 40 (80) | — | ||||
Age (years), mean (SD) | 23.0 (1.86) | 23.3 (1.52) | 0.13 | ||||
Education (years), mean (SD) | 16.1 (1.40) | 16.6 (0.78) | 0.14 | ||||
Hand dominance, n (%) | |||||||
Left | 0 (0) | 3 (6) | — | ||||
Right | 50 (100) | 47 (94) | — | ||||
Sight correction, n (%) | |||||||
None | 21 (42) | 26 (52) | — | ||||
Lenses | 10 (20) | 10 (20) | — | ||||
Glasses | 19 (38) | 14 (28) | — | ||||
Height (meter), mean (SD) | 1.67 (0.07) | 1.66 (0.07) | 0.07 | ||||
Weight (kg), mean (SD) | 62.7 (10.60) | 59.6 (7.85) | 0.12 | ||||
BMI, mean (SD) | 22.4 (3.00) | 21.5 (2.06) | 0.13 | ||||
Habitual sleep (hours/night), mean (SD) | 7.0 (0.98) | 7.3 (0.88) | 0.20 | ||||
Habitual smoker, n (%) | |||||||
Yes | 19 (38) | 20 (40) | — | ||||
No | 31 (62) | 30 (60) | — | ||||
Habitual consumption of alcohol, n (%) | |||||||
Yes | 11 (22) | 16 (32) | — | ||||
No | 39 (78) | 34 (68) | — | ||||
Habitual user of psychoactive drugs, n (%) | |||||||
Yes | 11 (22) | 4 (8) | — | ||||
No | 39 (78) | 46 (92) | — | ||||
PHQ-9f, mean (SD) | 11.3 (1.92) | 3.2 (1.48) | 0.87g | ||||
DASS-Dh, mean (SD) | 15.2 (7.40) | 3.5 (3.44) | 0.77g | ||||
DASS-Ai, mean (SD) | 9.6 (6.62) | 2.4 (2.89) | 0.61g | ||||
DASS-Sj, mean (SD) | 17.8 (6.73) | 7.7 (5.08) | 0.66g | ||||
PANAS-Posk before VRl, mean (SD) | 10.4 (3.53) | 12.2 (2.76) | 0.27m | ||||
PANAS-Pos after VR, mean (SD) | 8.9 (4.36) | 11.8 (3.61) | 0.36g | ||||
PANAS-Negn before VR, mean (SD) | 3.2 (2.98) | 1.9 (2.22) | 0.24o | ||||
PANAS-Neg after VR, mean (SD) | 3.2 (3.25) | 1.7 (2.24) | 0.24o |
aThe categorization is based on current depressive symptom severity assessment via the Patient Health Questionnaire-9 (score equal to 9 or above) and not the presence of a clinical diagnosis.
bDS: depressive symptom.
cHC: healthy control.
dGroup differences are calculated via 2-tailed Mann Whitney U tests and effect sizes reflect rank-biserial (rrb) correlations.
e—: not applicable.
fPHQ-9: Patient Health Questionnaire-9.
gP<.001.
hDASS-D: Depression Anxiety Stress Scale-Depression.
iDASS-A: Depression Anxiety Stress Scale-Anxiety.
jDASS-S: Depression Anxiety Stress Scale-Stress.
kjPANAS-Pos: Positive and Negative Affect Schedule – Positive affect
lVR: virtual reality.
mP<.01,
nPANAS-Neg: Positive and Negative Affect Schedule – Negative affect
oP<.05,
Power Calculation
A post hoc power analysis was conducted to assess the statistical power of the study based on the observed sample size of 100 participants (50 per group). Assuming a moderate effect size (Cohen d=0.50) and an α level of .05, the analysis indicated that the study has 0.7 power to detect between-group differences as visualized in
.
Inclusion and Exclusion Criteria
The screening for inclusion involved the assessment of current DS severity using the Patient Health Questionnaire-9 (PHQ-9; Kroenke et al [
]). Participants scoring 9 or above—constituting the upper limit for mild DSs [ ]—were assigned to the DS, while those scoring 5 or below were registered for the HC group ( ).For inclusion in either group, participants had to be free from any condition that would impair their ability to interact with the VR environment (eg, visual impairment without the possibility of correction via lenses) and able to provide written informed consent. Additional inclusion criteria for the DS group included the lack of diagnosis of any other psychiatric disorders than depressive or anxiety disorders, and treatment stability over the last 4 weeks if they received psychological or pharmacological treatment at the time of assessment. Participants in the control group had to be free from any previous or current psychiatric disorders.

Study Procedure
After providing informed consent and completing the pre-VR battery questionnaires, all participants were seated in a room equipped with the VR system. Following a headset and eye-tracking calibration, a tutorial was completed (aimed to help the understanding of how to move around and interact with the environment). Next, participants were free to explore the VR environment on their own terms but in the presence of a study administrator. The study session was concluded once the post-VR battery was completed.
Technological Specifications
Overview
The VR system was developed using the HP Reverb G2 Omnicept Edition headset, which includes a variety of biometric sensors such as eye tracking, a facial camera (lower), a heart rate sensor, and sensors for brain activity while maintaining the core features of the standard version. These features include a resolution of 2160 × 2160 px per eye, a 114° field of view, a 90 Hz refresh rate, and integrated audio.
Environment Development
The virtual environment was developed entirely in Unity 2020.3.39 LTS, using the HP Omnicept SDK and OpenXR to maximize compatibility and platform versatility. Several development processes were used to ensure a smooth and efficient environment, including occlusion culling, foveated rendering, baked lighting, and asset optimization.
The scenario consists of 4 rooms, each with distinct lighting parameters, as well as an additional room that can be visually explored through a keyhole in a door. Volume postprocessing was used to modify the lighting without affecting performance, and the keyhole effect was simulated using camera layers.
Within each room, various stimuli and interactive elements are present, such as sound boxes, a chalkboard, and interactable objects (eg, items that can be picked up and thrown, musical buttons, and photos that can be picked up for closer inspection). Every interaction between the participant and the environment is recorded, including eye tracking data, particularly in relation to elements designated as areas of interest or with which the subject can interact (eg, ENTERING_ROOM, LEAVING_ROOM, TAKE, DROP, TRIGGER, INTERACT).
Administrator Control
To manage the experiment, the study administrator is provided with an interface, enabling them to observe the participant’s actions in real-time. The interface also offers various options to ensure that the experimental session proceeds smoothly (eg, it is possible to discontinue a cognitive task if the participant requests so). This control system is facilitated by an additional camera mounted on the subject, enabling the administrator to monitor the experiment’s progression effectively.
Virtual Environment
The goal of the pilot VR environment is to differentiate between HCs and those with DSs based on cognitive, behavioral, and physiological data. The development and performance of the system for this purpose will be discussed in more detail in another paper [
], but the main characteristics will be summarized here for a general overview.The environment was designed to elicit certain behaviors and record measures, some of which were previously shown, and others are hypothesized to be connected to depression [
]. The domains included cognition (attention, working memory, processing speed, executive functioning, and cognitive flexibility), metacognition, persistence or grit, curiosity, as well as behavioral and attentional biases toward negative stimuli, speed, and movement measurements. Physiological data included skin conductance, heart rate variability, and eye-tracking–related measures.The environment resembled a multiroom family home (
). Exposure started with a tutorial introducing participants to controllers as well as how to move around in the environment and interact with objects. Following the completion of the tutorial, participants were free to explore the environment but had to solve cognitive tasks to open doors and move between rooms. There were 4 cognitive tasks in total: Rapid Visual Information Processing; N-back [2-back]; Trail Making Test—parts A and B; and Wisconsin Card Sorting Test. The exposure ended on the participant’s own terms, but the exit door only opened once all cognitive tasks were completed.Multimodal data collected during the VR sessions for the purpose of DS assessment were analyzed using a CatBoost machine learning algorithm (Yandex LLC) [
]. Model performance was evaluated based on accuracy, and feature importance was determined through an explainable artificial intelligence approach [ ], ensuring transparency in the decision-making process. For more details, see the study by Mura et al [ ].
Tools and Questionnaires
Pre-VR Questionnaires
The pre-VR battery included a background questionnaire (demographic data and diagnosis, medication); the Patient Health Questionnaire (PHQ-9) [
] to assess current DS severity; the Depression, Anxiety, Stress Scale (DASS) [ ] to assess current depressive, anxiety and stress symptom severity; the Positive and Negative Affect Schedule (PANAS-SF) [ ] to evaluate emotional state; as well as the Curiosity and Explorations Inventory-II [ ].Post-VR Questionnaires
The post-VR battery repeated the PANAS-SF questionnaires and included questionnaires regarding, acceptability, usability, and cybersickness, described in more detail in the later sections.
Acceptability
The construct of acceptability was assessed based on the theoretical framework of acceptability (TFA) [
, ]. The acceptability of the concept and the acceptability of the pilot system were evaluated separately. The acceptability of the concept, that is, the use of VR for the purposes of mental health screening, diagnosis, and treatment (definitions were provided) was assessed via self-report Likert items. Scores ranged from 1 to 5, indicating answers from “Strongly disagree” to “Strongly agree” on statements about VR being acceptable ( ). As the primary purpose of the pilot system was the assessment of DSs, the diagnostic use case was investigated via 2 additional questions.- Support tool: Willingness to engage with the system to inform diagnosis made by a mental health professional.
- Stand-alone tool: Willingness to engage with the system to receive a diagnosis without extra input from a mental health professional.
The dimensions of acceptability concerning the pilot system were assessed within the TFA and focused on general acceptability, affective attitude, perceived effectiveness, intervention coherence, self-efficacy, burden, ethicality, and opportunity cost. The construct of affective attitude was further broken down into two questions: (1) To what extent a participant likes or dislikes the system, and (2) how comfortable or uncomfortable they find the system to be. For ease of interpretability, the rating scale was unionized across all constructs, where “1” represented the “undesirable” rating (eg, low levels of comfort; or the need for high levels of effort indicated burden), and “5” reflected the “desirable” rating (eg, high levels of comfort; or low levels of effort reported under burden). See
for details about the specific items used to assess each of these constructs.Usability
Usability was assessed by the System Usability Scale (SUS) [
]—which is a postinteraction self-report questionnaire to assess user perceptions of a system’s overall usability. The scale is composed of 10 items rated on a Likert scale from 1 to 5, from “strongly disagree” to “strongly agree.” The total score ranges from 0 to 100, with higher scores indicating higher usability ratings. There are multiple frameworks to interpret the SUS score (range, adjective, and grade) [ ], but the primary threshold for this study was the average score necessary for an ‘OK/satisfactory’ experience (a score of 50.9 [ ]).Cybersickness
To assess levels of cybersickness following VR exposure, the Simulator Sickness Questionnaire (SSQ) was used [
]. The questionnaire contains 16 symptoms, which are rated on a scale of subjective severity: 0=none, 1=slight, 2=moderate, and 3=severe. Beyond the total score, one can attain separate scores for the nausea, oculomotor disturbances, and disorientation subscales.However, while the SSQ was once considered to be the “gold standard,” it has increasingly been criticized, including the original thresholds for the interpretation of the severity levels [
- ]. For this reason, the comparison of the pilot system results was made to the literature average [ ] instead of the original thresholds.Analysis
All analyses were carried out in R supplemented by R Studio (version 2023.12.1.402; R Foundation for Statistical Computing), except for the power calculation for which G*Power (version 3.1; Heinrich Heine University Düsseldorf) was used. Multiple packages were used to supplement analysis and visualization, these include but are not limited to: dplyr [
], vioplot [ ], ggplot2 [ ], psych [ ], rstatix [ ], ggeffects [ ], ordinal [ ], and VGAM [ ]. See for the script, its output, and a full list of packages.Within-group comparisons and comparisons to known literature values were conducted using Wilcoxon signed rank tests, while between-group comparisons used Mann-Whitney U tests. These nonparametric methods were chosen to account for the ordinal nature of the outcome measures, such as Likert scales, and to address the observed violations of the assumptions required for parametric tests (eg, normality assessed via the Shapiro-Wilk test). A nominal significance level (α) of .05 was applied. For ease of interpretability, effect sizes were consistently reported in r values. This includes the use of rank-biserial correlations for Mann-Whitney U tests as well as Wilcoxon signed rank tests for quantifying the magnitude of the differences in a standardized manner. The full statistics are available in
.Multivariate linear regression models were used to analyze predictors when the outcome variable was continuous, such as SUS or SSQ scores. For ordinal outcome measures, such as TFA constructs, proportional odds models were applied. In cases where the proportional odds assumption was not met (as noted in the Results section), results should be interpreted with caution. All regression models included age, sex, and education as covariates. When additional predictors, such as cybersickness severity scores, group labels, or self-reported anxiety and stress levels were relevant, these were included alongside the demographic variables.
Ethical Considerations
The study was carried out in accordance with the Declaration of Helsinki and obtained ethical approval from the local ethical committees on all sites where data was collected (Italy) or analyzed (Italy and Sweden). In Italy, approval was granted by Comitato Etico della Ricerca Psicologica, and in Sweden by the Etikprövningsmyndigheten (2023-00959-01).
Written informed consent was sought from all participants, with emphasis placed on the voluntary nature of participation and the possibility to withdraw from the study at any time, without reason, or any penalty. All participants were informed about data privacy and confidentiality, whereby all data obtained during the study were analyzed anonymously and used solely for the purpose of scientific research. Compensation included €25 (worth between US $24.38 and US $28.14 over the duration of the recruitment period).
Preregistration
The study was preregistered in the ISRCTN registry (ISRCTN16396369) [
]. The sole modification to the preregistered protocol concerned the threshold for inclusion on current DS severity. Due to the difficulty in recruiting volunteers in line with the study’s timeline, the upper threshold for inclusion in the control group was increased from 4 to 5, while the lower threshold to be included in the group with DSs was decreased from 10 to 9 on the PHQ-9.Results
Acceptability of the Concept
The use of VR for mental health treatment (62/84, 74%) and screening (65/84, 77%) was endorsed by the majority of those providing an answer, whereas only a minority indicated support for diagnostic purposes (28/83, 34%;
). There were no significant differences between the DS and HC groups.Comparing the use of VR with or without input from a mental health professional, the responses indicate higher support for the VR system to be used as a support tool (62/83, 75%) compared to a stand-alone solution (15/84, 18%). A Wilcoxon signed rank test indicated a significant difference of large size (V=1770; P<.001, 2-tailed; r=0.81, 95% CI 0.76-0.84; mediansupport 4, IQRsupport 3.5-4.5; medianstand-alone 2, IQRstand-alone 1.0-3.0). For the raw distribution of responses see
. Again, no difference was observed between the DS and HC groups.

Acceptability of the Pilot System
Behavioral measures of acceptability include the discontinuation of participation for any reason, as well as the need to take a break from the study. While the possibility to discontinue participation at any point, without any consequences was stated before enrollment, all 100 volunteers completed the study. The need for a break was expressed by 1 participant (1/100, 1%) due to discomfort. They decided to resume participation despite the possibility of opting out being reemphasized. On average, 31 minutes were spent in the virtual environment.
Acceptability was also assessed via a self-report questionnaire specifically formulated for the project based on the TFA [
, ]. Ratings, visualized in , indicated generally high levels of acceptability.
A majority of the participants found the system itself acceptable (81/84, 96%), liked the system (74/83, 89%), indicated that the engagement took little to no effort (71/84, 85%), found it clear how the system might help diagnose depressive disorders (60/84, 71%), found no moral or ethical consequences with the system (53/84, 63%), found it comfortable (57/84, 68%), and expressed no concern about missing out on other alternatives when engaging with the system (43/84, 51%). The 2 constructs on which the undesirable ratings (1 or 2) outnumbered desirable scores (4 or 5) concerned the perceived effectiveness of the system in improving mental state (19/84, 23%), and self-efficacy regarding feeling confident while engaging with the system (21/99, 21%).
Multivariate ordinal regression models were used to explain variance in ratings of all the acceptability constructs as detailed in
. Sex was a significant predictor of general acceptability and subjective burden, where women reported considerably lower levels of acceptability (meanmen 4.65, SDmen 0.49; meanwomen 4.30, SDwomen 0.65) and reported lower ratings on burden—meaning a considerably higher effort was required from their part to engage with the system (meanmen 4.29, SDmen 0.50; meanwomen 3.69, SDwomen 0.87). Education did not reach significance for any of the TFA constructs, and age was marginally significant only when used to explain whether participants found moral or ethical consequences to engaging with the system.In the next stage, group label (DS vs HC) was entered into the same models in addition to the demographic variables. Group label was a significant predictor of comfort levels (meanHC 3.83, SDHC 0.78; meanDS 3.34, SDDS 1.08) and marginally significant for the like or dislike rating (meanHC 4.38, SDHC 0.59; meanDS 4.00, SDDS 0.95)—meaning that participants in the DS group reported lower comfort levels and liked the system to a lesser extent.
Finally, cybersickness severity (total SSQ score) was entered into models explaining all TFA constructs, in addition to the demographic variables. Cybersickness was a significant predictor for comfort, burden, opportunity cost, and perceived effectiveness and marginally significant for self-efficacy (
). All showed an inverse relationship, whereas the higher the cybersickness symptom severity, the lower acceptability scores were reported on all TFA dimensions.In summary, both the concept and the system received high acceptability ratings. Most participants supported the use of VR for mental health screening and treatment. However, regarding diagnosis, participants preferred VR as a supportive tool rather than a stand-alone solution. The primary area for improvement identified was enhancing users’ self-efficacy while interacting with the system. Sex and cybersickness emerged as significant predictors of acceptability ratings.
Model | Predictor | Estimate (log odds) | 95% CI | P value | ||||||||||||||||||||||
General acceptability | ||||||||||||||||||||||||||
Model 1 | Age | 0.175 | –0.19 to 0.56 | .36 | ||||||||||||||||||||||
Model 1 | Sex | –1.331 | –2.54 to –0.21 | .02a | ||||||||||||||||||||||
Model 1 | Education | –0.529 | –1.10 to 0.01 | .053 | ||||||||||||||||||||||
Model 2 | Group (DSb vs HCc)d | –0.261 | –1.19 to 0.65 | .58 | ||||||||||||||||||||||
Model 3 | Cybersickness | –0.004 | –0.02 to 0.01 | .54 | ||||||||||||||||||||||
Like or dislike | ||||||||||||||||||||||||||
Model 1 | Age | –0.298 | –0.67 to 0.05 | .10 | ||||||||||||||||||||||
Model 1 | Sex | –0.461 | –1.52 to 0.58 | .39 | ||||||||||||||||||||||
Model 1 | Education | 0.164 | –0.34 to 0.67 | .52 | ||||||||||||||||||||||
Model 2 | Group (DS vs HC)d | –0.890 | –1.81 to 0.00 | .053 | ||||||||||||||||||||||
Model 3 | Cybersickness | –0.010 | –0.03 to 0.01 | .20 | ||||||||||||||||||||||
Comfort | ||||||||||||||||||||||||||
Model 1 | Age | –0.045 | –0.38 to 0.29 | .79 | ||||||||||||||||||||||
Model 1 | Sex | –0.718 | –1.84 to 0.37 | .20 | ||||||||||||||||||||||
Model 1 | Education | 0.116 | –0.36 to 0.58 | .63 | ||||||||||||||||||||||
Model 2 | Group (DS vs HC)d | –0.899 | –1.81 to –0.02 | .048a | ||||||||||||||||||||||
Model 3 | Cybersickness | –0.026 | –0.04 to –0.01 | .001e | ||||||||||||||||||||||
Burden | ||||||||||||||||||||||||||
Model 1 | Age | 0.137 | –0.26 to 0.54 | .50 | ||||||||||||||||||||||
Model 1 | Sex | –1.866 | –3.26 to –0.59 | .005e | ||||||||||||||||||||||
Model 1 | Education | –0.186 | –0.78 to 0.38 | .53 | ||||||||||||||||||||||
Model 2 | Group (DS vs HC)d | –0.273 | –1.28 to 0.72 | .59 | ||||||||||||||||||||||
Model 3 | Cybersickness | –0.036 | –0.06 to –0.02 | <.001f | ||||||||||||||||||||||
Intervention coherence | ||||||||||||||||||||||||||
Model 1 | Age | 0.085 | –0.25 to 0.42 | .62 | ||||||||||||||||||||||
Model 1 | Sex | –0.315 | –1.42 to 0.77 | .57 | ||||||||||||||||||||||
Model 1 | Education | –0.193 | –0.69 to 0.29 | .44 | ||||||||||||||||||||||
Model 2 | Group (DS vs HC)d | –0.567 | –1.44 to 0.29 | .20 | ||||||||||||||||||||||
Model 3 | Cybersickness | –0.003 | –0.02 to 0.01 | .71 | ||||||||||||||||||||||
Ethicality | ||||||||||||||||||||||||||
Model 1 | Age | 0.320 | –0.00 to 0.65 | .053 | ||||||||||||||||||||||
Model 1 | Sex | 0.241 | –0.74 to 1.23 | .63 | ||||||||||||||||||||||
Model 1 | Education | –0.166 | –0.66 to 0.33 | .51 | ||||||||||||||||||||||
Model 2 | Group (DS vs HC)d | 0.170 | –0.63 to 0.98 | .68 | ||||||||||||||||||||||
Model 3 | Cybersickness | –0.010 | –0.02 to 0.00 | .15 | ||||||||||||||||||||||
Opportunity costg | ||||||||||||||||||||||||||
Model 1 | Age | –0.078 | –0.39 to 0.24 | .63 | ||||||||||||||||||||||
Model 1 | Sex | –0.496 | –1.53 to 0.50 | .34 | ||||||||||||||||||||||
Model 1 | Education | 0.023 | –0.44 to 0.49 | .92 | ||||||||||||||||||||||
Model 2 | Group (DS vs HC)d | 0.336 | –0.46 to 1.14 | .41 | ||||||||||||||||||||||
Model 3 | Cybersickness | -0.019 | –0.03 to –0.01 | .009e | ||||||||||||||||||||||
Perceived effectiveness | ||||||||||||||||||||||||||
Model 1 | Age | –0.116 | –0.44 to 0.20 | .48 | ||||||||||||||||||||||
Model 1 | Sex | –0.757 | –1.79 to 0.25 | .14 | ||||||||||||||||||||||
Model 1 | Education | –0.045 | –0.53 to 0.43 | .85 | ||||||||||||||||||||||
Model 2 | Group (DS vs HC)d | –0.548 | –1.38 to 0.27 | .19 | ||||||||||||||||||||||
Model 3 | Cybersickness | –0.019 | –0.04 to –0.00 | .02a | ||||||||||||||||||||||
Self-efficacyh | ||||||||||||||||||||||||||
Model 1 | Age | 0.063 | –0.24 to 0.37 | .68 | ||||||||||||||||||||||
Model 1 | Sex | –0.872 | –1.86 to 0.08 | .08 | ||||||||||||||||||||||
Model 1 | Education | 0.031 | –0.40 to 0.48 | .89 | ||||||||||||||||||||||
Model 2 | Group (DS vs HC)d | 0.043 | –0.70 to 0.79 | .91 | ||||||||||||||||||||||
Model 3 | Cybersickness | -0.01 | –0.03 to 0.00 | .08 |
aP<.05.
bDS: depressive symptom.
cHC: healthy control.
dThe categorization is based on current depressive symptom severity assessment via the PHQ-9 questionnaire (score equal to 9 or above for depressive symptoms [DS] and 5 or below for healthy controls [HC]) and not the presence of a clinical diagnosis.
eP<.01.
fP<.001.
gThe assumption of proportional odds not was not fulfilled for the model. Results should be interpreted with caution.
hSelf-efficacy was measured with Item 9 of the System Usability Scale [
], as opposed to the other items that were specifically designed for the assessment of the pilot system based on the theoretical framework of acceptability [ , ].Usability
Descriptives
Results from 99 participants indicated average ratings of usability, with a mean score of 69 (SD 12.86), and a range of 37.5-97.5 on SUS [
]. As visualized in , which showcases the distribution of overall SUS scores, 88% (87/99) of participants provided a rating of 50.9 or above, indicating an OK or satisfactory experience or better [ ].
Analysis
A multiple linear regression analysis was conducted to predict usability ratings based on participants' age, education, gender, and group (DS vs HC). The regression model was not statistically significant (F4,94=1.23; P=.31), explaining only about 5% of the variance in SUS scores (R2=0.05, adjusted R2=0.01). The strongest predictor was gender, but this only showed a marginal effect (β=–5.79, 95% CI –12.38 to 0.80; P=.08) with men reporting higher usability ratings than women.
A second model was tested examining the relationship between usability scores and the predictors of age, education, sex, and total cybersickness severity. The model was significant (F4,94=2.97; P=.02), accounting for 11.2% of the variance in SUS scores (R2=0.11, adjusted R2=0.07). Cybersickness severity (SSQ scores) emerged as the only significant predictor (β=–0.12, 95% CI –0.21 to –0.03; P=.007), indicating that participants with higher SSQ scores tended to have lower SUS scores.
In summary, the system’s overall usability was rated as sufficient, with the majority of participants reporting a satisfactory experience. Cybersickness severity emerged as the only predictor of usability ratings, showing a negative association.
Cybersickness
Descriptives
The overall mean SSQ score combining all subscales was 30.49 (SD 28.03), with individual scores ranging from 0 to 119.68. This is comparable to the literature average of 28.00 reported by Saredakis et al [
], on the overall sample, as well as the DS subsample. However, the HC group reported considerably lower levels of cybersickness severity (V=402; P=.02; rrb=–0.37, 95% CI –0.61 to –0.07) as summarized in .For the SSQ subscales, the results of the total sample reflect severity levels not (significantly) different from the literature average [
] regarding nausea and disorientation levels, whereas oculomotor disturbances were rated considerably higher in this study (V=3296; P=.004; rrb=0.33, 95% CI 0.12-0.52).When the comparison was split by study groups, only the DS group deviated significantly from the literature average—specifically on the nausea (V=917; P=.002; rrb=0.50, 95% CI 0.22-0.70) and the oculomotor disturbances (V=967; P<.001; rrb=0.58, 95% CI 0.33-0.75) subscales. The HC group showed near-average levels on all 3 subscales. The distributions of raw scores, as boxplots, along the total and all subscales are illustrated in
and analyzed for between-group differences in the later section.SSQ Scale | Literature averaged | Total sample, mean (SD) | DSe, mean (SD) | HC, mean (SD) |
Total | 28.00 | 30.49 (28.03) | 39.39 (32.89) | 21.77 (18.84)f |
Nausea | 16.72 | 21.59 (21.96) | 29.40 (24.39)g | 13.93 (16.15) |
Oculomotor disturbances | 17.09 | 26.72 (24.63)g | 33.57 (27.50)h | 20.01 (19.48) |
Disorientation | 23.50 | 32.90 (39.84) | 42.04 (48.34) | 23.94 (26.83) |
aSSQ: Simulator Sickness Questionnaire.
bDS: depressive symptom.
cHC: healthy control.
dLiterature averages are taken from Saredakis et al [
] based on a meta-analysis of 55 studies on approximately 3000 participants.eThe categorization is solely dependent on the Patient Health Questionnaire-9 score and does not indicate the presence of a clinical diagnosis.
fP<.05.
gP<.01.
hP<.001.

Analysis
A multiple linear regression was conducted to predict SSQ scores based on age, education, sex, and group (DS vs HC). The model was statistically significant (F4,94=3.20; P=.02), accounting for approximately 12% of the variance in SSQ scores (R2=0.12, adjusted R2=0.08). The only predictor reaching significance was group label, (β=17.58, 95% CI 6.57-28.60; P=.002), indicating that participants in the DS group had considerably higher SSQ scores (mean 39.38, SD 32.89) than the HC group (mean 21.77, SD 18.84). These between-group differences remained significant for all subscales analyzed separately.
When DS severity was entered as a continuous variable instead of a group label, a further 2% variance in cybersickness severity was explained (R2=0.14, adjusted R2=0.11) and a linear prediction of SSQ scores along the PHQ-9 scores was attainable as visualized in
A. The graph shows all participants along the SSQ and PHQ-9 scales and illustrates a positive association between the 2 scores when the regression line is plotted.However, due to the baseline group differences in anxiety and stress levels (
) this prediction might be biased. While DASS depression was not of primary interest, a multiple linear regression model with DASS depression as an outcome was used, to be able to adjust for DASS anxiety and DASS stress levels in addition to age, sex, and education levels as covariates. This model was also significant (F6,92=4.05; P=.001) and explained about 21% of the variance in cybersickness severity (R2=0.21, adjusted R2=0.16). The only predictor emerging as significant was DASS depression (β=1.15, 95% CI 0.29-2.01; P=.009), whereas DASS anxiety reached marginal significance (β=1.2, 95% CI –0.07 to 2.47; P=.06). Cybersickness severity predictions showed a similar trend as for the PHQ-9 scores when the regression line was illustrated in B.
In summary, the severity of cybersickness was comparable to the literature averages for nausea and disorientation but significantly higher for oculomotor disturbances, particularly in the DS group. Across all subscales, the DS group reported higher SSQ scores than the HC group, with group membership emerging as the strongest predictor of cybersickness severity. A positive association was identified between DS severity (PHQ-9) and cybersickness, explaining a modest but significant proportion of variance. Even after adjusting for baseline anxiety and stress levels (DASS), DSs remained a significant predictor of cybersickness.
Discussion
Principal Findings
This study reports on the acceptability, usability, and cybersickness levels of a novel VR environment designed for the assessment of DSs. Following an approximately 30-minute engagement, 50 HCs and 50 individuals with moderate self-reported DSs completed a self-report battery on 3 constructs of interest.
Acceptability
In general, VR systems were rated acceptable by the majority of participants; both for the purpose of mental health screening as well as treatment. For diagnostics, a clear preference emerged for VR to be used as a support tool by a health care professional, as opposed to a stand-alone solution. While there is no literature available on VR for depression with which a direct comparison can be made, these findings are in line with previous results concerning other digital solutions, such as mHealth apps [
].The specific pilot system tested here was designed for the purpose of differentiating between HCs and those with DSs [
]. The system itself was rated acceptable by most participants, but the need for improvement was apparent; specifically when it came to participants’ confidence levels while using the system. Nevertheless, the high level of acceptance in practice found in the current study is in line with 2 studies previously reported on the acceptability of specific VR systems for therapeutic purposes for depression [ , ].Usability
Usability was rated satisfactory by most of the participants, with cybersickness severity emerging as a significant predictor of usability scores. We found 1 previous study reporting on the usability of VR systems for depression, which has also indicated favorable ratings on ease of use and perceived usefulness [
]. However, in comparison with other psychiatric conditions, such as social anxiety disorder [ ], psychosis [ ], or posttraumatic stress disorder [ ], it is important to emphasize that systematic evaluations of the usability of VR among those with depression are lacking.Concerning the effect of cybersickness severity on usability, there are no previous studies reporting in relation to depression. Additionally, while comprehensive reviews on the relationship between cybersickness and presence are available [
] there is no such overview for usability. Existing single studies report inconsistent findings across different conditions. For example, a negative association is reported on healthy individuals [ ], but no effect was found among those with Huntington disease [ ].These findings highlight the need for greater attention from the scientific community, as depression remains relatively understudied in the context of VR use compared to other psychiatric conditions [
, ]. Notably, participants in this study expressed support for the technology both conceptually and in relation to this specific pilot system. Identifying novel approaches with user support is particularly important for groups that might find treatment adherence and lack of motivation to engage challenging, such as those with depression [ ].The high levels of acceptance and usability that users have shown toward VR technology within the study, both in theory and practice, lay a solid ground for investigations into how VR could be used in a mental health context. VR being a premise for real-time data collection via direct observation of behavior contributes to higher ecological validity and allows for the consideration of a wider set of variables when assessing mental states [
, ]. While the potential is promising, reviews of real-life studies [ ], editorials [ ] as well as commentaries [ ], highlight a need for further scientific investigation, to better inform how VR is best implemented within clinical practice.Cybersickness
Finally, the results of the study show significant differences in self-reported cybersickness levels between HCs and those with DSs —the latter reporting considerably higher levels. This difference brings forward many questions, some relating to the self-report instrument used, as well as its administration; these will be discussed under the limitations more thoroughly. Beyond the critical appraisal of the instrument and optimizing the system so that we minimize the risk of cybersickness in general [
- ], this difference urges a consideration of how cybersickness might relate to depression. It is important to consider that our study lacked a pre-post comparison, which creates challenges when attempting to disentangle cybersickness symptoms evoked by VR exposure from potential baseline differences stemming from somatic symptoms related to depression. While “the language used in the medical literature to describe somatic symptoms in depression is both confusing and contradictory” [ ], one can pinpoint constructs related to depression, that may also be captured by cybersickness scales—which may cause contamination when assessing cybersickness levels. These include, but might not be limited to headache, fatigue, nausea, dizziness, and gastrointestinal disturbances [ , ]. This hypothesis needs further testing but is in line with the only study reporting on SSQ in a pre-post manner among those with psychiatric conditions including depression (heterogeneous sample of anxiety, psychotic, depressive, or bipolar disorder)—where the baseline levels of cybersickness would already be classed as “severe,” but there is no significant elevation reported post exposure [ ].Nonetheless, it is important to consider that the elevated cybersickness in the group with DSs could still be, at least in part, a response to the VR environment itself. If this is the case, using VR with the right study design may serve as a potential tool for further exploring the sensory abnormalities in depression, though ethical considerations are warranted. Investigating the underlying mechanisms of cybersickness in depression may also provide novel insights into the somatic manifestations of the condition and could help further define depression as well as refine interventions for mental health. Additionally, the need to consider cybersickness severity becomes even more apparent in light of the results showing its association with acceptability and usability ratings, which might transfer into the real-life uptake and use of VR technology.
Limitations and Future Directions
The study is not without its limitations. First, one has to consider the characteristics of the participants, as the sample consisted of young adults, the majority being women, and highly educated which limits generalizability [
]. In general, lower age and higher education levels are associated with increased digital health literacy [ ]. However, further reports are needed to understand how such sample characteristics—sociodemographic variables as well as DSs—influence user attitudes to digital technologies and engagement in a mental health context [ , ]. To this aim, there is a clear need for extending the research to include diverse age groups and educational backgrounds to enhance the generalizability and applicability of the findings.Second, the categorization between the HC and the DS groups was based solely on self-reported symptoms at the time of the study (using the PHQ-9) as opposed to the presence or absence of a clinical diagnosis. Other forms of assessment might be worthy of consideration. Additionally, the majority of individuals labeled as DS only reported moderate depression severity, which questions whether the results are generalizable to those with more severe symptoms. Self-report measures in general are subject to response biases, such as social desirability, or recall bias. The subjective evaluation of symptom severity in this case could potentially have resulted in over- or underreporting of severity.
Third, although we controlled for demographic covariates, baseline differences in anxiety and stress levels between the study groups may have impacted our results. Self-reported stress and anxiety significantly differed between the HC and DS groups, highlighting that between-group differences in our constructs of interest cannot unequivocally be attributed to differences in DS severity.
Fourth, the measurement of cybersickness levels leaves room for improvement on multiple fronts. The instrument (SSQ) has repeatedly been criticized for reasons among others being the low power to differentiate from anxiety and the questionable assumption of having zero “symptoms” at baseline [
- ]. The difference in severity between the HCs and those with DSs cannot unequivocally be attributed to the VR exposure, as no baseline levels were recorded. It is possible, that at least some degree of the difference originates from baseline differences in anxiety or somatic symptoms of depression, which the questionnaire most likely does not adequately differentiate from cybersickness symptoms. In the future, assessment of related symptoms necessitates pre-post comparisons, particularly relevant when the SSQ is being used [ - ].Fifth, when creating items to assess constructs of the TFA, 1 item was taken from the SUS questionnaire (item 9: confidence), while all other items were specialized to this study. Overall, single items created for this specific study lack the extensive work that foregoes validated questionnaires.
Sixth, the results of the ordinal regression models should be interpreted with caution, particularly in cases when the assumption of proportional odds was not fulfilled. This suggests that the relationship between predictors and the outcome variable may differ across categories and thus future studies could expand on these findings by exploring more flexible modeling approaches.
Seventh, we must acknowledge that our study was observational and exploratory in nature, aiming to assess acceptability and usability constructs without preestablished hypotheses, which warrants some considerations. An important consequence of the observational design is the lack of randomization. As a result, there is potential for selection bias and confounding factors that could influence the results. Concerning the exploratory nature: while this approach provides valuable insights, especially at early stages of development, confirmatory research is needed to test specific predictions and establish causal relationships. This is particularly relevant for the potential between-group differences on how VR exposure relates to cybersickness severity.
Finally, regarding future studies, it is important to highlight that clinician perspectives can form an important complement to patients’ user feedback. This dual approach provides insights that might not be accessible through the patient perspective alone, such as the feasibility and practicality of integrating VR solutions into clinical workflows [
, ]. As such, future studies could consider involving other user groups, and extend the consideration to implementation, feasibility, and cost-effectiveness of VR-based solutions to enable a more comprehensive evaluation.In summary, despite the limitations and the need for further research, this study is the first to assess the acceptability, usability, and cybersickness levels of a novel VR system incorporating multimodal data for depression symptom assessment. The findings support the potential of VR as a tool for mental health screening, assessment (when used as a supplementary tool), and treatment. While the system’s acceptability was generally favorable, areas for improvement were identified, particularly in enhancing users' self-efficacy during interactions. Usability ratings were satisfactory and cybersickness levels, on average, were comparable to other head-mounted displays. Notably, this study is the first to report elevated cybersickness levels among individuals with DSs, raising important theoretical and methodological questions for future research.
Conclusions
There is a clear need to enhance mental health diagnosis, potentially by incorporating a broader range of variables. This study explored a novel approach to increase the ecological validity of depression assessments and found that, while VR technologies are generally acceptable as a supplementary tool, they are not seen as a replacement for routine mental health evaluations. The study, which tested the initial version of a novel VR system, revealed that the majority of participants found the acceptability and usability satisfactory, despite experiencing considerable levels of cybersickness—which was found to be negatively associated with both constructs. Notably, the results highlighted previously unreported differences in cybersickness severity between individuals with DSs and HCs, warranting replication and further investigation. While the call for enhanced reliability in mental health assessments is well-founded, and novel approaches are under investigation for their efficacy and accuracy, greater emphasis must be placed on evaluating acceptability and usability. Ensuring a safe, acceptable, and user-friendly approach is essential for the successful implementation of these technologies.
Acknowledgments
The study was carried out within the EXPERIENCE project, which is funded by the European Commission H2020 Framework Program (grant 101017727).
Generative AI tools, including ChatGPT (OpenAI) and Perplexity (Perplexity AI, Inc), were used to enhance the clarity of the text, correct grammatical errors, and refine analysis code.
Data Availability
Access to the raw data will be considered on a case-by-case basis upon request. Interested parties may contact the corresponding author.
Authors' Contributions
SS contributed to methodology development, formal analysis, data curation, visualization, and writing the original draft. ETE was responsible for conceptualization, methodology development, formal analysis, data curation, and drafting the original manuscript. FM contributed to methodology development, formal analysis, data curation, and investigation, as well as drafting the original manuscript. VO developed the software, contributed to methodology development, data curation, and visualization, and participated in drafting of the original manuscript. V Catrambone contributed to methodology development and manuscript review and editing. GH was involved in methodology development and participated in reviewing and editing the manuscript. IT contributed to methodology development and manuscript review and editing. ALA participated in manuscript review and editing. V Cardi assisted in methodology development and manuscript review and editing. MGCAC contributed to manuscript review and editing. GM participated in methodology development and manuscript review and editing. MAR contributed to study conceptualization, provided resources, secured funding, supervised the project, and participated in project administration and manuscript review. GV conceptualized the study, secured funding, provided resources, supervised the project, and contributed to project administration and manuscript review. V Carli contributed to conceptualization, methodology, validation, project administration, supervision, and funding acquisition. V Carli also provided resources and participated in reviewing and editing the manuscript. CG was involved in conceptualization, methodology, validation, supervision, and project administration. CG provided resources, acquired funding, and reviewed and edited the manuscript. All authors reviewed the final manuscript.
Conflicts of Interest
None declared.
Multimedia Appendix 1
Questionnaire on acceptability based on the Theoretical Framework of Acceptability.
DOCX File , 40 KBMultimedia Appendix 2
Script used for the analysis and visualization, including its output.
PDF File (Adobe PDF File), 401 KBReferences
- GBD 2019 Mental Disorders Collaborators. Global, regional, and national burden of 12 mental disorders in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet Psychiatry. 2022;9(2):137-150. [FREE Full text] [CrossRef] [Medline]
- Friedrich MJ. Depression is the leading cause of disability around the world. JAMA. 2017;317(15):1517. [CrossRef] [Medline]
- Proudman D, Greenberg P, Nellesen D. The growing burden of major depressive disorders (MDD): implications for researchers and policy makers. Pharmacoeconomics. 2021;39(6):619-625. [FREE Full text] [CrossRef] [Medline]
- Lieblich SM, Castle DJ, Pantelis C, Hopwood M, Young AH, Everall IP. High heterogeneity and low reliability in the diagnosis of major depression will impair the development of new drugs. BJPsych Open. 2015;1(2):e5-e7. [FREE Full text] [CrossRef] [Medline]
- Liu X, Jiang K. Why is diagnosing MDD challenging? Shanghai Arch Psychiatry. 2016;28(6):343-345. [FREE Full text] [CrossRef] [Medline]
- Bell IH, Nicholas J, Alvarez-Jimenez M, Thompson A, Valmaggia L. Virtual reality as a clinical tool in mental health research and practice. Dialogues Clin Neurosci. 2020;22(2):169-177. [FREE Full text] [CrossRef] [Medline]
- Freeman D, Reeve S, Robinson A, Ehlers A, Clark D, Spanlang B, et al. Virtual reality in the assessment, understanding, and treatment of mental health disorders. Psychol Med. 2017;47(14):2393-2400. [FREE Full text] [CrossRef] [Medline]
- Mao K, Wu Y, Chen J. A systematic review on automated clinical depression diagnosis. Npj Ment Health Res. 2023;2(1):20. [FREE Full text] [CrossRef] [Medline]
- Bilello JA. Seeking an objective diagnosis of depression. Biomark Med. 2016;10(8):861-875. [CrossRef] [Medline]
- Abd-Alrazaq A, AlSaad R, Shuweihdi F, Ahmed A, Aziz S, Sheikh J. Systematic review and meta-analysis of performance of wearable artificial intelligence in detecting and predicting depression. NPJ Digit Med. 2023;6(1):84. [FREE Full text] [CrossRef] [Medline]
- Wiebe A, Kannen K, Selaskowski B, Mehren A, Thöne AK, Pramme L, et al. Virtual reality in the diagnostic and therapy for mental disorders: a systematic review. Clin Psychol Rev. 2022;98:102213. [FREE Full text] [CrossRef] [Medline]
- Cieślik B, Mazurek J, Rutkowski S, Kiper P, Turolla A, Szczepańska-Gieracha J. Virtual reality in psychiatric disorders: a systematic review of reviews. Complement Ther Med. 2020;52:102480. [FREE Full text] [CrossRef] [Medline]
- Duan S, Valmaggia L, Lawrence AJ, Fennema D, Moll J, Zahn R. Virtual reality–assessment of social interactions and prognosis in depression. J Affect Disord. 2024;359:234-240. [FREE Full text] [CrossRef] [Medline]
- Camacho-Conde JA, Legarra L, Bolinches VM, Cano P, Guasch M, Llanos-Torres M, et al. Assessment of attentional processes in patients with anxiety-depressive disorders using virtual reality. J Pers Med. 2021;11(12):1341. [FREE Full text] [CrossRef] [Medline]
- Cornwell BR, Salvadore G, Colon-Rosario V, Latov DR, Holroyd T, Carver FW, et al. Abnormal hippocampal functioning and impaired spatial navigation in depressed individuals: evidence from whole-head magnetoencephalography. Am J Psychiatry. 2010;167(7):836-844. [FREE Full text] [CrossRef] [Medline]
- Ghio L, Gotelli S, Marcenaro M, Amore M, Natta W. Duration of untreated illness and outcomes in unipolar depression: a systematic review and meta-analysis. J Affect Disord. 2014;152-154:45-51. [CrossRef] [Medline]
- DiMatteo MR, Lepper HS, Croghan TW. Depression is a risk factor for noncompliance with medical treatment: meta-analysis of the effects of anxiety and depression on patient adherence. Arch Intern Med. 2000;160(14):2101-2107. [CrossRef] [Medline]
- Sekhon M, Cartwright M, Francis JJ. Acceptability of healthcare interventions: an overview of reviews and development of a theoretical framework. BMC Health Serv Res. 2017;17(1):88. [FREE Full text] [CrossRef] [Medline]
- Sauer J, Sonderegger A, Schmutz S. Usability, user experience and accessibility: towards an integrative model. Ergonomics. 2020;63(10):1207-1220. [CrossRef] [Medline]
- Kim YY, Kim HJ, Kim EN, Ko HD, Kim HT. Characteristic changes in the physiological components of cybersickness. Psychophysiology. 2005;42(5):616-625. [CrossRef] [Medline]
- Migoya-Borja M, Delgado-Gómez D, Carmona-Camacho R, Porras-Segovia A, López-Moriñigo J-D, Sánchez-Alonso M, et al. Feasibility of a virtual reality–based psychoeducational tool (VRight) for depressive patients. Cyberpsychol Behav Soc Netw. 2020;23(4):246-252. [CrossRef] [Medline]
- Paul M, Bullock K, Bailenson J, Burns D. Examining the efficacy of extended reality-enhanced behavioral activation for adults with major depressive disorder: randomized controlled trial. JMIR Ment Health. 2024;11:e52326. [FREE Full text] [CrossRef] [Medline]
- Saredakis D, Szpak A, Birckhead B, Keage HAD, Rizzo A, Loetscher T. Factors associated with virtual reality sickness in head-mounted displays: a systematic review and meta-analysis. Front Hum Neurosci. 2020;14:96. [FREE Full text] [CrossRef] [Medline]
- Lundin RM, Yeap Y, Menkes DB. Adverse effects of virtual and augmented reality interventions in psychiatry: systematic review. JMIR Ment Health. 2023;10:e43240. [FREE Full text] [CrossRef] [Medline]
- Mura F, Alfeo AL, Sutori S, Eliasson ET, Catrambone V, Ortiz VG, et al. The EXPERIENCE system: integrating virtual reality and multi-modal explainable artificial intelligence for enhanced depression recognition. NA. 2025. (forthcoming)
- Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: Validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606-613. [FREE Full text] [CrossRef] [Medline]
- Prokhorenkova L, Gusev G, Vorobev A, Dorogush A, Gulin A. CatBoost: Unbiased boosting with categorical features. ArXiv. Preprint posted online on June 28, 2017. 2019. [CrossRef]
- Alfeo AL, Zippo AG, Catrambone V, Cimino MG, Toschi N, Valenza G. From local counterfactuals to global feature importance: efficient, robust, and model-agnostic explanations for brain connectivity networks. Comput Methods Programs Biomed. 2023;236:107550. [FREE Full text] [CrossRef] [Medline]
- Lovibond P, Lovibond SH. The structure of negative emotional states: comparison of the Depression Anxiety Stress Scales (DASS) with the Beck Depression and Anxiety Inventories. Behav Res Ther. 1995;33(3):335-343. [CrossRef] [Medline]
- Watson D, Clark LA, Tellegen A. Development and validation of brief measures of positive and negative affect: the PANAS scales. J Pers Soc Psychol. 1988;54(6):1063-1070. [CrossRef] [Medline]
- Kashdan TB, Gallagher MW, Silvia PJ, Winterstein BP, Breen WE, Terhar D, et al. The Curiosity and Exploration Inventory-II: Development, factor structure, and psychometrics. J Res Pers. 2009;43(6):987-998. [FREE Full text] [CrossRef] [Medline]
- Sekhon M, Cartwright M, Francis JJ. Development of a theory-informed questionnaire to assess the acceptability of healthcare interventions. BMC Health Serv Res. 2022;22(1):279. [FREE Full text] [CrossRef] [Medline]
- Brooke J. SUS: A quickdirty usability scale. In: Jordan PW, Thomas B, Weerdmeester BA, McClelland IL, editors. Usability Evaluation in Industry. London, UK. Taylor & Francis; 1996:189-194.
- Kortum P, Peres SC. Evaluation of home health care devices: remote usability assessment. JMIR Hum Factors. 2015;2(1):e10. [FREE Full text] [CrossRef] [Medline]
- Bangor A, Kortum P, Miller J. Determining what individual SUS scores mean: Adding an adjective rating scale. J Usability Stud. 2009;4(3):114-123. [CrossRef]
- Kennedy RS, Lane NE, Berbaum KS, Lilienthal MG. Simulator Sickness Questionnaire: An enhanced method for quantifying simulator sickness. Int J Aerosp Psychol. 1993;3(3):203-220. [CrossRef]
- Bimberg P, Weissker T, Kulik A. On the usage of the simulator sickness questionnaire for virtual reality research. 2020. Presented at: IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW); March 22-26, 2020:464-467; Atlanta, GA.
- Bouchard S, Berthiaume M, Robillard G, Forget H, Daudelin-Peltier C, Renaud P, et al. Arguing in favor of revising the factor structure when assessing side effects induced by immersions in virtual reality. Front Psychiatry. 2021;12:739742. [FREE Full text] [CrossRef] [Medline]
- Brown P, Spronck P, Powell W. The Simulator Sickness Questionnaire, and the erroneous zero baseline assumption. Front Virtual Real. 2022;3:945800. [CrossRef]
- Sevinc V, Berkman MI. Psychometric evaluation of simulator sickness questionnaire and its variants as a measure of cybersickness in consumer virtual environments. Appl Ergon. 2020;82:102958. [CrossRef] [Medline]
- Wickham H, François R, Henry L, Müller K, Vaughan D. Dplyr: A grammar of data manipulation. Tidyverse. 2023. URL: https://dplyr.tidyverse.org [accessed 2025-03-21]
- Adler D, Kelly S, Elliott T, Adamson J. Vioplot: Violin plot. GitHub. 2024. URL: https://github.com/TomKellyGenetics/vioplot [accessed 2025-03-21]
- Wickham H. Ggplot2: Elegant graphics for data analysis. Tidyverse. 2016. URL: https://ggplot2.tidyverse.org [accessed 2025-03-21]
- Revelle W. Psych: Procedures for psychological, psychometric, and personality research. The Comprehensive R Archive Network. 2024. URL: https://CRAN.R-project.org/package=psych [accessed 2025-03-21]
- Kassambara A. Rstatix: Pipe-friendly framework for basic statistical tests. Datanovia. 2023. URL: https://rpkgs.datanovia.com/rstatix/ [accessed 2025-03-21]
- Lüdecke D. ggeffects: Tidy data frames of marginal effects from regression models. J Open Res Softw. 2018;3(26):772. [FREE Full text]
- Christensen RHB. Ordinal: Regression models for ordinal data. The Comprehensive R Archive Network. 2023. URL: https://CRAN.R-project.org/package=ordinal [accessed 2025-03-21]
- Yee TW. The VGAM package for categorical data analysis. J Stat Soft. 2010;32(10):1-34. [CrossRef]
- Carli V, Gentili C. EXPERIENCE: Is virtual reality suitable to identify differences between depressed and healthy individuals? International Standard Randomised Controlled Trial Number Registry. 2022. URL: https://doi.org/10.1186/isrctn16396369 [accessed 2025-03-21]
- Chan AHY, Honey MLL. User perceptions of mobile digital apps for mental health: acceptability and usability—an integrative review. J Psychiatr Ment Health Nurs. 2022;29(1):147-168. [CrossRef] [Medline]
- Shahid S, Kelson J, Saliba A. Effectiveness and user experience of virtual reality for social anxiety disorder: systematic review. JMIR Ment Health. 2024;11:e48916. [FREE Full text] [CrossRef] [Medline]
- Rus-Calafell M, Garety P, Sason E, Craig TJK, Valmaggia LR. Virtual reality in the assessment and treatment of psychosis: a systematic review of its utility, acceptability and effectiveness. Psychol Med. 2018;48(3):362-391. [CrossRef] [Medline]
- Jones C, Miguel Cruz A, Smith-MacDonald L, Brown MRG, Vermetten E, Brémault-Phillips S. Technology acceptance and usability of a virtual reality intervention for military members and veterans with posttraumatic stress disorder: mixed methods unified theory of acceptance and use of technology study. JMIR Form Res. 2022;6(4):e33681. [FREE Full text] [CrossRef] [Medline]
- Weech S, Kenny S, Barnett-Cowan M. Presence and cybersickness in virtual reality are negatively related: a review. Front Psychol. 2019;10:158. [FREE Full text] [CrossRef] [Medline]
- Voinescu A, Petrini K, Stanton Fraser D. Presence and simulator sickness predict the usability of a virtual reality attention task. Virtual Real. 2023;27(3):1-17. [FREE Full text] [CrossRef] [Medline]
- Migliore S, Casella M, Tramontano C, Curcio G, Squitieri F. Virtual reality tolerability, sense of presence and usability in Huntington disease: a pilot study. Neurol Sci. 2025;46(1):219-225. [CrossRef] [Medline]
- Selaskowski B, Wiebe A, Kannen K, Asché L, Pakos J, Philipsen A, et al. Clinical adoption of virtual reality in mental health is challenged by lack of high-quality research. Npj Ment Health Res. 2024;3(1):24. [FREE Full text] [CrossRef] [Medline]
- Torous J, Nicholas J, Larsen ME, Firth J, Christensen H. Clinical review of user engagement with mental health smartphone apps: evidence, theory and improvements. Evid Based Ment Health. 2018;21(3):116-119. [FREE Full text] [CrossRef] [Medline]
- Geraets CNW, Wallinius M, Sygel K. Use of virtual reality in psychiatric diagnostic assessments: a systematic review. Front Psychiatry. 2022;13:828410. [FREE Full text] [CrossRef] [Medline]
- Riva G, Serino S. Virtual reality in the assessment, understanding and treatment of mental health disorders. J Clin Med. 2020;9(11):3434. [FREE Full text] [CrossRef] [Medline]
- Biswas N, Mukherjee A, Bhattacharya S. “Are you feeling sick?”—A systematic literature review of cybersickness in virtual reality. ACM Comput Surv. 2024;56(11):1-38. [CrossRef]
- Page D, Lindeman RW, Lukosch S. Identifying strategies to mitigate cybersickness in virtual reality induced by flying with an interactive travel interface. Multimodal Technol Interact. 2023;7(5):47. [CrossRef]
- Papaefthymiou S, Giannakopoulos A, Roussos P, Kourtesis P. Mitigating cybersickness in virtual reality: impact of eye–hand coordination tasks, immersion, and gaming skills. Virtual Worlds. 2024;3(4):506-535. [CrossRef]
- Tylee A, Gandhi P. The importance of somatic symptoms in depression in primary care. Prim Care Companion J Clin Psychiatry. 2005;7(4):167-176. [FREE Full text] [CrossRef] [Medline]
- Veling W, Lestestuiver B, Jongma M, Hoenders HJR, van Driel C. Virtual reality relaxation for patients with a psychiatric disorder: crossover randomized controlled trial. J Med Internet Res. 2021;23(1):e17233. [FREE Full text] [CrossRef] [Medline]
- Borghouts J, Eikey E, Mark G, De Leon C, Schueller SM, Schneider M, et al. Barriers to and facilitators of user engagement with digital mental health interventions: systematic review. J Med Internet Res. 2021;23(3):e24387. [FREE Full text] [CrossRef] [Medline]
- Estrela M, Semedo G, Roque F, Ferreira PL, Herdeiro MT. Sociodemographic determinants of digital health literacy: a systematic review and meta-analysis. Int J Med Inform. 2023;177:105124. [FREE Full text] [CrossRef] [Medline]
- Chudy-Onwugaje K, Abutaleb A, Buchwald A, Langenberg P, Regueiro M, Schwartz DA, et al. Age modifies the association between depressive symptoms and adherence to self-testing with telemedicine in patients with inflammatory bowel disease. Inflamm Bowel Dis. 2018;24(12):2648-2654. [FREE Full text] [CrossRef] [Medline]
- Abouzeid N, Lal S. The role of sociodemographic factors on the acceptability of digital mental health care: a scoping review protocol. PLoS One. 2024;19(4):e0301886. [FREE Full text] [CrossRef] [Medline]
- Chung OS, Robinson T, Johnson AM, Dowling NL, Ng CH, Yücel M, et al. Implementation of therapeutic virtual reality into psychiatric care: clinicians' and service managers' perspectives. Front Psychiatry. 2021;12:791123. [FREE Full text] [CrossRef] [Medline]
- Chung OS, Robinson T, Johnson AM, Dowling NL, Ng CH, Yücel M, et al. Corrigendum: Implementation of therapeutic virtual reality into psychiatric care: Clinicians' and service managers' perspectives. Front Psychiatry. 2022;13:893637. [FREE Full text] [CrossRef] [Medline]
Abbreviations
DASS: Depression Anxiety Stress Scale |
DS: depressive symptom |
DSM: Diagnostic and Statistical Manual of Mental Disorders |
HC: healthy control |
PANAS-SF: Positive and Negative Affect Schedule |
PHQ-9: Patient Health Questionnaire-9 |
SSQ: Simulator Sickness Questionnaire |
SUS: System Usability Scale |
TFA: theoretical framework of acceptability |
VR: virtual reality |
Edited by A Mavragani; submitted 08.11.24; peer-reviewed by R Zahn, AN Ramaseri-Chandra, C Nega; comments to author 03.01.25; revised version received 21.01.25; accepted 11.03.25; published 16.04.25.
Copyright©Sara Sutori, Emma Therése Eliasson, Francesca Mura, Victor Ortiz, Vincenzo Catrambonephd, Gergö Hadlaczky, Ivo Todorov, Antonio Luca Alfeo, Valentina Cardi, Mario G C A Cimino, Giovanna Mioni, Mariano Alcañiz Raya, Gaetano Valenza, Vladimir Carli, Claudio Gentili. Originally published in JMIR Formative Research (https://formative.jmir.org), 16.04.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.