Published on in Vol 6, No 1 (2022): January

Preprints (earlier versions) of this paper are available at, first published .
Facial and Vocal Markers of Schizophrenia Measured Using Remote Smartphone Assessments: Observational Study

Facial and Vocal Markers of Schizophrenia Measured Using Remote Smartphone Assessments: Observational Study

Facial and Vocal Markers of Schizophrenia Measured Using Remote Smartphone Assessments: Observational Study

Original Paper

1AiCure, New York, NY, United States

2Merck & Co, Inc, Kenilworth, NJ, United States

3Icahn School of Medicine at Mount Sinai, New York, NY, United States

4Department of Psychiatry, New York University School of Medicine, New York, NY, United States

Corresponding Author:

Anzar Abbas, PhD


214 Sullivan Street 6C

New York, NY, 10012

United States

Phone: 1 8005700448


Background: Machine learning–based facial and vocal measurements have demonstrated relationships with schizophrenia diagnosis and severity. Demonstrating utility and validity of remote and automated assessments conducted outside of controlled experimental or clinical settings can facilitate scaling such measurement tools to aid in risk assessment and tracking of treatment response in populations that are difficult to engage.

Objective: This study aimed to determine the accuracy of machine learning–based facial and vocal measurements acquired through automated assessments conducted remotely through smartphones.

Methods: Measurements of facial and vocal characteristics including facial expressivity, vocal acoustics, and speech prevalence were assessed in 20 patients with schizophrenia over the course of 2 weeks in response to two classes of prompts previously utilized in experimental laboratory assessments: evoked prompts, where subjects are guided to produce specific facial expressions and speech; and spontaneous prompts, where subjects are presented stimuli in the form of emotionally evocative imagery and asked to freely respond. Facial and vocal measurements were assessed in relation to schizophrenia symptom severity using the Positive and Negative Syndrome Scale.

Results: Vocal markers including speech prevalence, vocal jitter, fundamental frequency, and vocal intensity demonstrated specificity as markers of negative symptom severity, while measurement of facial expressivity demonstrated itself as a robust marker of overall schizophrenia symptom severity.

Conclusions: Established facial and vocal measurements, collected remotely in schizophrenia patients via smartphones in response to automated task prompts, demonstrated accuracy as markers of schizophrenia symptom severity. Clinical implications are discussed.

JMIR Form Res 2022;6(1):e26276



Utilization of objective digital measurements of patient behavior is rapidly increasing in clinical research and practice. The development and validation of digital measurement tools in psychiatry come with both significant opportunities and risks. Significant opportunity arises as psychiatry is undergoing a paradigm shift toward the utilization of objective markers to assess illness and disease progression [1] and toward the widespread use of telehealth platforms for psychiatric care. This is particularly important when face-to-face medical care is not possible, such as during the COVID-19 pandemic [2-4].

Many behavioral and physiological markers are now accessible through digital technology such as wearables, mobile or web-based apps, and application programming interfaces [5]. Such advances hold promise in allowing new innovations in neuropsychiatry to truly scale in a manner where they can be used to develop and implement assessment and treatment for patients with significant psychiatric impairment [6].

Schizophrenia represents a poignant example of both the benefits and challenges of remote digital measurement. Clinical trials for schizophrenia drug development are often site-centric, requiring patients to appear physically at the site for measurement of disease severity. The need to travel to sites can restrict study populations to those that live in geographical proximity to the site, restricting access to participation and limiting patient diversity [7]. Current approaches for measurement of disease rely on clinician-administered measures that are costly and time-consuming to administer, leading to infrequent assessment. The instruments themselves are not well-aligned with current neurobiological definitions of illness [8].

Digital assessments address the practical challenges associated with in-person measurement of disease severity. Given that they can be administered remotely, they allow for assessments to occur in the patient’s natural environment with reduced need for in-person consultations at a clinic. Additionally, the short length of the assessments allows for them to be administered with far greater frequency than would be possible with in-person assessments. Hence, digital assessments could provide care teams greater visibility into patient health and behavior outside the clinic with the potential to inform patient responses to treatment, or the lack thereof, earlier than would otherwise be possible [9,10]. There is a need to determine the viability of such assessment to accurately measure symptom severity when deployed in real-world settings, where differentiating between significant variability and noise can pose a challenge [11-13].

A number of behavioral characteristics of schizophrenia, such as alogia (poverty of speech) and affective flattening (diminished emotional expression or emotional withdrawal) [14], can be quantified directly using standardized tasks and coding schemes [15-19], which can be automated through use of computer vision [20] and vocal acoustic [21] machine learning models. In addition to digital measures that are directly analogous to core schizophrenia symptoms, there are a number of other acoustic measures including vocal loudness, pitch variability, fundamental frequency, and jitter, which have demonstrated validity as markers of schizophrenia [16,22-24]. These markers have demonstrated specificity as measures of the negative symptom cluster, which is of particular interest given the lack of available treatment options for negative symptoms [22].

In this study, we examine the ability to measure schizophrenia symptom severity through facial and vocal analysis using videos recorded during a remote smartphone-based assessment composed of both evoked and spontaneous prompts. We compared these measures against standard clinical assessments of overall schizophrenia symptom severity (ie, total score on the Positive and Negative Syndrome Scale [PANSS]) as well as specific domains of positive (P total), negative (N total), and general (G total) symptoms, measured during in-person study visits [25]. We further conducted an exploratory analysis on the relationship between digital measures and individual symptoms of schizophrenia.


Individuals who had received a DSM-5 clinical diagnosis of schizophrenia or schizoaffective disorder and passed a telephone screening and were on a stable treatment regimen for atypical antipsychotic therapy for ≥2 months with no intent to change medication during the 2-week study were recruited as study participants. A total of 20 individuals, 15 with schizophrenia and 5 with schizoaffective disorder, were enrolled (8 male, 12 female) with an age range of 29 to 61 years (µ=45, σ=11). A subset of 11 individuals had their diagnosis confirmed through semistructured interviews. To be included in the study, participants needed to be able to speak, read, hear, and understand the language of the study team and the informed consent form; respond verbally to questions; follow instructions; and be willing and able to participate in all study activities, including the use of smartphones for data collection.

Given that the purpose of the study was to determine whether remote assessments would be able to appropriately collect behavioral data for assessment of disease severity in patients with schizophrenia by using digital biomarkers, data from healthy controls were not included. Data on healthy controls would have allowed for assessment of whether facial and vocal digital biomarkers can distinguish healthy individuals from patients with schizophrenia. However, we felt that past work on each of the biomarkers discussed in this paper provides sufficient evidence for this claim (Table 1).

The study was conducted at the Icahn School of Medicine’s Affective and Cognitive Therapeutics Research Lab and the protocol was approved by the Biomedical Research Alliance of New York.

Data Collection

All study participants were assessed for severity of schizophrenia symptoms using both in-person clinical assessments and remote smartphone-based assessments over the course of the 14-day observational period. All data were collected over 3 months, from July to September 2019.

In-Person Clinical Assessments

The PANSS was administered in person to all participants by a trained research team member on the first (day 1) and last (day 14) of the study. For all subsequent analyses, the PANSS scores for each study participant were averaged for the 2 time points. Given the study participants were clinically stable, averaging the two PANSS scores allowed for reduction in any noise in the measurement. Multimedia Appendix 1 shows the reliability of the PANSS scores for the two time points.

Remote Smartphone-Based Assessments

On the first day of the study, all study participants were trained by a research coordinator on how to use the smartphone app [26] for remote data collection, which would capture video and audio data of participant behavior using the front-facing smartphone camera as they responded to on-screen prompts (Figure 1). This software has been used in clinical research for reporting medication adherence, electronic patient-reported outcomes, and ecological momentary assessments [27,28]. Participants were allowed to use their own smartphones or those provisioned to them by the study team for the duration of the study. The assessments were taken at scheduled time points over the course of the 14 days, and the app would send a reminder to the participant at the participant’s chosen daily reminder time when an assessment had become available. All participants received US $1 per assessment they completed using a debit card that was provided to them during study enrollment. Subjects were also compensated with US $25 for the screening visit, US $75 for the initial training, and US $200 at the final visit for device return (with an optional additional US $20 reimbursement if they used their own device, to cover data costs). The assessments were designed to capture 2 main kinds of behaviors as described below.

Figure 1. Example screenshots from the smartphone assessment all study participants took for remote and automated collection of video and audio data. During each of the prompts, the app speaks the text displayed on the screen and awaits a verbal and visual response from the participant, all while recording video and audio from the front-facing camera and microphone. (A) Screen displayed before the participant begins the assessment. (B) Prompt for collection of free behavior in response to images, showing one example image. (C) Prompt for collection of evoked facial expression behavior. (D) Prompt for collection of evoked vocal expression behavior.
View this figure
Free Speech and Spontaneous Expressivity

Participants were shown images from the Open Affective Standard Image Set [29] and asked to describe the images and talk about how they made them feel (Figure 1B). The participants’ speech and facial expressivity in response to the prompts were captured [15,16,18,19,30-32]. This assessment was conducted on days 2, 7, and 14 of the study.

Evoked Facial and Vocal Expressions

Participants were asked separately to make the most expressive face they could and hold it for 3 seconds (Figure 1C) and then recite the days of the week out loud (Figure 1D). These prompts were selected on the basis of prior experimental tasks used to examine emotional activity and speech in schizophrenia [31,33]. The captured video and audio were used to measure facial expressivity and acoustic characteristics of voice during the evoked expressions. These assessments were scheduled on days 1, 7, and 14 of the study.

Given that the study participants were clinically stable and maintained on the psychiatric medications they entered the study on, measurements acquired from each time point of the assessments were averaged before comparison with PANSS scores. Since we did not expect to observe significant clinical change, taking the average allows for reduction of noise and accounts for within-subject variability. Multimedia Appendix 1 shows the test-retest reliability of each of the digital measures between the 2 weeks was considerable, supporting the decision to average the measures.

Measurement of Digital Markers

Video and audio data of participant behavior collected during the remote assessments containing protected health information (PHI) were uploaded and stored using Health Insurance Portability and Accountability Act (HIPAA)–compliant backend services. These data were then processed to extract frame-by-frame measurements of behavior, generating the first level of non-PHI data. A combination of computer vision and digital signal processing tools were used for quantification of facial and vocal behavior and subsequent derivation of visual and auditory markers of schizophrenia as described below.

All analyses were conducted using Python, along with open-source tools. All digital biomarker variables analyzed were acquired through the use of OpenDBM, an open-source software package that combines tools for measurement of facial, vocal, and movement behavior, developed partially for our study [34] and made available freely for use by all researchers.

Measurement of Facial Expressivity

The software library OpenFace [35] was used to measure framewise facial expressivity through quantification of action units (AUs; Multimedia Appendix 2) using a computer vision–based implementation of the Facial Action Coding System. All framewise AU measurements were normalized through division by a timepoint-specific baseline value acquired at the beginning of each assessment when the participant is not presented with any stimulus. The normalization allows for correction of any inter- and intraindividual variability; this methodology has previously been demonstrated to be necessary for measurement of facial behavior using computer vision tools and for subsequent analyses of facial expressivity [36-38]. This normalization is also necessary to account for tardive dyskinesia or other movement disorders that may be present in patients receiving antipsychotics. The time point-specific baseline normalization addresses noise in facial expressivity measurements stemming from motor abnormalities. Facial expressivity was calculated by taking the mean framewise intensity of all AUs over the course of the video. The method for quantifying facial expressivity was the same for both spontaneous and evoked expressivity. For each frame of video, OpenFace provides a confidence score denoting the likelihood that it is accurately detecting a face; only frames with a confidence score of 80% or higher were used for all downstream analyses. While OpenFace provides large amounts of information on specific AUs and emotions, in the current investigation, we focused only on facial expressivity because of significant evidence that patients with schizophrenia display a decrease in overall affect (eg, blunted affect) [39,40].

Measurement of Vocal Acoustics

The software library Parselmouth [41], which is a Python implementation of the Praat software library [42], was used for measurement of all vocal acoustic characteristics. All audio analyzed was first passed through the LogMMSE noise reduction algorithm for speech enhancement [21,43].

Despite the exploratory nature of this study and given the small data sample, we attempted to be parsimonious in the selection of markers to reduce the likelihood of false discovery. Analysis of vocal markers included those that have previously demonstrated effects in studies of individuals with schizophrenia [16,23]. Each vocal marker—calculated separately during free speech and evoked vocal expressions—include vocal intensity, fundamental frequency mean, fundamental frequency stdev, vocal jitter, harmonics to noise ratio and speech prevalence [22,24,43-45]. Descriptions of these verbal acoustic features are provided in Table 1.

Table 1. List of vocal acoustic variables extracted from audio files collected during participation in remote smartphone assessments and references to earlier work on their relevance in schizophrenia.
Vocal intensityVolume of participant’s speech, measured in decibels, which was previously shown to be decreased in individuals with schizophrenia compared to healthy controls [30].
Fundamental frequency meanAverage fundamental frequency of participant speech in hertz, which has been shown to be higher in individuals with schizophrenia and decreases in response to treatment [24,44].
Fundamental frequency stdevSD in fundamental frequency in hertz, which has been shown to be greater in individuals with schizophrenia [24].
Vocal jitterDegree of irregularity in the frequency of the participant’s speech, measured in hertz, demonstrated to be higher in individuals with schizophrenia [45].
Speech prevalencePercentage of the audio file where participant speech was detected as opposed to silence; individuals with schizophrenia demonstrate increased pauses and variability in pause duration [39,46].
Harmonics to noise ratioQuantification of additive noise in the participant’s speech, which has been used to predict risk of psychosis, and has shown to be correlated with symptom severity in other neurological disorders such as Parkinson disease [12,47].

Data Analysis

Both facial expressivity and vocal characteristics were assessed during free behavior following spontaneous prompts (Table 2). Facial expressivity was also assessed during evoked facial expressions and vocal characteristics were assessed during evoked vocal expression following evoked prompts. Evaluation of vocal characteristics during the evoked expression task allowed for measurement of specific characteristics that have been previously shown to be effective measures of schizophrenia during speech (eg, fundamental frequency mean and stdev, jitter, harmonics to noise ratio) while also measuring speech characteristics such as amount of time spoken (ie, speech prevalence) [22,24,43-45]. A large number of variables can be calculated from video and audio data sources; however, the analyses presented herein were limited to features that have evidence and a theoretical basis for a relationship with schizophrenia symptom severity in the scientific literature.

Table 2. All variables described in Measurement of Digital Markers were calculated separately for distinct behaviors captured during the remote smartphone assessments. Each of the behaviors that were elicited and captured during the smartphone assessment and the digital markers calculated from those behaviors are listed here.
BehaviorOn-screen promptDigital markers measured
Free behaviorPlease describe what you see in this image and talk about how it makes you feel (Figure 1B)
  • Facial expressivity
  • Fundamental frequency mean
  • Fundamental frequency stdev
  • Vocal jitter
  • Harmonics to noise ratio
  • Speech prevalence
Evoked facial expressionPlease make the most expressive face you can and hold it for 3 seconds (Figure 1C)
  • Facial expressivity
Evoked vocal expressionPlease say the names of the days of the week starting with Monday (Figure 1D)
  • Fundamental frequency mean
  • Fundamental frequency stdev
  • Vocal jitter
  • Harmonics to noise ratio
  • Speech prevalence
Correlation With PANSS Subscale Scores

As the primary analysis, digital measures were correlated with overall schizophrenia symptom severity considering the PANSS total score (PANSS Total) along with the 3 subscales reflecting N Total, P Total, and G Total using Pearson’s correlation. When comparing negative symptoms, we utilized the PANSS Marder Symptom Factor, which includes two symptoms that are traditionally included in the general severity score: Motor Retardation and Social Avoidance and Isolation [48].

Correlation With Individual PANSS Items

As an additional exploratory analysis, digital measurements that demonstrated significance in relation to specific subscales were then further explored in relation to the specific symptoms that derive those subscales, correcting for multiple comparisons using a Benjamini-Hochberg adjusted P value [49]. This was an exploratory analysis conducted to further disaggregate the heterogeneity within the symptom scales to understand more specifically which clinical features were reflected in the digital measurement. The results from these analyses are provided in the supplementary materials and are not included in the main text.

Participation in the in-app remote assessments across participants was high (Multimedia Appendix 3).

Correlation With PANSS Scores

Vocal Markers During Evoked Vocal Expression

Our results demonstrate that multiple digital measures are significantly correlated with overall N Total after correcting for multiple comparisons. This includes fundamental frequency mean (r=–0.64; adjusted P=.02), vocal jitter (r=0.56; adjusted P=.02), and harmonics to noise ratio (r=–0.61; adjusted P=.02). Two other features trended in the hypothesized direction with P values of <0.1 after correction for false discovery, including speech prevalence (r=–0.47; adjusted P=.06) and fundamental frequency stdev (r=–0.44; adjusted P=.07; see Table 3 for full results). Importantly, the directionality of results was consistent with prior research. For example, increased negative symptom severity was reflected in decreased speech prevalence, decreased tonal qualities of speech, and increased noise to speech sounds, consistent with the literature [16,22-24].

Table 3. Correlation between vocal markers during evoked vocal expression and Positive and Negative Syndrome Scale (PANSS) score showed a relationship between vocal characteristics and schizophrenia symptom severity.
VariableNegative symptom severityPositive symptom severityGeneral severityTotalVocal
frequency stdev
frequency mean
Negative symptom severity

Pearson r

P value

Positive symptom severity

Pearson r0.452a

P value.045

General severity

Pearson r0.572b0.806c

P value.008<.001


Pearson r0.757c0.870c0.947c

P value<.001<.001<.001

Vocal intensity

Pearson r–0.091–0.250–0.088–0.152

P value.

Fundamental frequency stdev

Pearson r–0.436–0.0680.098–0.090–0.081

P value.

Fundamental frequency mean

Pearson r–0.644a–0.253–0.218–0.3730.4750.577a

P value.

Vocal jitter

Pearson r0.563a0.2290.1220.293–0.176–0.695c–0.823c

P value.<.001<.001
Speech prevalence

Pearson r–0.470–0.247–0.292–0.3620.611a0.0430.781c–0.373

P value.<.001.12
Harmonics to noise ratio

Pearson r–0.610a–0.195–0.126–0.2970.1540.773c0.868c–0.965c0.422

P value.<.001<.001<.001.07




Evoked Facial Expression

Facial expressivity demonstrated significant relationships with the overall schizophrenia symptom severity PANSS total score (r=–0.71; adjusted P=.002) and on all PANSS subscales (N Total, r=–0.50; adjusted P=.04; P Total, r=–0.63; adjusted P=.006; G Total, r=–0.70; adjusted P=.009), in a direction consistent with the literature [15,18,19,37,38] (Table 4).

Table 4. Correlation between facial expressivity during evoked facial expression and the Positive and Negative Syndrome Scale score showed a relationship between facial affect and schizophrenia symptom severity.
VariableFacial expressivityNegative symptom severityPositive symptom severityGeneral severity
Facial expressivity

Pearson r

P value

Negative symptom severity

Pearson r–0.500a

P value.04

Positive symptom severity

Pearson r–0.628b0.452a

P value.01.045
General severity

Pearson r–0.695b0.572b0.806c

P value.0090.008<.001

Pearson r–0.714b0.757c0.870c0.947c

P value.002<.001<.001<.001




Free Behavior in Response to Images

Spontaneous measurement of vocal and facial expressions, as elicited by emotionally valenced images, demonstrated relationships between multiple vocal markers and the negative symptom cluster. Highly consistent with results of vocal measurements in response to evoked prompts, the following measures demonstrated significant relationships with N Total: fundamental frequency mean (r=–0.61; adjusted P=.04), harmonics to noise ratio (r=–0.58; adjusted P=.03), speech prevalence (r=–0.57; adjusted P=.03). Vocal jitter showed a trend in the hypothesized direction a with P value of <.10 (r=0.43; adjusted P=.09), and fundamental frequency stdev did not approach significance (Table 5). In contrast to measurement after the evoked task, vocal intensity measured during free behavior demonstrated significance (r=0.50; adjusted P=.05).

Table 5. Correlation between facial and vocal markers during free behavior and PANSS score showed a relationship between facial affect and vocal characteristics with schizophrenia symptom severity.
VariableNegative symptom severityPositive symptom severityGeneral severityTotalFacial
Fundamental frequency
Fundamental frequency
Harmonics to noise ratioVocal jitter
Negative symptom severity

Pearson r

P value

Positive symptom severity

Pearson r0.452a

P value.045

General severity

Pearson r0.572b0.806c

P value.008<.001


Pearson r0.757c0.870c0.947c

P value<.001<.001<.001

Facial expressivity

Pearson r0.142–0.1130.0900.056

P value.

Vocal intensity

Pearson r–0.502a–0.332–0.225–0.3860.364

P value.

Fundamental frequency mean

Pearson r–0.606a–0.288–0.268–0.4280.1840.935c

P value.<.001

Fundamental frequency stdev

Pearson r–0.304–0.189–0.127–0.2250.1790.581b0.529a

P value.

Harmonics to noise ratio

Pearson r–0.584a–0.224–0.097–0.3120.1740.654b0.774c0.476a

P value.<.001.04
Vocal jitter

Pearson r0.4260.1470.0150.194–0.097–0.541a–0.691b–0.278–0.937c

P value.<.001
Speech prevalence

Pearson r–0.567a–0.260–0.261–0.4030.1610.869c0.923c0.2600.575b–0.510a

P value.<.001<.




Principal Findings

In this study, we tested the hypothesis that facial and vocal markers of schizophrenia can be captured remotely in patients using brief automated smartphone-based assessments, and that such measures would be correlated to standard clinical measures of schizophrenia symptom severity. The measures show promise as objective and automated methods of assessing illness severity in the context of treatment development and decision-making. Prompts and vocal or facial measures that have previously demonstrated accuracy in controlled research settings were simplified and deployed as a brief assessment via a smartphone app in an observational study involving patients with schizophrenia. Our results support the ability to measure meaningful clinical markers of schizophrenia symptom severity via a brief smartphone-based assessment that captures data remotely and processes it through a back-end of machine learning algorithms to identify vocal and facial markers.

Our results demonstrate that vocal characteristics such as fundamental frequency, loudness, nonverbal vocal tones, and the prevalence of speech serve as specific markers of symptom severity—particularly for negative symptoms—in a direction consistent with previous literature, which used laboratory-based measures. The majority of these markers demonstrate a robust signal of negative symptom severity regardless of whether prompts were evoked or spontaneous.

The observation that vocal markers provide specificity as a metric of negative symptom severity has significant practical implications in clinical research and decision-making. Recent advances in the mechanistic understanding of negative symptoms have led to a number of promising pharmacological and cognitive treatments for negative symptoms of schizophrenia [50-53]. Such initiatives are important given the lack of US Food and Drug Administration–approved treatments for negative symptoms [54]. However, measures of negative symptoms to assess the efficacy of these treatments on the basis of objective measurement of behavior rather than subjective clinician observation are sparse [55-58].

Facial expressivity only demonstrated a relationship with schizophrenia symptom severity when captured using evoked prompts. This may indicate that either greater structure is needed to assess this marker remotely or that the prompts that were utilized were not a strong enough elicitation. Indeed, prior work has demonstrated that video rather than still images are stronger evocations to assess emotional variability in schizophrenia [59]. These findings suggest that care must be taken to determine the form of behavior from which facial expressivity is being quantified: facial expressivity during evoked prompts differs from facial expressivity during free behavior or in response to specific stimuli. Indeed, previous work has demonstrated how the context of behavior affects the measurements acquired [7]. In this study, we observed that facial expressivity in response to evoked prompts provides a robust signal for overall symptom severity.


This study presents a number of important limitations. While the primary hypotheses were supported, not all effects were consistent across prompts. Given the small sample size, it is impossible to conclude definitively which markers can be utilized to robustly assess schizophrenia symptom severity or impairment. Indeed, a number of relatively large correlation coefficients trended in the hypothesized direction but with P values of <.10, likely owing to sample size constraints. Further, despite the markers being hypothesized a priori, this work is exploratory in nature given the small sample size, limited number of assessments, and the short duration of the study. A larger assessment will be needed to replicate our findings and to assess reliability of the metrics more broadly. Additionally, the PANSS has well-documented shortcomings as a measurement tool for negative symptoms, and future work should conduct correlations with additional scales such as the Clinical Assessment Interview for Negative Symptoms or the Brief Negative Symptom Scale [60-63]. More specifically, future studies are required to individually compare specific aspects of negative symptoms with their correlates in digital measures (eg, comparison of clinician-observed blunted affect with digitally assessed facial expressivity, considering the hypothesis that greater blunted affect is correlated with reduced facial expressivity). Such studies would allow for a more direct assessment of digital assessment tools to quantify individual schizophrenia symptoms. Despite the aforementioned limitations, this study provides evidence that facial and vocal digital measures can be remotely captured in patients with schizophrenia, and that such measures demonstrate significant relationships with established measures of schizophrenia symptom severity, offering promise that these tools could be used to remotely measure and track disease severity in an objective manner.

While app-based video and audio capture utilizes a proprietary platform, this investigation utilized open-source Python-based software, available to all researchers [34]. This allows for the expansion of our study to a wider patient population, as mentioned above, and the independent validation of the methods and their implementation in this investigation by other researchers in academic and clinical research, following an open science framework for the development of digital tools for objective, accurate, and scalable measurement of disease symptoms for both mental and physical health.


This study shows that facial and vocal markers, measured using computer vision and vocal analytics from video data captured remotely via a smartphone app demonstrates validity as a marker of schizophrenia and is a promising metric for negative symptom severity. Use of such technology in clinical care and clinical research settings could allow for more frequent, remotely assessed, objective measurement of disease symptoms and treatment responses in a scalable and accessible manner, which can support the development of novel treatments and risk assessment among individuals with schizophrenia.


The authors appreciate the involvement of the clinical, research, and operations staff at both Mount Sinai and AiCure for the development, deployment, and implementation of the technology presented here and the participants who volunteered to be involved in the research.

Conflicts of Interest

IGL, AA, VY, and VK were employed and own shares at AiCure, LLC, at the time of the study. Authors OP, MD, MM, LS, and BH are employees of Merck Sharp & Dohme Corp, a subsidiary of Merck & Co, Inc, and may own stocks/stock options at Merck & Co, Inc. MMPR has received research grant funding from Neurocrine Biosciences Inc, Millennium Pharmaceuticals, Takeda, Merck, and AiCure. She is an advisory boardmember for Neurocrine Biosciences Inc.

Multimedia Appendix 1

Descriptive statistics for Positive and Negative Syndrome Scale scores and digital biomarkers during free behavior.

DOCX File , 7 KB

Multimedia Appendix 2

List of facial action units (AUs) whose frame-wise intensity was quantified using computer vision; AU intensities were normalized and then combined to measure facial expressivity.

DOCX File , 7 KB

Multimedia Appendix 3

Amount of participation in the naturalistic assessments deployed through the AiCure app in the duration of the study.

DOCX File , 7 KB

  1. Insel TR. Digital phenotyping: a global tool for psychiatry. World Psychiatry 2018 Oct;17(3):276-277 [FREE Full text] [CrossRef] [Medline]
  2. Figueroa CA, Aguilera A. The Need for a Mental Health Technology Revolution in the COVID-19 Pandemic. Front Psychiatry 2020;11:523 [FREE Full text] [CrossRef] [Medline]
  3. Kopec K, Janney CA, Johnson B, Spykerman K, Ryskamp B, Achtyes ED. Rapid Transition to Telehealth in a Community Mental Health Service Provider During the COVID-19 Pandemic. Prim Care Companion CNS Disord 2020 Oct 01;22(5):20br02787 [FREE Full text] [CrossRef] [Medline]
  4. Brunette MF, Achtyes E, Pratt S, Stilwell K, Opperman M, Guarino S, et al. Use of Smartphones, Computers and Social Media Among People with SMI: Opportunity for Intervention. Community Ment Health J 2019 Aug;55(6):973-978 [FREE Full text] [CrossRef] [Medline]
  5. Insel TR. Digital Phenotyping: Technology for a New Science of Behavior. JAMA 2017 Oct 03;318(13):1215-1216. [CrossRef] [Medline]
  6. Insel TR. Bending the Curve for Mental Health: Technology for a Public Health Approach. Am J Public Health 2019 Jun;109(S3):S168-S170. [CrossRef] [Medline]
  7. Lecomte T, Potvin S, Corbière M, Guay S, Samson C, Cloutier B, et al. Mobile Apps for Mental Health Issues: Meta-Review of Meta-Analyses. JMIR Mhealth Uhealth 2020 May 29;8(5):e17458 [FREE Full text] [CrossRef] [Medline]
  8. Torous J, Staples P, Barnett I, Sandoval LR, Keshavan M, Onnela J. Characterizing the clinical relevance of digital phenotyping data quality with applications to a cohort with schizophrenia. NPJ Digit Med 2018;1:15 [FREE Full text] [CrossRef] [Medline]
  9. Marsch LA. Opportunities and needs in digital phenotyping. Neuropsychopharmacology 2018 Jul;43(8):1637-1638 [FREE Full text] [CrossRef] [Medline]
  10. Torous J, Keshavan M. A new window into psychosis: The rise digital phenotyping, smartphone assessment, and mobile monitoring. Schizophr Res 2018 Jul;197:67-68. [CrossRef] [Medline]
  11. Goltermann J, Emden D, Leehr EJ, Dohm K, Redlich R, Dannlowski U, et al. Smartphone-Based Self-Reports of Depressive Symptoms Using the Remote Monitoring Application in Psychiatry (ReMAP): Interformat Validation Study. JMIR Ment Health 2021 Jan 12;8(1):e24333 [FREE Full text] [CrossRef] [Medline]
  12. Tsanas A, Little M, McSharry P, Ramig L. Accurate telemonitoring of Parkinson's disease progression by noninvasive speech tests. IEEE Trans Biomed Eng 2010 Apr;57(4):884-893. [CrossRef] [Medline]
  13. Coravos A, Khozin S, Mandl KD. Erratum: Author Correction: Developing and adopting safe and effective digital biomarkers to improve patient outcomes. NPJ Digit Med 2019 May 10;2(1):40 [FREE Full text] [CrossRef] [Medline]
  14. Tandon R, Gaebel W, Barch DM, Bustillo J, Gur RE, Heckers S, et al. Definition and description of schizophrenia in the DSM-5. Schizophr Res 2013 Oct;150(1):3-10. [CrossRef] [Medline]
  15. Kohler CG, Martin EA, Milonova M, Wang P, Verma R, Brensinger CM, et al. Dynamic evoked facial expressions of emotions in schizophrenia. Schizophr Res 2008 Oct;105(1-3):30-39 [FREE Full text] [CrossRef] [Medline]
  16. Parola A, Simonsen A, Bliksted V, Fusaroli R. Voice patterns in schizophrenia: A systematic review and Bayesian meta-analysis. Schizophr Res 2020 Feb;216:24-40. [CrossRef] [Medline]
  17. de Boer JN, van Hoogdalem M, Mandl R, Brummelman J, Voppel A, Begemann M, et al. Language in schizophrenia: relation with diagnosis, symptomatology and white matter tracts. NPJ Schizophr 2020 Apr 20;6(1):10 [FREE Full text] [CrossRef] [Medline]
  18. Mandal MK, Pandey R, Prasad AB. Facial expressions of emotions and schizophrenia: a review. Schizophr Bull 1998;24(3):399-412. [CrossRef] [Medline]
  19. Mattes R, Schneider F, Heimann H, Birbaumer N. Reduced emotional response of schizophrenic patients in remission during social interaction. Schizophr Res 1995 Nov;17(3):249-255. [CrossRef] [Medline]
  20. Baltrušaitis T, Robinson P, Morency LP. OpenFace: An open source facial behavior analysis toolkit. 2016 Presented at: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV); March 7-10, 2016; Lake Placid, NY. [CrossRef]
  21. Jadoul Y, Thompson B, de Boer B. Introducing Parselmouth: A Python interface to Praat. J Phon 2018 Nov;71:1-15. [CrossRef]
  22. Covington MA, Lunden SLA, Cristofaro SL, Wan CR, Bailey CT, Broussard B, et al. Phonetic measures of reduced tongue movement correlate with negative symptom severity in hospitalized patients with first-episode schizophrenia-spectrum disorders. Schizophr Res 2012 Dec;142(1-3):93-95 [FREE Full text] [CrossRef] [Medline]
  23. Martínez-Sánchez F, Muela-Martínez JA, Cortés-Soto P, García Meilán JJ, Vera Ferrándiz JA, Egea Caparrós A, et al. Can the Acoustic Analysis of Expressive Prosody Discriminate Schizophrenia? Span J Psychol 2015 Nov 02;18:E86. [CrossRef] [Medline]
  24. Saxman JH, Burk KW. Speaking fundamental frequency and rate characteristics of adult female schizophrenics. J Speech Hear Res 1968 Mar;11(1):194-203. [CrossRef] [Medline]
  25. Kay SR, Fiszbein A, Opler LA. The positive and negative syndrome scale (PANSS) for schizophrenia. Schizophr Bull 1987;13(2):261-276. [CrossRef] [Medline]
  26. AiCure.   URL: [accessed 2021-12-09]
  27. Labovitz DL, Shafner L, Reyes Gil M, Virmani D, Hanina A. Using Artificial Intelligence to Reduce the Risk of Nonadherence in Patients on Anticoagulation Therapy. Stroke 2017 May;48(5):1416-1419 [FREE Full text] [CrossRef] [Medline]
  28. Hanina A, Shafner L. Chapter 31 - Traveling Through the Storm: Leveraging Virtual Patient Monitoring and Artificial Intelligence to Observe, Predict, and Affect Patient Behavior in CNS Drug Development. In: Handbook of Behavioral Neuroscience. Amsterdam: Elsevier; 2019:427-434.
  29. Kurdi B, Lozano S, Banaji MR. Introducing the Open Affective Standardized Image Set (OASIS). Behav Res Methods 2017 Apr;49(2):457-470. [CrossRef] [Medline]
  30. Cohen AS, Mitchell KR, Docherty NM, Horan WP. Vocal expression in schizophrenia: Less than meets the ear. J Abnorm Psychol 2016 Feb;125(2):299-309 [FREE Full text] [CrossRef] [Medline]
  31. Kohler CG, Martin EA, Stolar N, Barrett FS, Verma R, Brensinger C, et al. Static posed and evoked facial expressions of emotions in schizophrenia. Schizophr Res 2008 Oct;105(1-3):49-60 [FREE Full text] [CrossRef] [Medline]
  32. Schwartz BL, Mastropaolo J, Rosse RB, Mathis G, Deutsch SI. Imitation of facial expressions in schizophrenia. Psychiatry Res 2006 Dec 07;145(2-3):87-94. [CrossRef] [Medline]
  33. Alpert M, Kotsaftis A, Pouget ER. At issue: speech fluency and schizophrenic negative signs. Schizophr Bull 1997;23(2):171-177. [CrossRef] [Medline]
  34. AiCure/open_dbm. GitHub.   URL: [accessed 2021-12-09]
  35. TadasBaltrusaitis/OpenFace. GitHub.   URL: [accessed 2021-12-10]
  36. Alvino C, Kohler C, Barrett F, Gur RE, Gur RC, Verma R. Computerized measurement of facial expression of emotions in schizophrenia. J Neurosci Methods 2007 Jul 30;163(2):350-361 [FREE Full text] [CrossRef] [Medline]
  37. Wang P, Kohler C, Barrett F, Gur R, Gur R, Verma R. Quantifying Facial Expression Abnormality in Schizophrenia by Combining 2D and 3D Features. 2007 Presented at: 2007 IEEE Conference on Computer Vision and Pattern Recognition; June 17-22, 2007; Minneapolis, MN. [CrossRef]
  38. Wang P, Barrett F, Martin E, Milonova M, Gur RE, Gur RC, et al. Automated video-based facial expression analysis of neuropsychiatric disorders. J Neurosci Methods 2008 Feb 15;168(1):224-238 [FREE Full text] [CrossRef] [Medline]
  39. Berenbaum H, Oltmanns TF. Emotional experience and expression in schizophrenia and depression. J Abnorm Psychol 1992 Feb;101(1):37-44 [FREE Full text] [CrossRef] [Medline]
  40. Henry JD, Green MJ, de Lucia A, Restuccia C, McDonald S, O'Donnell M. Emotion dysregulation in schizophrenia: reduced amplification of emotional expression is associated with emotional blunting. Schizophr Res 2007 Sep;95(1-3):197-204. [CrossRef] [Medline]
  41. YannickJadoul/Parselmouth. GitHub.   URL: [accessed 2021-12-10]
  42. Praat: doing phonetics by computer.   URL: [accessed 2021-12-10]
  43. Cannizzaro MS, Cohen H, Rappard F, Snyder PJ. Bradyphrenia and bradykinesia both contribute to altered speech in schizophrenia: a quantitative acoustic study. Cogn Behav Neurol 2005 Dec;18(4):206-210. [CrossRef] [Medline]
  44. Kliper R, Vaizman Y, Weinshall D, Portuguese S. Evidence for depression and schizophrenia in speech prosody. 2010 Presented at: 3rd Tutorial and Research Workshop on Experimental Linguistics; August 25-27, 2010; Athens. [CrossRef]
  45. Kayi ES, Diab M, Pauselli L, Compton M, Coppersmith G. Predictive Linguistic Features of Schizophrenia. 2017 Presented at: 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017); August 3-4, 2017; Vancouver, BC. [CrossRef]
  46. Çokal D, Zimmerer V, Turkington D, Ferrier N, Varley R, Watson S, et al. Disturbing the rhythm of thought: Speech pausing patterns in schizophrenia, with and without formal thought disorder. PLoS One 2019;14(5):e0217404 [FREE Full text] [CrossRef] [Medline]
  47. Agurto C, Pietrowicz M, Norel R, Eyigoz E, Stanislawski E, Cecchi G, et al. Analyzing acoustic and prosodic fluctuations in free speech to predict psychosis onset in high-risk youths. Annu Int Conf IEEE Eng Med Biol Soc 2020 Jul;2020:5575-5579. [CrossRef] [Medline]
  48. Marder SR, Davis JM, Chouinard G. The effects of risperidone on the five dimensions of schizophrenia derived by factor analysis: combined results of the North American trials. J Clin Psychiatry 1997 Dec;58(12):538-546. [CrossRef] [Medline]
  49. Li A, Barber RF. Multiple testing with the structure-adaptive Benjamini–Hochberg algorithm. J R Stat Soc B 2018 Nov 02;81(1):45-74. [CrossRef]
  50. Erhart SM, Marder SR, Carpenter WT. Treatment of schizophrenia negative symptoms: future prospects. Schizophr Bull 2006 Apr;32(2):234-237 [FREE Full text] [CrossRef] [Medline]
  51. Fusar-Poli P, Papanastasiou E, Stahl D, Rocchetti M, Carpenter W, Shergill S, et al. Treatments of Negative Symptoms in Schizophrenia: Meta-Analysis of 168 Randomized Placebo-Controlled Trials. Schizophr Bull 2015 Jul;41(4):892-899 [FREE Full text] [CrossRef] [Medline]
  52. Millan MJ, Fone K, Steckler T, Horan WP. Negative symptoms of schizophrenia: clinical characteristics, pathophysiological substrates, experimental models and prospects for improved treatment. Eur Neuropsychopharmacol 2014 May;24(5):645-692 [FREE Full text] [CrossRef] [Medline]
  53. Singh SP, Singh V, Kar N, Chan K. Efficacy of antidepressants in treating the negative symptoms of chronic schizophrenia: meta-analysis. Br J Psychiatry 2010 Sep;197(3):174-179. [CrossRef] [Medline]
  54. Kirkpatrick B, Fenton WS, Carpenter WT, Marder SR. The NIMH-MATRICS consensus statement on negative symptoms. Schizophr Bull 2006 Apr;32(2):214-219 [FREE Full text] [CrossRef] [Medline]
  55. King DJ. Drug treatment of the negative symptoms of schizophrenia. Eur Neuropsychopharmacol 1998 Feb;8(1):33-42. [CrossRef] [Medline]
  56. Möller HJ. Clinical evaluation of negative symptoms in schizophrenia. Eur Psychiatry 2007 Sep;22(6):380-386. [CrossRef] [Medline]
  57. Prikryl R, Kasparek T, Skotakova S, Ustohal L, Kucerova H, Ceskova E. Treatment of negative symptoms of schizophrenia using repetitive transcranial magnetic stimulation in a double-blind, randomized controlled study. Schizophr Res 2007 Sep;95(1-3):151-157. [CrossRef] [Medline]
  58. Walther S, Koschorke P, Horn H, Strik W. Objectively measured motor activity in schizophrenia challenges the validity of expert ratings. Psychiatry Res 2009 Oct 30;169(3):187-190. [CrossRef] [Medline]
  59. Bersani G, Polli E, Valeriani G, Zullo D, Melcore C, Capra E, et al. Facial expression in patients with bipolar disorder and schizophrenia in response to emotional stimuli: a partially shared cognitive and social deficit of the two disorders. Neuropsychiatr Dis Treat 2013;9:1137-1144 [FREE Full text] [CrossRef] [Medline]
  60. Kring AM, Gur RE, Blanchard JJ, Horan WP, Reise SP. The Clinical Assessment Interview for Negative Symptoms (CAINS): final development and validation. Am J Psychiatry 2013 Feb;170(2):165-172 [FREE Full text] [CrossRef] [Medline]
  61. Kirkpatrick B, Strauss GP, Nguyen L, Fischer BA, Daniel DG, Cienfuegos A, et al. The brief negative symptom scale: psychometric properties. Schizophr Bull 2011 Mar;37(2):300-305 [FREE Full text] [CrossRef] [Medline]
  62. White L, Harvey PD, Opler L, Lindenmayer JP, The PANSS Study Group. Empirical assessment of the factorial structure of clinical symptoms in schizophrenia. A multisite, multimodel evaluation of the factorial structure of the Positive and Negative Syndrome Scale.. Psychopathology 1997;30(5):263-274. [CrossRef] [Medline]
  63. van der Gaag M, Cuijpers A, Hoffman T, Remijsen M, Hijman R, de Haan L, et al. The five-factor model of the Positive and Negative Syndrome Scale I: confirmatory factor analysis fails to confirm 25 published five-factor solutions. Schizophr Res 2006 Jul;85(1-3):273-279. [CrossRef] [Medline]

AU: action unit
G Total: general severity
N Total: negative symptom severity
P Total: positive symptom severity
PANSS: Positive and Negative Syndrome Scale
PHI: protected health information

Edited by G Eysenbach; submitted 04.12.20; peer-reviewed by E Achtyes, D Fulford; comments to author 26.01.21; revised version received 02.03.21; accepted 22.11.21; published 21.01.22


©Anzar Abbas, Bryan J Hansen, Vidya Koesmahargyo, Vijay Yadav, Paul J Rosenfield, Omkar Patil, Marissa F Dockendorf, Matthew Moyer, Lisa A Shipley, M Mercedez Perez-Rodriguez, Isaac R Galatzer-Levy. Originally published in JMIR Formative Research (, 21.01.2022.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.