Abstract
Machine learning–based audiovisual phenotyping can reveal hidden discrepancies between patients’ self-reported experiences and nonverbal expressions, offering a promising tool for objectively assessing communication quality and advancing health equity.
JMIR Form Res 2026;10:e85906doi:10.2196/85906
Keywords
Introduction
Although depression is highly prevalent, many patients do not engage with prescribed treatments, particularly racial and ethnic minority individuals in primary care settings where clinicians lack time and infrastructure for effective communication []. Shared decision-making (SDM) can enhance engagement [], but SDM is not yet the norm []. Social desirability bias, power dynamics, and cultural norms may lead patients to report high SDM and trust [] despite feeling otherwise.
Objective measurements may better capture true experiences. Current SDM assessments rely on subjective self-reports or observer ratings with no objective alternatives. Audiovisual digital phenotyping (ADP) is useful in monitoring depression [,] and could assess communication quality. This study evaluated multimodal ADP’s usability for assessing health communication, SDM, and trust in depression care. We compared ADP outputs from audio, visual, and language modalities with validated patient-reported measures to identify patterns of discrepancies and alignments.
Methods
Study Design
Twenty-four participants were recruited from primary care practices. Eligible adults had a depressive disorder diagnosis (ICD-10-CM F33.xx) and a recent primary care visit. Participants completed recorded video interviews about patient-provider communication, decision-making experiences, and self-report measures: SDM-Q-9-Psy [] (low to high SDM), CollaboRATE [] (low to high provider engagement effort), and Trust scale [] (low to high trust). Mean and sum scores were calculated; participants were categorized as having negative or positive communication experiences based on lower or higher scores, respectively.
From the interviews, two short video clips per participant were extracted: one reflecting positive communication experiences and one reflecting negative experiences. Verbal and nonverbal responses were analyzed with three on-premise Hume AI expression models capturing ADP [,]: (1) a facial expression model (FaceNet Inception-ResNet V1) capturing movement and nonverbal expression of the face; (2) a speech prosody model (Whisper-Small), assessing tone and vocal dynamics from audio; and (3) a natural language processing (NLP) model (BERT), identifying the emotionality of the spoken transcript. For each participant, the top three emotions per modality were extracted.
Alignment or discrepancy between self-report and ADP was assessed using face validity by comparing self-report scores with emotional outputs. Participants with positive experiences were matched to positive clips; those with negative experiences were matched to negative clips. Alignment required concordance between reported experience and extracted emotions (eg, negative emotions in negative clips for low scorers). Exploratory analyses examined clips with opposite experience types.
Ethical Considerations
This study was approved by the Temple University Institutional Review Board (Protocol #29435). Patients provided informed consent for participation. Participants received a US $20 gift card. Self-reported and ADP data were deidentified.
Results
Of the 24 participants who completed the study, data from 6 were analyzable after excluding cases with simultaneous on-screen appearances of participant and interviewer or poor video quality. The final sample included 3 women (50%), 5 Black participants (83%), and 4 unemployed participants (67%, ). Because the six interviews lasted 48 (SD 13.1) minutes on average, we selected shorter clips for ADP analysis. Selected clip lengths were 14‐58 seconds (mean 29.4, SD 12.7 seconds), each containing approximately 30 analyzable frames per second (about 840‐3480 frames per participant for two clips). Categorization into low and high communication experience was conducted based on an SDM-Q-9-Psy score higher than 2.5 and a Trust score higher than 27.5, as most CollaboRATE scores were above average.
| Participant characteristics | P1 | P2 | P3 | P4 | P5 | P6 |
| Age (y) | 24 | 56 | 58 | 58 | 68 | 39 |
| Sex | Male | Female | Female | Male | Female | Male |
| Hispanic or Latino | No | No | No | No | Yes | No |
| Race | Black or African American | White or Caucasian | Black or African American and American Indian/Native American or Alaska Native | Black or African American | Black or African American and Other | Black or African American |
| Employment status | Employed | Unemployed | Unemployed | Unemployed | Unemployed | Unemployed |
| Decision made at the consultation | Referral to outpatient center | Refills, no new decisions were made | Referrals | Keep current medication | Stop therapy | Change in medications |
| SDM-Q-9-Psy, mean score (range 0‐5) | 1.11 | 1.89 | 3.00 | 4.56 | 5.00 | 5.00 |
| CollaboRATE, mean score (range 0‐9) | 5.67 | 4.00 | 6.67 | 9.00 | 9.00 | 9.00 |
| Trust in provider, sum score (range 0‐55) | 22 | 24 | 37 | 43 | 54 | 55 |
Four participants (P3-P6) reported positive communication experiences. However, for 3 (75%) participants, ADP analysis revealed discrepancies between self-reported positive experiences and the presence of negative (eg, distress or disappointment) or neutral (eg, confusion) emotion outputs in positive clips (). Disappointment, awkwardness, and annoyance were common negative emotions in negative clips by participants who reported positive overall experiences. These relationship-related emotions may reflect disappointment with specific aspects of the patient-provider communication. Among the ADP modalities, the greatest discrepancies between verbal content and ADP were observed in facial expression and NLP (in positive clips), whereas speech prosody aligned more closely with survey results in 2 participants (P4 and P6), with emotional outputs such as excitement and amusement ().
| Participant | Clip type and modality | |||||
| Negative, mean (SD) | Positive, mean (SD) | |||||
| FEa | SP | NL | FE | SP | NL | |
| P1 |
|
|
|
|
|
|
| P2 |
|
|
|
|
|
|
| P3 |
|
|
|
|
|
|
| P4 |
|
|
|
|
|
|
| P5 |
|
|
|
|
|
|
| P6 |
|
|
|
|
|
|
aFE: facial expression (FaceNet Inception-ResNet V1).
bSP: speech prosody (Whisper-Small).
cNL: natural language (BERT).
dLow SDM-Q-9-Psy score.
eHigh SDM-Q-9-Psy score.
Two participants (P1 and P2) reported negative communication experiences on surveys. In their negative clips, NLP and prosody reflected these experience (eg, anxiety), while facial expressions showed mixed patterns: P1 displayed positive emotions (eg, amusement) and P2 displayed neutral emotions (eg, calmness). In positive clips, P1 showed predominantly positive emotions across all modalities, whereas P2 displayed a mix of neutral and negative emotions (eg, confusion) across all modalities, indicating a discrepancy with the positive clip classification but alignment with P2’s overall negative self-reported communication experience. Notably, P1 exhibited similar facial expressions across positive and negative clips.
Discussion
This pilot study demonstrated the usability of multimodal ADP for evaluating patient-provider communication, SDM, trust, and engagement, with prosody showing the strongest alignment with self-reported experiences and facial expression showing the weakest alignment. Discrepancies between self-reports and nonverbal expressions may help explain high rates of service disengagement and treatment nonadherence among patients, whose nonverbal communication cues may be clinically overlooked despite reported trust and engagement []. Nonverbal expressions aligned with self-reports for negative experiences but contradicted self-reports for positive experiences, highlighting the need for providers to be mindful of social desirability bias and patient-provider power imbalances.
To protect privacy, analyses used on-premises technology, which offers fewer advantages than cloud-based artificial intelligence (AI) models. This created challenges with simultaneous on-screen appearances, poor lighting, and nonstandard camera angles, resulting in a reduced sample size for comparing ADP with SDM and trust measures. Despite constraints, ADP provided hundreds of thousands of analyzable frames per clip, offering extensive repeated measurements. Postappointment data collection was another limitation.
Technologically, facial expression sensitivity in depression requires optimization, as limited facial expression may affect provider responses and ADP emotion extraction. Future research should address how to implement commercial AI tools while respecting ethical requirements when handling protected health information []. Additional considerations for on-premises AI studies should ensure sufficient computing capacity to support analyses.
Given our predominantly Black patient sample, findings highlight providers’ need to recognize how social desirability bias, power dynamics, and cultural norms may lead patients to report positive experiences despite feeling disengaged. This demonstrates multimodal ADP’s promise for objectively assessing communication quality and advancing health equity.
Acknowledgments
Authors thank Macie Sullivan, BA, a research assistant at the Shared Decision Making Laboratory, for her help with data collection.
Artificial intelligence (AI) tools were used solely for copy editing, grammar checking, and spelling corrections during manuscript preparation. No generative content was created by AI.
Funding
This study was partly supported by the Temple University Grant-in-Aid Program.
Data Availability
Deidentified data supporting the findings of this study are available from the corresponding author upon reasonable request, subject to approval by the Temple University Institutional Review Board.
Authors' Contributions
Conceptualization: YZ-I
Analysis: SK
Funding acquisition: YZ-I
Methodology & Resources: SK, VT, JB, AB, YZ-I
Project administration: SK, ACG-M, LH, JC, AP
Supervision: YZ-I
Writing – original draft: SK, YZ-I
Writing – review & editing: all authors
Conflicts of Interest
VT, JB, and AB have worked for Hume AI. The remaining authors declare no competing interests.
References
- Schillok H, Gensichen J, Panagioti M, et al. Effective components of collaborative care for depression in primary care: an individual participant data meta-analysis. JAMA Psychiatry. Sep 1, 2025;82(9):868-876. [CrossRef] [Medline]
- Zisman-Ilani Y, Roth RM, Mistler LA. Time to support extensive implementation of shared decision making in psychiatry. JAMA Psychiatry. Nov 1, 2021;78(11):1183-1184. [CrossRef] [Medline]
- Matthews EB, Savoy M, Paranjape A, et al. Shared decision making in primary care based depression treatment: communication and decision-making preferences among an underserved patient population. Front Psychiatry. 2021;12:681165. [CrossRef] [Medline]
- Zisman-Ilani Y, Peek ME. Improving equity in shared decision-making. JAMA Intern Med. Sep 1, 2024;184(9):1130-1131. [CrossRef] [Medline]
- Birnbaum ML, Abrami A, Heisig S, et al. Acoustic and facial features from clinical interviews for machine learning-based psychiatric diagnosis: algorithm development. JMIR Ment Health. Jan 24, 2022;9(1):e24699. [CrossRef] [Medline]
- Abbas A, Sauder C, Yadav V, et al. Remote digital measurement of facial and vocal markers of major depressive disorder severity and treatment response: a pilot study. Front Digit Health. 2021;3:610006. [CrossRef] [Medline]
- Zisman-Ilani Y, Roe D, Scholl I, Härter M, Karnieli-Miller O. Shared decision making during active psychiatric hospitalization: assessment and psychometric properties. Health Commun. Jan 2017;32(1):126-130. [CrossRef] [Medline]
- Elwyn G, Barr PJ, Grande SW, Thompson R, Walsh T, Ozanne EM. Developing CollaboRATE: a fast and frugal patient-reported measure of shared decision making in clinical encounters. Patient Educ Couns. Oct 2013;93(1):102-107. [CrossRef] [Medline]
- Hall MA, Camacho F, Dugan E, Balkrishnan R. Trust in the medical profession: conceptual and measurement issues. Health Serv Res. Oct 2002;37(5):1419-1439. [CrossRef] [Medline]
- Baird A, Tzirakis P, Brooks JA, et al. The ACII 2022 affective vocal bursts workshop & competition. Presented at: 2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW); Oct 17-21, 2022:1-5; Nara, Japan. [CrossRef]
- Demszky D, Movshovitz-Attias D, Ko J, Cowen A, Nemade G, Ravi S. GoEmotions: a dataset of fine-grained emotions. Presented at: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; Jul 5-10, 2020. URL: https://www.aclweb.org/anthology/2020.acl-main [CrossRef]
- Richardson BT, Jackson J, Marable G, et al. The role of Black churches in promoting mental health for communities of socioeconomically disadvantaged Black Americans. Psychiatr Serv. Aug 1, 2024;75(8):740-747. [CrossRef] [Medline]
- Galatzer-Levy IR, Tomasev N, Chung S, Williams G. Generative psychometrics-an emerging frontier in mental health measurement. JAMA Psychiatry. Jan 1, 2026;83(1):5-6. [CrossRef] [Medline]
Abbreviations
| ADP: audiovisual digital phenotyping |
| AI: artificial intelligence |
| NLP: natural language processing |
| SDM: shared decision-making |
Edited by Amy Schwartz, Matthew Balcarras; submitted 15.Oct.2025; peer-reviewed by Mayuko Ito Fukunaga, Ziyang Gong; accepted 28.Jan.2026; published 03.Mar.2026.
Copyright© Shely Khaikin, Vineet Tiruvadi, Jeffrey Brooks, Alice Baird, Anne-Catherine Grela-Mpoko, Lindsey Hoffman, Jadyn Crossley, Menachem Leasy, Jaime Fineman, Margot Savoy, Laura Igarabuza, Anuradha Paranjape, Cheryl YS Foo, Michael L Birnbaum, Yaara Zisman-Ilani. Originally published in JMIR Formative Research (https://formative.jmir.org), 3.Mar.2026.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.

