This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
Neck surface accelerometer (NSA) wearable devices have been developed for voice and upper airway health monitoring. As opposed to acoustic sounds, NSA senses mechanical vibrations propagated from the vocal tract to neck skin, which are indicative of a person’s voice and airway conditions. NSA signals do not carry identifiable speech information and a speaker’s privacy is thus protected, which is important and necessary for continuous wearable monitoring. Our device was already tested for its durable endurance and signal processing algorithms in controlled laboratory conditions.
This study aims to further evaluate both instrument and analysis validity in a group of occupational vocal users, namely, voice actors, who use their voices extensively at work in an ecologically valid setting.
A total of 16 professional voice actors (age range 21-50 years; 11 females and 5 males) participated in this study. All participants were mounted with an NSA on their sternal notches during the voice acting and voice assessment sessions. The voice acting session was 4-hour long, directed by a voice director in a professional sound studio. Voice assessment sessions were conducted before, during, and 48 hours after the acting session. The assessment included phonation tasks of passage reading, sustained vowels, maximum vowel phonation, and pitch glides. Clinical acoustic metrics (eg, fundamental frequency, cepstral measures) and a vocal dose measure (ie, accumulated distance dose from acting) were computed from NSA signals. A commonly used online questionnaire (Self-Administered Voice Rating questionnaire) was also implemented to track participants’ perception of vocal fatigue.
The NSA wearables stayed in place for all participants despite active body movements during the acting. The ensued body noise did not interfere with the NSA signal quality. All planned acoustic metrics were successfully derived from NSA signals and their numerical values were comparable with literature data. For a 4-hour long voice acting, the averaged distance dose was about 8354 m with no gender differences. Participants perceived vocal fatigue as early as 2 hours after the start of voice acting, with recovery 24-48 hours after the acting session. Among all acoustic metrics across phonation tasks, cepstral peak prominence and spectral tilt from the passage reading most closely mirrored trends in perceived fatigue.
The ecological validity of an in-house NSA wearable was vetted in a workplace setting. One key application of this wearable is to prompt occupational voice users when their vocal safety limits are reached for duly protection. Signal processing algorithms can thus be further developed for near real-time estimation of clinically relevant metrics, such as accumulated distance dose, cepstral peak prominence, and spectral tilt. This functionality will enable continuous self-awareness of vocal behavior and protection of vocal safety in occupational voice users.
Neck surface accelerometers (NSAs), a type of mechano-acoustic sensor, have been adopted as mobile health (mHealth) wearables for voice and upper airway health monitoring [
Compared with other mHealth wearables embedded with acoustic microphones, NSA-based wearables have advantages of protecting speaker’s privacy and increasing signal quality for remote, continuous voice monitoring. For instance, the neck tissue acts as a low-pass filter by nature and restricts the signal bandwidth to 1.5 kHz at maximum [
From the clinical perspective, an individual’s voice condition is evaluated through an array of acoustic and aerodynamic metrics such as fundamental frequency (
Voice-related metrics obtained from NSA devices were not found to differ from those obtained with conventional instruments [
Growing evidence further supports the robustness of NSA signals in discerning normal versus deviated vocal health conditions. For example, one study collected NSA-derived acoustic metrics from a group of female patients with hyperfunctional voice disorders and their matched controls for over a week [
One target clinical population for voice monitoring wearables includes those who use their voices heavily in the workplace. Voice actors, singers, and teachers are examples of occupational voice users who tend to develop vocal fatigue and disorders [
This study represents our ongoing work to develop and validate an in-house NSA wearable system for voice and upper airway health monitoring. Our device has already been tested in controlled laboratory settings [
Briefly, all participants were subject to a voice acting session in an ecologically valid setting plus 2 follow-up sessions of voice assessments. Vocal doses and acoustic metrics were collected with our in-house NSA wearables and self-perceived vocal fatigue was assessed by an online questionnaire. NSA acoustic metrics were extracted from both sustained vowels and passage reading tasks. Furthermore, certain voice actors have a routine of practicing vocal warm-up exercise as part of their acting. Participants were thus randomized to either a warm-up group or no warm-up group before their acting session to protect the ecological validity while minimizing potential confounding effects from an individual’s warm-up history.
We hypothesized that NSA-derived acoustic metrics and self-perceived vocal fatigue ratings would show similar trends indicative of vocal fatigue and recovery. We also hypothesized that the acoustic metrics derived from passage readings would be comparable to those derived from sustained vowels. We further hypothesized that distance dose and NSA-derived acoustic metrics would be comparable between female and male participants in this study.
Our in-house NSA system consisted of (1) an accelerometer (BU-27135; Knowles Inc.) set into a circular silicon pad with a diameter of 28 mm, thickness of 1.2 mm, and weight less than 20 g; and (2) a peripheral circuit containing 1 power supply module and 1 amplifier module on a printed circuit board (
The NSA Wearable Device. (A) Hardware instrument, and (B) Schematic design. Adapted from “Figure 1. The physical prototype and schematic of the NSA”, by Lei et al, 2019 [
Participants were recruited via the Alliance of Canadian Cinema, Television and Radio Artists (ACTRA) (Montreal Chapter) network. A total of 16 professional voice actors aged 21-50 years consented to participate in the experiment. Participants were randomly assigned to either a
The experimental protocol spanned across 4 consecutive days for various tasks (
Human Protocol of Voice Assessments and Voice Acting. Voice assessments included Self-Administrated Voice Rating questionnaire (SAVRa) and neck surface accelerometer (NSA)-derived acoustic voice evaluation.
Participants were required to wear an NSA during the whole session. Before the voice acting, the warm-up group participants practiced a 30-minute vocal warm-up routine with a trained speech-language pathologist to ensure the warm-up exercise stayed consistent across participants. The no warm-up group participants were instructed to take vocal rest by refraining from using their voices completely during the 30 minutes preceding the acting session.
After that, all participants proceeded to perform a 4-hour-long voice acting session directed by a professional vocal director. The acting was based on a standardized script from the Assassin’s Creed© video game. Participants were instructed to keep a mouth-to-microphone distance of 50 cm as much as possible without hindering their acting. The air microphone sound was purely used for on-site coaching purpose. Given the confidentiality in video game development, the air microphone data were prohibited for research use.
The acting session consisted of 2 parts: (1) part 1, consisting of low-intensity (eg, casual dialog) voice-over work; and (2) part 2, consisting of medium- (eg, barks, oh-noes) and high-intensity (eg, death cries) voice-over work. The voice director provided feedback to participants on their performance, in an effort to ensure that intensity levels and acting styles were consistent across participants. As a common practice in voice acting, a 15-minute break was provided between parts 1 and 2. Further, participants had access to water and were encouraged to drink throughout sessions. The voice director would reinforce actors to take a drink during sessions when audible “mouth noises” were heard, as the resulting sounds could not be used in the game videos for technical reasons.
The voice assessment protocol included self-perceptual ratings of vocal fatigue and acoustic voice evaluations derived from NSA measurements. The protocol was conducted at 6 study time points: (1) 24 hours before the voice acting session, as a baseline measure; (2) immediately prior to the voice acting session (presession); (3) halfway through the voice acting session (midsession, ie, the 15-minute break between part 1 and part 2 of acting); (4) immediately after the voice acting session (postsession); (5) 24 hours after the voice acting session; and (6) 48 hours after the voice acting session. Participants were also asked to complete the self-perceptual rating questionnaire remotely every 2 waking hours following the voice acting session until the end of the study (
The Self-Administered Voice Rating (SAVRa) questionnaire was administered to evaluate participants’ perception of vocal fatigue [
To approximate a standard clinical protocol of acoustic voice evaluation, 4 phonation tasks were elicited from participants wearing an NSA (
Phonation tasks.
Task number | Phonation task | Acoustic metrics |
1 | 1-minute reading of the Rainbow Passage |
Cepstral peak prominence Fundamental frequency H1 – H2a Harmonic richness factor Spectral entropy Spectral tilt Surface/skin acceleration level |
2 | Vowel phonation /a/ for 5 seconds |
Cepstral peak prominence Fundamental frequency H1 – H2 Harmonic richness factor Spectral entropy Spectral tilt Surface/skin acceleration level Jitter Shimmer |
3 | Deep breath and vowel phonation /a/ |
Maximum phonation time |
4 | Glide on vowel /a/ from low to high pitch |
|
aH1 – H2: difference between the first and second harmonic magnitudes.
Task 1 (Rainbow Passage task) was used to assess acoustic metrics during running speech. Participants were required to read the standard Rainbow Passage for a duration of 1 minute using a pitch, loudness, and pace similar to a natural conversational context. Seven metrics, namely, CPP,
Task 2 (sustained vowel task) was used to assess acoustic metrics in a more steady-state phonation style. Participants were asked to sustain the vowel sound /a/ for 5 seconds while maintaining a steady pitch and loudness. In addition to the aforesaid metrics, jitter and shimmer were quantified to measure pitch and loudness stability, respectively. Of note, the extraction of jitter and shimmer are only applicable for relatively stable and periodic signals, such as those of sustained vowels herein.
Task 3 (maximum phonation task) was used to measure the maximum time (in seconds) that a person can sustain phonation. Participants were instructed to take a deep breath and produce the vowel /a/ as long as possible, using a comfortable pitch and loudness.
Task 4 (pitch glide task) was used to evaluate an individual’s pitch range. Participants were instructed to start saying /a/ at the lowest pitch possible and slowly glide their voice as high in pitch as possible. Minimum pitch (
All NSA-related data extraction and calculation were performed using the MATLAB (MathWorks) software. For the computation of acoustic metrics (see
Mathematical formulas and definitions of acoustic metrics.
Acoustic metrics | Mathematic formula | Units | Definition |
CPPa | Peak_max – (b0+b1*|q|) |
Decibels | The difference in amplitude between the cepstral peak and the corresponding value on the trend line through the overall spectrum, which represents how far the cepstral peak emerges from the cepstrum background. |
|
1/ |
Hertz | Frequency of vocal fold vibration that is the lowest of all the frequencies in the voice spectrum and is obtained by the reciprocal of the smallest period. |
H1 – H2c | 20log(A1/A2) |
Decibels | The log-magnitude difference between the amplitudes of the first and second harmonics in the spectrum. |
HRFd | Decibels | Ratio of the sum of the amplitudes at the harmonics above the fundamental frequency to the amplitude of the component at the fundamental frequency. | |
SEe | Relative value | Estimates the uniformity of signal energy distribution in the frequency domain. | |
Tiltf | Decibels/Hertz | Tilt of the trend line of the long-term average spectrum, which represents the degree to which intensity drops off as frequency increases. | |
SALg | 20log(max[data_frame]/A_noise) |
Decibels | The calculation is based on the maximum of each voiced segment amplitude for every 45-ms segment window. |
Jitter(relative) | Percent | Average absolute difference between consecutive periods divided by average period, indicating the cycle-to-cycle variation of the fundamental frequency. | |
Shimmer(relative) | Percent | Average absolute difference between the amplitudes of consecutive periods divided by average amplitude, indicating the cycle-to-cycle variation of vocal amplitude. | |
MPTh | T2 – T1 |
Seconds | Measure of a maximally sustained vowel following a maximal inspiration, which provides an indication of the efficiency of the respiratory mechanism. |
aCPP: cepstral peak prominence.
b
cH1 – H2: difference between the first and second harmonic magnitudes.
dHRF: harmonic richness factor.
eSE: spectral entropy.
fTilt: spectral tilt.
gSAL: skin acceleration level.
hMPT: maximum phonation time.
Conventionally, the computation of these acoustic metrics is based on glottal flow waveforms, which are derived from mouth-radiated acoustic pressure or airflow signals using inverse filtering estimation. However, as the NSA signals are based on skin acceleration, no mouth-radiated pressure components are present for inverse filtering to obtain glottal flow pulses and thus the resulting waveforms. As such, algorithms of H1 – H2, SE, Tilt, and SAL were customized and parameterized in this study [
Lastly, for distance dose, the algorithm was based on our previously published work [
JMP Pro software (version 16.1.0; JMP Statistical Discovery LLC) was used for all statistical analyses. With the high number of contrasts carried out throughout this analysis, a more conservative α value of .01 was used to minimize the chances of a type 1 error.
As SAVRa scores were obtained every 2 hours after the acting session, data were reduced by averaging the scores to the corresponding AM or PM of the day. For instance, day 3 scores obtained from 12:00 AM to 11:59 AM were averaged as day 3 AM, whereas those from 12:00 PM to 11:59 PM were averaged as day 3 PM. In addition, individual difference scores were computed for each participant by subtracting mean values at baseline (day 1) from means at each time point and then averaged as described above. Computing and analyzing differences helped to normalize individual variation and allowed for analyses to highlight changes in vocal measurements and fatigue over time. Both means and difference scores were used for statistical analyses.
Mixed-effects ANOVA was performed on each SAVRa score (EFFT, DISC, and IPSV). Either study group (warm-up or no warm-up) or gender group (female or male) was treated as a between-subjects factor in separate mixed-effects ANOVAs. Full-factorial models were not conducted because of the uneven distribution of genders across study groups.
Accumulated distance doses for (1) the entire voice acting session (Total Dd), (2) the first part of the session (Dd part 1), and (3) the second part of the session (Dd part 2) were computed for each participant. No data normalization was performed for these data. Mixed-effects ANOVAs were conducted with session dose (Dd part 1 vs Dd part 2) as a within-subjects factor, and study group or gender group as a between-subjects factor. A separate
For NSA-derived acoustic metrics, mixed-effects ANOVAs were conducted using time as a within-subjects factor (day 1, day 2 presession, day 2 midsession, day 2 postsession, day 3, day 4) and study group or gender group as a between-subjects factor. Planned paired contrasts were performed for significant main effects (P<.01). For analyses involving study group, individual difference scores (magnitude of change compared with baseline) were used instead of mean values.
This human protocol (A04-B21-17A) was approved by the Institutional Review Board at McGill University. The full purpose of the study was not communicated to participants until after completing the study to minimize participant bias on self-perceptual rating measures.
The breakdown of participant demographics as functions of study group and gender group is shown in
Participant descriptive statistics.
Group | Age (years), mean (SD) | Voice acting experience (years), mean (SD) | |||
|
|
|
|||
|
No warm-up | 32 (5.1) | 4 (2.9) | ||
|
Warm-up | 32 (5.5) | 8 (5.7) | ||
|
|
|
|||
|
Female | 32 (5.5) | 7 (5.4) | ||
|
Male | 33 (4.7) | 4 (3.2) |
Participants performed their voice acting with NSA wearables for 4 hours in an ecologically valid setting. The wearables stayed in place for all participants regardless of active body movements during the acting session. All planned acoustic metrics were successfully extracted from NSA signals. To further validate the NSA signal processing algorithm, numerical values of our acoustic metrics from the Rainbow Passage task were compared with those extracted from daily conversational speech by other research groups. Our data were found to be within a reasonable numerical range with others, supporting both the ecological and external validity of our instrument and analyses (
Acoustic metrics comparison.a
Sources |
|
CPPc | H1 – H2d | Tilte | Tilt Absf | |||||||||||||
Mode | Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | |||||||||||||
|
|
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
|||||||||||
|
|
T1 | 104 | 156.1 (49.0) | 20.5 (3.7) | 5.4 (17.1) | –0.048 (0.009) | –6.0 (5.3) | ||||||||||
|
|
T2b | 108 | 150.2 (88.0) | 26.3 (8.9) | 3.8 (17.4) | –0.044 (0.008) | –4.6 (4.0) | ||||||||||
|
|
T3 | 99 | 151 (61.4) | 29.5 (7.8) | 5.2 (18) | –0.041 (0.007) | –6.0 (4.7) | ||||||||||
|
|
T4 | 81 | 140.6 (48.8) | 26.3 (7.7) | 3.8 (16) | –0.045 (0.009) | –4.7 (4.0) | ||||||||||
|
|
T6 | 85 | 146.4 (49.3) | 21 (3.8) | 4.6 (15.6) | –0.049 (0.009) | –4.9 (4.5) | ||||||||||
|
|
T7 | 82 | 143 (53.8) | 20.7 (3.7) | 10.3 (16.2) | –0.049 (0.010) | –5.6 (4.4) | ||||||||||
|
|
|
|
|
|
|
|
|||||||||||
|
|
T1 | 176 | 183.3 (75.8) | 21.1 (3.8) | 8.2 (23.5) | –0.048 (0.011) | –1.2 (5.0) | ||||||||||
|
|
T2a | 173 | 184.9 (113.3) | 23.6 (7.6) | 7.4 (21.8) | –0.05 (0.011) | –.07 (5.2) | ||||||||||
|
|
T2b | 173 | 182.5 (76.5) | 23.1 (7.6) | 8.5 (23.5) | –0.048 (0.011) | –0.8 (4.8) | ||||||||||
|
|
T3 | 186 | 182.2 (69.5) | 26.7 (8.6) | 10.1 (24.8) | –0.045 (0.012) | –1.2 (5.2) | ||||||||||
|
|
T4 | 138 | 171.3 (75.0) | 24.4 (6.4) | 13 (23.3) | –0.045 (0.009) | –2.3 (5.8) | ||||||||||
|
|
T6 | 151 | 176.5 (63.9) | 21.4 (3.9) | 8.7 (23.7) | –0.05 (0.011) | –1.3 (5.2) | ||||||||||
|
|
T7 | 181 | 182.4 (70.2) | 20.8 (3.9) | 4.9 (22) | –0.052 (0.011) | –0.9 (5.7) | ||||||||||
|
|
|
|
|
|
|
||||||||||||
|
Patients with PVFLg | 198.1 | —h (76.1) | — | — | — | — | |||||||||||
|
Matched controls | 202.9 | — (88.0) | — | — | — | — | |||||||||||
|
|
|
|
|
|
|
||||||||||||
|
Patients with PVHi | 197.2 | — (75.3) | 23.2 (4.4) | — | — | –14.4 (2.4) | |||||||||||
|
PVH controls | 201.4 | — (89.6) | 22.9 (4.5) | — | — | –14.1 (2.4) | |||||||||||
|
Patients with NPVHj | 193.8 | — (73.5) | 21.4 (4.2) | — | — | –13.6 (2.5) | |||||||||||
|
NPVH controls | 192.9 | — (70.1) | 22.8 (4.4) | — | — | –14.1 (2.4) | |||||||||||
|
|
|
|
|
|
|
||||||||||||
|
Patients with PVH | 196.1 | — (73.5) | 23.1 (4.4) | 4.4 (6.1) | — | — | |||||||||||
|
Matched controls | 199.4 | — (86.7) | 22.7 (4.4) | 5.1 (7.0) | — | — | |||||||||||
|
|
|
|
|
|
|
||||||||||||
|
Combined phonation (healthy) | 205.7 | — (91.6) | 22.7 (4.5) | 5.5 (7.2) | — | — | |||||||||||
|
Singing (healthy) | 325.4 | — (94.6) | 21.5 (4) | 9.7 (7.3) | — | — | |||||||||||
|
Speech (healthy) | 203.5 | — (62.4) | 23.1 (4.5) | 4.2 (6.6) | — | — | |||||||||||
|
|
|
|
|
|
|
||||||||||||
|
Patients with NPVH | 202.4 | — (68.1) | 20.6 (3.9) | 2.6 (6.7) | — | — | |||||||||||
|
Matched controls | 182.8 | — (68.6) | 22.1 (4.3) | 2.5 (6.5) | — | — |
aMode and mean (SD) data for the acoustic metrics
b
cCPP: cepstral peak prominence.
dH1 – H2: difference between the first and second harmonic magnitudes.
eTilt: spectral tilt.
fTilt Abs: tilt absolute.
gPVFL: phonotraumatic vocal fold lesions.
h—: data not available.
iPVH: phonotraumatic vocal hyperfunction.
jNPVH: nonphonotraumatic vocal hyperfunction.
No significant effects of study group or gender group were observed for SAVRa measures, but a main effect of time was found on all 3 SAVRa scores (all P<.001;
Means and standard errors (error bars) of Self-Administrated Voice Rating questionnaire (SAVRa) as functions of Time and Gender Group. The voice acting session is highlighted in the pink region. Asterisks denote statistically significant differences between a specific time point and Day 1 (**P≤.01, *** P≤.001). DISC: laryngeal discomfort level; EFFT: current speaking effort level; IPSV: inability to produce soft voice; n.s.=no significant differences.
No significant main effects of study group, gender group, or session dose were found on independent tests of distance dose. The averaged Total Dd was approximately 8354.35 m (SD 2301.84 m) for a 4-hour voice acting across participants. The averaged Dd was approximately 4250.24 m (SD 1408.31 m) for part 1 and 4104.11 m (SD 1086.22 m) for part 2 of the acting session (
Means and standard errors (error bars) of accumulated distance dose (Dd) as functions of Study Group and Gender Group. (A) Total 4-hour sessions. (B) First and second parts of session. n.s.=no significant differences, ie,
Across all phonation tasks, no study group effects (ie, main effect of study group or interaction of study group and time) were noted for acoustic metrics. By contrast, main and interaction effects of gender group and time were found in certain acoustic metrics depending on the phonation task.
There was a main effect of time for CPP and Tilt (both P<.001) measures, but no significant gender group or interaction effects (
Means and standard errors (error bars) of neck surface accelerometer-derived acoustic metrics in the Rainbow Passage Task as functions of Time and Gender Group. Asterisks denote statistically significant differences (1) between the female (F) and the male (M) participant groups, as well as, (2) between a specific time point and Day 1 (*** P≤.001). CPP: cepstral peak prominence;
There was a significant main effect of time for shimmer (P<.01), with values peaking at day 2 postsession and then dropping off afterward (
Means and standard errors (error bars) of neck surface accelerometer-derived acoustic metrics in the Sustained Vowel Task as functions of Time and Gender Group. Asterisks denote statistically significant differences (1) between the female (F) and the male (M) participant groups, as well as, (2) between a specific time point and Day 1 (*** P≤.001). CPP: cepstral peak prominence;
No effects of gender group, time, or their interaction were noted for maximum phonation time (MPT),
Group-based means (SD) for the maximum phonation time and pitch glide tasks.a
Acoustic metrics and gender groups | Experimental time points, mean (SD) | ANOVA | ||||||||||||||||||
Day 1 | Day 2 presession | Day 2 midsession | Day 2 postsession | Day 3 | Day 4 | Time | Gender | Time × gender | ||||||||||||
|
|
|
|
|
|
|
||||||||||||||
|
Female | 25.29 (7.53) | 22.60 (6.42) | 25.68 (6.84) | 24.12 (7.38) | 22.22 (6.86) | 24.90 (7.90) |
|
|
|
||||||||||
|
Male | 30.08 (8.51) | 26.74 (11.35) | 30.41 (15.36) | 30.56 (10.65) | 30.87 (12.11) | 28.59 (10.50) | |||||||||||||
|
|
|
|
|
|
|
||||||||||||||
|
Female | 13.98 (4.88) | 12.85 (0.41) | 12.83 (0.62) | 12.87 (0.42) | 15.03 (7.62) | 12.63 (0.61) |
|
|
|
||||||||||
|
Male | 12.77 (0.68) | 16.80 (8.80) | 14.06 (3.57) | 13.08 (0.56) | 18.81 (8.43) | 25.18 (16.20) | |||||||||||||
|
|
|
|
|
|
|
||||||||||||||
|
Female | 929.14 (335.34) | 870.47 (269.30) | 911.29 (211.76) | 934.05 (301.39) | 842.46 (242.24) | 864.20 (231.92) |
|
|
|
||||||||||
|
Male | 694.61 (259.44) | 689.70 (281.84) | 661.19 (188.34) | 626.75 (191.97) | 823.09 (614.29) | 574.22 (249.39) |
aThere are no statistically significant effects (P<.01).
bMPT: maximum phonation time.
c
d
Accumulated distance doses are used to estimate a person’s voice use [
Based on the SAVRa data, participants started to perceive significant increases in vocal effort and discomfort after the first part of the acting. The scores increased during acting, reached their peak right after acting, and gradually returned to baseline within 48 hours after acting. This arc-shaped trajectory replicated the same SAVRa variations obtained from our previous vocal loading study, in which participants were required to reach a distance dose of 500 m in each of the 6 consecutive voice sessions [
In sum, both female and male actors showed comparable accumulated distance doses from voice acting, suggesting a gender-specific vocal safety limit may not be necessary. Similar to the observation from air microphone signals, NSA-derived acoustic metrics performed differently between sustained vowels and running speech, whereby the latter is more ecologically valid [
The NSA system used in this study was a wired version, which poses challenges for users to wear it for long periods. The data transfer was also through a physical recorder and then to a personal computer. No user-device interaction such as biofeedback of voice use was built into the current NSA system. To address these issues critical to mHealth, a wireless version of NSA wearable is now under development in our group. The NSA data will be transmitted through Bluetooth low-energy technology to a smartphone device. An in-house mobile app is also in development with features of NSA data visualization and vocal health feedback. We have already developed machine learning algorithms that are lean and efficient enough to classify upper airway symptoms such as cough and throat clearing on the NSA board [
Laboratory NSA wearable devices were deployed to a group of professional voice actors who underwent a 4-hour voice acting session. The devices were able to tolerate the strenuous body movements and ensued body movement noise from voice acting. Vocal dose measures and a regular check of clinical evaluation metrics (SAVRa and NSA-derived acoustic metrics) were included in this investigation to validate the instrumentation of the device, the NSA-derived acoustic metrics, and NSA’s algorithm for voice monitoring. Future field tests are warranted to evaluate aforesaid new instrument and algorithm functions in predicting voice and airway health for occupational voice users and those with chronic airway diseases.
Group-based means for SAVRa scores. Means (standard deviation) for each SAVRa item are presented for females and males across time points. F-values, degrees of freedom, and
Post hoc testing results for SAVRa scores. t Ratios,
Group-based means for Distance Dose measures. Means (standard deviation) for each distance dose measure are presented by Study Group (No Warm-Up and Warm-Up) and Gender Group (Females and Males). F-values, degrees of freedom, and
Group-based means for the Rainbow Passage task. Means (standard deviation) for each voice metric are presented for females and males across time points. F-values, degrees of freedom, and
Post hoc testing results for the Rainbow Passage task. t Ratios,
Group-based means for the Sustained Vowel task. Means (standard deviation) for each voice metric are presented for females and males across time points. F-values, degrees of freedom, and
Post hoc testing results for the Sustained Vowel task. t Ratios,
Alliance of Canadian Cinema, Television and Radio Artists
cepstral peak prominence
Centre for Research on Brain, Language and Music Research
distance dose
laryngeal discomfort level
current speaking effort level
fundamental frequency
difference between the first and second harmonic magnitudes
harmonic richness factor
inability to produce soft voice
mobile health
maximum phonation time
neck surface accelerometer
skin acceleration level
Self-Administered Voice Rating questionnaire
spectral entropy
sound pressure level
spectral tilt
We acknowledge Luc Mongeau, Nicholas Ogrodnik, and Laura Fasanella for providing assistance on the initial study set up and data collection. We thank Maia Masuda for supervising vocal warm-up exercise. We also sincerely thank The Alliance of Canadian Cinema, Television and Radio Artists for allowing their voice actors to work in sessions without their usual Union rates as compensation. We acknowledge research grants from the Canadian Institutes of Health Research (PJT-156412), The Centre for Research on Brain, Language and Music Research (CRBLM) Incubator Awards (NYKL-J), Canada Research Chair research stipend (NYKL-J), and National Institutes of Health (R01 DC 005788; LM). The CRBLM is funded by the Government of Quebec via the Fonds de Recherche Nature et Technologies and Société et Culture. The presented content is solely the responsibility of the authors and does not necessarily represent the official views of the aforesaid funding agencies.
None declared.