This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
To provide effective care for inpatients with COVID-19, clinical practitioners need systems that monitor patient health and subsequently allow for risk scoring. Existing approaches for risk scoring in patients with COVID-19 focus primarily on intensive care units (ICUs) with specialized medical measurement devices but not on hospital general wards.
In this paper, we aim to develop a risk score for inpatients with COVID-19 in general wards based on consumer-grade wearables (smartwatches).
Patients wore consumer-grade wearables to record physiological measurements, such as the heart rate (HR), heart rate variability (HRV), and respiration frequency (RF). Based on Bayesian survival analysis, we validated the association between these measurements and patient outcomes (ie, discharge or ICU admission). To build our risk score, we generated a low-dimensional representation of the physiological features. Subsequently, a pooled ordinal regression with time-dependent covariates inferred the probability of either hospital discharge or ICU admission. We evaluated the predictive performance of our developed system for risk scoring in a single-center, prospective study based on 40 inpatients with COVID-19 in a general ward of a tertiary referral center in Switzerland.
First, Bayesian survival analysis showed that physiological measurements from consumer-grade wearables are significantly associated with patient outcomes (ie, discharge or ICU admission). Second, our risk score achieved a time-dependent area under the receiver operating characteristic curve (AUROC) of 0.73-0.90 based on leave-one-subject-out cross-validation.
Our results demonstrate the effectiveness of consumer-grade wearables for risk scoring in inpatients with COVID-19. Due to their low cost and ease of use, consumer-grade wearables could enable a scalable monitoring system.
Clinicaltrials.gov NCT04357834; https://www.clinicaltrials.gov/ct2/show/NCT04357834
Health trajectories from patients with COVID-19 show large variability with sudden deterioration in the disease state and uncertain outcomes [
Prior research has developed systems for monitoring patients with COVID-19 in different settings. One research stream detects the onset of COVID-19 using wearables (eg, smartphones) [
In this paper, we develop a risk score for inpatients with COVID-19 in general wards based on scalable consumer-grade wearables (see
Monitoring and risk scoring. To develop a scalable risk score, physiological features were computed from wearable measurements. Next, Bayesian survival analysis was conducted to assess the association between the physiological features and patient outcomes. Lastly, a scalable risk score was developed. This study was designed to demonstrate the effectiveness of consumer-grade wearables for a scalable risk-scoring system in inpatients with COVID-19 in the general ward. ICU: intensive care unit.
In visit 1 (V1), a study investigator explained the nature, purpose, and risks of the study and provided eligible patients with a copy of the patient information sheet. If written informed consent was obtained and eligibility criteria were met, the remaining screening information was obtained. A patient number was assigned to each patient in ascending order.
Eligible patients were provided with a Garmin vívoactive 4S (Garmin International Inc., Olathe, Kansas, USA) smartwatch and a Xiaomi Redmi 9 (Xiaomi Corp., Beijing, China) smartphone. Patients wore the wearable on the wrist of the dominant hand, if possible (otherwise the other hand).
After mounting the devices, the study investigator controlled the function of the devices and checked whether data transfer was working properly. In addition, the patient was instructed to fully charge both devices once per day or as needed. The smartwatch was worn during the whole study duration, that is, from hospitalization in the general ward until the patient was admitted to the ICU or discharged home.
The study investigators were equipped with a monitoring dashboard allowing for observation of the charging status as well as functionality of the devices in use. If a patient was not capable of charging the devices themselves or the devices were not working properly, a member of the study team directly approached the patient and either charged the devices or solved possible technical issues.
In visit 2 (V2, close-out visit), the treating physician in the general ward informed the study team that 1 of the close-out criteria (admitted to the ICU or discharged home) had been met. A member of the study team then visited the patient and initiated the close-out visit. During V2, patients returned the wearable, the smartphone, and the charging cable. Completeness of data transfer to the back-end server was checked, and thereafter, all data on the devices were deleted.
The study followed the Declaration of Helsinki, the guidelines of good clinical practice, Swiss health laws, and the ordinance on clinical research. The study was approved by the local ethics committee of Bern, Switzerland (ID 2020-00874). Each patient provided informed written consent before any study-related procedure.
The technical backbone of our data collection comprised 2 components: (1) a smartwatch that continuously collected physiological parameters and (2) a custom smartphone app to transfer the data to our server. In particular, the collected data were first transferred via Bluetooth to our self-developed smartphone app. Subsequently, the data were sent to a central database.
The smartwatch was used for measuring various physiological parameters. The recorded sensor measurements were the accelerometer (ACC), interbeat interval (IBI), HR, and RF. The ACC was sampled with 25 Hz. The HR and RF were logged once per minute. The IBI was recorded by logging the time of each heartbeat. The HR, IBI, and RF were derived from the photoplethysmography (PPG) sensor of the wearable.
Additional patient demographics (ie, patient age and sex) were collected by the clinical practitioners.
Data gathered from sensors embedded in consumer-grade wearable devices come along with inherent challenges for clinical usage. In particular, consumer smartwatches are by no means certified medical devices, and their sensor data may be subject to noise and missing values. We thus performed customized preprocessing of the sensor data as follows.
The HRV was computed based on a time series of IBIs. Of note, variability measures retrieved from an optical PPG signal should be referred to as pulse rate variability (PRV), whereas the variability measures retrieved from an electrocardiogram (ECG) should be referred to as the HRV. Since variability measures are significantly correlated, we followed the convention and speak here of the HRV [
Our data showed substantial variation in the HR, HRV, and RF throughout the day, which was most likely due to changes in patient activity patterns. A confirmatory check showed strong dependence on the intensity of body movements throughout the day (see
The wearable-based measurements of the HR, HRV, and RF were aggregated into a single value per time window using feature engineering. For the HR and RF, we computed 15 statistical features that reflect different properties of the distribution over time (eg, mean, skewness, SD). For the HRV, there exists an extensive amount of research on the effect of the window size on HRV features [
For both preprocessing and feature engineering, we leveraged the publicly available Python package FLIRT, which is tailored to process wearable data [
To assess the association of physiological features with observed patient outcomes (ie, hospital discharge vs ICU admission), survival analysis was conducted. This allowed us to appropriately account for the time-to-event nature of the data and the presence of right censoring (see the Results section). Since the physiological features were updated each day, they were represented by time-dependent covariates in our survival analysis. Accordingly, a pooled regression approach [
In our explanatory analysis, we estimated univariate associations, which allowed us to identify the association of individual physiological features with patient health. Thus, a separate model was fitted for each physiological feature.
To develop a risk score based on the physiological features, a parsimonious 2-step approach was chosen. That is, we first used feature engineering and principal component analysis (PCA) to obtain a low-dimensional but comprehensive representation of patients’ physiological state. This representation was then linked to patient outcomes through a Bayesian survival model that was similar to the models used in our explanatory analysis. We chose this approach over alternative methods (eg, deep learning) due to several reasons. First, the use of a parametric model can effectively reduce the risk of overfitting, while the feature engineering still allows us to use high-dimensional sensor data. Moreover, by jointly modeling the probabilities of hospital discharge and ICU admission, the risk score makes optimal use of the available data and can be readily interpreted as an overall indicator of patient condition. Finally, the use of Bayesian modeling ensures robust results even with limited amounts of data and appropriately quantifies uncertainty in the risk score.
The risk score was constructed by combining multiple physiological features into an overall metric. For this, we proceeded as follows: (1) The coefficients of the explanatory models were used to select physiological features of the HR, HRV, and RF that showed a relevant association (80% credible interval [CrI], excluding 0) with patient outcomes. (2) Since many features of the same measurement were strongly correlated, dimensionality reduction via PCA [
All model parameters were estimated using a fully Bayesian framework [
The performance of the developed risk score was evaluated via leave-one-patient-out cross-validation. The cross-validation covered all relevant preprocessing steps, including PCA. We assessed the performance in terms of discrimination accuracy via time-dependent receiver operating characteristic (ROC) curves using the incident case approach with dynamic controls (I/D) [
To assess the added value of continuous physiological measurements for monitoring a patient’s health throughout their hospital stay, we compared our risk score model to an alternative model that uses only data from the first night of hospital stay but is otherwise identical. The time-dependent AUROC was computed for both risk score models and compared across a varying length of stay in the hospital. We evaluated the length of stay for which a sufficient number of observations was available (ie, up to 6 days), corresponding to 87% of all observed lengths of stay. In the Results section, we further report a smoothed AUROC over time using the nearest-neighbor estimator for time-dependent ROC curves [
We conducted an observational study (see the study flowchart in
Inclusion criteria were age greater than 18 years, suspicion of COVID-19 or patient testing positive for SARS-CoV-2, and hospitalization in the general ward. Exclusion criteria were direct transfer from an emergency ward or external institution to the ICU (ie, no hospitalization in the general ward of the study institution). Further exclusion criteria were that the smartwatch could not be attached around the wrist of the patient, known allergies to components of the smartwatch, and rejection of ICU admission in the patient decree.
After screening, 1 (2.2%) of 46 individuals was excluded due to a negative SARS-CoV-2 result, 4 (8.7%) patients were excluded due to technical problems during the recording (eg, persistent interruptions of the Bluetooth connection between wearable and smartphone) or nonadherence to the prescribed measurement regime, and 1 (2.2%) individual was excluded because the hospital discharge occurred on the same day of hospitalization. In total, 40 (87%) patients remained. Of these, 7 (17.5%) were admitted to an ICU during their hospital stay (after a median of 2 days), and 31 (77.5%) were discharged without a subsequent ICU stay (after a median of 4 days). In addition, 2 (5%) patients dropped out before their outcome was recorded and were thus treated as right-censored in our analysis.
Overview of study with a study flowchart. Data were obtained according to the study flowchart. During visit 1 (V1), 46 eligible patients were recruited. After hospitalization in the general ward, patients were equipped with a consumer-grade wearable (smartwatch). We excluded patients with suspected COVID-19 in the case of a negative SARS-CoV-2 test (n=1). In addition, patients were excluded due to nonadherence to measurement principles or interruptions in connectivity (n=4) and self-discharge on the same day as hospital admission (n=1). During visit 2 (V2), we recorded the patient outcomes (ie, discharge, n=31, vs ICU admission, n=7). Patients with unknown outcomes were right-censored (n=2). ICU: intensive care unit.
Overall, 49 different univariate associations were estimated (ie, with 15, 30.6%, features related to the HR, 19, 38.8%, features related to the HRV, and 15, 30.6%, features related to the RF). For features related to the HR, we found the following associations: a higher HR was associated with worsened patient outcomes. In particular, we found that an increase in the mean HR indicated a deterioration in the patient’s condition (coefficient 0.71, 95% CrI 0.20-1.32). A similar association was found for several other features, including the maximum HR (coefficient 0.46, 95% CrI 0.03-0.94). For entropy-based features, the estimated relationship remained largely uncertain, however. For features related to the HRV, we found that increases were associated with improved patient outcomes. For example, an increase in the standard deviation of normal-to-normal intervals (SDNN) indicated an improvement of the patient condition (coefficient –0.28, 95% CrI –0.82 to 0.21). Moreover, several features related to the RF showed a positive association, where larger values indicated a worsened patient outcome. For example, increases in the 95% quantile of the RF were associated with a deterioration in the patient’s condition (coefficient 0.77, 95% CrI 0.19-1.51). The same was observed for the RF SD (0.46, 95% CrI –0.05 to 1.06). Altogether, these associations establish that the risk of a worsened condition among inpatients with COVID-19 can be identified through health measurements from consumer-grade wearables.
As part of our robustness checks, various alternative model specifications were tested (ie, changes with respect to the time window used for physiological measurements, the time trend, subject-specific variation, the cumulative distribution function, and wider priors; see
Association of physiological features with patient outcomes. Shown are the standardized coefficients of physiological features for the (a) HR, (b) HRV, and (c) RF. Features were computed based on daily physiological measurements from wearables (see the Feature Engineering section and
Next, the physiological measurements from the wearables were combined into an overall risk score. Here, our main aim was to demonstrate that a combination of different physiological features is of predictive value and thus jointly informative. For this, PCA [
The selected PCs were used to model patient outcomes as the dependent variable based on a survival model that is similar to that of the explanatory analysis.
Probability of hospital discharge and ICU admission for different values of the risk score. Shown is the estimated daily probability of hospital discharge (blue) and ICU admission (red) as a function of the risk score. A larger risk score implies a higher probability of ICU admission and a lower probability of hospital discharge. Posterior means (lines) and 95% CrIs (shaded areas) are reported. The probability of continued stay (ie, neither hospital discharge nor ICU admission) is not shown but can be computed as Pcontinued stay = 1 – Pdischarge – PICU. CrI: credible interval; ICU: intensive care unit.
For comparison, we also reported the performance of a fixed risk score that used only data from the first night of hospital stay but was otherwise identical (
To further assess the added value of the physiological features used in our risk score, we also evaluated the performance of a risk score that uses demographic features (ie, patient age and sex) but no physiological features. The cross-validation results for this risk score indicated that demographic features alone have no relevant predictive value with regard to the time-varying health condition of the patients in our sample (see
Prediction performance of the risk score over time. Shown is the time-dependent AUROC of the risk score in predicting patient discharge over time. Two scenarios are compared: (1) main (blue solid line) and (2) fixed (gray dashed line). In the main scenario, the daily risk score is computed from updated wearable-based measurements recorded during the respective previous night. The AUROC is significantly above 0.5 for up to 6 days, which covers 87% of the patients’ length of stay. In the fixed scenario, the risk score is computed throughout the stay from recordings only from the first night. The comparison between these scenarios shows the added value of regularly updated health measurements provided by wearables. Out-of-sample predictions were obtained via leave-one-patient-out cross-validation. Dots show the individual time-dependent AUROC estimates for days with observed patient discharge. Smoothing was performed via a nearest-neighbor estimator (see the Performance Evaluation section) to obtain an estimate of the mean AUROC over time (lines) with 95% CIs (shaded areas). AUROC: area under the receiver operating characteristic curve.
This work presents a monitoring system that allows for risk scoring of inpatients with COVID-19 in the general ward using consumer-grade wearables (smartwatches).
For this, Bayesian survival analysis was used to establish that physiological measurements monitored by consumer-grade wearables are indicative of patient outcomes in the general ward (ie, hospital discharge vs ICU admission). We further showed that these different physiological measurements can be combined into a single, clinically meaningful risk score with high prediction performance regarding the health condition of patients (time-dependent AUROC of 0.73-0.90). Our results show the feasibility of a risk score for inpatients with COVID-19 in general wards based on scalable consumer-grade wearables. In the future, such risk scores may enable clinical practitioners to adapt to patient needs and, ideally, respond earlier when a patient trajectory progresses toward a critical condition.
We found that several physiological features derived from wearable-based measurements are associated with patient outcomes. For instance, a higher mean HR, a higher mean RF, and a lower HRV RMSSD are all indicative of a deterioration in the health condition of patients. The observed relationship between patient outcomes and cardiovascular features (HR and HRV), as well as patient outcomes and RF measurements, is consistent with previous research on digital biomarkers [
To derive our risk score, we intentionally chose a parsimonious approach using feature engineering and Bayesian survival modeling. Different from other machine learning methods, a parametric, Bayesian approach like ours is especially viable in the case of newly emerged diseases, where data availability may be limited. Since our feature engineering is mostly disease independent, the physiological features could be integrated into models for other diseases too, further promoting scalability. More generally, our approach demonstrates how multiple competing patient outcomes can be flexibly linked to time-dependent measurements in a parametric, joint model of patient condition.
Prior research successfully explored vital signs measured by smartwatches (eg, the resting HR) as a basis to detect the onset of COVID-19 outside a clinical setting [
Lastly, several studies have focused on risk scoring in ICUs [
In summary, our study supports the clinical relevance of wearables exclusively based on consumer-grade technology. In contrast to specialized medical devices for health monitoring (eg, finger pulse oximetry or ECG sensor), consumer-grade technology comes at a comparatively low cost, can be deployed easily, and is thus scalable. Clinical practitioners simply need to attach the smartwatch to the wrist of a patient. In addition, smartwatches offer a familiar user interface.
A general concern may be that measurements from consumer-grade wearables are subject to noise or missing values. The results of our study, however, show that a wearable-based risk score can offer robust predictions of patient outcomes. Our study opens several possibilities for future research. The main limiting factor of our study is the sample size of 40, which naturally restricts the number of ICU admissions in the data set. To further assess the predictive performance of wearable-based risk scores, in particular with regard to ICU admission, future research might expand our data set with additional patient populations and different variants of SARS-CoV-2. The model merely incorporated data from wearable sensors for risk scoring and refrained from integrating other data sources (eg, electronic health records). This choice was made to ensure a scalable use in clinical practice. Further, our system builds upon dimensionality reduction via PCA to handle high-dimensional sensor data, proving effective to avoid overfitting. Nevertheless, future research may explore alternative machine learning methods for risk scoring.
Overall, our results show the promise of consumer-grade wearables as an effective, scalable, and low-cost technology for health monitoring in a general ward. In the future, consumer-grade wearables, such as smartwatches, may further offer monitoring capabilities for inpatients with other diseases.
Data description.
Method details.
Model Diagnostics.
Robustness of Explanatory Analysis.
Principal component analysis (PCA) results.
Comparison to a Risk Score Using Only Demographic Features.
accelerometer
area under the receiver operating characteristic curve
credible interval
electrocardiogram
heart rate variability
heart rate
heart rate variability
interbeat interval
intensive care unit
least absolute shrinkage and selection operator
principal component
principal component analysis
photoplethysmography
respiration frequency
receiver operating characteristic
S Föll, MM, KK, VL, TZ, DS, SJ, and FW contributed to the conception and design of the study. MM and S Föll developed the digital biomarker platform. KK, SJ, and DS screened and enrolled the patients and acquired the data. MM and S Föll processed the data. AL conducted the statistical analysis. S Feuerriegel supervised the statistical analysis. AE supervised the clinical study. S Föll and AL wrote the manuscript. MM, FW, EF, S Feuerriegel, KK, VL, TZ, DS, SJ, and AE critically reviewed and edited the manuscript. FW and AE jointly supervised the project and share the last authorship. All authors have approved the final draft of the manuscript for submission.
This work was partially funded by the Hasler Stiftung (Project 20039). The funding body had no control over the study design, data collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.
S Feuerriegel declares membership in a COVID-19 working group of the World Health Organization (WHO) but without competing interests. All other authors declare no competing interest.