This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
Continuous glucose monitors have shown great promise in improving outpatient blood glucose (BG) control; however, continuous glucose monitors are not routinely used in hospitals, and glucose management is driven by pointofcare (finger stick) and serum glucose measurements in most patients.
This study aimed to evaluate times series approaches for prediction of inpatient BG using only pointofcare and serum glucose observations.
Our data set included electronic health record data from 184,320 admissions, from patients who received at least one unit of subcutaneous insulin, had at least 4 BG measurements, and were discharged between January 1, 2015, and May 31, 2019, from 5 Johns Hopkins Health System hospitals. A total of 2,436,228 BG observations were included after excluding measurements obtained in quick succession, from patients who received intravenous insulin, or from critically ill patients. After exclusion criteria, 2.85% (3253/113,976), 32.5% (37,045/113,976), and 1.06% (1207/113,976) of admissions had a coded diagnosis of type 1, type 2, and other diabetes, respectively. The outcome of interest was the predicted value of the next BG measurement (mg/dL). Multiple time series predictors were created and analyzed by comparing those predictors and the index BG measurement (sampleandhold technique) with next BG measurement. The population was classified by glycemic variability based on the coefficient of variation. To compare the performance of different time series predictors among one another,
The median number of BG measurements from 113,976 admissions was 12 (IQR 524). The
When analyzing time series predictors independently, increasing variability in a patient’s BG decreased predictive accuracy. Similarly, inclusion of older BG measurements decreased predictive accuracy. These relationships become weaker as glucose variability increases. Machine learning techniques marginally augmented the performance of time series predictors for predicting a patient’s next BG measurement. Further studies should determine the potential of using time series analyses for prediction of inpatient dysglycemia.
Current practice guidelines recommend scheduled insulin therapy for most hospitalized patients with diabetes or hyperglycemia [
In an attempt to combat hypoglycemia, continuous glucose monitors (CGMs) were developed to measure the glucose concentration on the time scale of minutes [
The objective of this study was to compare different regression windows and machine learning approaches for prediction of the next BG measurement in hospitalized patients using only POC and serum glucose data. As previous research has suggested that selfmonitoring BG underpredicts low and high BG indices owing to the sparser data points compared with CGM, there is a compelling need to improve on predictive accuracy using POC BG measurements [
This was a retrospective cohort study derived from electronic health record data obtained from 5 hospitals within the Johns Hopkins Health System. Admissions were included if patients received at least one unit of either subcutaneous or intravenous insulin and had at least 4 BG measurements during the admission. Among 118,734 hospitalized patients discharged between January 1, 2015, and May 31, 2019, there were a total of 4,538,510 serum or POC BG measurements. Data extraction and processing methodology for this data set has been previously described [
As our interest was to learn about prediction in noncritical care settings where glucose measurements are typically obtained 46 times daily, we excluded admissions in which the patient received intravenous insulin or was admitted to the intensive care unit (typically hourly glucose checks), or had a BG measurement in which both the preceding and succeeding BG measurements were within ≤90 minutes. The rationale for this last exclusion criterion was 2fold: first, the typical shortest interval between BG checks for most patients in a nonintensive care unit setting is 34 hours (between meals and at bedtime, or every 46 hours for select patients), and second, successive BG measurements correlate with one another, which would overestimate model performance. Furthermore, because the longest interval between 2 fingerstick BG measurements in hospitalized patients is typically 10 hours (window between 10 PM bedtime and 8 AM finger stick), we excluded any BG reading in which the succeeding BG reading occurred >10 hours after that BG reading. These exclusions decreased our data set from 184,320 admissions to 113,976 admissions.
For this time series analysis, each BG value was ordered sequentially in 5minute intervals (
Example of data preprocessing to develop a data set suitable for time series analysis. Preprocessing of the time data required going from standard time (left) to blocks of 5 minutes. Only the first blood glucose (BG) measurement in each block of 5 minutes was included, causing the 51 mg/dL reading at 2:48 to be excluded as a repeat measure of the same hypoglycemic event. In addition, the BG measurement of 68 at 6:45 PM was excluded since the preceding and succeeding BG measurements (at 6:33 PM and 6:52 PM) were both within 90 minutes. In the final data set (right), neither the 6:33 PM nor the 6:45 PM BG measurement were excluded as each of them had at least one adjacent BG measurement separated in time by at least 90 minutes. *Repeat measurement in the same 5minute block is dropped. **BG measurements in which previous and next reading are both within 90 minutes of index observation is dropped.
The outcome of interest was the predicted value of a patient’s next BG measurement in mg/dL.
The primary predictor of interest was previous BG measurements. All approaches to predicting next BG used some subset of prior BG measurements based either on a prespecified window of time or a prespecified number of such measurements. Secondary predictors considered included sex; age; race; diabetes diagnosis;
All statistical analyses were performed using R statistical software (version 3.6.2; R Foundation for Statistical Computing) and Stata software (version 15.1; StataCorp LLC). Stata software was used for its preloaded functions in time series analysis that allowed for the generation of different time series variables. R statistical software was used to analyze these time series variables via machine learning algorithms. The
The first predictive approach was based on simple MA models. MA models predict the next glucose value as a simple average of all readings within a specified time window, including the index reading. These analyses were performed with windows of 30 minutes, 1 hour, 1.5 hours, 2 hours, 4 hours, 8 hours, 12 hours, 16 hours, 20 hours, 24 hours, 36 hours, 48 hours, 60 hours, and 72 hours. If there was no additional reading in the MA lookback window, the predicted next BG measurement is the index BG. We created MA values with windows as short as 30 minutes because despite excluding BG readings that had a preceding and succeeding reading within 90 minutes, it was still feasible to have 2 readings within 30 minutes of one another (
Rolling regression (RR) was used to predict the next BG measurement from a simple linear regression model estimated from the previous
Example calculation of rolling regression and moving average calculation. Top: A patient’s blood glucose (BG) reading is graphically presented with BG value on the yaxis and time of BG reading on the xaxis. Bottom left: Rolling regression removes the temporal component of BG reading. The BG reading number is plotted on the xaxis as a discrete variable, and the BG observation number is plotted on the yaxis. A bestfit line is plotted based on the n BG readings included in the rolling regression, and a prediction (black drop) is made based on where the bestfit line intersects the next discrete BG reading. Bottom right: Moving averages allow for as many BG readings to be included in a given period. All BG readings are equally weighted, and the prediction (black drop) is made based on an unweighted average of all the BG readings in that period.
RR and recursive regression values were generated in Stata using the
To understand the predictive benefit of having nonBG predictors in the machine learning models, we reran the same models listed above including all other patientlevel and timespecific predictors in addition to the BG time series measures.
To compare the performance of different machine learning algorithms, a random sample of 10,000 observations was selected for 5fold crossvalidation for each of the 5 methods. Owing to concerns regarding reporting model predictive accuracy with
The study protocol was approved by the institutional review board of the Johns Hopkins School of Medicine with a waiver of informed consent (IRB00117098).
Our cohort includes 2,436,228 BG measurements from 113,976 admissions.
Cohort characteristics by admission (N=113,976).
Factor  Value  
Age (years), median (IQR)  65 (5475)  
Weight (lbs), median (IQR)  176 (145213)  
BMI (kg/m^{2}), median (IQR)  27.8 (23.633.2)  



Female  56,256 (49.36) 

Male  57,720 (50.64) 



Black  36,371 (31.91) 

Other  13,088 (11.48) 

White  64,517 (56.61) 
Length of stay (days), median (IQR)  5.03 (3.018.86)  
Average admission BG^{a}, median (IQR)  141 (117179)  



None  72,471 (63.58) 

T1D^{b}  3253 (2.85) 

T2D^{c}  37,045 (32.5) 

Other  1207 (1.06) 
^{a}BG: blood glucose.
^{b}T1D: type 1 diabetes.
^{c}T2D: type 2 diabetes.
Distribution of time to next blood glucose (BG) reading in hours.
The data set was further divided into categories of CV to determine if BG measurement lability affected predictive accuracy of different time series predictors:
Pearson correlation coefficient of time series predictors plotted with next blood glucose (BG) measurement.
Model 



All observations  Glycemic variability category^{a}  


Low  Medium  High  Very high  



30 minutes  0.527  0.684  0.513  0.380  0.368  

1 hour  0.519  0.683  0.508  0.370  0.348  

1.5 hours  0.512  0.682  0.502  0.356  0.322  

2 hours  0.504  0.682  0.498  0.336  0.290  

4 hours  0.481  0.680  0.480  0.290  0.205  

8 hours  0.489  0.682  0.495  0.296  0.162  

12 hours  0.483  0.675  0.486  0.292  0.157  

16 hours  0.467  0.664  0.469  0.273  0.144  

20 hours  0.465  0.656  0.466  0.277  0.148  

24 hours  0.454  0.646  0.453  0.266  0.143  

36 hours  0.440  0.628  0.438  0.256  0.140  

48 hours  0.432  0.617  0.429  0.251  0.139  

60 hours  0.427  0.610  0.423  0.248  0.139  

72 hours  0.423  0.605  0.418  0.245  0.138  



3 observations  0.529  0.679  0.540  0.398  0.370  

4 observations  0.495  0.672  0.499  0.341  0.321  

5 observations  0.484  0.670  0.488  0.321  0.298  

10 observations  0.491  0.667  0.501  0.334  0.257  

25 observations  0.465  0.647  0.478  0.304  0.199  

100 observations  0.433  0.609  0.444  0.276  0.177  

500 observations  0.427  0.601  0.437  0.272  0.175  
Recursive regression  0.427  0.601  0.437  0.272  0.175  
Current BG (sampleandhold)  0.529  0.685  0.515  0.381  0.373 
^{a}Glycemic variability defined based on coefficient of variation (CV) over the previous 24 hours including the index BG measurement: low (CV≤0.15), medium (0.15<CV≤0.30), high (0.30<CV≤0.45), and very high (0.45>CV). A good
Next blood glucose (BG) measurement and predicted BG measurement throughout an example patient’s admission. A patient’s next BG value (blue line) compared with what would be predicted based on a time series predictor (red line). The time series predictors that were compared with the true next BG value were (A) 2hour moving average (MA), (B) 4hour MA, (C) 8hour MA, (D) 24hour MA, (E) 48hour MA, (F) 4BG rolling regression (RR), (G) 25BG RR, and (H) recursive regression.
The performance of 4hour and 24hour MA, 3observation and 25observation RR, and recursive regression BG predictions are compared in the full population, patients with T1DM, and patients with type 2 diabetes mellitus with basal insulin on board (
Clarke Error Grid analysis results for different time series analyses by diabetes diagnosis.
Clarke Error Grid zone  Full population (n=2,436,228)  Type 1 diabetes (n=104,115)  Type 2 diabetes^{a} (n=351,252)  

A^{b}  B^{c}  C^{d}  D^{e}  E^{f}  A  B  C  D  E  A  B  C  D  E  



4 hours  0.61  0.33  0.02  0.03  0.00  0.43  0.40  0.09  0.07  0.01  0.51  0.40  0.04  0.05  0.00  

24 hours  0.59  0.35  0.02  0.03  0.00  0.39  0.43  0.10  0.06  0.02  0.50  0.42  0.04  0.04  0.01  



3 observations  0.27  0.71  0.01  0.01  0.00  0.26  0.69  0.02  0.03  0.01  0.25  0.71  0.02  0.02  0.00  

25 observations  0.55  0.41  0.02  0.03  0.00  0.39  0.47  0.05  0.07  0.01  0.49  0.42  0.03  0.05  0.01  

Recursive regression  0.53  0.42  0.02  0.03  0.00  0.37  0.49  0.05  0.08  0.02  0.47  0.44  0.04  0.06  0.01 
^{a}With basal insulin on board at time of blood glucose reading.
^{b}A: Values indicate proportion of predicted glucose values that are within 20% of true value.
^{c}B: Values indicate proportion of predicted glucose values that are outside of 20% but would not lead to inappropriate treatment.
^{d}C: Values indicate proportion of predicted glucose values that are within a range that would lead to unnecessary treatment.
^{e}D: Values indicate proportion of predicted glucose values that are within a range that indicates potentially dangerous failure to detect hypoglycemia or hyperglycemia.
^{f}E: Values indicate proportion of predicted glucose values that are within a range that would confuse treatment of hypoglycemia for hyperglycemia and vice versa.
The 5fold crossvalidation statistics of time series predictors used to predict next blood glucose (BG) value in various machine learning models.

Model A^{a}  Model B^{b}  

RMSE^{c} (95% CI)  _{R}^{2} (95% CI)  MAE^{d} (95% CI)  RMSE (95% CI)  _{R}^{2} (95% CI)  MAE (95% CI)  
_{Cubist}  _{44.9 (42.847.0)}  _{0.561 (0.5360.586)}  _{29.3 (28.829.3)}  _{44.9 (43.246.6)}  _{0.563 (0.5330.593)}  _{29.3 (29.029.6)}  
_{Linear model}  _{44.8 (42.946.6)}  _{0.562 (0.5320.592)}  _{29.7 (29.130.2)}  _{46.9 (42.551.2)}  _{0.526 (0.4610.591)}  _{29.9 (29.230.7)}  
_{Random forest}  _{45.4 (43.947.0)}  _{0.547 (0.5210.575)}  _{30.4 (29.631.1)}  _{45.2 (42.947.4)}  _{0.554 (0.5350.574)}  _{30.3 (29.730.8)}  
_{Partial least squares}  _{45.4 (43.847.0)}  _{0.548 (0.5120.584)}  _{30.2 (29.530.9)}  _{46.0 (44.347.6)}  _{0.538 (0.5020.574)}  _{30.7 (30.031.5)}  
_{knearest neighbors}  _{48.7 (46.750.8)}  _{0.486 (0.4690.503)}  _{32.8 (32.633.0)}  _{49.4 (46.851.9)}  _{0.472 (0.4390.505)}  _{33.4 (32.634.1)} 
^{a}Model A: predictor variables in all machine learning models above were 30minute moving average (MA), 1hour MA, 1.5hour MA, 2hour MA, 4hour MA, 8hour MA, 12hour MA, 16hour MA, 20hour MA, 24hour MA, 36hour MA, 48hour MA, 60hour MA, 72hour MA, 3observation rolling regression (RR), 4observation RR, 5observation RR, 10observation RR, 25observation RR, 100observation RR, 500observation RR, recursive regression, index BG measurement, and previous BG measurement.
^{b}Model B: all variables included in model A and sex, age, race, diabetes diagnosis, nil per os status, home insulin, home antihyperglycemic medication, glomerular filtration rate, hydrocortisone equivalents on board, basal insulin units on board (units, U), combination insulin units on board (U), concentrated insulin units on board (U), intermediate acting insulin units on board (U), rapidacting insulin units on board (U), regular insulin units on board (U), and ultra–longacting insulin units on board (U).
^{c}RMSE: root mean squared error.
^{d}MAE: median average error.
In this retrospective cohort study using a large number of POC and serum glucose observations, we identified the correlation of different timevarying MA and RR predictors of a hospitalized patient’s next BG reading. We found that the most recent BG measurement provides the most predictive accuracy; adjusting for trends or increasing the lookback window negatively affects correlation. Interestingly, the addition of variables associated with glycemic control did not greatly modify the performance of machine learning algorithms that included all the MA and RR predictors, although the machine learning models performed marginally better compared with any individual time series predictor. However, the best performing algorithm in model A (time series predictors only) was the simple linear regression, but the best performing algorithm in model B (time series predictors with additional nonglycemic data) was the Cubist model, suggesting that new information differentially improved different algorithms.
In clinical practice, there is growing interest in developing machine learning algorithms to predict hypoglycemia in the inpatient setting. Although many of the published algorithms use categorical variables [
There has been interest in using CGM data to predict a patient’s BG over a short horizon of 60 minutes. For example, Gani et al [
Our study found that the predictive accuracy of MA and RR declined with increasing size of the lookback window. Although BG is generally obtained either every 4 hours or 4 times daily, the 30minute MA had the highest predictive accuracy based on the
A secondary objective was to evaluate how performance accuracy differs when comparing a model using BG data alone with one that includes a broader number of clinical variables that can influence glucose homeostasis. A recent review of BG prediction strategies in patients with T1DM using CGM data found that most published models use CGM data, insulin dosing, and carbohydrate consumption [
On the basis of the relatively similar performances of the MA and RR, we were surprised how greatly the Clarke Error Grid analysis differed between the MA and RR results. For example, in the full patient population, 61% of the 4hour MA predictions but only 27% of the 3observation RR predictions were in zone A. These findings highlight limitations in deciding which performance metrics to report. As described previously, mean squared error and sum of squared errors are the most commonly reported performance metrics in BG prediction [
Our study has several strengths. Notably, we determined the correlation of different time series predictors with the next BG measurement. We also evaluated the predictive performance when all time series predictors were included in machine learning models that included other demographic and clinical parameters. Our analyses were based on a large, diverse sample. Although BG prediction algorithms published in the literature use CGM data, our analysis can be applied to hospitalized patients who do not have access to CGMs.
There are some limitations to our study. We did not have information about insulin doses from total parenteral nutrition formulations, amount of carbohydrates consumed with meals, or designation of BG as either random or fasting. Similarly, measures such as hemoglobin A1c were not included as not all patients have this routinely measured during admission. As we attempted to predict a patient’s next BG measurement based on POC or serum BG readings, the time to next BG reading is not defined like in CGMs. Thus, we were unable to define a discrete prediction horizon as BG samples are not obtained at exact intervals in every inpatient. In addition, much of our analysis was based on the correlation between a BG reading’s next measurement and its associated time series predictors. This analysis has limited predictive value as these time series predictors were not tested on a test cohort of data. The machine learning approaches presented to combat this limitation may be prone to overfitting given the complexity of the models. Although the machine learning models were significantly more predictive than any individual time series predictor, the clinical significance of these findings is uncertain given the only modest increases in
To the best of our knowledge, this is the first study to evaluate different prediction models for the value of the next BG measurement using only POC and serum glucose measurements in hospitalized patients. Our results did not rely on data from CGMs and were agnostic to when the patient’s next BG would be measured. We found that BG prediction is highly dependent on the most recent BG observation, with diminishing performance as the lookback window increases. Future prospective studies need to evaluate prediction of BG using such time series models and determine whether quantitative prediction of glucose results in better clinical outcomes compared with previous studies that predict hypoglycemic and hyperglycemic events as binary or categorical outcomes.
blood glucose
continuous glucose monitor
coefficient of variation
moving average
pointofcare
root mean squared error
rolling regression
type 1 diabetes mellitus
The authors would like to thank Sam Sokolinsky and Shamil Fayzullin from the Johns Hopkins Health System Quality and Clinical Analytics for their assistance with data extraction from the electronic medical record. This study was supported by grant K23DK111986 from the National Institute for Diabetes and Digestive and Kidney Diseases (NM, MSA, and JM).
NM and MSA contributed to study concept and design, data extraction and assessment, statistical analysis, interpretation of data, and drafted the manuscript. ADZ contributed to study concept and design, statistical analysis, interpretation of data, and drafted the manuscript. JM contributed to study design and concept, statistical analysis, and critical revision of the manuscript. All authors reviewed and edited the manuscript.
None declared.