Automated Analysis of Drawing Process to Estimate Global Cognition in Older Adults: Preliminary International Validation on the US and Japan Data Sets

Background With the aging of populations worldwide, early detection of cognitive impairments has become a research and clinical priority, particularly to enable preventive intervention for dementia. Automated analysis of the drawing process has been studied as a promising means for lightweight, self-administered cognitive assessment. However, this approach has not been sufficiently tested for its applicability across populations. Objective The aim of this study was to evaluate the applicability of automated analysis of the drawing process for estimating global cognition in community-dwelling older adults across populations in different nations. Methods We collected drawing data with a digital tablet, along with Montreal Cognitive Assessment (MoCA) scores for assessment of global cognition, from 92 community-dwelling older adults in the United States and Japan. We automatically extracted 6 drawing features that characterize the drawing process in terms of the drawing speed, pauses between drawings, pen pressure, and pen inclinations. We then investigated the association between the drawing features and MoCA scores through correlation and machine learning–based regression analyses. Results We found that, with low MoCA scores, there tended to be higher variability in the drawing speed, a higher pause:drawing duration ratio, and lower variability in the pen’s horizontal inclination in both the US and Japan data sets. A machine learning model that used drawing features to estimate MoCA scores demonstrated its capability to generalize from the US dataset to the Japan dataset (R2=0.35; permutation test, P<.001). Conclusions This study presents initial empirical evidence of the capability of automated analysis of the drawing process as an estimator of global cognition that is applicable across populations. Our results suggest that such automated analysis may enable the development of a practical tool for international use in self-administered, automated cognitive assessment.


Introduction
With the aging of populations worldwide, early detection of cognitive impairments has become a research and clinical priority. In particular, early identification of prodromal dementia is essential for providing secondary prevention and disease-modifying treatments [1][2][3][4]. The cognitive screening tests most commonly used by clinicians are the Mini-Mental State Examination (MMSE) [5] and the Montreal Cognitive Assessment (MoCA) [6]. Both tests are designed to assess global cognition, and validated cutoff scores are used for detecting impairment [7,8]. One limitation of these tests is that they require administration by trained professionals. According to the World Alzheimer Report published in 2021 [1], 83% of clinicians reported that the COVID-19 pandemic has delayed access to cognitive screening tests. Consequently, self-administered, automated assessment may be more important in situations, like the current COVID-19 pandemic, that impose limitations on in-person evaluation in a clinical setting. Another limitation of these tests is related to issues with their use in multilingual populations, such as cross-linguistic artifacts in translation [1,9,10]. Recently, several nonlinguistic cognitive tests have been investigated to overcome the influence of language differences by mitigating the need for translation [11,12]. In sum, there is a clear need to develop a self-administered, automated assessment tool that can be used internationally, which would greatly increase the accessibility of screening in a variety of settings and populations. This would be particularly important for removing barriers to diagnosis and mitigating the gap between countries in the diagnostic coverage-the rate of diagnosis of dementia was estimated to be only 25% worldwide, with less than 10% in low-and middle-income countries [1].
Drawing ability is a promising means for developing such an automated cognitive assessment tool. Drawing tests have been widely used for screening cognitive impairments and dementia (eg, trail making [13] and clock drawing [14]), and automated analysis of the drawing process has shown that features characterizing the drawing process are sensitive to cognitive impairments and diagnoses of dementia [15][16][17][18]. For example, reduction in the drawing speed and increases in its variability, as well as increased pauses between drawing motions, have been reported as statistically significant features for assessment of impaired global cognition [19,20], as well as for detecting Alzheimer disease (AD) and mild cognitive impairment (MCI) [21][22][23][24]. Machine learning models based on these drawing features have succeeded in estimating measures of global cognition [25,26] and classifying AD, MCI, and control individuals [23][24][25]27]. However, there has been little evidence of the capability of automated analysis of the drawing process for assessment of cognitive performance across different populations, even though applicability across the intended populations is a requirement for machine learning-based health care tools, including those for screening of dementia [1,28,29].
In this study, we evaluated the applicability of automated analysis of the drawing process for estimating global cognition in community-dwelling older adults across populations in different nations. Specifically, we collected drawing data with a digital tablet, along with MoCA scores for assessing global cognition, from community-dwelling older adults in the United States and Japan. We then investigated the associations between the MoCA scores and drawing features across the 2 data sets. Finally, we built a machine learning model that used the drawing features to estimate MoCA scores, and we evaluated the model's generalizability from the US data set to the Japan data set.

Ethical Review
The study was approved by the University of California San Diego Human Research Protections Program (HRPP; project number 170466) and the Ethics Committee of the University of Tsukuba Hospital (H29-065). All participants provided written consent to participate in the study after the procedures of the study had been fully explained.

Participants
The participants were community-dwelling older adults recruited in San Diego County, California and in Ibaraki prefecture, Japan. For the US data set, the participants were residents of the independent living sector of a continuing-care senior housing community and were recruited through short presentations using an HRPP-approved script and flyer. For the Japan data set, the participants were individuals recruited through local recruiting agencies or community advertisements in accordance with the approved protocol. Both data sets represented subsets of larger cohort studies [24,30]. The participant selection criteria were as follows: (1) English-speaking (for the United States) or Japanese-speaking (for Japan) individuals ≥65 years old, (2) completion of the MoCA, (3) no known diagnosis of dementia, and (4) no other diseases or disabilities that would interfere with the collection of drawing data. Table 1 summarizes the participants' characteristics. We collected and analyzed drawing data and MoCA scores from a total of 92 community-dwelling older adults in the United States and Japan. The US data set included 55 participants aged 67-98 years (female: 39/55, 71%; age, mean 83.4, SD 6.9 years). The Japan data set included 37 participants aged 65-80 years (female: 19/37, 51%; age: mean 73.3, SD 4.5 years). Regarding the demographics, the proportion of female participants did not differ statistically between the 2 data sets (χ 2 1 =3.63, P=.06), while the age and years of education were higher in the US data set than in the Japan data set (age: t 90 =7.79, P<.001; years of education: t 90 =5.25, P<.001).

Data Analysis
All participants performed the Trail Making Test part B (TMT-B) [13] and MoCA. The TMT-B drawing data were collected using a Wacom Cintiq Pro 16 tablet (sampling rate: 180 Hz; drawing area size: 252 × 186 mm; pen pressure levels: 8192; pen inclination resolution: 1 degree) and custom Windows software that we developed. The software was written in the C# language and was used to capture raw drawing data from the tablet via the Wacom Wintab .NET library (version: 1.2). The raw data consisted of a time series of the pen tip's x-and y-coordinates, the pen pressure, the pen's horizontal and vertical inclinations, and the distance of the pen tip from the drawing surface. All data were captured at the tablet's sampling rate.
The TMT-B was selected as a representative cognitive task that involves drawing motions and is commonly used in clinical practice for screening AD and MCI [31,32]. It requires participants to draw lines that alternately connect a total of 25 numbers and letters in their respective sequences [13]. For the MoCA, we used the original paper-and-pencil version [6] for the US participants and its Japanese version [33] for the Japan participants. The total possible score on the MoCA ranges from 0 to 30, where lower scores indicate lower global cognition. Both TMT-B and the MoCA were administered by neuropsychologists or trained study staff who were blind to the study hypothesis during data collection. The US data set was collected between May 2019 and January 2020. The Japan data set was collected between December 2018 and May 2019.
Next, we extracted drawing features from the drawing data and examined their associations with the MoCA scores. Specifically, we investigated the following 6 automatically extracted drawing features: the drawing speed and its variability, the pressure variability, the variabilities of the pen's horizontal and vertical inclinations, and the pause:drawing duration ratio. These features were selected because they have been reported as significant indicators of changes in cognitive or motor functions [15,16,24,34]. The drawing speed represented the speed of the pen tip on the surface during drawing motions. The drawing speed variability was calculated using the coefficient of variation to remove the influence of the absolute value, as the drawing speed itself was also a feature. For the pressure variability, we used the median absolute deviation, which is more robust against outliers than the standard deviation. In contrast, the variabilities of the pen's horizontal and vertical inclinations were calculated using standard deviations. The pause:drawing duration ratio was defined as the ratio of the total duration of pauses between drawing motions (ie, between strokes and within a stroke) and the total duration of drawing motions on the surface. Pauses within a stroke were detected when the pen tip remained inside a 0.25-mm radius on the drawing surface for more than 100 milliseconds.
To investigate the associations of each drawing feature with the MoCA scores, Pearson correlation coefficients were computed after controlling for the age, sex, and years of education for the entire data set and for the US and Japan data sets separately. The 3 sociodemographic variables were considered as covariates, because they have been suggested to affect performance on cognitive screening tests, including the MoCA [35]. The following Python 3.8 libraries were used for the correlation analysis: pandas (version 1.2.4), NumPy (version 1.20.1), SciPy (version 1.6.2), and pingouin (version 0.4.0).
We also developed a supervised machine learning model that used drawing features to estimate MoCA scores, and we then evaluated the model's applicability across data sets. The analysis workflow is illustrated in Figure 1A. Specifically, the model was trained on the US data set and tested on the Japan data set. For the machine learning model, we used the random forest algorithm to capture nonlinear relationships, given that nonlinear interactions between drawing features and cognitive impairments were observed in previous studies [23,24]. The random forest hyperparameters in this study were as follows: search range of 2, 3, and 4 for the maximum tree depth; 2, 3, 4, and 6 for the maximum number of features; 1.0, 0.75, and 0.5 for the proportion of the maximum number of samples to train each base regressor; and 2, 3, 4, and 5 for the minimum number of samples required at a leaf node. The number of trees was set to 500, and all other parameters were kept at their default values. The hyperparameters were tuned through 10-fold cross-validation within the training data set. We statistically evaluated the observed performance through permutation testing (1000 iterations) by randomizing the MoCA scores. To better interpret the results, the importance of each feature in the resultant model was also evaluated using the Shapley Additive Explanations (SHAP) method [36]. Specifically, we compared the mean absolute SHAP values of each feature. The following Python 3.8 libraries were used to perform the machine learning analysis: scikit-learn (version 0.23.2) and SHAP (version 0.40.0).

Results
The mean MoCA score was 24.4 (SD 3.0; range for participants: 16-30; possible range: 0-30), and the scores did not differ statistically between the 2 data sets (t 90 = 0.02, P=.99; Table 1). For the collection of drawing data, each session took an average of 119.7 (SD 64.6) seconds per participant. The mean TMT-B time and number of errors were 117.9 (SD 61.7) seconds and 1.4 (SD 2.2), respectively. The TMT-B time was longer in the US data set (t 88 =2.72, P=.008), while the number of errors did not differ statistically between the 2 data sets (t 88 =1.82, P=.07). Two participants (US: 1; Japan: 1) could not complete the TMT-B trial. To include them in the analysis, we used features extracted from their partial drawing data.
For the correlation analysis between the MoCA scores and each drawing feature in the entire data set, we found that 4 of the 6 features were significantly associated after controlling for age, sex, and years of education (absolute Pearson r=0.33-0.49, P≤.002; see Figure 1B for a correlation example and Table 2 for the full list). With lower MoCA scores, there tended to be higher variability in the drawing speed and pen pressure, a higher pause:drawing duration ratio, and lower variability in the pen's horizontal inclination. As listed in Table 2, these tendencies were also observed when the 2 data sets were each analyzed separately. After correction for multiple comparisons, all the statistically significant correlations remained for the entire data set and the Japan data set (Benjamini-Hochberg adjusted P<.05), whereas those for the US data set lost significance (Benjamini-Hochberg adjusted P>.05).
The random forest model trained on the US data set could estimate MoCA scores from drawing features for the Japan data set with an R 2 of 0.35 (Pearson r of 0.61, mean absolute error of 1.75, and root-mean-square error of 2.12; permutation test, P<.001; Figure 1C). Regarding the importance of each feature in the model, as indicated by the SHAP values, the variability of the pen's horizontal inclination had the highest importance, followed by the pressure variability and the drawing speed variability ( Figure 1D).

Principal Findings
We collected drawing data from 92 community-dwelling older adults in the United States and Japan, and we investigated the associations between features characterizing the drawing process and global cognition as assessed by MoCA. We obtained 2 main findings, as follows. First, we found drawing features that showed consistent trends with respect to the changes in MoCA scores across the US and Japan data sets. Specifically, with low MoCA scores, there tended to be higher variability in the drawing speed, a higher pause:drawing duration ratio, and lower variability in the pen's horizontal inclination. Our second finding was that the automated machine learning model trained on the drawing data in the US data set could estimate the MoCA scores for the Japan data set with an R 2 of 0.35, particularly by leveraging variability-related features. We used drawing data from the TMT-B task in this study, but other types of drawing tasks may have a similar capability. For example, a previous study showed that MoCA scores could be estimated by using pause-and speed-based features from a clock drawing task [26], although the method's applicability across populations was not evaluated. The use of 2 or more tasks will be a promising area of future research for more reliable estimation of global cognition.
Regarding the correlations of drawing features with MoCA scores across the US and Japan data sets, the correlations persisted even after controlling for age, sex, and years of education. In post hoc power analysis, the power exceeded 0.90 with a significance level of .05 (2-sided). The trends were consistent with those observed in previous studies with individuals with impaired global cognition [19,20] or patients with AD or MCI [21][22][23][24]. One of our contributions lies in demonstrating consistent trends between drawing features and clinical cognitive scores across 2 different populations by using the same protocol. It is especially notable that the pause:drawing duration ratio and the drawing speed variability have been reported as representative features for use in AD or MCI screening models based on automated analysis of the drawing process [23,24]. To our knowledge, the models in those previous studies were not tested for their applicability across different populations, but our results suggest that these drawing features may help with the application of screening models across populations for international use.
We have presented preliminary evidence suggesting that automated analysis of the drawing process for estimation of global cognition can be applied across populations. We trained the machine learning model on drawing data in the US data set, and we then evaluated its performance on unseen drawing data in the Japan data set. In this context, the model could estimate MoCA scores with an R 2 of 0.35 (Pearson r of 0.61 and root-mean-square error of 2.12). Previous studies investigated models that used a single data set to estimate global cognition from the characteristics of drawing or other types of behaviors such as speech. The performance results for those models included a Pearson correlation coefficient of 0.55 for MoCA on a model using drawing features [26] and a root-mean-square error of 3.74 for MMSE on the best model using speech features in a competition [37]. Our model outperformed those recent results, although there are notable methodological differences in terms of the evaluation method and the sample size, for example. Our model's improved performance might have derived from the use of variability-related features, given that they were ranked as the most important features in our model. Variability-related features in drawing have recently been suggested as a potential marker for motor control deterioration in dementia [19,38,39], but they have rarely been used for estimating cognitive function, and they have not been tested across populations. Our results thus suggest that variability-related features in drawing may be a key behavioral marker for automatic assessment of global cognition across different populations.
With the aging of populations worldwide, there is a growing interest in using digital technology to assess cognitive function in nonclinical settings like the home for early detection of dementia [1]. Examples of such research include approaches using computerized cognitive tests [29,[40][41][42] and using behavioral data such as drawing, speech, and gait data [24,32,[43][44][45]. In either approach, a major challenge is to make the tool suitable for multinational and multilingual populations [1]. In this context, our results suggest that automated analysis of the drawing process may offer a promising approach for developing such a tool for international use.
Furthermore, the approach using behavioral data is expected to support future efforts toward the development of continuous, passive monitoring tools for early detection of dementia from data that can be collected in everyday life [43,45]. For example, multiple studies have demonstrated the feasibility of detecting cognitive impairments by using daily walking behavior collected from accelerometer sensors in a free-living setting [46][47][48] and by using daily conversational speech data [49][50][51][52]. To our knowledge, no study has investigated the associations of cognitive impairments with daily drawing data that are collected passively in a free-living setting. However, drawing may be a promising behavioral modality for reliable estimation of cognitive impairments: It is a common activity in everyday life, and drawing data can be easily and robustly collected with a commercial-grade device.
Regarding the device used for drawing data collection, previous studies have shown the usefulness of a range of devices, including a mobile tablet with a stylus [53][54][55][56][57], a smart pad [58], and a digital pen [23,26,38]; accordingly, our findings may be applicable to those devices as well. All such devices commonly allow capture of x-and y-coordinates and pressure data at similar sampling rates, and previous studies reported similar associations of pause-, speed-, and pressure-based features with cognitive measures. In a future study, as pen inclination data are not always available, we will need to examine whether a combination of other available data can achieve performance comparable to that of our model. Furthermore, the variability of the device placement (eg, holding the tablet with the nondominant hand) can affect the drawing performance in free-living settings. We will thus need further research in situ for the development of realistic applications.

Limitations
This study had several limitations. First, it was limited in terms of the numbers of participants, drawing tasks, and data sets.
Our findings were based on drawing data from a single task, and the applicability to other types of drawing data thus remains unexplored. In addition, the international applicability of our model was only evaluated between 2 data sets, and the details of how the model performance is influenced by cultural differences have not been thoroughly investigated. Together, our findings have yet to be confirmed with larger samples that provide cross-cultural insights. Second, we did not investigate the participants' sensory and physical functions (eg, eyesight, grip strength), even though those functions might affect drawing performance. Moreover, other residual confounders might exist. Third, the drawing data were collected in a laboratory setting with a tester; accordingly, a future study will need to establish the validity of fully self-administered tasks. Finally, further research will also be needed to obtain a mechanistic understanding of how drawing features relate to the neural changes underlying cognitive impairments.

Conclusions
In summary, we have presented empirical evidence of the capability of automated analysis of the drawing process as an estimator of global cognition that is applicable across populations. Although no causality could be inferred from our results with cross-sectional data, the results nevertheless suggest that automated analysis of the drawing process could be a practical tool for international use in automated cognitive assessment. Consequently, this approach may help lower the barrier to early detection of cognitive impairments in a variety of settings and populations.