Background: Although hyperactivity is a core symptom of attention-deficit/hyperactivity disorder (ADHD), there are no objective measures that are widely used in clinical settings.
Objective: We describe the development of a smartwatch app to measure hyperactivity in school-age children. The LemurDx prototype is a software system for smartwatches that uses wearable sensor technology and machine learning to measure hyperactivity. The goal is to differentiate children with ADHD combined presentation (a combination of inattentive and hyperactive/impulsive presentations) or predominantly hyperactive/impulsive presentation from children with typical levels of activity.
Methods: In this pilot study, we recruited 30 children, aged 6 to 11 years, to wear a smartwatch with the LemurDx app for 2 days. Parents also provided activity labels for 30-minute intervals to help train the algorithm. Half of the participants had ADHD combined presentation or predominantly hyperactive/impulsive presentation (n=15), and half were in the healthy control group (n=15).
Results: The results indicated high usability scores and an overall diagnostic accuracy of 0.89 (sensitivity=0.93; specificity=0.86) when the motion sensor output was paired with the activity labels.
Conclusions: State-of-the-art sensors and machine learning may provide a promising avenue for the objective measurement of hyperactivity.
Attention-Deficit/Hyperactivity Disorder and the Need for Objective Measurement of Hyperactivity
ADHD is the most common neurodevelopmental disorder of early childhood, affecting over 5% of American children . There are 3 presentations of ADHD: (1) predominantly inattentive presentation, (2) predominantly hyperactive/impulsive presentation, and (3) combined presentation. In school-age children, ADHD predominantly hyperactive/impulsive presentation and combined presentation make up 55% of all ADHD cases [ ]. Although there are objective assessment tools such as the Conners Continuous Performance Test 3rd Edition (Conners CPT 3) to measure inattention (the core symptom of the predominantly inattentive presentation of ADHD), there are no comparable objective assessment tools to measure hyperactivity (the core symptom of ADHD predominantly hyperactive/impulsive presentation). Instead, the current standard measurement for hyperactivity consists of subjective reports via questionnaires from parents and teachers, such as the Vanderbilt Assessment Scales. Reliance on subjective questionnaires to measure hyperactivity is a significant public health concern as it causes misdiagnosis including overdiagnosis and underdiagnosis [ - ]. Overdiagnosis can lead to unnecessary treatment, while underdiagnosis can lead to delayed treatment [ , ].
Sensor Technology and Machine Learning
Advances in sensor technology and machine learning provide opportunities to develop new methods of diagnoses with enhanced objectivity and precision. Wearable technologies (eg, smartwatches) with state-of-the-art sensors are practical, cost-effective solutions for providing objective measures of hyperactivity in children. The wide array of sensors (eg, accelerometer) embedded in wearable technology offer new opportunities to develop objective and accurate measures of hyperactivity. Although actigraphy has been around for decades and extensively used in research settings, its use has largely been confined to sleep studies . Actigraphy has been used in a limited number of studies on children with ADHD, typically to measure sleep duration [ - ], rather than quantify daytime levels of hyperactivity to aid in diagnosis.
Multilevel Classification to Determine the Context of Symptoms
Context is critical in correctly diagnosing hyperactivity. For example, symptoms of ADHD include “often leaves seat in classroom or in other situations in which remaining seated is expected” and “often runs about or climbs excessively in situations in which it is inappropriate” . While running and climbing in the playground do not contribute to a diagnosis of ADHD, running and climbing in a classroom does. Machine learning can be applied to sensor data to establish the context in which hyperactivity is present. Context is a combination of activity and situation. To assess context, we have developed a multilevel classification approach that first classifies the wearer’s activity, then contextualizes the level of motion, and finally evaluates activity level based on that context. The method analyzes hand motion to detect various activities; it collects the relationship between the wearer’s condition, activity, and magnitude of motion through accelerometer, time, and location data. Although it is not possible to have a class for every activity a user might perform (eg, fidgeting or other nonpurposeful motion), LemurDx classifies activities into common categories as a first layer of classification that is sufficient to condition algorithm parameters.
We describe the development of a smartwatch app to measure hyperactivity in school-age children. The LemurDx prototype is a software system for smartwatches that uses built-in sensors and machine learning to measure hyperactivity, with the goal of differentiating children with ADHD combined presentation or predominantly hyperactive/impulsive presentation from children with typical levels of activity. In this pilot study, we used LemurDx and supervised machine learning models paired with activity data from 30 children to develop initial classification algorithms. We report on usability scores from the LemurDx prototype and accuracy results from the initial algorithms.
This pilot study tested the feasibility of collecting, storing, and analyzing motion, as well as contextual (ie, GPS, heart rate, Bluetooth) data, from children aged 6 to 11 years who wore an Apple smartwatch with the LemurDx app for 2 days. The data from the days when the participants with ADHD were unmedicated combined with contextual data extracted from the smartwatch sensors, as well as activity labels were included in the final analyses.
The project was approved by the Institutional Review Board (19040006) at the University of Pittsburgh.
Participants were recruited via a web-based research registry called Pitt + Me, through the University of Pittsburgh’s Clinical and Translational Science Institute program. The research staff subsequently contacted interested participants via phone to complete the eligibility screening. The sample consisted of 30 children aged 6 to 11 years (ADHD combined presentation or hyperactive presentation=15; non-ADHD=15) and their families. Inclusion criteria for the ADHD sample included a formal diagnosis of ADHD combined presentation or hyperactive presentation, which was confirmed using the ADHD module of the Kiddie Schedule for Affective Disorders and Schizophrenia Present and Lifetime Version (K-SADS-PL) diagnostic interview and a score of ≥10 on the hyperactivity items of the Vanderbilt Assessment Scale–Parent report (VAS-P). Exclusion criteria included serious child psychopathology requiring alternative treatment (eg, bipolar disorder, major depressive disorder, psychosis, autism spectrum disorder).
VAS-P  is a 47-item survey that scores instances of behavior based on frequency of occurrence. Scoring is broken down into the following subtypes: inattentive, hyperactive/impulsive, or combined types. For the purposes of the study, only the 5 hyperactive subtype questions were asked in order to determine the level of the child’s hyperactive behavior. Symptoms are rated on a Likert scale from 0 to 3.
K-SADS-PL Diagnostic Interview
The K-SADS-PL  is a semistructured diagnostic interview designed to assess current and past episodes of psychopathology in children and adolescents according to Diagnostic and Statistical Manual criteria. Probes and objective criteria are provided to rate individual symptoms. For the purposes of the study, the research staff only asked about the items related to the ADHD diagnostic criteria located in the ADHD supplement.
Post-Study System Usability Questionnaire
The Post-Study System Usability Questionnaire (PSSUQ)  is a 19-item survey that assesses 3 factors: system usefulness, information quality, and interface quality. The survey uses a 7-point Likert scale in which participants indicate the degree to which they agree or disagree with each item. Lower scores indicate higher levels of agreement, while higher scores indicate lower levels of agreement.
Parents also provided activity labels to help train the algorithm. The activities were logged at a 30-minute resolution. For each 30-minute increment of time, from 6 Am to midnight, the parent had a drop-down menu of 5 activity classifications that best summarized the child’s activities over the course of the 30-minute block. The activity categories were as follows: sleeping (eg, napping, sleeping); sitting/quiet activity (eg, watching TV, reading a book, using the internet, etc); everyday/household activity (eg, taking a walk, cleaning room, going shopping, playing an instrument, etc); exercise (eg, playing a sport, running, playing in the playground); and not wearing the watch. The activity labels created an additional level of qualification to the motion data collected. The purpose was to be able to later use the deidentified activity labels data to parallel the sensor data, to check the fidelity of the sensor data, and qualify outliers found in the sensor data.
After obtaining informed consent from the parent and assent from the child, the participants wore an Apple smartwatch with the LemurDx app running on it for 2 days. One arm of the study (n=15) consisted of children diagnosed with ADHD predominantly hyperactive/impulsive presentation or combined presentation, confirmed using the K-SADS-PL. The children wore the smartwatch for at least 1 day when they did not take their medication (eg, Saturday, Sunday, medication holidays), given that properly titrated stimulant medication reduces hyperactivity. The control arm (n=15) included children without an ADHD diagnosis, confirmed using the K-SADS-PL. The parents also provided activity labels via an automated remote assessment.
Data Processing and Analyses
The data processing and analysis pipeline consisted of three main steps: (1) feature extraction to calculate a set of motion and behavioral features over different time periods, (2) feature selection to identify a set of useful features and reduce the dimensionality, and (3) modeling the final set of features () to identify children with ADHD using a supervised machine learning approach.
We computed 3 sets of features from the motion data collected on the watch. The first set included information about the shape of the motion curves over time and included features such as skewness and kurtosis. These features allowed us to identify the type of motion the children were making. The second set of features included statistical summaries of the motion data and included features such as mean, variance, median, magnitude, and hour quantiles of the observed motion. These features tend to capture both the amount of and the changes in motion. The third set of features included the cumulative motion recorded by the watch. This feature captured the total amount of motion exhibited over a time window. We calculated all 3 sets of features for 3 axes of the acceleration data and over 3 time windows, that is 1, 5, and 10 minutes. Apart from calculating these features over the course of the whole day, we also divided the features into times of the day when the children were performing specific activities as recorded in the activity labels. We used features from 3 activity classes: sitting/quiet, exercise, and everyday/household activity. Next, we handled the missing features due to missing sensor data. We occasionally missed sensor data due to technical issues with the app or watch, or compliance and human factors issues (eg, the family forgot to charge the watch). We imputed all the missing features with a value of –1.
We used the randomized logistic regression (RLR) method to select an optimal set of features before classifying the data. RLR randomly subsamples the data and calculates feature importance based on their performance in a classification task, using logistic regression . This approach usually leads to a stable and reproducible set of selected features. We selected the top 20 features outputted from RLR implementation of scikit-learn.
We used Python’s scikit-learn library (Python Software Foundation) for the model building and for all analyses. We tried 3 types of learning algorithms: random forests, support vector machines (SVMs), and logistic regression. We chose these algorithms for their ability to generalize, capture inherent nonlinearity in the data (using specific kernels in case of SVMs), and their ability to model noisy data. Our analyses showed that random forests (with 2000 decision stumps or estimators) gave the best performance. Analyses were performed with leave-one-participant-out cross-validation to ensure that the models did not overfit. This approach builds a separate model for each participant in the validation phase and ensures that no participant’s data are shared between training and testing.
Key demographic variables and hyperactivity scores are summarized in.
|Characteristics||All participants (N=30)|
|ADHDa (n=15), n (%)||Non-ADHD (n=15), n (%)|
|Age (year), mean (SD)||9.6 (1.6)||10.1 (1.8)|
|Gender, n (%)|
|Female||6 (40)||9 (60)|
|Male||8 (53)||6 (40)|
|Other||1 (6.7)||0 (0)|
|Race, n (%)|
|White||11 (73.3)||14 (93.3)|
|Black or African American||1 (6.7)||1 (6.7)|
|More than one race||1 (6.7)||0 (0)|
|Chose not to answer||2 (13.3)||0 (0)|
|VAS-Pb hyperactivity scores, mean (SD)||11.5 (2.2)||1.9 (1.7)|
aADHD: attention-deficit/hyperactivity disorder.
bVAS-P: Vanderbilt Assessment Scale-Parent report.
Sensor data from the LemurDx app were successfully collected for 28 of the 30 child participants. A total of 2 participants accidentally turned off location recording in the settings of the watch. For 3 other participants, the watch failed to record heart rate. Also, 5 participants (3 in the ADHD group and 2 in the control group) had trouble recording additional sensor data. We theorized that this failure was due to a loose fit of the watch on the children’s small wrists. The overall PSSUQ usability scores were high (mean 1.81, SD 0.93) as were all the other subscale scores including usefulness (mean 1.81, SD 1.13), information quality (mean 1.75, SD 0.89), and interface quality (mean 1.92, SD 1.16). There were 3 common themes among the qualitative survey results: challenges with the app’s interface, low battery life, and participants who enjoyed using the app. Fewer than 20% of the participants had some trouble with the watch’s interface. Qualitative feedback is summarized in.
|Theme||Example quotes||Frequency, n (%)|
|Challenges with the interface||5 (17)|
|Low battery life||4 (13)|
|Enjoyed the app||3 (10)|
The top 20 motion features extracted from motion sensor data are summarized in.
Alone, the model was no better than chance at differentiating between ADHD and non-ADHD children (accuracy=0.46; X21=0.14, P=.70). When the motion sensor data was paired with the contextual sensors (ie, GPS, heart rate, Bluetooth), the model performance improved significantly (accuracy=0.71) and could differentiate between ADHD and non-ADHD children at better than chance level (X21=5.25, P=.02). Model performance was best when the sensor data was paired with the activity labels (accuracy=0.89) and could reliably differentiate between ADHD and non-ADHD children (X21=17.37, P<.001). Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for each model are summarized in. Finally, our analyses showed ( ) that the magnitude of motion when the child was expected to “sit quietly” was the biggest differentiator between ADHD and non-ADHD children, consistent with a clinical profile of hyperactivity.
|Number||Motion feature||Axis||Time interval|
|1||Cumulative variance||X-axis||10 minutes|
|2||Cumulative mean||X-axis||1 minute|
|3||Cumulative mean||X-axis||5 minutes|
|4||Cumulative mean||Y-axis||10 minutes|
|5||Cumulative variance||Z-axis||1 minute|
|6||Cumulative mean||All 3 axes||10 minutes|
|7||Cumulative variance||All 3 axes||5 minutes|
|8||Mean motion||X-axis||10 minutes|
|Motion sensors plus activity labels||0.89||0.93||0.86||0.87||0.92|
|Motion sensors plus contextual sensorsa||0.71||0.79||0.64||0.69||0.75|
|Motion sensors alone||0.46||0.50||0.43||0.47||0.46|
aContextual sensors included GPS, heart rate, and Bluetooth.
This study examined the LemurDx smartwatch app prototype among a sample of 30 children. Usability scores were high, pointing to the potential clinical utility of this approach to provide an objective measure of hyperactivity. However, qualitative feedback pointed to some issues with the interface and battery life, indicating that further development is needed in these areas. Despite these limitations, the app performed well enough to collect usable sensor data from 93% of the sample and successfully classify children with high accuracy.
As expected based on past actigraphy studies, motion data alone were a poor classifier of hyperactivity. Using motion sensors alone, model performance was no better than chance level at differentiating children with ADHD (hyperactive or combined presentations) from the ones in the healthy control group. Accuracy improved significantly when contextual information and activity labels were added to the models. These results suggest that contextual information is important when using sensor motion data to make inferences about the presence of hyperactivity.
These promising results point to the value of further research on contextualizing motion data for clinical purposes. Using the range of sensors provided in modern smartwatches could allow us to further refine the machine learning algorithms. These results would likely yield increases in accuracy based on variables specific to each child. The LemurDx app also has the potential to provide an objective measure of response to stimulant medication, thereby providing clinicians with objective data based on which medication titration decisions could be made. Overall, the data from this study support the further refinement of the LemurDx app and algorithms in order to provide an objective measure of hyperactivity to supplement the subjective parent and teacher questionnaires.
A limitation of this study was the absence of children with borderline levels of hyperactivity. Only children who met the specific cutoff numbers on the Vanderbilt’s hyperactivity screen were recruited. Children had to score an 8 or above to meet the ADHD condition, while a score of 5 or below was needed for the control condition. Recruiting children who scored an 8 or higher but still met the control condition would help with diversifying the data. Another limitation of this pilot study was the small sample size, as only 30 families were included in the study sample, with usable motion data from 28 children. The sample size, however, was sufficient for this preliminary pilot work. A larger sample in the future will allow for stronger indicators of context, better visualization tools for clinicians, and more precise machine learning models.
This study was supported by a grant to OL, MG, and SS from the National Institute of Mental Health (R41MH119644). JLH is now at the University of Iowa.
Conflicts of Interest
- Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition. Washington, DC: American Psychiatric Association; 2013.
- Willcutt EG. The prevalence of DSM-IV attention-deficit/hyperactivity disorder: a meta-analytic review. Neurotherapeutics 2012 Jul 15;9(3):490-499 [FREE Full text] [CrossRef] [Medline]
- Ford-Jones P. Misdiagnosis of attention deficit hyperactivity disorder: 'Normal behaviour' and relative maturity. Paediatr Child Health 2015 May;20(4):200-202 [FREE Full text] [CrossRef] [Medline]
- Watson GL, Arcona AP, Antonuccio DO, Healy D. Shooting the messenger: the case of ADHD. J Contemp Psychother 2014;44:43-52 [FREE Full text] [CrossRef] [Medline]
- Layton TJ, Barnett ML, Hicks TR, Jena AB. Attention deficit–hyperactivity disorder and month of school enrollment. N Engl J Med 2018 Nov 29;379(22):2122-2130. [CrossRef]
- Batstra L, Nieweg E, Pijl S, Van Tol DG, Hadders-Algra M. Childhood ADHD: a stepped diagnosis approach. J Psychiatr Pract 2014 May;20(3):169-177. [CrossRef] [Medline]
- Manos MJ, Giuliano K, Geyer E. ADHD: overdiagnosed and overtreated, or misdiagnosed and mistreated? Cleve Clin J Med 2017 Nov 01;84(11):873-880 [FREE Full text] [CrossRef] [Medline]
- Sadeh A. The role and validity of actigraphy in sleep medicine: an update. Sleep Med Rev 2011 Aug;15(4):259-267. [CrossRef] [Medline]
- Paavonen E, Räikkönen K, Lahti J, Komsi N, Heinonen K, Pesonen AK, et al. Short sleep duration and behavioral symptoms of attention-deficit/hyperactivity disorder in healthy 7- to 8-year-old children. Pediatrics 2009 May;123(5):857-864. [CrossRef] [Medline]
- Cortese S, Farone SV, Konofal E, Lecendreux M. Sleep in children with attention-deficit/hyperactivity disorder. J Am Acad Child Adolesc Psychiatry 2009;48(9):894-908. [CrossRef]
- Cohen-Zion M, Ancoli-Israel S. Sleep in children with attention-deficit hyperactivity disorder (ADHD): a review of naturalistic and stimulant intervention studies. Sleep Med Rev 2004 Oct;8(5):379-402. [CrossRef] [Medline]
- Wolraich M, Lambert W, Doffing M, Bickman L, Simmons T, Worley K. Psychometric properties of the Vanderbilt ADHD diagnostic parent rating scale in a referred population. J Pediatr Psychol 2003 Dec;28(8):559-567. [CrossRef]
- Kaufman J, Birmaher B, Brent D, Rao U, Flynn C, Moreci P, et al. Schedule for affective disorders and Schizophrenia for school-age children-present and lifetime version (K-SADS-PL): initial reliability and validity data. J Am Acad Child Adolesc Psychiatry 1997 Jul;36(7):980-988 [FREE Full text] [CrossRef] [Medline]
- Lewis JR. Psychometric evaluation of the PSSUQ using data from five years of usability studies. Int. J. Hum.-Comput. Interact 2002 Sep;14(3-4):463-488. [CrossRef]
- Meinshausen N, Bühlmann P. Stability selection. J R Stat Soc Series B Methodol 2010 Aug;72(4):417-473. [CrossRef]
|ADHD: attention-deficit/hyperactivity disorder|
|K-SADS-PL: Kiddie Schedule for Affective Disorders and Schizophrenia Present and Lifetime Version|
|PSSUQ: Post-Study System Usability Questionnaire|
|RLR: randomized logistic regression|
|SVM: support vector machine|
|VAS-P: Vanderbilt Assessment Scale-Parent report|
Edited by A Mavragani; submitted 17.12.21; peer-reviewed by J Haavik, E Vanegas; comments to author 31.01.22; revised version received 11.02.22; accepted 19.02.22; published 25.04.22Copyright
©Oliver Lindhiem, Mayank Goel, Sam Shaaban, Kristie J Mak, Prerna Chikersal, Jamie Feldman, Jordan L Harris. Originally published in JMIR Formative Research (https://formative.jmir.org), 25.04.2022.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.