Published on in Vol 5, No 12 (2021): December

Preprints (earlier versions) of this paper are available at, first published .
Test-Retest Reliability of Home-Based Fitness Assessments Using a Mobile App (R Plus Health) in Healthy Adults: Prospective Quantitative Study

Test-Retest Reliability of Home-Based Fitness Assessments Using a Mobile App (R Plus Health) in Healthy Adults: Prospective Quantitative Study

Test-Retest Reliability of Home-Based Fitness Assessments Using a Mobile App (R Plus Health) in Healthy Adults: Prospective Quantitative Study

Authors of this article:

I-I Lin 1 Author Orcid Image ;   You-Lin Chen 1 Author Orcid Image ;   Li-Ling Chuang 2, 3, 4 Author Orcid Image

Original Paper

1Recovery Plus Inc, Chengdu, China

2School of Physical Therapy & Graduate Institute of Rehabilitation Science, College of Medicine, Chang Gung University, Taoyuan, Taiwan

3Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan

4Healthy Aging Research Center, Chang Gung University, Taoyuan, Taiwan

Corresponding Author:

Li-Ling Chuang, PhD

School of Physical Therapy & Graduate Institute of Rehabilitation Science, College of Medicine, Chang Gung University

No. 259 Wen-hua 1st Rd, Guishan Dist

Taoyuan, 33302


Phone: 886 3 2118800 ext 3177

Fax:886 3 2118700


Background: Poor physical fitness has a negative impact on overall health status. An increasing number of health-related mobile apps have emerged to reduce the burden of medical care and the inconvenience of long-distance travel. However, few studies have been conducted on home-based fitness tests using apps. Insufficient monitoring of physiological signals during fitness assessments have been noted. Therefore, we developed R Plus Health, a digital health app that incorporates all the components of a fitness assessment with concomitant physiological signal monitoring.

Objective: The aim of this study is to investigate the test-retest reliability of home-based fitness assessments using the R Plus Health app in healthy adults.

Methods: A total of 31 healthy young adults self-executed 2 fitness assessments using the R Plus Health app, with a 2- to 3-day interval between assessments. The fitness assessments included cardiorespiratory endurance, strength, flexibility, mobility, and balance tests. The intraclass correlation coefficient was computed as a measure of the relative reliability of the fitness assessments and determined their consistency. The SE of measurement, smallest real difference at a 90% CI, and Bland–Altman analyses were used to assess agreement, sensitivity to real change, and systematic bias detection, respectively.

Results: The relative reliability of the fitness assessments using R Plus Health was moderate to good (intraclass correlation coefficient 0.8-0.99 for raw scores, 0.69-0.99 for converted scores). The SE of measurement and smallest real difference at a 90% CI were 1.44-6.91 and 3.36-16.11, respectively, in all fitness assessments. The 95% CI of the mean difference indicated no significant systematic error between the assessments for the strength and balance tests. The Bland–Altman analyses revealed no significant systematic bias between the assessments for all tests, with a few outliers. The Bland–Altman plots illustrated narrow limits of agreement for upper extremity strength, abdominal strength, and right leg stance tests, indicating good agreement between the 2 assessments.

Conclusions: Home-based fitness assessments using the R Plus Health app were reliable and feasible in young, healthy adults. The results of the fitness assessments can offer a comprehensive understanding of general health status and help prescribe safe and suitable exercise training regimens. In future work, the app will be tested in different populations (eg, patients with chronic diseases or users with poor fitness), and the results will be compared with clinical test results.

Trial Registration: Chinese Clinical Trial Registry ChiCTR2000030905;

JMIR Form Res 2021;5(12):e28040




Physical fitness plays an important role in overall health and quality of life and is directly related to physical activity [1]. Regular physical activity confers health benefits, such as increased life expectancy and reduced mortality [2,3]. However, the World Health Organization has reported that >80% of adolescents globally do not engage in sufficient physical activity. The prevalence of insufficient physical activity was 27.5% among adults aged >18 years worldwide [4]. Studies have indicated that physical inactivity is associated with poor physical fitness and increases not only the incidence and mortality rates of chronic disease, but also the medical and economic burden of disease [5-7]. Physical fitness has various degrees of influence on the activities and quality of life [1,8]. Poor physical fitness (below the 25th percentile of the fitness distribution) has a much greater impact on the risk of cardiovascular disease than insufficient physical activity [9]. Therefore, physical fitness needs to be considered as a fundamental assessment for people with a higher risk of chronic diseases.

Several physical fitness assessment methods have been established for reliability and validity. Physical fitness measures typically consist of cardiorespiratory fitness, muscle strength, endurance, agility, flexibility, and measures of body composition [1,10,11]. The 3-minute step test is one of the common cardiorespiratory fitness tests, consisting of stepping up and down a height of 23.0 cm-50.8 cm at a consistent step rate [12]. The 3-minute step test was shown to be reliable and valid in the general population and in patients with lung disease and rheumatoid arthritis [12-17]. Sufficient muscle power and endurance can reduce the risk of exercise injury and enhance cardiorespiratory capacity [18-20]. Wall squatting, push-up, and curl-up tests are common strength tests for the lower and upper extremities and abdominal muscles with established validity and reliability [11,21-23]. Balance and flexibility are important because poor stability may increase the risk of falls and limit functional activities [24-27]. Insufficient flexibility and mobility may restrict movement and cause pain [28-30]. Balance tests, the toe-touch test, the sit-and-reach test, and the Apley shoulder scratch test are common tests to assess balance, flexibility, and mobility [26,31-33]. However, most of these fitness tests are administered by a professional face to face, so patients or clients need to be present at a clinic or gym.

In consideration of cost and travel barriers, self-administered and home-based fitness tests may be more suitable for many people. However, there is currently little research on home-based fitness tests. One study showed that the home-based Senior Fitness Test, using inertia sensors and a depth camera, led to greater leg or arm strength, aerobic endurance, and flexibility [29]. The InterWalk Fitness Test incorporates indirect calorimetry and acceleration monitoring and was found to be accurate and reliable for persons with type 2 diabetes [34]. The self-administered Canadian Home Fitness Test was developed to assess cardiorespiratory endurance with a double 8-inch step and has an established record of safety and predictive ability [35-37]. Additional reliable home-based fitness tests that are easy to use and record data on accessible software platforms are needed.

As mobile technologies have advanced, an increasing number of health-related apps have emerged [38]. Some health apps provide patient education about lifestyle and health behaviors, some provide pain management, and others provide physical fitness assessments or interventions [34,38-40]. Among commercial fitness apps, most focus on cardiorespiratory fitness assessments, such as the submaximal walking data collected by a smartphone’s accelerometer [40]. Some apps focus on functional performance, such as movement speed or leg strength during functional activities [41]. However, most commercial fitness apps lack supporting evidence [40]. Only a few fitness apps have been tested for validity and reliability, and most are rated as having moderate to good validity [34,42-44]. Insufficient monitoring of physiological signals (eg, heart rate) during cardiorespiratory fitness assessment was noted among the available apps [40]. Therefore, we designed R Plus Health (Recovery Plus Inc), a digital health app that incorporates all the usual components of a fitness assessment but also monitors physiological signals.


The aim of this study is to investigate the test-retest reliability of home-based fitness assessments using the R Plus Health app in healthy young adults.


A total of 31 participants were recruited with convenience sampling from 4 departments of a technology company in Chengdu, China. Sampling was performed via random draw. The inclusion criteria were healthy adults with normal health examinations, aged between 18 and 75 years, and with the ability to use smartphones. Those who rated more than 3 out of 10 on the visual analog pain scale; had poor compliance or were not willing to cooperate with the assessment; had regular strengthening sessions during the study period; had a history of alcohol abuse or illegal drug use; were pregnant, lactating, or trying to become pregnant; had participated in other clinical trials within 3 months before this study; and had uncontrolled chronic diseases were excluded. The participants received oral and written information about the study, and informed consent was obtained from all participants. This study was approved by the Chinese Ethics Committee of Registering Clinical Trials (ChiCTR2000030905).

R Plus Health App

The R Plus Health app was developed as a tool for healthy adults and patients with chronic diseases. It includes fitness assessments and individualized exercise prescriptions with physiological signal monitors (eg, heart rate monitor). After downloading the R Plus Health app, the participants received an informed safety declaration and completed a health questionnaire, which was checked by doctors or other professional health care providers on the web. Through oral and video guidance, the participants were then instructed on how to perform the fitness assessments with maximal effort. The fitness assessments in the app included cardiorespiratory fitness, strength, balance, mobility, and flexibility tests (Figure 1). These fitness assessments have established clinical validity and reliability [12-25,30-33]. To complete the cardiorespiratory fitness test and record a real-time heart rate, the participants were required to wear a heart rate monitor below the sternum on a strap around the chest during testing (Figure 2). The heart rate monitor (Magene H64 dual protocol heart rate sensor) is compatible with the app and has Conformite Europeenne and Federal Communications Commission certification. Finally, according to the results of the fitness assessments and the overall health condition of each participant, a proper individualized exercise prescription was suggested by professional teams in the app.

Figure 1. Video demonstration of the push-up test.
View this figure
Figure 2. Demonstration of how to wear the heart rate monitor strap.
View this figure

Assessment Procedures

Eligible participants enrolled in the study, provided informed consent, downloaded the R Plus Health app, and filled in the health questionnaire with the assistance of a physiotherapist. The physiotherapist recorded the basic data, including pain level on a visual analog scale and the overall health condition of the participants, at the baseline and final assessments.

All participants self-administered 2 fitness assessments with a 2- to 3-day interval between assessments to provide the best reproducibility [45]. The fitness assessments were administered sequentially (cardiorespiratory endurance, strength, flexibility, mobility, and balance). The participants followed the guidance and instructions in the app for each fitness assessment.

The 3-minute step test measures cardiorespiratory endurance based on how quickly the heart rate returns to normal after a 3-minute step exercise [12,13]. First, the heart rate monitor strap was worn for a 5-minute rest period beside the 30-cm step (to establish a baseline). After watching the tutorial videos in the app, the participants stepped up and down at 96 beats per minute (bpm) using a metronome for a total of 3 minutes. After finishing the test, the participants rested for 1 minute. The participants could suspend the test if any discomfort occurred.

The push-up and curl-up tests measure muscle strength and endurance in the upper limbs and abdomen, respectively, based on the number of completed repetitions [11,21,23,46]. When performing the push-up test, there were 2 variations in the starting position. The standard push-up test involved having the knees off the ground in the push-up position and was used for male participants. The modified push-up test involved having the knees on the ground and was used for female participants. The participants performed as many push-ups as possible with the correct form within 40 seconds. The curl-up test began with the participants lying on their back, knees bent at approximately 90°, feet flat on the floor, and arms straight with the palms of their hands resting on their thighs. The participants curled up and down at 40 bpm using a metronome. If the participants could not continue or stopped for more than 5 seconds, they clicked the completed button and recorded the repetitions.

The wall squatting test measures muscle strength and endurance in the lower limbs based on the holding time [11,22]. The wall squatting test began with the participants in a standing position, feet shoulder width apart and back against the wall; then, both knees were bent at a 90° angle. The participants held this squatting position for as long as possible. When the participants were finished, they could click the completed button and record the total time. If the participants held the position for more than 150 seconds, the app finished the test automatically.

The sit-and-reach test measures the flexibility of the hamstrings and the lower back with a ruler based on the distance [11,30,32]. The participants sat on the floor with their legs straight and their heels in line with a ruler, hands stacked, and palms facing downward. They then reached forward as far as possible along the measuring line. After reaching forward, the participants recorded their distance in centimeters.

The Apley scratch test or the upper extremity (UE) multipattern test measures the mobility of the upper limbs based on the distance between the middle fingers [33]. There were 2 patterns of upper limb flexibility: shoulder flexion, abduction, and external rotation and shoulder extension, adduction, and internal rotation. The participants performed these 2 patterns of movement for each upper limb and recorded the distance between both middle fingers. The results were classified as above average, normal, or below average.

The one-leg stance test measures balance based on the holding time [26]. The participants stood on one leg, bent the other leg 15-20 cm off the ground with their eyes open and their arms beside the hips. The participants maintained their balance for as long as possible. If the participants lost balance, they clicked the completed button, and the time was recorded in the app automatically. If the participants maintained balance for more than 30 seconds, the app finished the test automatically.

To minimize possible diurnal variation in physical fitness, the participants were instructed to perform the 2 assessments at the same time of the day. They were asked to avoid resistance training and exhausting work between assessments to minimize the potential effects of fatigue. After each test, the participants immediately recorded the results on paper to avoid recall effects and then sent them to the researchers. The researchers concealed the data of the participants in an envelope for anonymity and encoded the names as numbers to protect the privacy of the participants.

Outcome Measures

At the baseline assessment, the descriptive data, pain score, and health condition of the participants were evaluated by a physiotherapist. Descriptive data included age, sex, height, and weight. The pain level was assessed using a visual analog scale from 0 (no pain) to 10 (worst pain). Health condition was assessed using a health-related questionnaire in the app and by a physiotherapist.

The outcomes of each fitness assessment included the raw data and converted score. The raw data were recorded as bpm, repetitions, seconds, and an ordinal scale. The converted scores (0-100) were computed using the app through normative data and a self-established score conversion system on the basis of the raw data.

The participants recorded the heart rate in bpm as raw data, and the converted scores used the same units. If someone could not complete the 3-minute step test, the reason was noted [12]. In each cardiorespiratory fitness test, 2 measurements were made: the average resting heart rate during the 5-minute rest period and the 1-minute recovery heart rate after the 3-minute step test.

The outcomes of the push-up, wall squatting, and curl-up tests were recorded as completed repetitions and total time. The flexibility of the lower limbs and lower back was measured in centimeters from negative to positive values. The mobility of the upper limbs was classified as above average, normal, or below average. The one-leg stance test recorded the total time in seconds [26,31-33].

Data Analysis

Statistical analyses were conducted using SPSS 20.0 software (IBM Inc). Descriptive statistics were presented in the form of mean and SD, and the relative and absolute test-retest reliabilities of the fitness assessments were estimated separately.

Relative Reliability

The relative reliability of the fitness assessments was calculated using the intraclass correlation coefficient (ICC) with a 2-way mixed model (type absolute agreement). On the basis of the 95% CIs of the ICC estimates, agreement was rated as poor (<0.5), moderate (between 0.5 and 0.75), good (between 0.75 and 0.9), or excellent (>0.9) [47].

Absolute Reliability

The absolute reliability of the fitness assessments was evaluated using the SE of measurement (SEM), the smallest real difference (SRD), and Bland–Altman analyses [47,48]. The SEM expressed the measurement error variation between the assessments within a group and was calculated as SDpooled×√(1-ICC) [49]. In this formula, SDpooled indicates the pooled SD for the 2 assessments. The SRD is a measure of sensitivity to change, represented as the magnitude of the change detected at a certain CI [50]. The SRD90 is defined as the SEM of the difference scores at a 90% confidence level and was calculated as 1.65×√2×SEM [48]. If the difference between the 2 assessments was greater than the SRD, it was interpreted as a real change. For all measurements, the smaller the SEM and SRD90, the greater the reliability.

The Bland–Altman analyses and plots assessed the agreement or repeatability of the 2 assessments [49,51]. They estimated the mean and SD difference between the 2 assessments and established limits of agreement (LOA) within a 95% CI [51]. The 95% LOA was calculated as the mean difference±(SDdiff×1.96). The SDdiff indicates the SD of the difference between 2 measurements [48]. The scatter plots show the relationship between the difference between the 2 assessments (y-axis) and the mean of the 2 assessments (x-axis). More point scattering within the 95% LOA, along with a smaller range between the 2 limits, indicated a higher agreement [52,53].


The characteristics of the participants and the descriptive statistics of the fitness assessments at baseline are shown in Tables 1 and 2, respectively. The study enrolled 31 participants (Table 1), which exceeded the minimum sample size of 26 (effect size of 0.5 and power of 0.8) calculated using G*power 3.1 [54]. The average age was 27.25 (4.0) years, and they had negligible pain (mean 0.19 out of 10 on the visual analog scale), which did not worsen during testing.

Table 1. Characteristics of the participants (N=31).
Age (years), mean (SD)27.25 (4.0)
Sex, n (%)

Female16 (52)

Male15 (48)
Height (cm), mean (SD)168.66 (7.61)
Weight (kg), mean (SD)60.23 (11.41)
BMI (kg/m2), mean (SD)21.03 (2.75)
Health statusNormal health examination
Pain assessment (range 0-10), mean (SD)0.19 (0.65)
Table 2. Fitness assessments of the participants at baseline (N=31).
Domain and test itemsRaw data, mean (SD)Converted scorea, mean (SD)
Cardiovascular fitness

HRb at restc (bpmd)74.81 (9.6)51.23 (16.5)

1-minute HR after teste (bpm)92.26 (18.3)60.19 (19.1)

UEf strength: push-up (repetitions)12.94 (9.3)58.71 (18.3)

Abdominal strength: curl-up (repetitions)19.55 (13.7)51.29 (18.8)

LEg strength: wall squatting (seconds)63.03 (26.1)53.23 (18.1)

LE flexibility: sit-and-reach (centimeters)2.85 (14.2)57.74 (25.3)
Balance ability93.23 (19.4)

Right leg stance (seconds)31.77 (14.4)h

Left leg stance (seconds)30.55 (9.9)
UE mobilityi65.48 (18.8)

UE multipattern (above average), n (%)

Right UE23 (74)

Left UE16 (52)

UE multipattern (normal), n (%)

Right UE3 (10)

Left UE6 (19)

UE multipattern (below average), n (%)

Right UE5 (16)

Left UE9 (29)

aConverted score (0-100) from raw data in the app using normative data.

bHR: heart rate.

cResting heart rate measurement.

dbpm: beats per minute.

eHeart rate recovery 1 minute after the 3-minute step test.

fUE: upper extremity.

gLE: lower extremity.

hNot available; no converted score was calculated respectively because the scores were averaged in balance ability.

iThe upper extremity multipattern test was categorized into 3 classes (above average, normal, and below average).

Table 2 shows the results of the baseline fitness assessments as raw data (mean [SD]) and converted score (0-100). At the baseline assessments, the average heart rate was 74.81 bpm at rest, and the 1-minute recovery heart rate was 92.26 bpm after the 3-minute step test. In the strength tests, the average number of completed repetitions was 12.94 push-ups and 19.55 curl-ups, and the average holding time for the squatting test was 63.03 seconds.

Relative Reliability

Table 3 summarizes the test-retest reliability of all the fitness assessments. On the basis of the raw data, the ICCs for all tests were 0.8-0.99. On the basis of the converted scores, the ICCs for all tests were 0.69 to 0.99. In most tests, the 95% CI was >0.5.

Table 3. Test-retest reliability of the fitness assessments (N=31).
Test itemsICCa for the raw dataICC for the converted score
HRb at restc0.80 (0.58-0.90)0.69 (0.34-0.85)
1-minute HR after testd0.92 (0.84-0.96)0.82 (0.63-0.92)
UE strengthe0.97 (0.94-0.99)0.97 (0.93-0.99)
Abdominal strengthf0.98 (0.95-0.99)0.94 (0.87-0.97)
LE strengthg0.93 (0.85-0.96)0.82 (0.63-0.92)
LE flexibilityh0.89 (0.77-0.95)1
UE mobilityiN/Aj0.99 (0.98-0.99)
Right leg stance0.99 (0.98-0.99)0.75 (0.5-0.88)
Left leg stance0.89 (0.77-0.95)0.75 (0.5-0.88)

aICC: intraclass correlation coefficient (at a 95% CI).

bHR: heart rate.

cResting heart rate measurement.

d1-minute HR after test: heart rate recovery 1 minute after the 3-minute step test.

eUE strength: upper extremity strength (push-up test).

fCurl-up test.

gLE strength: lower extremity strength (wall squatting test).

hLE flexibility: lower extremity flexibility (sit-and-reach test).

iUE mobility: upper extremity mobility (upper extremity multipattern test).

jN/A: not applicable; no intraclass correlation coefficient value was calculated because the raw data of the upper extremity mobility test was the percentage of participants, not a continuous variable.

Absolute Reliability

The absolute reliability and Bland–Altman analyses are presented in Table 4. The SEM and SRD90 were 1.44-6.91 and 3.36-16.11, respectively, across the different assessments. The mean differences in UE strength, lower extremity (LE) flexibility, and right leg balance tests were close to 0. The 95% CI of the mean difference contained 0, indicating no significant systematic error between the 2 assessments in strength (−6.28 to 3.89 in the LE strength test and −1.54 to 0.89 in the UE strength test), flexibility (−2.65 to 3.57 in the LE flexibility test), and balance tests (−1.75 to 0.07 in the right leg stance test and −5.58 to 0.93 in the left leg stance test).

Table 4. Absolute reliability of the fitness assessments in raw data.
Raw data of test itemsSEMaSRD90bBland–Altman analyses

dcSDdiffdSE of de95% CI of dLOAf
HR at restg (bpmh)4.289.995.615.581.003.57 to 7.66−5.32 to 16.55
1-minute HR after testi (bpm)5.1812.087.196.911.244.66 to 9.73−6.34 to 20.73
UE strengthj (repetitions)1.613.76−0.323.310.59−1.54 to 0.89−6.81 to 6.17
Abdominal strengthk (repetitions)2.084.85−1.744.000.72−3.21 to −0.27−9.58 to 6.10
LE strengthl (s)6.9116.11−1.1913.872.49−6.28 to 3.89−28.38 to 25.99
LE flexibilitym (cm)4.7110.990.468.471.52−2.65 to 3.57−16.14 to 17.06
Right leg stance (s)1.443.36−0.842.480.45−1.75 to 0.07−5.70 to 4.02
Left leg stance (s)3.287.66−2.328.871.59−5.58 to 0.93−19.70 to 15.06

aSEM: SE of measurement.

bSRD90: smallest real difference at a 90% confidence level.

cd: mean difference between 2 trials.

dSDdiff: SD of mean difference.


fLOA: limits of agreement (d±[SDdiff×1.96]).

gHR at rest: resting heart rate measurement.

hbpm: beats per minute.

i1-minute HR after test: heart rate recovery in 1 minute after the 3-minute step test.

jUE strength: upper extremity strength (push-up test).

kAbdominal strength: curl-up test.

lLE strength: lower extremity strength (wall squatting test).

mLE flexibility: lower extremity flexibility (sit-and-reach test).

Figures 3-10 show the Bland–Altman plots of the differences between the 2 measurements for all tests. Reference lines show mean differences between time 1 and time 2 (solid line) and 95% LOA for the mean difference (dotted lines). The differences for most of the tests were within the 95% CI. The LOA were −5.32 to 16.55 for the heart rate at rest and −6.34 to 20.73 for the 1-minute heart rate after test. The LOA were −6.81 to 6.17 in the UE strength test, −9.58 to 6.10 in the abdominal strength test, and −28.38 to 25.99 in the LE strength test. The LOA were −16.14 to 17.06 in the LE flexibility test, −5.70 to 4.02 in the right leg stance test, and −19.70 to 15.06 in the left leg stance test. There were at most 3 outliers in the 1-minute heart rate after, LE strength, and right leg stance tests.

Figure 3. The Bland–Altman plots of differences between the 2 measurements in heart rate at rest. HR: heart rate.
View this figure
Figure 4. The Bland–Altman plots of differences between the 2 measurements in 1-minute heart rate recovery. HR: heart rate.
View this figure
Figure 5. The Bland–Altman plots of differences between the 2 measurements in abdominal strength assessments. ab: abdominal.
View this figure
Figure 6. The Bland–Altman plots of differences between the 2 measurements in upper extremity strength assessments. UE: upper extremity.
View this figure
Figure 7. The Bland–Altman plots of differences between the 2 measurements in lower extremity strength assessments. LE: lower extremity.
View this figure
Figure 8. The Bland–Altman plots of differences between the 2 measurements in LE flexibility tests. LE: lower extremity.
View this figure
Figure 9. The Bland–Altman plots of differences between the 2 measurements in right leg stance tests. R: right.
View this figure
Figure 10. The Bland–Altman plots of differences between the 2 measurements in left leg stance tests. L: left.
View this figure

Principal Findings

This is the first study to investigate the test-retest reliability of home-based fitness assessments using a mobile health app in young, healthy adults. The results showed a moderate to good reliability of the fitness assessments. Therefore, through video and oral guidance, the app was shown to be reliable when applied to young users.

The self-administered fitness assessments in the app were feasible, with a low risk of injury. All participants completed the fitness assessments with the guidance of the R Plus Health app. Although some participants enrolled in the study had mild pain, they did not worsen after the fitness assessment. In other clinical research, it has been shown that mobile apps are able to conduct ecological momentary assessments, manage, and monitor patients with good adherence, detect symptoms, and evaluate the condition of a patient [55-57]. Therefore, well-designed mobile apps could offer a feasible means of self-assessment for clients and clinicians.

The results of our study were consistent with those of previous reliability studies [14]. Among the fitness assessments in the app, the test-retest reliabilities were moderate to good in this study. On the basis of the raw data, the ICCs for all tests were 0.8-0.99, indicating good to excellent reliability. On the basis of the converted scores, the ICCs for all tests were 0.68-0.99, indicating moderate to good reliability. The 95% CIs were above 0.5 in most tests. One previous study investigated the reliability of web-based versus supervised cardiovascular fitness assessments using the Young Men’s Christian Association 3-minute step test for college students [14]. The results of that study showed that there were no significant differences in the recovery heart rate between the 2 groups and that self-assessed cardiovascular fitness measurements were reliable [14]. Another study investigating the reliability of the Chester Step Test in patients with chronic obstructive pulmonary disease showed good reliability (ICC>0.8) [58]. In a previous analysis of strength fitness tests, reliability was established in adolescents, with ICCs of 0.7-0.9 in push-up, curl-up, and wall squatting tests [59]. For balance tests, the ICCs of single-leg stance tests were found to be >0.77 in young adults using a computerized balance platform [60]. These findings suggest that, regardless of the methods of assessing fitness capacity (eg, web-based and supervised assessments), the use of standard procedures and precise guidance under signal monitoring can ensure an accurate measure of actual performance. Self-administered fitness assessments in the R Plus Health app can be one of these efficient and reliable methods.

In addition to the relative reliability, the absolute reliability can demonstrate the agreement and sensitivity of the mean differences between the assessments. The SRD is a measure of sensitivity to change and represents the magnitude of the change at a certain confidence level [50]. If the difference between 2 assessments was larger than the SRD, it could be considered a real change, and the smaller the SEM and SRD90 of the difference, the greater the reliability. In this study, the SEM and SRD90 ranged from 1.44-6.91 and 3.36-16.11, respectively. For example, if the change was more than 16.11 in the wall squatting test, it was considered real at a 90% confidence level. In this study, the SRD90 in the wall squatting, push-up, and curl-up tests were 16.11, 3.76, and 4.85, respectively. These values were greater than the between-assessment changes reported in a previous study, which were 6.2, 2.6, and 0.1 for the wall squatting, push-up, and curl-up tests, respectively [59]. Different results might be because of different populations, ages, and experimental designs.

The Bland–Altman analyses and plots were generated to measure the repeatability of 2 measurement systems or of several trials using one method [49,51]. The scattering of data points within the 95% LOA and a smaller range between the 2 limits indicated higher agreement [52,53]. The 95% CI of the mean difference contained 0, indicating no significant systematic error between the 2 assessments for the strength, flexibility, and balance tests. The range of the LOA was slightly narrower in the UE strength (−6.81 to 6.17), abdominal muscle strength (−9.58 to 6.10), and right leg stance (−5.70 to 4.02) tests, indicating higher agreement. There was at least one outlier in each fitness assessment, and at most 3 outliers (in the 1-minute heart rate, LE strength, and right leg stance tests), which might be due to familiarization or fatigue in the second test.

Sufficient physical fitness is critical in daily life. It can decrease the risk of cardiovascular disease, pain, and injuries and improve the performance of life activities [9,18-20]. From the results of the cardiorespiratory fitness assessments in this study, the mean heart rate after 1-minute recovery from the step test was 92.26 bpm, indicating above average fitness base on the normative data [61]. In the LE strength wall squatting test, the mean holding time was 63.03 seconds, indicating an average fitness level [62]. In the LE flexibility sit-and-reach test, the mean distance was 2.85 cm, categorized as an average fitness level [11]. Even though the enrolled participants were generally in good health, they were below average in some of the fitness components. In the push-up test for UE strength and the curl-up test for abdominal muscle strength test, the mean number of repetitions was 12.94 and 19.55, respectively, which was below average based on the normative data and showed a need for improvement [63,64]. In the single-leg stance balance test, the mean holding time was 30 seconds (31.77 seconds for the right leg and 30.55 seconds for the left leg), indicating a below average fitness level [65]. Lack of muscle strength and balance can increase the risk of falls, pain, and injuries and limit daily life activities [24-27]. Therefore, comprehensive fitness assessments are essential.

Each participant differed in their performance in the physical fitness assessments according to variable self-conditions between the 2 assessments, and the results also differed from one participant to another. Even in healthy participants without chronic diseases, mild pain can lead to low strength in the extremities. Pain can inhibit muscle firing, and the lack of muscle contraction can decrease the stability of the joints and in turn, produce pain [66]. In other situations, insufficient muscle strength can lead to poor cardiorespiratory fitness. Evidence has shown that muscular fitness is related to cardiovascular prognosis and mortality [67]. As a result, according to the individual situation, it is important to detect weaknesses in the fitness profile and provide proper assessments and advice to clients. Through a comprehensive fitness assessment composed of cardiovascular endurance, strength, flexibility, and balance tests, the R Plus Health app can provide clinicians with a complete picture of the clients’ fitness. Clinicians can then choose to provide other detailed assessments on the web or at the clinic, which not only increases the efficiency of the evaluations but also decreases the medical and economic burden.

Limitations and Future Studies

This study had several limitations. First, the level of difficulty in similar assessments differed from one participant to another, which may lead to ceiling or floor effects. According to the individual situation, automatic adjustment of the grade of assessments will be essential. Second, it is difficult to ascertain the accuracy of the performance assessments in the app without professional supervision. That is, the results of the app might not be identical to testing under professional supervision. In this study design, the results from different testing situations (with or without supervision) could not be compared. One solution to this problem would be to apply suitable monitoring equipment (eg, motion capture analysis and artificial intelligence techniques) to increase the precision of the assessments. However, this creates an additional technological burden. Cross-validation of the outcomes collected by the app versus professional staff will be the subject of future studies. Third, the study recruited young, healthy adults, so the results of the fitness assessments should not be generalized to other populations, such as older adults or patients with chronic diseases. Therefore, the fitness assessments in the app need to be conducted in other populations to compare the results between the app and clinical testing. Testing of the R Plus Health app in additional populations will be conducted in the future.


Home-based fitness assessments using a mobile health app were reliable and feasible in young, healthy adults. The results showed moderate to good reliability, and the testing process caused negligible pain effects. This study highlighted an important contribution of mobile health apps to health care, that is, that healthy adults can self-administer fitness tests and thereby reduce overall costs. The results of mobile fitness assessments can offer a reliable understanding of a person’s health condition and help prescribe a safe and suitable exercise training regimen. Expansion of the use of this technology to different populations (eg, patients with chronic diseases or users with poor fitness) will offer widespread benefits to both patients and the health care system.


This research was partially supported by the Ministry of Science and Technology (MOST-109-2314-B-182-030, and 110-2314-B-182-017) and Chang Gung Memorial Hospital (CMRPD1I0141 and CMRPD 1I0142) in Taiwan.

The authors thank Si-jing Ye and Jing Wei for data collection and Chong Jiang for assistance in conducting the survey. All authors read and approved the final manuscript.

Conflicts of Interest

None declared.

  1. Caspersen CJ, Powell KE, Christenson GM. Physical activity, exercise, and physical fitness: definitions and distinctions for health-related research. Public Health Rep 1985;100(2):126-131 [FREE Full text] [Medline]
  2. Warburton DE, Nicol CW, Bredin SS. Health benefits of physical activity: the evidence. Can Med Asso J 2006 Mar 14;174(6):801-809 [FREE Full text] [CrossRef] [Medline]
  3. Katzmarzyk PT. Physical activity, sedentary behavior, and health: paradigm paralysis or paradigm shift? Diabetes 2010 Nov 27;59(11):2717-2725 [FREE Full text] [CrossRef] [Medline]
  4. Guthold R, Stevens GA, Riley LM, Bull FC. Worldwide trends in insufficient physical activity from 2001 to 2016: a pooled analysis of 358 population-based surveys with 1·9 million participants. Lancet Glob Health 2018 Oct;6(10):1077-1086 [FREE Full text] [CrossRef] [Medline]
  5. Chronic diseases and their common risk factors. World Health Organization. 2005.   URL: [accessed 2021-11-02]
  6. Anderson E, Durstine J. Physical activity, exercise, and chronic diseases: a brief review. Sport Med Health Sci 2019 Dec;1(1):3-10 [FREE Full text] [CrossRef]
  7. Yin P, Qi J, Liu Y, Liu J, Li J, Zeng X, et al. Burden of disease in the Chinese population from 2005 to 2017. Chin Circ J 2019;34(12):1146-1147 [FREE Full text]
  8. Weening-Dijksterhuis E, de Greef MH, Scherder EJ, Slaets JP, van der Schans CP. Frail institutionalized older persons: a comprehensive review on physical exercise, physical fitness, activities of daily living, and quality-of-life. Am J Phys Med Rehabil 2011 Feb;90(2):156-168. [CrossRef] [Medline]
  9. Williams PT. Physical fitness and activity as separate heart disease risk factors: a meta-analysis. Med Sci Sports Exerc 2001 May;33(5):754-761 [FREE Full text] [CrossRef] [Medline]
  10. Wilder RP, Greene JA, Winters KL, Long WB, Gubler K, Edlich RF. Physical fitness assessment: an update. J Long Term Eff Med Implants 2006;16(2):193-204. [CrossRef] [Medline]
  11. American College of Sports Medicine. ACSM's Guidelines for Exercise Testing and Prescription. Philadelphia, PA: Lippincott Williams & Wilkins; 2013.
  12. Andrade C, Cianci R, Malaguti C, Corso SD. The use of step tests for the assessment of exercise capacity in healthy subjects and in patients with chronic lung disease. J Bras Pneumol 2012;38(1):116-124 [FREE Full text] [CrossRef] [Medline]
  13. Lee O, Kim SS, Kim YS, Son HJ, Kim YM, Choi BY. Correlation between YMCA step-test and maximum oxygen consumption (VO2max) as measurement tools for cardiorespiratory. Korean J Epidemiol 2008 Jun 30;30(1):73-81. [CrossRef]
  14. Liguori G, Mozumdar A. Reliability of self assessments for a cardiovascular fitness assessment. Int J Fitness 2009;5(1):33-40 [FREE Full text]
  15. Cooney JK, Moore JP, Ahmad YA, Jones JG, Lemmey AB, Casanova F, et al. A simple step test to estimate cardio-respiratory fitness levels of rheumatoid arthritis patients in a clinical setting. Int J Rheumatol 2013;2013:174541-174548 [FREE Full text] [CrossRef] [Medline]
  16. Bennett H, Parfitt G, Davison K, Eston R. Validity of submaximal step tests to estimate maximal oxygen uptake in healthy adults. Sports Med 2016 May 15;46(5):737-750. [CrossRef] [Medline]
  17. Teren A, Zachariae S, Beutner F, Ubrich R, Sandri M, Engel C, et al. Incremental value of Veterans Specific Activity Questionnaire and the YMCA-step test for the assessment of cardiorespiratory fitness in population-based studies. Eur J Prev Cardiol 2016 Jul;23(11):1221-1227. [CrossRef] [Medline]
  18. Oliver GD, Adams-Blair HR. Improving core strength to prevent injury. J Phys Edu Recreat Dance 2010 Sep;81(7):15-19. [CrossRef]
  19. Cho KH, Bok SK, Kim Y, Hwang SL. Effect of lower limb strength on falls and balance of the elderly. Ann Rehabil Med 2012 Jun;36(3):386-393 [FREE Full text] [CrossRef] [Medline]
  20. Watanabe M, Matsumoto T, Ono S, Koseki H, Watarai K. Relationship of lower extremity alignment during the wall squat and single-leg jump: assessment of single-leg landing using three-dimensional motion analysis. J Phys Ther Sci 2016 Jun;28(6):1676-1680 [FREE Full text] [CrossRef] [Medline]
  21. Hall GL, Hetzler RK, Perrin D, Weltman A. Relationship of timed sit-up tests to isokinetic abdominal strength. Res Q Exerc Sport 1992 Mar;63(1):80-84. [CrossRef] [Medline]
  22. Blazevich AJ, Gill N, Newton RU. Reliability and validity of two isometric squat tests. J Strength Cond Res 2002 May;16(2):298-304. [Medline]
  23. Cogley RM, Archambault TA, Fibeger JF, Koverman MM, Youdas JW, Hollman JH. Comparison of muscle activation using various hand positions during the push-up exercise. J Strength Condit Res 2005;19(3):628-633. [CrossRef]
  24. Harris J, Eng J, Marigold D, Tokuno C, Louis C. Relationship of balance and mobility to fall incidence in people with chronic stroke. Phys Ther 2005;85(2):150-158. [CrossRef]
  25. Hrysomallis C. Relationship between balance ability, training and sports injury risk. Sports Med 2007;37(6):547-556. [CrossRef] [Medline]
  26. Schepens S, Goldberg A, Wallace M. The short version of the Activities-specific Balance Confidence (ABC) scale: its validity, reliability, and relationship to balance impairment and falls in older adults. Arch Gerontol Geriatr 2010;51(1):9-12 [FREE Full text] [CrossRef] [Medline]
  27. Urrunaga-Pastor D, Moncada-Mapelli E, Runzer-Colmenares F, Bailon-Valdez Z, Samper-Ternent R, Rodriguez-Mañas L, et al. Factors associated with poor balance ability in older adults of nine high-altitude communities. Arch Gerontol Geriatr 2018;77:108-114 [FREE Full text] [CrossRef] [Medline]
  28. Malliaras P, Hogan A, Nawrocki A, Crossley K, Schache A. Hip flexibility and strength measures: reliability and association with athletic groin pain. Br J Sports Med 2009 Oct 11;43(10):739-744. [CrossRef] [Medline]
  29. Kelley MJ, Shaffer MA, Kuhn JE, Michener LA, Seitz AL, Uhl TL, et al. Shoulder pain and mobility deficits: adhesive capsulitis. J Orthop Sports Phys Ther 2013 May;43(5):1-31. [CrossRef] [Medline]
  30. Mistry G, Vyas N, Sheth M. Comparison of hamstrings flexibility in subjects with chronic low back pain versus normal individuals. J Clin Exp Res 2014;2(1):85-88. [CrossRef]
  31. Ayala F, de Baranda PS, De Ste Croix M, Santonja F. Reproducibility and criterion-related validity of the sit and reach test and toe touch test for estimating hamstring flexibility in recreationally active young adults. Phys Ther Sport 2012 Nov;13(4):219-226 [FREE Full text] [CrossRef] [Medline]
  32. Mayorga-Vega D, Viciana J, Cocca A, Merino-Marban R. Criterion-related validity of toe-touch test for estimating hamstring extensibility: a meta-analysis. J Hum Sport Exerc 2014 Jul;9(1):188-200. [CrossRef]
  33. Sprague PA, Mokha GM, Gatens DR, Rodriguez R. The relationship between glenohumeral joint total rotational range of motion and the functional movement screen™ shoulder mobility test. Int J Sports Phys Ther 2014 Oct;9(5):657-664 [FREE Full text] [Medline]
  34. Brinkløv CF, Thorsen IK, Karstoft K, Brøns C, Valentiner L, Langberg H, et al. Criterion validity and reliability of a smartphone delivered sub-maximal fitness test for people with type 2 diabetes. BMC Sports Sci Med Rehabil 2016;8:31 [FREE Full text] [CrossRef] [Medline]
  35. Jetté M, Campbell J, Mongeon J, Routhier R. The Canadian Home Fitness Test as a predictor for aerobic capacity. Can Med Assoc J 1976 Apr 17;114(8):680-682 [FREE Full text] [Medline]
  36. Shephard RJ, Bailey DA, Mirwald RL. Development of the Canadian Home Fitness Test. Can Med Assoc J 1976 Apr 17;114(8):675-679 [FREE Full text] [Medline]
  37. Shephard RJ. The current status of the Canadian Home Fitness Test. Br J Sports Med 1980 Jul 01;14(2-3):114-125 [FREE Full text] [CrossRef] [Medline]
  38. Robbins R, Krebs P, Jagannathan R, Jean-Louis G, Duncan DT. Health app use among US mobile phone users: analysis of trends by chronic disease status. JMIR Mhealth Uhealth 2017 Dec 19;5(12):e197 [FREE Full text] [CrossRef] [Medline]
  39. Thurnheer SE, Gravestock I, Pichierri G, Steurer J, Burgstaller JM. Benefits of mobile apps in pain management: systematic review. JMIR Mhealth Uhealth 2018 Oct 22;6(10):e11231 [FREE Full text] [CrossRef] [Medline]
  40. Muntaner-Mas A, Martinez-Nicolas A, Lavie CJ, Blair SN, Ross R, Arena R, et al. A systematic review of fitness apps and their potential clinical and sports utility for objective and remote assessment of cardiorespiratory fitness. Sports Med 2019 Apr;49(4):587-600 [FREE Full text] [CrossRef] [Medline]
  41. Ruiz-Cárdenas JD, Rodríguez-Juan JJ, Smart R, Jakobi J, Jones G. Validity and reliability of an iPhone App to assess time, velocity and leg power during a sit-to-stand functional performance test. Gait Posture 2018 Jan;59:261-266 [FREE Full text] [CrossRef] [Medline]
  42. Capela NA, Lemaire ED, Baddour NC. A smartphone approach for the 2 and 6-minute walk test. Annu Int Conf IEEE Eng Med Biol Soc 2014;2014:958-961. [CrossRef] [Medline]
  43. Brooks GC, Vittinghoff E, Iyer S, Tandon D, Kuhar P, Madsen KA, et al. Accuracy and usability of a self-administered 6-minute walk test smartphone application. Circ Heart Fail 2015 Sep;8(5):905-913 [FREE Full text] [CrossRef] [Medline]
  44. Altini M, Van Hoof C, Amft O. Relation between estimated cardiorespiratory fitness and running performance in free-living: an analysis of HRV4Training data. In: Proceedings of the IEEE EMBS International Conference on Biomedical & Health Informatics (BHI). 2017 Presented at: IEEE EMBS International Conference on Biomedical & Health Informatics (BHI); Feb. 16-19, 2017; Orlando, FL, USA p. 16-19. [CrossRef]
  45. Monteiro ER, Vingren JL, Neto VG, Neves EB, Steele J, Novaes JS. Effects of different between test rest intervals in reproducibility of the 10-repetition maximum load test: a pilot study with recreationally resistance trained men. Int J Exerc Sci 2019;12(4):932-940 [FREE Full text] [Medline]
  46. Ferguson B. ACSM’s Guidelines for Exercise Testing and Prescription 9th Ed. 2014. J Can Chiropr Assoc 2014;58(3):328 [FREE Full text]
  47. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 2016 Jun;15(2):155-163 [FREE Full text] [CrossRef] [Medline]
  48. Chuang L, Wu C, Lin K, Hsieh C. Relative and absolute reliability of a vertical numerical pain rating scale supplemented with a faces pain scale after stroke. Phys Ther 2014 Jan;94(1):129-138. [CrossRef] [Medline]
  49. Bruton A, Conway J, Holgate S. Reliability: what is it, and how is it measured? Physiotherapy 2000 Feb;86(2):94-99 [FREE Full text] [CrossRef]
  50. Schuck P, Zwingmann C. The 'smallest real difference' as a measure of sensitivity to change: a critical analysis. Int J Rehabil Res 2003 Jun;26(2):85-91. [CrossRef] [Medline]
  51. Giavarina D. Understanding Bland Altman analysis. Biochem Med (Zagreb) 2015;25(2):141-151 [FREE Full text] [CrossRef] [Medline]
  52. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986 Feb 08;1(8476):307-310. [Medline]
  53. Myles P, Cui J. Using the Bland-Altman method to measure agreement with repeated measures. Br J Anaesth 2007 Sep;99(3):309-311 [FREE Full text] [CrossRef] [Medline]
  54. Buchner A, Erdfelder E, Faul F, Lang AG. G*Power : statistical power analyses for Mac and Windows. Heinrich-Heine-Universität Düsseldorf. 2001.   URL: [accessed 2021-09-08]
  55. Holtz B, Whitten P. Managing asthma with mobile phones: a feasibility study. Telemed J E Health 2009 Nov;15(9):907-909. [CrossRef] [Medline]
  56. Juengst SB, Graham KM, Pulantara IW, McCue M, Whyte EM, Dicianno BE, et al. Pilot feasibility of an mHealth system for conducting ecological momentary assessment of mood-related symptoms following traumatic brain injury. Brain Inj 2015 Aug;29(11):1351-1361. [CrossRef] [Medline]
  57. Lee J, Song S, Ahn JS, Kim Y, Lee JE. Use of a mobile application for self-monitoring dietary intake: feasibility test and an intervention study. Nutrients 2017 Jul 13;9(7):748 [FREE Full text] [CrossRef] [Medline]
  58. de Camargo AA, Justino T, de Andrade CH, Malaguti C, Corso SD. Chester step test in patients with COPD: reliability and correlation with pulmonary function test results. Respir Care 2011 Jul 01;56(7):995-1001 [FREE Full text] [CrossRef] [Medline]
  59. Lubans DR, Morgan P, Callister R, Plotnikoff RC, Eather N, Riley N, et al. Test-retest reliability of a battery of field-based health-related fitness measures for adolescents. J Sports Sci 2011 Apr;29(7):685-693. [CrossRef] [Medline]
  60. Muehlbauer T, Roth R, Mueller S, Granacher U. Intra and intersession reliability of balance measures during one-leg standing in young adults. J Strength Cond Res 2011 Aug;25(8):2228-2234. [CrossRef] [Medline]
  61. YMCA of the USA. YMCA Fitness Testing and Assessment Manual. Champaign, Illinois: Human Kinetics Publishers; 2000:1-250.
  62. McIntosh G, Wilson L. Trunk and lower extremity muscle endurance: normative data for adults. J Rehabil Outcomes Meas 1998;2(4):20-39 [FREE Full text]
  63. Faulkner R, Sprigings E, McQuarrie A, Bell RD. A partial curl-up protocol for adults based on an analysis of two procedures. Can J Sport Sci 1989 Sep;14(3):135-141. [Medline]
  64. Canadian Society for Exercise Physiology. Canadian Physical Activity, Fitness & Lifestyle Approach : CSEP - Health & Fitness Program's Health-related Appraisal & Counselling Strategy. Ottawa, Ont: Canadian Society for Exercise Physiology; 2004.
  65. Springer BA, Marin R, Cyhan T, Roberts H, Gill NW. Normative values for the unipedal stance test with eyes open and closed. J Geriatr Phys Ther 2007;30(1):8-15. [CrossRef] [Medline]
  66. Verbunt JA, Seelen HA, Vlaeyen JW, Bousema EJ, van der Heijden GJ, Heuts PH, et al. Pain-related factors contributing to muscle inhibition in patients with chronic low back pain: an experimental investigation based on superimposed electrical stimulation. Clin J Pain 2005;21(3):232-240. [CrossRef] [Medline]
  67. Artero EG, Lee D, Lavie CJ, España-Romero V, Sui X, Church TS, et al. Effects of muscular strength on cardiovascular risk factors and prognosis. J Cardiopulm Rehabil Prev 2012;32(6):351-358 [FREE Full text] [CrossRef] [Medline]

bpm: beats per minute
ICC: intraclass correlation coefficient
LE: lower extremity
LOA: limits of agreement
SEM: SE of measurement
UE: upper extremity

Edited by G Eysenbach; submitted 18.02.21; peer-reviewed by C Jacob, G Fico, C Reis; comments to author 04.08.21; revised version received 16.09.21; accepted 12.10.21; published 08.12.21


©I-I Lin, You-Lin Chen, Li-Ling Chuang. Originally published in JMIR Formative Research (, 08.12.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.