Objective: To validate and evaluate the intra- and inter-rater reliability of the 2-min step test (2MST) in measuring the
functional performance of patients with knee pain associated with osteoarthritis (OA).
Methods: Forty-one patients with knee OA was included. Two examiners assessed the patients at two times with
interval between the test and retest from 7 to 14 days. All executions of 2MST were recorded in real time by the exam‑
iners and were also recorded by video. The intraclass correlation coefficient (ICC) and 95% confidence interval (CI),
standard error of measurement (SEM) and minimum detectable difference (MDD) were used to determine reliability.
In the construct validity, we correlate the score of the 2MST with the other instruments used in the study: The Western
Ontario and McMaster Universities Osteoarthritis Index (WOMAC), Numerical Pain Scale (NPS), Pain-Related Catastro‑
phizing Thoughts Scale (PCTS) and Chronic Pain Self-Efficacy Scale (PSEQ). The agreement between the face-to-face
assessment and the evaluation based on the video record was assessed using the Bland-Altman methodology in the
4 moments of the 2MST.
Results: 2MST presented excellent intra- (ICC = 0.94, SEM = 4.47, MDD = 12.40) and inter-rater reliability (ICC = 0.97,
SEM = 3.07, MDD = 8.52). The agreement was acceptable between face-to-face assessments and the analyzes
performed on video. All instruments showed a statistically significant correlation with 2MST, except the PCTS. A cor‑
relation magnitude above 0.50 was found between the 2MST and pain and function domains of the WOMAC, and a
correlation magnitude between 0.30 and 0.50 with the joint stiffness domain of the WOMAC, NRPS and PSEQ.
Conclusion: 2MST proved to be valid for assessing functional capacity in patients with knee OA, with excellent
Keywords: Reproducibility of results, Chronic pain, Exercise test
de Morais Almeida et al. BMC Musculoskeletal Disorders (2022) 23:159
Associated with subjective methods, objective tests are validated their participation by signing an informed con-
used to assess functional capacity, commonly in assess- sent form.
ment routines in rehabilitation centers and in research.
Six-minute walk tests (6MWT), Timed Up and Go Participants
(TUG), gait speed tests, Chair-Stand Test (CST), and The sample calculation considered the confidence coeffi-
Stair-Climb Test are recommended for patients with cient of 0.95 and the amplitude of the confidence interval
knee OA [7, 8], although there is no test considered the for the intraclass correlation coefficient (ICC) of 0.30. In
gold standard for assessing functional capacity in patients addition, the calculation was performed to detect ade-
with knee OA. quate reliability (ICC = 0.75) according to the classifica-
In addition to these, another test that measures func- tion of Fleiss [18]. Thus, a sample size of 34 participants
tional capacity and is applicable in some populations, was estimated. To compensate for a possible sample loss,
such as elderly people of both sexes [9, 10], hypertensive a minimum sample size of 40 volunteers was considered.
elderly and elderly people with mild cognitive impair- The processing of the sample calculation was carried out
ment, is the two-minute step test (2MST) [9, 11–14]. Pre- based on the study carried out by Bonnett [19].
vious studies have found similar results between 2MST We included in this study: patients of both gen-
and 6MWT [10] and a weak to moderate correlation ders; with a minimum age of 40 years and a maximum
of 2MST with 6MWT and TUG in hypertensive older of 80 years; complaint of knee pain lasting more than
adults [14]. 3 months, diagnosis of knee OA issued after evaluation
When compared to the TUG and 6MWT tests, the by an experienced orthopedist, based on criteria estab-
2MST presents the positive aspects of being of low cost, lished by the American College of Rheumatology with
quick execution and not requiring large spaces or specific clinical evaluation and imaging. The criteria were pres-
furniture for its realization. 2MST only needs a wall for ence of pain, presence of osteophytes and at least one
its execution, which within the routine of a rehabilita- of the 3 characteristics (age over 50 years, presence of
tion service facilitates the evaluative dynamics, since its crackling and/or morning stiffness for less than 30 min).
execution can be performed inside the outpatient clinic, Patients with grade 2 or 3 in the classification of Kellgren
inpatient room or in the corridor. However, there is still and Lawrence were included in the study [20].
a lack in the literature on construct validity and reliabil- The non-inclusion criteria adopted in the study were:
ity of the use of 2MST in patients with knee OA, despite individuals with a history of lower limb surgery; use
being a promising tool to investigate functional capacity. of mobility aids; neurological disorder (sensory and/
The clinical plausibility for the application of 2MST in or motor); hip OA; use of prosthesis or orthosis in the
patients with knee OA is related to the importance of the lower limbs; cardiopulmonary diseases or any other
climbing up and down movements, as explained below: acute adverse health condition that may make it impos-
the movement during the 2MST performance involves sible to carry out the proposed tests. Exclusion criteria
the knee and hip joints; stair climbing is one of the first were patients who did not show up within the stipulated
task affected in patients with knee OA [15]; and lower period of 7 to 14 days for the retest.
stair-climbing speed is commonly seen in patients with
knee OA [16, 17]. Assessment procedures
Given this scenario, the present study aimed to assess The present study was integrated by two physical thera-
the construct validity and intra- and inter-examiner reli- pist examiners who performed the measurements with
ability of the 2MST in measuring the functional perfor- the 2MST independently in two moments (test and
mance of patients with knee pain associated with OA. retest), resulting in a total of 4 test applications for each
The hypothesis of this study is that 2MST is a reliable and participant, two evaluations on the first day and two
valid measure for the population tested. more on the second day. The assessments were carried
out by two physiotherapists with more than 10 years of
experience. In addition, a 1-month prior training was
Methods carried out to standardize the execution of the tests.
Study design When measuring functional capacity using 2MST, the
This is a study of the construct validity and reliability examiner measured the maximum number of knee lifts
of the 2MST. The research was carried out in the Adult that the individual performs in 2 min. Before starting the
Orthopedic Rehabilitation sector of Hospital Sarah (São test, a marking was made on the wall, at the midpoint
Luís, MA, Brazil) from July 1, 2020 to January 30, 2021, between the patella and the anterosuperior iliac spine.
approved by the institution’s research ethics committee The examiner counted the number of right knee eleva-
(protocol number 3.962.645). All research participants tions that reached this mark for patients who had pain
de Morais Almeida et al. BMC Musculoskeletal Disorders (2022) 23:159
associated with right knee OA and for patients with bilat- “no pain” and the number 10 represents “the worst pain
eral symptomatic knee OA. The counting of left knee imaginable”. Thus, individuals graded their pain based
elevations was performed only in patients with exclusive on this parameter. This scale is validated for Portuguese
symptoms of left OA. [22]. Each patient answered the scale twice: once for pain
Two previous runs of the test were performed for famil- intensity at rest and once for pain intensity during active
iarization, for a period of 30 s (with a 1-min rest interval knee movements.
between them). After 1 min of rest, the first examiner The Pain-Related Catastrophizing Thoughts Scale
(staying beside the patient for safety in case of imbalance) (PCTS) was used to assess catastrophizing in relation to
applied the test for 2 min, giving verbal information to pain. It is composed of 9 items scaled on a Likert scale,
start the test, another when 1 min had passed and when ranging from 0 to 5 associated with the words “almost
there were 30 s to the end of the test. After a 10-min rest never” and “almost always”. The total score is the sum of
break, the second examiner performed the same proce- the scores of the completed items, divided by the number
dure. The order of examiners was defined by drawing lots of these items answered, with the minimum score being
before each application of 2MST. 0 and the maximum 5. Higher scores indicate a greater
After a minimum interval of 7 days and a maximum presence of catastrophic thoughts. The scale was adapted
of 14 days, the patients were evaluated with the 2MST and validated for Brazilian Portuguese [5].
again by the two examiners. The same pattern performed The Pain Self-Efficacy Questionnaire (PSEQ) was
in the test was maintained, with the maintenance of the developed to investigate the degree of confidence that
time, in the same environment, without the patient hav- patients with chronic pain have about themselves to per-
ing performed any type of physical exercise on the day form daily activities or functions. It consists of 10 items,
of the assessment, in order to avoid fatigue before the with response options ranging from 0 to 6, 0 being “not
assessment. at all confident” and 6 “completely confident”, totaling a
All 2MST runs were recorded for review by the exam- score from 0 to 60. The higher the score, the greater is
iners. In addition, the planes were filmed for further anal- your self-efficacy. This instrument is validated for Brazil-
ysis using an iPhone 8 cell phone (Cupertino, CA, USA) ian Portuguese [6].
and a universal telescope tripod set at the height of the
marking made on the wall. A third independent exam-
iner counted the number of steps using video recordings. Statistical analysis
This measure was taken to allow for the analysis of agree- To characterize the sample, quantitative data were
ment, considering video-based counting as the reference described as mean and standard deviation (SD), and
measure. qualitative data as number and percentage. The intraclass
To determine the construct validity through correla- correlation coefficient (ICC2,3) was used to determine
tions, patients answered validated instruments, trans- intra- and inter-examiner reliability, with its respective
lated and adapted to Brazilian Portuguese, commonly 95% confidence interval (CI), standard error of measure-
used in patients diagnosed with knee OA. Therefore, we ment (SEM) and minimal detectable difference (MDD)
used several questionnaires and scales to better assess the [23]. To interpret the ICC value, the study by Fleiss [18]
pain of patients with knee AO, within a biopsychosocial was used as a reference: for values below 0.40, reliability
model, considering pain intensity, physical function, joint was considered low; between 0.40 and 0.75, moderate;
stiffness, catastrophizing and self-efficacy. between 0.75 and 0.90, high, and, finally, values greater
The Western Ontario and McMaster Universities than 0.90, reliability was considered excellent.
Osteoarthritis Index (WOMAC) is a self-administered To determine the construct validity, the Shapiro-Wilk
questionnaire designed specifically for individuals with normality test was initially applied. Upon identification of
knee or hip OA. It was culturally validated and adapted non-normal distribution of data, Spearman’s correlation
to Brazilian Portuguese [21]. The questionnaire has three coefficient (rho) was used to verify the magnitude of cor-
domains: pain, with 5 items; joint stiffness, with 2 items; relation between 2MST and NPRS, WOMAC, PCTS and
and physical function, with 17 items. For each item, the PSEQ. As a hypothesis for the magnitudes of correlation,
patient has 5 response options (none, mild, moderate, we expect a correlation ≥0.50 between 2MST and the
strong, very strong). The pain domain score ranges from physical function domain of the WOMAC (similar con-
0 to 20, the stiffness domain ranges from 0 to 8 and the structs) and a correlation ranging from 0.30 to 0.50 with
physical function domain ranges from 0 to 68 points. The the pain and joint stiffness domains of the WOMAC,
higher the value, the worse the symptoms. NPRS, PCTS and PSEQ (related but different constructs).
The Numerical Pain Scale (NPS) is a scale consisting It is expected that at least 75% of the hypotheses defined
of a sequence from 0 to 10, where the value 0 represents a priori are confirmed [24].
de Morais Almeida et al. BMC Musculoskeletal Disorders (2022) 23:159
The agreement between the face-to-face evaluations Table 2 Scores of the instruments applied in the study (n = 41)
of the 2MST and the evaluation performed based on the Questionnaires Mean
video recording was analyzed using the Bland-Altman (standard
methodology, considering 4 moments of the completion deviation)
of the 2MST [25]. Western Ontario and McMaster Universities Osteoarthritis Index
The software used for the analyzes was SPSS (version Pain domain 11.21 (3.65)
17, Chicago, IL, USA) and a significance level of 5% was Joint stiffness domain 4.63 (2.25)
considered. Physical function domain 37.31 (11.54)
Numerical Pain Scale
Results Rest 5.56 (2.88)
Forty-three patients diagnosed with knee OA were
Movement 8.12 (2.22)
included in the study, with a sample loss of two individu-
Pain-Related Catastrophizing Thoughts Scale 1.79 (1.37)
als who did not attend within the recommended period
Pain Self-Efficacy Questionnaire 48.04 (11.81)
for the retest. Thus, the final sample consisted of 41
Thus, according to Table 1, the sample in this study was
composed mostly of female adults, with overweight and examiners, as shown in Table 4. In turn, when consid-
bilateral knee OA, with grade 3 OA severity in the Kell- ering the validity of 2MST, we observed a magnitude of
gren and Lawrence classification. In addition, the mean correlation above 0.50 between 2MST and the WOMAC
duration of chronic knee pain symptoms was 50 months. pain and function domains, and magnitude of correlation
Table 2 describes the mean values and standard devia- between 0.30 and 0.50 with the WOMAC joint stiffness
tion of the scores obtained through the questionnaires domain, NPRS at rest and during movement, and PSEQ
applied in the study. Table 3 describes the 2MST val- (Table 5).
ues according to the two examiners and the measure- Using the Bland-Altman methodology, Figs. 1 , 2, 3
ment performed based on the recorded video of the test and 4 show the graphs of acceptable agreement between
execution. the assessments made by the examiners and the analyzes
Regarding reliability, we observed excellent ICC values performed based on the video recording.
(≥ 0.94) when considering different times and different
The 2MST showed excellent intra- and inter-examiner
Table 1 Sociodemographic characteristics of patients with knee reliability in patients with knee OA and was adequately
osteoarthritis (n = 41) correlated with the NPRS, WOMAC and PSEQ instru-
Variable Mean (standard
ments. Other studies have investigated the reliability of
deviation) or 2MST in other populations. The error values inherent to
n (%) the test were less than 7%. The mean values of the execu-
Age (years) 56.48 (7.60)
tion in the 2MST were slightly higher in the retest, pos-
Sex (women) 35 (85.4%)
sibly due to the learning factor; however, the ICC values
were adequate.
Body mass index (kg/m2) 30.51 (3.96)
The scientific literature has only two studies investigat-
Knee osteoarthritis
ing the reliability of the 2MST. Excellent intra-examiner
Bilateral 37 (90.2%)
reliability was found in the elderly, with an ICC value
Right 2 (4.9%)
of 0.90 [9]. Reliability was considered high in individu-
Left 2 (4.9%)
als aged between 18 and 44 years, sedentary and active,
with an ICC greater than or equal to 0.83 [26]. Our
Basic education 8 (19.5%)
results show higher ICC values than previous stud-
Complete primary education 5 (12.2%)
ies (ICC ≥ 0.94), probably due to two factors: consistent
Complete high school 20 (48.8%)
clinical experience of the examiners (> 10 years) and the
Complete higher education 5 (12.2%)
completion of training and standardization for 1 month
Posgraduate 3 (7.3%)
before the start of data collection.
Kellgren-Lawrence classification
Our study was the first to assess clinimetric proper-
Grade 2 12 (29.3%)
ties of 2MST in a population of knee OA. A systematic
Grade 3 29 (70.7%)
review conducted by Bohannon and Crouch [27] evaluat-
Chronicity of pain (months) 50.04 (44.34)
ing the clinimetric properties of 2MST in healthy elderly
de Morais Almeida et al. BMC Musculoskeletal Disorders (2022) 23:159
Table 3 Mean values and standard deviation of the execution of the 2-min step test (2MST) according to the examiners and in the
video analysis (n = 41)
2MST Examiner 1 Examiner 2
Test Retest Test Retest
Face-to-face measurement 64.75 (19.03) 67.12 (18.27) 65.56 (18.70) 67.58 (19.11)
Measurement with video 64.73 (18.97) 67.39 (18.12) 65.46 (18.65) 67.68 (19.11)
Table 4 Intra- and inter-examiner reliability of the 2-min step test (2MST) (n = 41)
Reliability ICC 95% IC SEM (elevation) SEM MDD MDD
(%) (elevation) (%)
Table 5 Correlation between the scores of the 2-min step test according to international guidelines [24]. Therefore, the
(2MST) and the other questionnaires applied (n = 41) 2MST measures functionality, with the advantages of
Questionnaires 2MST being a low cost test, quick execution and not requiring
large spaces or specific furniture for its realization.
rho p
The relationship between physical function and pain
Western Ontario and McMaster Universities measures has already been investigated in previous stud-
Osteoarthritis Index ies with patients with knee OA. In a similar way to the
Pain domain −0.503 0.001* present study, Odole et al. [28] identified a correlation
Joint stiffness domain −0.431 0.005* magnitude lower than 0.50 in the correlations between
Physical function domain −0.536 < 0.001* physical function and self-efficacy (r = 0.35), kinesiopho-
Numerical Pain Scale bia (r = − 0.43) and catastrophizing (r = − 0.28). Investi-
Rest −0.347 0.026* gating catastrophizing in patients with knee OA, Gomes
Movement −0.478 0.002* et al. [29] found no correlation with lower limb function,
Pain-Related Catastrophizing Thoughts Scale −0.172 0.281 balance and mobility (rho ranging from − 0.22 to 0.25).
Pain Self-Efficacy Questionnaire 0.366 0.019* In addition, other factors are associated with physical-
* Significant correlation (p < 0.05, Spearman’s correlation coefficient) functional performance, such as knee muscle strength,
knee flexion range of motion, knee pain, and age [30].
We used self-report instruments (questionnaires and
and elderly people with specific diseases such as heart scales) already validated in the population with knee
failure, chronic kidney disease, hypertension, depression OA to assess the magnitude of correlation with 2MST,
and Alzheimer’s disease observed that 2MST was corre- and this aspect is an important limitation of our study.
lated with the level of ability functional, performance on Other 2MST validation studies used performance assess-
psychocognitive measures, health status and age. How- ment instruments in different populations, showing an
ever, only one study included in the review addressed the adequate correlation with the 1-mile walk time in healthy
assessment of reliability [9]. Of the 30 articles analyzed, elderly [10], and weak to moderate correlation with the
8 studies showed an increase in repetitions after physical 6MWT and TUG in a population of hypertensive women
training. [14].
In analyzing the construct validity, we correlated the Our study observed adequate agreement between
2MST score with the NPRS, WOMAC, PCTS and PSEQ the evaluations carried out in person and the evalu-
scores. With the exception of PCTS, all instruments ations carried out through video recording. We con-
showed a statistically significant correlation with 2MST, sider measurement through video as a reference value.
with a magnitude of correlation above 0.30. In addition, The results showed that the mean difference between
a correlation magnitude above 0.50 was found between the two methods is close to 0, which reflects excellent
2MST and the WOMAC pain and function domains. agreement. Therefore, both forms of application of the
More than 75% of our hypotheses were confirmed, dem- 2MST can be used, and professionals involved in the
onstrating sufficient results for construct validation,
de Morais Almeida et al. BMC Musculoskeletal Disorders (2022) 23:159
Fig. 1 Graph of agreement between the examiner test 1 and the respective analysis through video recording
Fig. 2 Graph of agreement between the retest of examiner 1 and the respective analysis through video recording
de Morais Almeida et al. BMC Musculoskeletal Disorders (2022) 23:159
Fig. 3 Graph of agreement between the examiner 2 test and the respective analysis through video recording
Fig. 4 Graph of agreement between the retest of examiner 2 and the respective analysis through video recording
de Morais Almeida et al. BMC Musculoskeletal Disorders (2022) 23:159
Bacanga, São Luís, MA 65080805, Brazil.
Instituto de Coloproctologia, São Luís, MA, Brazil.
form of assessment best suits their clinical context.
Received: 9 September 2021 Accepted: 14 February 2022
tioned. Our study performed the test and retest keep-
ing the same conditions regarding location, time,
fatigue, but we did not control the stability of the clini-
References
7 to 14-day interval. Reliability analysis by sex was not 1. Ferreira C d SB, Dibai-Filho AV, Almeida DO d S, et al. Structural validity of
the Brazilian version of the western Ontario and mcmaster universities
performed, although we did not observe this analysis osteoarthritis index among patients with knee osteoarthritis. Sao Paulo
in other reliability studies [23, 31, 32]. However, our Med J. 2020;138(5):400–6. https://doi.org/10.1590/1516-3180.2020.0046.
sample was predominantly female and overweight, and R1.26062020.
2. Cui A, Li H, Wang D, Zhong J, Chen Y, Lu H. Global, regional prevalence,
these factors must be considered in the generalization incidence and risk factors of knee osteoarthritis in population-based
of the results. We did not investigate the influence of studies. EClinicalMedicine. 2020;29. https://doi.org/10.1016/J.ECLINM.
the grade in the classification of Kellgren and Lawrence 2020.100587.
3. Marx FC, de Oliveira LM, Bellini CG, Ribeiro MCC. Tradução e validação
(grade 2 versus 3) on the results of the present study. cultural do questionário algofuncional de Lequesne para osteoartrite
de joelhos e quadris para a língua portuguesa. Rev Bras Reumatol.
2006;46(4):253–60. https://doi.org/10.1590/S0482-50042006000400004.
4. Metsavaht L, Leporace G, Sposito MM de M, Riberto M, Batista LA. What
Conclusion is the best questionnaire for monitoring the physical characteristics of
2MST proved to be valid for assessing functional capac- patients with knee osteoarthritis in the brazilian population? Rev Bras
Ortop 2011;46(3):256–261. doi:https://doi.org/10.1590/S0102-36162
ity in patients with knee OA, with excellent reliability.
The study supports the use of the 2MST in the clinical 5. Sardá Junior J, Nicholas MK, Pereira IA, Pimenta CA d M, Asghari A, Cruz
context and in research with patients with pain, associ- RM. Validation of the pain-related catastrophizing thoughts scale. Acta
Fisiátrica. 2008;15(1):31–6. https://doi.org/10.5935/0104-7795.20080001.
ated with knee OA.
6. Bonafé FSS, Marôco J, Campos JADB. Pain self-efficacy questionnaire
and its use in samples with different pain duration time. Br J Pain.
2018;1(1):33–9 Accessed 8 Sept 2021. https://www.scielo.br/j/brjp/a/
based tests to assess physical function in people diagnosed with hip or
knee osteoarthritis. Osteoarthr Cartil. 2013;21(8):1042–52. https://doi.org/
TFMA, AVDF and CETC designed the study; TFMA, FFT and EAAL collected
8. Dobson F, Hinman R, Hall M, et al. Reliability and measurement error of
the data; TFMA, AVDF, FFT, EAAL and CETC analyzed and interpreted the data;
the osteoarthritis research society international (OARSI) recommended
TFMA, AVFC and CETC wrote the initial draft; All authors read and approved
performance-based tests of physical function in people with hip and
the final manuscript.
knee osteoarthritis. Osteoarthr Cartil. 2017;25(11):1792–6. https://doi.org/
9. Rikli RE, Jones CJ. Development and validation of a functional fitness test
10. Rikli RE, Jones CJ. Functional fitness normative scores for community-
11. Langoni CDS, Resende TDL, Barcellos AB, et al. Effect of exercise on cogni‑
tion, conditioning, muscle endurance, and balance in older adults with
Ther. 2019;42(2):E15–22. https://doi.org/10.1519/JPT.0000000000000191.
