Reading Psychology: Click For Updates

This article was downloaded by: [Tulane University]
On: 01 February 2015, At: 01:09

Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954
Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,
UK
Reading Psychology
Publication details, including instructions for
authors and subscription information:
http://www.tandfonline.com/loi/urpy20
Reliability of Ratings of
Children’s Expressive Reading
a b
Gary P. Moser , Richard R. Sudweeks , Timothy G.
a a
Morrison & Brad Wilcox
a
Department of Teacher Education , Brigham Young
University , Provo , Utah
b
Department of Instructional Psychology and
Technology , Brigham Young University , Provo , Utah
Published online: 04 Nov 2013.
Click for updates
To cite this article: Gary P. Moser , Richard R. Sudweeks , Timothy G. Morrison &
Brad Wilcox (2014) Reliability of Ratings of Children’s Expressive Reading, Reading
Psychology, 35:1, 58-79, DOI: 10.1080/02702711.2012.675417
To link to this article: http://dx.doi.org/10.1080/02702711.2012.675417
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the
information (the “Content”) contained in the publications on our platform.
However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness,
or suitability for any purpose of the Content. Any opinions and views
expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the
Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any
losses, actions, claims, proceedings, demands, costs, expenses, damages,
and other liabilities whatsoever or howsoever caused arising directly or
indirectly in connection with, in relation to or arising out of the use of the
Content.
This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan,
sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at
http://www.tandfonline.com/page/terms-and-conditions
Downloaded by [Tulane University] at 01:09 01 February 2015
Reading Psychology, 35:58–79, 2014
Copyright
C Taylor & Francis Group, LLC
ISSN: 0270-2711 print / 1521-0685 online

DOI: 10.1080/02702711.2012.675417
RELIABILITY OF RATINGS OF CHILDREN’S

EXPRESSIVE READING
GARY P. MOSER
Department of Teacher Education, Brigham Young University, Provo, Utah
RICHARD R. SUDWEEKS
Department of Instructional Psychology and Technology,
Brigham Young University, Provo, Utah

TIMOTHY G. MORRISON and BRAD WILCOX
Department of Teacher Education, Brigham Young University, Provo, Utah
This study examined ratings of fourth graders’ oral reading expression. Ran-
domly assigned participants (n = 36) practiced repeated readings using nar-
rative or informational passages for 7 weeks. After this period raters used the
Multidimensional Fluency Scale (MFS) on two separate occasions to rate
students’ expressive reading of four equivalent passages. Results of this general-
izability study showed that a minimum of two and preferably three equivalent
passages, two raters, and one rating occasion are recommended to obtain reliable
ratings. This research substantiates the reliability of the MFS and demonstrates
the importance of raters collaborating and finding texts at students’ independent
reading levels.
Early definitions of fluency focused primarily on processes of word

identification and rate of oral reading. More recently, researchers
have turned their attention toward two related issues concerning
reading fluency: its effects on comprehension and on expressive
oral reading.
Samuels (2002) explained, “Decades ago, no one thought of
automaticity (or fluency) in terms of processes that foster active
constructive comprehension” (p. 167). LaBerge and Samuels
(1974) were among the first to show that automatic identification
of words during reading characterizes the activity of fluent read-
ing. Some have explained that when readers can identify words ef-
fortlessly using little attention, they are more able to attend to cre-
ating meaning (Stanovich, 1980, 1986). The terms skills, meaning
Address correspondence to Timothy G. Morrison, 205 MCKB, Brigham Young Uni-

versity, Provo, UT 84602. E-mail: tim morrison@byu.edu
58
Reliability of Ratings of Children’s Expressive Reading 59
processes that are automatic (e.g., letter and word identification),

and strategies, designating processes that are used more deliber-
ately and consciously (e.g., identifying causal relationships and
resolving inferences), are used to explain the value of automatic
word identification (Afflerbach, Pearson, & Paris, 2008). Using
those capabilities, skilled readers have automatized basic pro-
cesses of reading so that strategies can be used to build meaning.
Those who can read quickly and accurately are considered
by many to be fluent readers. While reading fluency instruction
continues to emphasize efficiency in word identification accuracy
and reading rate (Chard, Pikulski, & McDonagh, 2006; Eldredge,

2005; Kuhn, 2004; National Reading Panel, 2000; Quirk, Schwa-
nenflugel, & Webb, 2009; Rasinski, 2000; Samuels, 2002; Samuels,
Schermer, & Reinking, 1992), elements of expressive reading that
contribute to improvement in comprehension are now empha-
sized as well (Dowhower, 1991; Kuhn, 2004; Pinnell et al., 1995;
Prescott-Griffin & Witherell, 2004; Rasinski, 2003; Reutzel, 2006;
Schreiber, 1991; Young, Bowers, & MacKinnon, 1996). Readers
who can read the words quickly and accurately can often read
with meaningful expression. Expressive reading, sometimes re-
ferred to as prosody, includes elements such as pacing, phrasing,
smoothness, and speech volume.
A multifaceted definition of fluency requires multiple assess-
ment tools to obtain a complete understanding. Measurement of
rate and accuracy are relatively straightforward processes; how-
ever, assessment of expressive reading is more difficult because
it requires rater-mediated judgments.
Schwanenflugel et al. (2006) have studied children’s fluent
reading, focusing on word and text level fluency without measur-
ing prosody. Evaluating prosody is sometimes overlooked because
it can be difficult to assess reliably. Schwanenflugel, Hamilton,
Kuhn, Wisenbaker, and Stahl (2004) also examined children’s flu-
ency, focusing specifically on prosody. Instead of using a rubric,
they examined prosody by measuring pauses noted in record-
ings of children’s oral reading. This method can be used by re-
searchers who have the required technology; however, other re-
searchers and classroom teachers may need to use more accessible
means.
Several measures of oral reading expression that use rubrics
requiring rater judgments have been developed, including those
60 G. P. Moser et al.
FIGURE 1 Multidimensional Fluency Scale.
created by National Assessment of Educational Progress (NAEP),

Johns and Berglund (2006), Leslie and Caldwell (2011), and
Rasinski, Blachowicz, and Lems (2006), who developed the
Multidimensional Fluency Scale (MFS; see Figure 1). The MFS rates
several indicators of prosody: (a) expression and volume, (b)
phrasing, (c) smoothness, and (d) pacing. Measuring expressive
reading using any of these assessment tools relies on ratings of

several aspects of expression. Scores derived from use of these
instruments can be influenced by many factors, such as the types
and topics of passages students read, the difficulty levels of the
passages, and the professional experience and collaboration of
those who rate the readings, as well as the abilities of the readers
themselves.
Such ratings require that judgments be made about the qual-
ity or quantity of some property or characteristic of an object or
event. In the context of expressive oral reading, ratings are evalu-
ative inferences about a student’s ability to read with expression.

Consequently, explicit steps must be taken to standardize the cri-
teria and minimize the subjectivity of the rating process. Other-
wise the rating that an individual reader receives may depend
more on who performed the rating or when it occurred than on
the quality of the reader’s expressive reading.
The purpose of this study was to determine the reliability of
scores obtained using the MFS (Rasinski et al., 2006) with fourth-
grade readers. Specifically we focused on three research ques-
tions:
1. What percentage of the variability in the ratings of elements of

expression in the MFS is contributed by passages, raters, rating
occasions, and interactions of these various sources of error?
2. What is the estimated reliability of the ratings of the four ele-
ments of expressive reading in the MFS for making decisions
about individual students?
3. How would the reliability of the rating process be changed by
using a less expensive and more practical design for collecting
ratings, for example, using different combinations of passages,
raters, and rating occasions?
Method
Participants
Participants were selected from two elementary schools in the

same school district located in a middle class community, with two
fourth-grade classrooms from each school participating. In one
school 3% of the students were eligible for free or reduced-price
school lunch and in the other 41% were eligible. Fewer than 10%
of the students at both schools represented minority populations.
Independent reading levels for all fourth graders were
obtained at the beginning of the school year using the
Developmental Reading Assessment (Beaver & Carter, 2003).
These students were sorted into groups that most closely approxi-
mated their independent reading levels—second, third, or fourth
grade. A stratified random sample of 36 of these fourth graders,
with equal numbers of males and females, was selected. Propor-
tional numbers of students at each reading level were then ran-
domly assigned to practice reading either narrative (n = 18) or
informational text (n = 18).

Three classroom teachers participated, with one teaching two
language arts classes. All were female professional teachers with
a range of 3 to 32 years of experience, and one held a master’s
degree. Based on observations in their classrooms, researchers
determined that all three teachers could be excellent models of
expressive oral reading. These teachers received training on ele-
ments of the MFS. They were instructed in their responsibilities
in implementing the procedures of repeated reading but did not
participate in scoring children’s expressive reading.
Two researchers independently scored all readings after re-
viewing the MFS and practicing together and separately on several
occasions. One rater held a masters degree in literacy and was a
doctoral candidate at the time. He had been a classroom teacher
for over 20 years. The second rater was an associate professor with
a PhD in Reading Education. Earlier in his career he had been
a classroom teacher for 3 years. The professor was the doctoral
student’s dissertation chair at the time of the study.
Practice Materials
Reading materials for fluency practice were selected using four

criteria. First, an equal number of narrative and informational
passages were selected. Second, specific genres were chosen to re-
duce variability of text type. Narrative passages were all within the
contemporary realistic fiction genre, while informational passages
were all descriptive. Third, the length of each selected passage
was approximately 200 words (Dowhower, 1987; O’Shea, Sindelar,
& O’Shea, 1985; Samuels, 1979). Fourth, the readability level for
each passage was determined using the Spache or the Flesch Grade
Level readability formulas (Micro Power & Light, 2005).
Procedures
The 7-week practice period consisted of three consecutive ses-

sions per week (Kuhn, 2005), with a 20-minute period each day.
Each week all students received a packet of three narrative or
three informational passages matched to their independent read-

ing levels. Prior to having students read these passages for prac-
tice, teachers modeled expressive oral reading of each. All fourth
graders then read their selections four times in a succession of
timed repeated readings. Following each reading, students calcu-
lated and recorded the number of words they read during each
1-minute interval (wpm; Rasinski, 2003; Reutzel, 2006).
Teachers made sure students practiced the appropriate pas-
sage for the day; modeled proper rate, accuracy, and expression
for all passages; timed students’ oral readings for 1 minute; in-
structed students to record their reading rates; and collected fold-
ers at the end of each session. Each teacher was observed by one
of the researchers on two separate occasions during the 7-week
period, assuring that all students practiced consistently.
At the end of the 7 weeks, the 36 students were assessed
as they read four equivalent assessment passages—two narrative
and two informational—that were matched to their independent-
reading level. A counterbalancing procedure was implemented
for introducing assessment passages, based on whether text prac-
ticed was narrative or informational (O’Shea et al., 1985; Saenz &
Fuchs, 2002).
Prior to the scoring, the two raters involved in the study spent
three sessions of approximately two hours each in practicing us-
ing the MFS to rate the fourth graders’ oral readings. The descrip-
tions given on the scoring rubric of the MFS (see Figure 1) were
reviewed and discussed. These practice sessions used recorded
readings of passages that were not included in the final data set.
The raters first listened to recordings of students’ readings
and discussed how they would rate them and why. They found it
easy to consistently determine what constituted extreme scores on
a four-point scale. It was more difficult to determine differences

in the middle scores. Each time there was a discrepancy the raters
discussed their reasoning until they were able to come to agree-
ment and establish guidelines to direct their future decisions.
Raters then scored a sample of readings independently, re-
turning to compare scores. The few discrepancies they experi-
enced were resolved through discussion, and scoring guidelines
were further clarified. For example, in the area of pacing it was
determined that higher scores would go to students who read at
a conversational rate rather than racing through the text. In the
area of phrasing, if a student regularly attended to punctuation

with a few notable exceptions, he received a score of three. But if
a student rarely paid attention to punctuation he received a score
of two, even though there were times when his phrasing was more
appropriate.
Data Analysis
The data that were analyzed to answer the research questions

were the ratings of four readings by 36 fourth-graders, a total of
144. Two researchers independently rated the students’ expressive
reading of the two narrative and two informational passages using
the Multidimensional Fluency Scale (Rasinski et al., 2006; Zutell &
Rasinski, 1991). The raters scored all passages, using a scale of
1–4 for each element of expressive reading, for each participant
on two separate occasions for a total of 288 ratings each.
Traditional procedures for estimating reliability are not capa-
ble of simultaneously estimating the impact of multiple sources
of inconsistencies in ratings. However, Cronbach, Gleser, Nanda,
& Rajaratnam (1972) developed a framework and set of proce-
dures known as generalizability theory that provides a way to si-
multaneously estimate the effects of multiple sources of error
variability—including two-way and higher-order interactions—on
the reliability of a set of ratings.
Generalizability theory (G theory) is a measurement theory
that explicitly acknowledges different sources of measurement er-
ror and provides a way to estimate simultaneously the magnitude
of these multiple sources of error that may affect the dependent
variable. Derived from factorial analysis of variance, G theory pro-
vides a way of partitioning the total variability in a set of ratings
into multiple components, each associated with a different source
of variance. G theory is particularly appropriate for rating expres-

sive oral reading because it defines reliability as a variable that
takes on different values depending on the magnitude of the
variance components. Shavelson and Webb (1991) and Brennan
(2001) have shown generalizability theory to be effective in con-
ducting reliability studies.
G theory is analogous to random effect analysis of variance
in that both are used to estimate the main effects and interaction
effects through the analysis of mean ratings. Instead of computing
F-ratios to test hypotheses, the ANOVA in a G study produces an
estimate of the variance component for each main effect and for
each interaction. However, G theory goes beyond ANOVA in that
it can be used to estimate the relative percentage of measurement
error from each of these facets.
Ideally the variance component for readers should be larger
than any of the others, since they are the object of measurement,
while rater means and occasion means should remain constant.
However, the variance component for raters may be large, because
raters do not always agree. This could be due to rater fatigue,
disparate rater knowledge of the subject matter, a consistent ten-
dency of some raters to be lenient or stringent in their ratings, or
other sources of rater error.
Similarly, variance components for rating occasions may be
large, as a rater may judge differently at separate times. These dif-
ferences may be due to individuals changing their own standards
or strategies or to some outside experience or influence.
Generalizability theory can examine two-way interaction ef-
fects among passages, raters, and occasions in order to estimate
the amount of measurement error from these sources. While G
studies were done in this research to measure the variance com-
ponents involved in ratings of expressive reading in this study, de-
cision studies (D studies) were also performed to make informed
decisions about how many levels of each facet (passages, raters,
and rating occasions) should be used to obtain acceptable relia-
bility at a feasible cost (Shavelson & Webb, 1991).
Eight generalizablity (G) studies were conducted for all four
elements of expressive reading for both narrative and informa-
tional text. Raters, rating occasions, and passages were treated as
random facets, while the type of text was classified as fixed. Nar-
rative and informational text conditions were analyzed separately
(Shavelson & Webb, 1991). Passage was nested in type of text, but
crossed with students, raters, and rating occasions. This produced

a fully-crossed, three-facet S × P × R × O design for each type
of text, where S designates students, P designates the passage,
R designates the raters, and O represents the rating occasions.
The G studies, using GENOVA computer software for data anal-
ysis (Brennan, 2001), computed the reliability coefficients for all
facets.
In addition, eight separate D studies were conducted for the
four aspects of expressive reading for both types of text. These D
studies were conducted to determine how many passages, raters,
and rating occasions would be necessary to optimize the reliabil-

ity of ratings for future studies. Shavelson and Webb (1991) ex-
plained the benefits for implementing D studies with behavior
measurements:
By increasing the number of conditions of a facet in a measurement (e.g.,

increasing the number of observers who rate each behavior), the error
contributed by that facet can be decreased, much as adding items to a
test decreases error (and increases reliability) in classical test theory. In
this way, a cost efficient design for the application of the social science
measure can be developed. (p. 13)
Following generalizability theory guidelines, G studies were

conducted first to establish the sources of variation in the mea-
surement of expressive reading. These were followed by D studies
to determine the optimum configuration of passages, raters, and
rating occasions required to obtain reliable scores of expressive
oral reading (Brennan, 2001; Shavelson & Webb, 1991; Shavelson,
Webb, & Rowley, 1989).
Results
Results of the generalizability studies are presented first, followed

by results of the decision studies.
Generalizability Studies
Two separate generalizability studies were conducted as part of

this research. The first G study was conducted to estimate various
sources of variability in the ratings of students’ expressive oral
reading of narrative text. The second G study estimated the

sources of variance in ratings of students’ expressive reading
of informational text. The variance components for ratings of
reading of narrative passages are reported in the first line of
Table 1 and in the first line of Table 2 for informational text.
Ideally, the variance component for students should be large
relative to the other variance components, because of the inher-
ent differences in participants. For this study variability attributed
to differences in students ranged from 79.2% to 83.3% for narra-
tive text and 70.7% to 81.8% for informational text.
The percentage of variability attributed to raters, rating oc-

casions, and passages for all four aspects of expressive reading
had minimal effects in the percentage of variability in the ratings
(see Tables 1 and 2). In fact, the percentage of total variation at-
tributed to rating occasions, raters, and passages for narrative text
was 0.0% for all elements of expression except pace, which was
only 1.4%. Slightly different percentages were obtained for infor-
mational text, with most percentages still only 0.0%.
These results indicated that the raters’ scores of expressive
reading were very consistent. Each rater’s scores were very simi-
lar on both rating occasions, showing strong intra-rater reliability.
In addition, their scores were similar to one another’s on both
rating occasions, demonstrating strong inter-rater reliability. Re-
sults also indicated that the assessment passages used in the study
appeared to be equivalent. While the facets of raters, rating oc-
casions, and passages did not affect the ratings given, individual
differences among the participating students did. Although stu-
dents had practiced reading narrative or informational text, their
expression scores were similar regardless of the type of text they
read for assessment. So differences among the students appeared
to be more related to the topic of the reading than the genre.
Decision Studies
Information from G studies was used to conduct D studies to

make decisions and applications for specific purposes (Shavelson
& Webb, 1991). In this research, the G studies established char-
acteristics of the measurement tool used to rate oral expressive
reading. In the D studies those characteristics were used to deter-
mine the best combination of them for future study.
68
TABLE 1 Variability in Ratings of Expressive Oral Reading of Narrative Text Attributed to Each Facet and Interactions Among
Facets
Expression and Volume Phrasing Smoothness Pace
Estimated Percent Estimated Percent of Estimated Percent of Estimated Percent of

Sources of Variance of Total Variance Total Variance Total Variance Total
Variability Component Variation Component Variation Component Variation Component Variation
Students 0.6314 79.2 0.5948 80.2 0.6478 80.7 0.7437 83.3

Occasions 0.0002 0.0 0.0010 0.0 0.0000 0.0 0.0014 0.0
Raters 0.0000 0.0 0.0000 0.0 0.0000 0.0 0.0121 1.4
Passages 0.0000 0.0 0.0000 0.0 0.0000 0.0 0.0004 0.0
S×O 0.0000 0.0 0.0000 0.0 0.0238 3.0 0.0000 0.0
S×R 0.0395 5.0 0.0133 1.8 0.0127 1.6 0.0573 6.4
S×P 0.0000 0.0 0.0000 0.0 0.0000 0.0 0.0000 0.0
O×R 0.0004 0.0 0.0006 0.0 0.0052 1.0 0.0000 0.0
O×P 0.0006 0.0 0.0000 0.0 0.0000 0.0 0.0002 0.0
R×P 0.0048 1.0 0.0204 2.8 0.0062 1.0 0.0000 0.0
S×O×R 0.0065 1.0 0.0133 1.8 0.0000 0.0 0.0141 1.6
S×O×P 0.0000 0.0 0.0077 1.0 0.0000∗ 0.0 0.0000 0.0
S×R×P 0.0716 9.0 0.0768 10.4 0.0216 2.7 0.0355 4.0
O×R×P 0.0000 0.0 0.0012 0.0 0.0000 0.0 0.0000 0.0
Residual 0.0425 5.3 0.0127 1.7 0.0853 10.6 0.0286 3.2
Note. ∗ = The negative variance components were set to zero following Brennan’s (1992) guidelines.
TABLE 2 Variability in Ratings of Expressive Oral Reading of Informational Text Attributed to Each Facet and Interactions
Among Facets
Estimated Percent Estimated Percent Estimated Percent of Estimated Percent of

Sources of Variance of Total Variance of Total Variance Total Variance Total
Students 0.5327 70.7 0.5377 79.6 0.6720 81.8 0.6730 81.2

Occasions 0.0010 0.0 0.0004 0.0 0.0000 0.0 0.0000 0.0
Raters 0.0038 1.0 0.0079 1.2 0.0000 0.0 0.0232 2.8
Passages 0.0018 0.0 0.0042 1.0 0.0018 0.0 0.0000 0.0
S×O 0.0056 1.0 0.0000 0.0 0.0006 0.0 0.0000∗ 0.0
S×R 0.0587 7.8 0.0000 0.0 0.0000 0.0 0.0532 6.4
S×P 0.0000 0.0 0.0000 0.0 0.0000 0.0 0.0000 0.0
O×R 0.0000 0.0 0.0000 0.0 0.0004 0.0 0.0000∗ 0.0
O×P 0.0000 0.0 0.0000 0.0 0.0006 0.0 0.0006 0.0
R×P 0.0000 0.0 0.0095 1.4 0.0071 1.0 0.0000 0.0
S×O×R 0.0016 0.0 0.0000∗ 0.0 0.0066 1.0 0.0000 0.0
S×O×P 0.0000 0.0 0.0000 0.0 0.0000 0.0 0.0000 0.0
S×R×P 0.1129 15.0 0.0738 10.9 0.0762 9.3 0.0502 6.1
O×R×P 0.0026 0.0 0.0004 0.0 0.0000 0.0 0.0000 0.0
Residual 0.0321 4.3 0.0413 6.1 0.0568 6.9 0.0282 3.4
Note. ∗ = The negative variance components were set to zero following Brennan’s (1992) guidelines.
69
Results from the D studies showed very high reliability coef-

ficients, ranging from .94 to .97 for narrative text and from .92
to .98 for informational text when using four passages, two raters,
and two occasions, as in the current study. However, this design
required a heavy load for each rater—rating 144 readings on two
separate occasions for a total of 288 passages (see Figure 2). Rat-
ing sessions on both occasions lasted approximately eight hours.
This may be an unrealistic expectation for raters. D studies exam-
ined possible ways to decrease the time and effort for individual
raters to complete the task without compromising the reliability
of the results.
Four alternative designs were examined. The recommended
design is a nested, two-facet (R:S) × P design, by which two raters
score the expressive reading of half of the students on every
passage. In this design, two individuals would rate half of the stu-
dents’ readings of two passages on a single occasion. The effects
of raters, rating occasions, and passages would be negligible, but
the variability related to student differences would remain high.
The percentage of variability attributed to students in this design
would range from 77.2% to 86.6% for narrative text and 72.7% to
82.1% for informational texts (see Tables 3 and 4). The projected
range of reliability coefficients would be .95 to .98 for narrative
text and .92 to .97 for informational (see Figures 3 and 4).
With the recommended design, each of the two raters would
rate 72 passages on one rating occasion rather than 144 ratings
on two separate occasions as required for the original study (see
Figure 5). This design would significantly reduce the time re-
quired for rating oral reading (approximately 4 hours, rather than
16), without sacrificing reliability of scores.
Discussion
Along with the above direction for scoring expressive oral read-
ing of fourth grade students, findings in this study substantiated
the reliability of scores obtained using the Multidimensional Flu-
ency Scale (Rasinski et al., 2006; Zutell & Rasinski, 1991). This re-
search also demonstrated the importance of raters collaborating
and coming to consensus and teachers locating reading materials
appropriate to students’ reading abilities.
TABLE 3 Variability in Ratings of Expressive Oral Reading of Narrative Text Attributed to Each Facet and Interactions Among
Facets—Recommended Design

Students (S) 0.5986 77.2 0.5946 80.3 0.6288 85.7 0.7617 86.6
Passages (P) 0.0 0.0 0.0095 1.3 0.0 0.0 0.0093 1.1
R:S 0.0 0.0 0.0 0.0 0.0139 1.9 0.0 0.0
SP 0.0587 7.6 0.0183 2.5 0.0145 2.0 0.0601 6.8
Residual 0.1181 15.2 0.1181 15.9 0.0764 10.4 0.0486 5.5
71
72
TABLE 4 Variability in Ratings of Expressive Oral Reading of Informational Text Attributed to Each Facet and Interactions
Among Facets—Recommended Design

Students (S) 0.5355 72.7 0.5327 75.3 0.6308 79.0 0.6688 82.1
Passages (P) 0.0046 1.0 0.0083 1.2 0.0 0.0 0.0256 3.1
R:S 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
SP .0371 5.0 0.0 0.0 0.0008 0.0 0.0438 5.4
Residual 0.1600 21.7 0.1667 23.5 0.1667 20.9 0.0764 9.4
FIGURE 2 Design used in study: Two raters rate 144 readings each on two occa-
sions.
Reliability of Scores Using the Multidimensional Fluency Scale
Ratings of expressive reading require students to read multiple

passages to obtain reliable results. Depending on the purpose
of the study and the resources available, researchers could have
FIGURE 3 Generalizability coefficients for narrative expressive reading when

varying the number of passages and raters (color figure available online).
FIGURE 4 Generalizability coefficients for informational expressive reading

when varying the number of passages and raters (color figure available online).
FIGURE 5 Recommended design: Two raters rate 72 readings each on two oc-
casions.
students read a minimum of two and preferably three equivalent

passages to obtain reliable scores using the Multidimensional
Fluency Scale with fourth grade students. If the raters demonstrate
consistency, two raters could score students’ expressive reading
of these passages independently on one rating occasion. This

study demonstrated that increasing the number of passages
read provided greater increases in reliability than increasing
the number of raters. However, increasing the number of raters
provided slightly larger increases in reliability than increasing the
number of rating occasions.
Findings from this study support claims made by Zutell and
Raskinski (1991) that the MFS can be used to obtain accurate
scores of expressive reading. While many different assessment
tools are available, we recommend that researchers use the MFS as
a reliable measure of students’ oral reading expression. Our study

was conducted with fourth grade students. Other work needs to
be done with students at various ages and conditions, such as En-
glish Language Learners, students with reading difficulties, and
students of different grade levels. Additional research is needed
to examine the reliability of other oral reading expression assess-
ment tools as well.
Rater Collaboration
Rater effects were not evident in this study—both inter-rater and

intra-rater reliability were obtained, likely due to the careful col-
laboration and practice of the two raters. The raters spent three
sessions of approximately two hours each practicing using the
MFS to rate fourth graders’ oral readings of narrative and infor-
mational passages. These readings were not among the sample for
the study but were done in preparation. Raters felt more confi-
dent in scoring readings on both the upper and lower ends of the
scale. Since more discrepancies in these sessions occurred with
the middle scores than with more distinct high or low readers, we
recommend that raters spend time discussing differences between
ratings in these middle ranges. Time spent in clarifying decisions
and resolving disagreements in the preparation phase of the study
allowed for more consistent results.
Difficulty of Informational Passages
One difficulty in this study was locating appropriate passages for

students to practice and for researchers to use in assessment. We
began by examining passages that fourth-grade teachers might
readily have available to them—basal reading passages and con-

tent area textbooks. Unfortunately, we found that while these pas-
sages were written for children reading at a fourth grade level,
most passages had much higher readability scores, especially the
informational passages. Allington (2002) also expressed concern
about the difficulty levels of reading materials students are ex-
pected to read and reported that content area textbooks are of-
ten written two to three grade levels higher than the target grade
level. For this reason, we sought appropriate passages from other
sources commonly found in classroom, school, and community
libraries. We recommend that steps be taken to ensure that stu-

dents have access to informational texts that they can successfully
read and learn from. Publishers need to take extra care in prepar-
ing materials for publication in content area textbooks and other
informational texts. Teachers need to look for trade books to sup-
plement textbooks.
Conclusion
Although it is important to examine the processes of accurate

word identification and rate of oral reading, these components
of fluency must not eclipse others simply because they are easy to
measure. Expressive oral reading is more difficult to assess, but it
is essential when trying to obtain a broader view of children’s flu-
ent reading abilities. Fluency is not an end; rather it is inseparably
connected to comprehension. While expressive reading requires
judgments by raters, this study shows that their decisions can be
made reliably.
References
Afflerbach, P., Pearson, P., & Paris, S. G. (2008). Clarifying differences between
reading skills and reading strategies. The Reading Teacher, 61(5), 364–373.
Allington, R. L. (2002). You can’t learn much from books you can’t read. Educa-
tional Leadership, 60(3), 16–19.
Beaver, J. M., & Carter, M. A. (2003). Developmental reading assessment. Parsippany,
NJ: Pearson Education.
Brennan, R. L. (1992). Elements of generalizability theory (2nd ed.). Iowa City, IA:
ACT Publications.
Brennan, R. L. (2001). Generalizability theory. New York, NY: Springer-Verlag.
Chard, D. J., Pikulski, J. J., & McDonagh, S. H. (2006). Fluency: The link etween
decoding and comprehension for struggling readers. In T. Rasinski, C. Bla-
chowicz, & K. Lems (Eds.), Fluency instruction: Research-based best practices (pp.
39–61). New York, NY: Guilford.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The depend-
ability of behavioral measurements: Theory for generalizability of scores and profiles.
New York, NY: Wiley.
Dowhower, S. L. (1987). Effects of repeated reading on second-grade transi-
tional readers’ fluency and comprehension. Reading Research Quarterly, 22(4),
389–406.
Dowhower, S. L. (1991). Speaking of prosody: Fluency’s unattended bedfellow.
Theory Into Practice, 30(3), 165–175.
Eldredge, J. L. (2005). Foundations of fluency: An exploration. Reading Psychol-

ogy, 26, 161–181.
Johns, J., & Berglund, R. L. (2006). Fluency: Strategies and assessment. Dubuque,
IA: Kendall Hunt.
Kuhn, M. R. (2004). Helping students become accurate, expressive readers: Flu-
ency instruction for small groups. The Reading Teacher, 58(4), 338–344.
Kuhn, M. R. (2005). A comparative study of small group fluency instruction.
Reading Psychology, 26, 127–146.
LaBerge, D., & Samuels, S. J. (1974). Toward a theory of automatic information
processing in reading. Cognitive Psychology, 6, 293–323.
Leslie, L., & Caldwell, J. (2011). Qualitative reading inventory–5. Boston, MA: Pear-
son.
Micro Power and Light Co. (2005). Readability calculations (Version 7.0) [Computer
software]. Dallas, TX: Micro Power and Light Co.
National Reading Panel. (2000). Teaching children to read: An evidence-based assess-
ment of the scientific research literature on reading and its implications for reading
instruction. Washington, DC: National Institute of Child Health and Human
Development.
O’Shea, L. J., Sindelar, P. T., & O’Shea, D. J. (1985). The effects of repeated
readings and attentional cues on reading fluency and comprehension. Journal
of Reading Behavior, 27(2), 129–142.
Pinnell, G. S., Pikulski, J. J., Wixson, K. K., Campbell, J. R., Gough, P. B.,
& Beatty, A. S. (1995). Listening to children read aloud: Data from NAEP’s
integrated reading performance record (IRPR) at grade 4. Washington, DC:
Office of Educational Research and Improvement, U.S. Department of
Education.
Prescott-Griffin, M. L., & Witherrell, N. L. (2004). Fluency in focus: Comprehension
strategies for all young readers. Portsmouth, NH: Heinemann.
Quirk, M., Schwanenflugel, P. J., & Webb, M. (2009). A short-term longitudinal
study of the relationship between motivation to read and reading fluency skill
in second grade. Journal of Literacy Research, 41(2), 196–227.
Rasinski, T. V. (2000). Speed in reading does matter. The Reading Teacher, 54(2),
146–151.
Rasinski, T. V. (2003). The fluent reader: Oral reading strategies for building word recog-
nition, fluency, and comprehension. New York, NY: Scholastic.
Rasinski, T. V., Blachowicz, C., & Lems, K. (Eds.). (2006). Fluency instruction:
Research-based best practices. New York, NY: Guilford.
Reutzel, D. R. (2006). “Hey, teacher, when you say ‘fluency,’ what do you mean?”:
Developing fluency in elementary classrooms. In T. Rasinski, C. Blachowicz,
& K. Lems (Eds.), Fluency instruction: Research-based best practices (pp. 62–85).
New York, NY: Guilford.
Saenz, L. M., & Fuchs, L. S. (2002). Examining the reading difficulty of sec-
ondary students with learning disabilities: Expository versus narrative text.
Remedial and Special Education, 23, 31–41.
Samuels, S. J. (1979). The method of repeated readings. The Reading Teacher, 32,
403–408.
Samuels, S. J. (2002). Reading fluency: Its development and assessment. In S. J.
Samuels & A. E. Farstrup (Eds.), What research has to say about reading instruction
(3rd ed., pp. 166–183). Newark, DE: International Reading Association.
Samuels, S. J., Schermer, N., & Reinking, D. (1992). Reading fluency: Techniques
for making decoding automatic. In S. J. Samuels & A. E. Farstrup (Eds.), What
research has to say about reading instruction (2nd ed., pp. 124–144). Newark, DE:
International Reading Association.
Schreiber, P. A. (1991). Understanding prosody’s role in reading acquisition.
Theory Into Practice, 30(3), 158–164.
Schwanenflugel, P. J., Hamilton, A. M., Kuhn, M. R., Wisenbaker, J., & Stahl, S. A.
(2004). Becoming a fluent reader: Reading skill and prosodic features in the
oral reading of young readers. Journal of Educational Psychology, 96, 119–129.
Schwanenflugel, P. J., Meisinger, E. B., Wisenbaker, J. M., Kuhn, M. R., Strauss,
G. P., & Morris, R. D. (2006). Becoming a fluent and automatic reader in the
early elementary school years. Reading Research Quarterly, 41, 496–522.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury
Park, CA: Sage.
Shavelson, R. J., Webb, N. M., & Rowley, G. L. (1989). Generalizability theory.
American Psychologist, 44(6), 922–932.
Stanovich, K. E. (1980). Toward an interactive-compensatory model of individual
differences in the development of reading fluency. Reading Research Quarterly,
1, 33–71.
Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of indi-
vidual differences in the acquisition of literacy. Reading Research Quarterly, 2,
360–406.
Young, A. R., Bowers, P. G., & MacKinnon, G. E. (1996). Effects of prosodic mod-
eling and repeated reading on poor readers’ fluency and comprehension.
Applied Psycholinguistics, 17, 59–84.
Zutell, J., & Rasinski, T. V. (1991). Training teachers to attend to their students’
oral reading fluency. Theory Into Practice, 30(3), 211–217.

Reading Psychology: Click For Updates

Uploaded by

Copyright:

Available Formats

Reading Psychology: Click For Updates

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reading Psychology: Click For Updates

Uploaded by

Copyright:

Available Formats

This article was downloaded by: [Tulane University]

On: 01 February 2015, At: 01:09

Click for updates

To link to this article: http://dx.doi.org/10.1080/02702711.2012.675417

PLEASE SCROLL DOWN FOR ARTICLE

ISSN: 0270-2711 print / 1521-0685 online

RELIABILITY OF RATINGS OF CHILDREN’S

Brigham Young University, Provo, Utah

Early definitions of fluency focused primarily on processes of word

Address correspondence to Timothy G. Morrison, 205 MCKB, Brigham Young Uni-

processes that are automatic (e.g., letter and word identification),

and reading rate (Chard, Pikulski, & McDonagh, 2006; Eldredge,

FIGURE 1 Multidimensional Fluency Scale.

created by National Assessment of Educational Progress (NAEP),

reading using any of these assessment tools relies on ratings of

ative inferences about a student’s ability to read with expression.

1. What percentage of the variability in the ratings of elements of

Participants were selected from two elementary schools in the

informational text (n = 18).

Reading materials for fluency practice were selected using four

The 7-week practice period consisted of three consecutive ses-

three informational passages matched to their independent read-

a four-point scale. It was more difficult to determine differences

area of phrasing, if a student regularly attended to punctuation

The data that were analyzed to answer the research questions

of variance. G theory is particularly appropriate for rating expres-

crossed with students, raters, and rating occasions. This produced

and rating occasions would be necessary to optimize the reliabil-

By increasing the number of conditions of a facet in a measurement (e.g.,

Following generalizability theory guidelines, G studies were

Results of the generalizability studies are presented first, followed

Two separate generalizability studies were conducted as part of

reading of narrative text. The second G study estimated the

The percentage of variability attributed to raters, rating oc-

Information from G studies was used to conduct D studies to

Expression and Volume Phrasing Smoothness Pace

Estimated Percent Estimated Percent of Estimated Percent of Estimated Percent of

Students 0.6314 79.2 0.5948 80.2 0.6478 80.7 0.7437 83.3

Expression and Volume Phrasing Smoothness Pace

Estimated Percent Estimated Percent Estimated Percent of Estimated Percent of

Students 0.5327 70.7 0.5377 79.6 0.6720 81.8 0.6730 81.2

Results from the D studies showed very high reliability coef-

Expression and Volume Phrasing Smoothness Pace

Estimated Percent Estimated Percent of Estimated Percent of Estimated Percent of

Expression and Volume Phrasing Smoothness Pace

Estimated Percent Estimated Percent of Estimated Percent of Estimated Percent of

Reliability of Scores Using the Multidimensional Fluency Scale

Ratings of expressive reading require students to read multiple

FIGURE 3 Generalizability coefficients for narrative expressive reading when

FIGURE 4 Generalizability coefficients for informational expressive reading

students read a minimum of two and preferably three equivalent

of these passages independently on one rating occasion. This

a reliable measure of students’ oral reading expression. Our study

Rater effects were not evident in this study—both inter-rater and

Difficulty of Informational Passages

One difficulty in this study was locating appropriate passages for

readily have available to them—basal reading passages and con-

libraries. We recommend that steps be taken to ensure that stu-

Although it is important to examine the processes of accurate