Running Head: EARLY-GRADES TEXT COMPLEXITY
Important Text Characteristics for Early-Grades Text Complexity
Jill Fitzgerald
MetaMetrics and The University of North Carolina at Chapel Hill
Emerita and Research Professor
Jeff Elmore
MetaMetrics
Heather Koons
MetaMetrics and The University of North Carolina at Chapel Hill
Elfrieda H. Hiebert
TextProject and The University of California at Santa Cruz
Kimberly Bowen
Eleanor E. Sanford-Moore
MetaMetrics
A. Jackson Stenner
MetaMetrics and The University of North Carolina at Chapel Hill
May 7, 2014
Please do not quote or disseminate without first author permission.
This is an Accepted Manuscript of an article that is in press in the
Journal of Educational Psychology, available online at
http://dx.doi.org/10.1037/a0037289.
Early-Grades Text Complexity
2
Abstract
The Common Core set a standard for all children to read increasingly complex texts throughout
schooling. The purpose of the present study was to explore text characteristics specifically in
relation to early-grades text complexity. Three-hundred-fifty primary-grades texts were selected
and digitized. Twenty-two text characteristics were identified at four linguistic levels, and
multiple computerized operationalizations were created for each of the 22 text characteristics. A
researcher-devised text-complexity outcome measure was based on: teacher judgment of text
complexity in the 350 texts; and text complexity as gauged from student responses using a maze
task for a subset of the 350 texts. Analyses were conducted using a logical analytical progression
typically used in machine-learning research. Random forest regression was the primary statistical
modeling technique. Nine text characteristics were most important for early-grades text
complexity including word structure (decoding demand and number of syllables in words), word
meaning (age of acquisition, abstractness, and word rareness), and sentence and discourse-level
characteristics (intersentential complexity, phrase diversity, text density/information load, and
non-compressibility). Notably, interplay among text characteristics was important to explanation
of text complexity, particularly for subsets of texts.
Keywords: Text complexity, early-grades reading, random forest regression, machine-learning
Early-Grades Text Complexity
3
Early-Grades Text Complexity
4
Important Text Characteristics for Early-Grades Text Complexity
The United States Common Core State Standards (CCSS) for English Language Arts
(National Governors Associate Center for Best Practices [NGACBP] & Council of Chief State
School Officers [CCSSO], 2010) bring unprecedented attention to the nature of texts that
students read. The goal of the Standards is for high school graduates to be well prepared for
college and workplace careers. The ability to read college-and-workplace texts plays a prominent
role in the Standards for that preparation. Citing prior evidence of a current-day gap between the
text-complexity levels at high-school graduation and college and workplace (e.g., ACT, 2006;
Williamson, 2008), the CCSS authors set a challenging standard for all students to be able to
“comprehend texts of steadily increasing complexity as they progress through school . . .” (NGA
& CCSSO, 2010, Appendix A). The foundation for students’ ability to read increasingly complex
texts begins in early-reading exposure, and considerable controversy and debate has focused
attention on the potential impact of the text-complexity Standard for young readers (e.g., Hiebert,
2012; Mesmer, Cunningham, & Hiebert, 2012). As educators attempt to support youngsters to
read increasingly complex texts, early-grades teachers need a sound understanding of what
makes texts more or less complex for young students who are beginning to learn to read. An
empirically-based understanding of text complexity for early-grades readers is critical for
practical reasons and should also contribute to development of theoretical modeling of text
complexity. The purpose of the present study was to explore text characteristics specifically in
relation to early-grades text complexity. The research questions addressed in the study were: a)
Which text characteristics are most important for early-grades text complexity; b) Is there
interplay of text characteristics in relation to text complexity, and if there is, can any aspects of
the interplay be described? The research questions were addressed using computer-based
Early-Grades Text Complexity
5
analysis of texts. The present study makes an additional contribution to the educational research
literature in that a statistical approach and methodological sequence unique in the educational
research literature were used—random forest regression in conjunction with a machine-learning
research paradigm.
What is Text Complexity?
On a broad stage, in science writ large, “complexity” has overtaken “parsimony” as a
focal interest in both physical and social sciences. Scientists increasingly aim to understand
complexity as it exists naturally in the world—as opposed to more traditional efforts to reduce
natural occurrences to some fundamental simplicity (e.g., Bar-Yam, 1997). The seminal
philosophical definition of complexity may be attributed to Rescher (1998, p. 1)—“Complexity
is . . . a matter of the number and variety of an item’s constituent elements and of the
elaborateness of their interrelational structure, be it organizational or operational.” Complexity
theory suggests that although the complexity of some objects, events, or actions may not be fully
understood, three essential elements of complex systems can be pinpointed and characterized
(Bar-Yam, 1997; Kauffman, 1995). First, in general, complex systems involve a large number of
mutually interacting parts, but even a small number of interacting components can behave in
complex ways (Bar-Yam, 1997; Albert & Barabási, 2002). When complexity occurs, a reciprocal
relationship exists between parts and wholes. Ensembles are influenced by the distinct elements,
but the distinct elements are also influenced by the whole of the ensemble (Merlini Barbaresi,
2003). Second, however, there is usually a limit to the number of parts the researcher has primary
interest in, and paradoxically, for practical and research purposes, often summative description of
a complicated system may require description as a particular few-part system where the few-part
Early-Grades Text Complexity
6
system retains the character of the whole (Bar-Yam, 1997). Third, most complex systems are
purposive, and there is often a sense in which the systems are engineered (Bar-Yam, 1997).
Following suit, for the present study, a dynamic systems definition of text complexity
was embraced. First, “text” is defined as “. . . an organized unit, whose various components or
levels are recognized to give autonomous contributions to the global effect . . .” (Merlini
Barbaresi, 2002, p. 120). Second, text complexity is “. . . a dynamic configuration resulting from
the contributions of complex phenomena, as they occur at the various text levels” and across text
levels (Merlini Barbaresi, 2003, p. 23). The CCSS text-complexity definition further undergirded
the present work—text complexity is “the inherent difficulty of reading and comprehending text
combined with consideration of the reader and task variables” (NGA & CCSSO, 2010, Appendix
A, Glossary of Key Terms, p. 43). The Common Core definition is embedded in a systems
outlook in which complexity arises among reader, printed text, and situation during the whole of
a reading act. That is, when engaged in a specific reading encounter, complexity is in some
degree relative to an individual and to contextual characteristics (such as age or developmental
reading level or degree of teacher support while reading). Concomitantly, complexity of
particular texts is relative to populations of readers at different ages or reading ability levels (cf.
Miestamo, 2006 and Kusters, 2008 on relative versus absolute complexity; van der Sluis & van
den Broek, 2010). That is, when viewed on a continuum of complexity in relation to many
readers’ developmental levels, texts have an emergent nature and can be assigned a “complexity
level” to situate them on an entire continuum. The stance is consistent with theories of reading
dating back to Rosenblatt’s expositions on reading as transactional (1938; 2005) and Rumelhart’s
(1985) explanation of reading as interactive, and more recently to the widely accepted Rand
Reading Study Group model of reading (Snow, 2002). For example, in the Rand Reading Study
Early-Grades Text Complexity
7
Group model, text is squarely rooted in an interaction with the reader as reading happens during
an activity within a particular social context. The stance is also consistent with Mesmer,
Cunningham, and Hiebert’s (2012) exposition of early-grades text characteristics in that they also
address text complexity as situated within individual and social/instructional contexts.
Commensurate with the three essential elements named above for complex systems, for
the present study, we assumed: (a) that early-grades texts are complex systems consisting of
many mutually interacting characteristics and ensembles of characteristics that interplay to
impact text complexity, and the characteristics can be quantitatively measured; (b) to begin to
understand the text-characteristic functioning, we would need to consider an organizational
scheme for the characteristics and explore whether and how characteristics interact; and (c) the
complexity of early-grades texts purposefully exists, that is, it is in some sense engineered, to
support young children to learn to read with as much ease as possible. As well, exploration of
interplay among text characteristics would be essential to successful explanation of text
complexity.
Which Text Characteristics Might Matter Most for Early-Grades Text Complexity?
An “optimal” text is one in which text characteristics are configured such that readers can
construct meaning while engaged with the text with the greatest amount of ease and the greatest
depth of processing (cf. Merlini Barbaresi, 2003 on optimality theory and Juola, 2003 on the
necessity of complex systems to reflect “process,” including cognitive process). Text authors
may consciously or unconsciously use optimality when creating texts for particular audiences.
Generally, authors must make trade-off choices between favoring readers’ processing ease
(efficiency) and readers’ processing depth (effectiveness), and the point of balance between the
two is constrained by intended uses of the text, including intended readers of the text (cf. Merlini
Early-Grades Text Complexity
8
Barbaresi, 2003 who references the trade-offs, but in recognition of how an author develops a
text rather than in reference to readers/audience). For example, in content-laden disciplinary
texts, readers’ processing depth (effectiveness) is often given preference over readers’ processing
ease (efficiency). Early-grades texts are generally created to heighten certain factors related to
children’s processing ease (such as word decodability), while simultaneously requiring a
relatively low level of processing depth, that is, requiring little effort for meaning creation.
Further, some evidence suggests that text characteristics do influence the early word-reading
strategies that young children develop (Compton, Appleton, & Hosp, 2004; Juel & RoperSchneider, 1985). For example, in one study, when tested on novel words, young students who
read highly decodable texts outperformed other students who primarily read texts with repetition
of high-frequency words (Juel & Roper-Schneider, 1985).
The concept of optimality suggests that different text characteristics might be more
important at certain levels of students’ reading development than at others, leading directly to
consideration of which characteristics of text might be related to the development of students’
emergent reading ability. A deep research base suggests that, while meaning creation is at the
heart of learning to read, “cracking the code” requires focal effort for beginning readers, and
critical cognitive factors inherent in the early learning-to-read phase are development of
phonological awareness and word recognition (e.g., Adams, 1990; Fitzgerald & Shanahan,
2000). As a result, hypothetical critical text characteristics that would support early word-reading
development are, for example, texts that are comprised of: repetition of simple words which
likely facilitates sight word development and orthographic-pattern knowledge (e.g., Metsala,
1999; Vadasy, Sanders, & Peyton, 2005); words with relatively simple orthographic
configurations which facilitates orthographic-pattern knowledge (e.g., Bowers & Wolf, 1993);
Early-Grades Text Complexity
9
rhyming words which may advance phonological awareness (e.g., Adams, 1990); words that are
familiar in meaning in oral language which likely reduce challenges to meaning creation while
reading, permitting more attention to word recognition (e.g., Muter, Hulme, Snowling, &
Stevenson, 2004); and repeated refrains or repetitive phrases which likely reinforce phonological
awareness and development of sight words along with varied word recognition strategies such as
using context to make guesses at unknown words (e.g., Ehri & McCormick, 1998; cf.
Bazzanella, 2011 on multiple of functions of repetition in oral discourse, including cognitive
facilitation). Moreover, inclusion of several types of text-characteristic support might
exponentially boost students’ ease of learning about code-related facets of reading.
Consequently, to describe early-grades text complexity, it is theoretically necessary to
consider several text characteristics at multiple linguistic levels (Graesser & McNamara, 2011;
Graesser, McNamara, & Kulikowich, 2011; Kintsch, 1998; Snow, 2002). Studying linguistic
levels in text complexity is compatible with research that suggests that hierarchy is one of the
central architectures of complexity (Simon, 1962). The research base supporting the importance
of multiple levels of texts characteristics for early phases of learning to read is extensive and
comprehensive (Mesmer, et al., 2012). Only illustrative citations are provided in the following
summary (which compares to Mesmer, et al., 2012).
Beginning readers learn to attach specific sounds to graphemes and vice versa (e.g.,
Fitzgerald & Shanahan, 2000), and the research base on the importance of phonological activity
is extensive (e.g., Schatschneider, Fletcher, Francis, Carlson, & Foorman, 2004). Other aspects of
word-level features have also received wide attention in early-grades texts. In particular, word
structure (how a word is configured) and word frequency (the degree to which a word occurs in
spoken or written language) have deep research bases. With regard to word structure, letter-
Early-Grades Text Complexity
10
sound regularity in words is highlighted in decodable and linguistic texts where significant
attention is paid to word rimes and bigrams and trigrams (two and three letter units). Such texts
have been shown to have positive impact on oral reading accuracy, but not on comprehension or
other global measures of reading (e.g., Compton, et al., 2004). With regard to word familiarity,
many early grades texts are designed to include repetition of high-frequency words. Children’s
accuracy and speed of recognition is influenced by word frequency (e.g., Howes & Solomon,
1951).
The importance of knowing key meanings in texts has been well substantiated in relation
to its impact on comprehension (e.g., Stanovich, 1986), and some evidence suggests that young
students may benefit from texts with easier and more familiar vocabulary (e.g., Hiebert & Fisher,
2007). However, current-day early-grades texts may contain a fairly large amount of challenging
word meanings (e.g., Foorman, Francis, Davidson, Harm, & Griffin, 2004). In general, words
that occur with higher frequency are processed more quickly and tend to be associated with
networks of knowledge (Graesser, McNamara, & Kulikowich, 2011). In addition to word
frequency, other word meaning factors, including imageability, concreteness, and age of word
acquisition, have been shown to be significant for students’ comprehension and/or word
recognition during reading (e.g., Woolams, 2005).
Within-sentence syntax is primarily related to the ease or challenge for creating meaning
while reading as opposed to word recognition (Mesmer, et al., 2012). The importance of withinsentence syntax in texts is likely due to the extent to which complexity within a sentence places
demands on children’s working memory (Graesser, et al., 2011).
Discourse-level text characteristics impact aspects of reading in general (Graesser, et al.,
2011) and are likely to be related to early reading. For example, referential cohesion—occasions
Early-Grades Text Complexity
11
when a noun, pronoun, or noun phrase reference another element in the text—has been shown to
be related to reading time and comprehension (e.g., McNamara & Kintsch, 1996). More cohesive
texts tend to facilitate comprehension, likely because they support mental model building
(Kintsch, 1998). It has long been known that even young readers have expectations for story
structures that they tend to use to guide comprehension, although young students tend to reveal
such expectations to a lesser extent than do older students (e.g., Whaley, 1981; Mandler &
Johnson, 1977). As well, better readers make use of informational text structures for
comprehension and recall (Britton, Glynn, Meyer, & Penland, 1982). A final potential discourselevel text characteristic is genre, generally considered by linguists and discourse analysts to be a
slippery construct (Rudrum, 2005; Steen, 1999). However, questions remain about the
relationship between genres and text complexity, especially with regard to identification of
various genres according to specific text features (e.g., Mesmer, et al., 2012). For instance,
findings on the view that narratives are easier texts than other genres are mixed (e.g., Langer,
Campbell, Neuman, Mullis, Persky, & Donahue [1995] supported the view, while Duke [2000]
did not).
In addition to considering which sorts of text characteristics might be especially
important for examining early-grades text complexity, it is essential to embrace potential
interplay among various text characteristics. Theoretically, the emergent nature of text
complexity is in part due to the challenge level of the constituent elements, but it may also
develop through the interplay of the elements (Merlini Barbaresi, 2003). Complex systems tend
to have subsystems that may conflict depending on their “targets,” and to attain a successful
result, subsystems need to co-operate towards a compromise solution (Merlini Barbaresi, 2003;
cf. Gamson et al., 2013 on text characteristic “trade offs”; Gervasi & Ambriola, 2003). That is,
Early-Grades Text Complexity
12
text characteristics at different linguistic levels may have conflicting impact on readers (their
“targets”). For instance, an author may choose to write a text for second-grade students about a
content-area topic, such as sound waves, requiring heavily laden vocabulary meanings that may
make the text quite complex for young readers. But the words may also be technically
challenging for word recognition. As an ensemble, difficult vocabulary meanings coupled with
high decoding demand can magnify complexity exponentially. The author might consider ways
of lessening the burden on the reader by employing other text-level characteristics, such as using
a within-sentence syntactic pattern that is generally familiar to typically-developing secondgrade students or inserting parenthetical definitions after difficult word meanings, or at the
discourse level, placing main ideas first in paragraphs. As another example, there is evidence that
concreteness/abstractness, or imageability interacts with structural complexity and word
familiarity to influence readers’ word recognition (e.g., Schwanenflugel & Akin, 1994). In short,
constellations of co-occurring linguistic characteristics may contribute to variation in text
complexity (Biber, 1988).
Measuring Text Complexity Quantitatively
Several established computerized systems address text-complexity beyond the early
grades through quantitative measurement. They are summarized here to provide context for the
present study: readability formulae that are typically focused on word frequency, word length,
and/or sentence length (e.g., Klare, 1974-1975; ATOS, n.d.; REAP Readability Tool, n.d.);
conjoint measurement systems that relate students’ reading levels to text-complexity levels on
the same scale, identifying collections of text characteristics (typically a small set such as word
frequency and within-sentence syntax) that serve as “best predictors” of text complexity levels
(e.g., the Lexile Framework for Reading [Stenner, Burdick, Sanford, & Burdick, 2006] and
Early-Grades Text Complexity
13
Degrees of Reading Power [DRP] [Koslin, Zeno, & Koslin, 1987]); and natural language
processing analyses involving multiple text characteristics (e.g., Coh-Metrix [Graesser,
McNamara, & Kulikowich, 2011; McNamara, Graesser, McCarthy, & Cai, 2014], Reading
Maturity Metric [n.d.], and SourceRater [Sheehan, Kostin, Futagi, & Flor, 2010]). The systems
may be differentiated in the following ways: (a) All measures except Coh-Metrix provide a
single text-complexity quantitative judgment of texts’ complexity levels. Some do so using grade
levels, others use their own leveling system. (b) Only Lexile and DRP measures are relational to
readers, that is, they are originally based on individuals’ reading of the texts—except that the
SourceRater measure uses an “inheritance principle” in which the original outcome variable used
in the predictor equation was educators’/publishers’ assignment of text grade levels. Other
measures examine text characteristics and then use a form of dimension reduction, such as
Principal Components Analysis to determine essential components of text complexity. (c) CohMetrix and SourceRater quantify the broadest number of text characteristics and include
discourse-level text characteristics in their analyses.
Across the various systems, the most common text characteristics that are best predictors
of text complexity are word familiarity, word length, sentence syntax, and/or sentence length.
The SourceRater system involves eight dimensions—syntactic complexity, vocabulary difficulty,
level of abstractness, referential cohesion, connective cohesion, degree of academic orientation,
degree of narrative orientation, and paragraph structure. Coh-Metrix employs 53 text
characteristic measures reduced to five dimensions—narrativity, syntactic simplicity, word
concreteness, referential cohesion, and deep cohesion. Importantly, none of the currently existing
common metrics specifically provides explanation of what constitutes early-grades text
complexity (cf. Graesser, et al., 2011 and van der Sluis & van den Broek, 2010).
Early-Grades Text Complexity
14
Summary
As the Common Core text-complexity standard is implemented in schools, educators and
researchers alike need an empirically-based understanding of text complexity for early-grades
readers. Complexity theory provides a foundation for studying early-grades text complexity. Key
principles of complex systems are that: they involve a large number of mutually interacting parts;
interplay among components can be locally, rather than globally, relevant; they often may be
described by hierarchical organization; and they are purposive, that is engineered for particular
purposes. A relational outlook on text complexity implies complexity of particular texts is
relative to particular individuals, reading occasions, and developmental reading levels. However,
theoretically, texts have an emergent “developmental” complexity such that they can be assigned
a complexity level in relation to an entire continuum of complexity. Using an “optimality”
concept in conjunction with what is known about critical cognitive factors for the early learningto-read phase and prior findings about the importance of selected text characteristics during early
reading, not only should many text characteristics at multiple linguistic levels be investigated,
but interplay among text characteristics should be hypothesized. Few of the prior text-complexity
measurement systems encompass discourse-level characteristics, few address text complexity as
relational within either specific reading occasion or in the sense of student reading-ability
development, none addresses the interplay or potential interactive nature of text characteristics,
and importantly, none specifically addresses early-grades text complexity. In the present study, a
relational frame is used to explore text characteristics that matter most for early-grades texts, and
the potential interplay of text characteristics is naturally accounted for through use of a statistical
modeling technique that is prevalent in many fields, but novel to educational research, that is,
random forest regression.
Early-Grades Text Complexity
15
Methods
Overview
Three-hundred-fifty primary-grades texts were selected and digitized. Twenty-two textcharacteristics were identified at four linguistic levels. Multiple computerized variable
operationalizations were created for each of the 22 text characteristics, totaling 238 variables.
The variables were automated so that a computer could examine the digitized texts and produce
text-complexity measures for each operationalization. Analyses were conducted using a logical
analytical progression typically used in machine-learning research (Mohri, Rostamizadeh, &
Talwalker, 2012). Three phases of analyses were: variable selection to find a subset of the most
important text characteristics out of the 238 operationalizations; using 80% of the texts,
“training” a random forest regression model (Breiman, 2001a) of the most important text
characteristics associated with text-complexity level; and validating the model on a 20% “holdout” set of texts. Follow-up analyses were done to explore the data structure.
Texts
Three-hundred-fifty texts (148,068 words in total) intended for kindergarten through
second-grade constituted the text base. An existing larger corpus of early-grades texts was made
available for the study (MetaMetrics, n. d.a), and maximum-variation purposive selection
(Patton, 1990) was used to choose texts from the corpus. As well, 18 kindergarten through
second-grade Common Core State Standards (NGA & CCSSO, 2010, Appendix B) exemplar
texts (that were not present in the available corpus) were purchased. The goal of maximumvariation purposive selection was to ensure comprehensive representation of a wide variety of
early-grades text types, text levels, and publishers that currently exist in U. S. early-grades
classrooms. We chose 350 texts for two main reasons: (a) to include a sufficiently large number
Early-Grades Text Complexity
16
of texts that would adequately represent the domain and to ensure sound statistical analyses
(following the suggested sample size in Heldsinger & Humphry, 2010); and (b) to include a
manageable set of texts to accomplish teacher and student tasks needed for development of the
text-complexity-level variable (described below in the section, “Text-Complexity Level”). All
texts were reproduced in authentic form (including pictures) and digitized.
Six categories for commonly occurring early-grades text types for independent reading
were determined: code-based (decodable, phonics), whole-word (texts that include many words
that appear in early-grades texts with high frequency), trade books (books commonly sold for
library, supplementary materials for classroom use, or private sale), leveled books (texts that are
sequenced in difficulty level), texts of assessments, and other (e.g., label books). The first four
text types had been previously identified in studies of classroom texts as reasonably
comprehensive categories of early-grades texts intended for independent reading in primarygrade classrooms (Aukerman, 1984; Hiebert, 2011). The last two categories were included
because texts appearing in assessments also commonly occur in early-grades classrooms, and
texts of assessments may become even more prominent with the advent of the Common Core
State Standards (NGA & CCSSO, 2010). Some commonly occurring early-grades texts, such as
label books, do not fit well into the previous categories. The first four category labels are
common terms used by educators and publishers (Mesmer, 2006).
It was not possible to consider proportional representation of types as they exist in United
States classrooms because to our knowledge there is no direct evidence of the degree to which
different categories of early-grades texts are present or used in United States classrooms, though
at least one survey of United States primary grades teachers suggested that use of the first four
categories of texts is widespread (Mesmer, 2006). Consequently, we selected “prototypes” to
Early-Grades Text Complexity
17
represent each category (Hiebert & Pearson, 2010), using texts and, where series existed, texts
were sampled from the range in the series. In reality, many early-grades texts fall into two or
more of the category types (Mesmer, 2006). For example code-based texts are often “leveled.”
However, for our purposes of ensuring wide representation of text types, each text was assigned
to a single category. If a text was labeled “decodable” or “phonics” by the publisher, it was
labeled “code-based.” If a publisher characterized a text as primarily attending to high-frequency
words or sight words, it was labeled “whole word.” A text was labeled “trade book” if it was
available in the trade market and not just in the school market, and it was not identified by the
publisher as decodable, phonics, or high-frequency. A text was labeled “leveled” if the text was
assigned a level (other than grade level) by the publisher and was not labeled “decodable,”
“phonics,” or “high frequency.”
Text levels were determined by using publisher-designated grade, level, or age ranges.
Texts were labeled: easy if they were designated kindergarten, kindergarten levels (as noted on
publisher websites), or typical ages for kindergarten; moderately hard if designated first grade,
first-grade levels, or first-grade ages; and hard if designated second grade, second-grade levels,
or second-grade ages.
Thirty-two publishers were represented in the 350 texts, ranging from 3 to 15 different
publishers for each of five of the six text types, with one publisher for the text-of-assessment
type.
Text genre (narrative, informational, hybrid) was determined using a modification of
Duke’s (2000) procedures. Two primary text characteristics were used to discern narrative,
informational, and hybrid text—purpose and textual attributes. Narrative text was defined as
follows (Duke, 2000; Rudrum, 2005): It is a series or sequence of events, with the intention or
Early-Grades Text Complexity
18
purpose to evoke an element of reader response. It tells a “story” and/or has characters, places
events, and things that are familiar, and is closely related to oral conversation. Informational was
defined as text that conveys information about the natural or social world, and is typically written
by someone who is presumed to know the information to someone who is presumed to not know
it (Duke, 2000). Textual attributes for narratives included for instance, events, actions with
temporal or causal links, characters, dialogue. Textual attributes for informational texts included
for example facts, timeless verb constructions, technical vocabulary, descriptions of attributes,
definitions. A set of rules modified from Duke (2000) was devised for determining genre
classification, using a decision tree process that began by determining the purpose of the book
and then addressing attributes of the text. Inter-classifier reliability between two individuals for
20% of the 350 books was .96.
Finally, the text corpus could be described as follows. Caution should be exercised when
interpreting the following figures for the text categories—again, because the categories are not
mutually exclusive. Rather, using the publisher designation in concert with the researcherdevised system described above for when a text could belong to two or more categories, 41% of
the texts were leveled, 17% were code-based, 15% were trade books, 10% were whole-word, 9%
were texts of tests, and 8% were other. Approximately 36% of the 350 texts were labeled easiest,
37% moderately hard, and 27% hardest. Sixty-six percent were labeled narrative, 24%
informational, and 10% hybrid or other.
Variables
Text-Complexity Level. The outcome variable was early-reader text-complexity level
measured using a continuous, developmental scale, with scores ranging from 0 to 100. An
overview of the scale-building procedures is as follows. (Further details of the procedures are
Early-Grades Text Complexity
19
provided in Journal of Educational Psychology Supplementary Material 1 online at [LINK].)
Because text complexity was defined at the intersection of printed texts with students reading
them for particular purposes and doing particular tasks, a multiple-perspective measure of text
complexity was created using student responses during a reading task and teachers’ ordering of
texts according to complexity. In doing so, we represented students and teachers as readers, and
teachers as important context for student reading instruction, as well as two different tasks into
the final measure. Then the magnitude and strength of the association between the two logit
scales was examined, and to arrive at a single scale, a linear equating linking procedure (Kolen &
Brennan, 2004) was used to bring the student results onto a common scale with the teacher
results. Finally, for ease of interpretability, the logit scale was linearly transformed to a 0 to 100
scale.
In the first substudy, through Rasch modeling (Bond & Fox, 2007) a text-complexity
logit scale was created from the interface of 1,258 children from 10 U.S. states reading passages
from a subset of the 350 texts and responding to a maze task (Shin, Deno, & Espin, 2000 for task
validity). Cronbach’s alpha estimates of reliability for test all forms ranged from .85 to .96. Also,
dimensionality assessments for text genre and for differential text ordering according to student
ethnicity, gender, or free-reduced-lunch status suggested no evidence of measurement multidimensionality. After creation of the logit scale, each text in the subset was assigned a textcomplexity level.
In the second substudy, also through Rasch modeling, a second text-complexity logit
scale was created from 90 practicing primary-grades teachers’ (from 33 states and 75 school
districts) evaluations of texts’ complexity. Teachers ordered random pairs of the 350 texts seen
side by side on a computer screen. For each pair, teachers clicked on the text they thought was
Early-Grades Text Complexity
20
more complex. Using the Separation Index method (Wright & Stone, 1999), measurement
reliability was .99. After creation of the logit scale, each of the 350 texts was assigned a textcomplexity level.
Next, the correlation between the two logit scales (N = 89 texts) was .79 (p < .01),
suggesting that the texts ordered on text complexity similarly whether teachers or students were
involved. The relatively high correlation was also evidence of concurrent validity in that it
suggested that the two logit scales were measuring the same construct. Consequently, a linking
equating procedure was used to link the two logit scales (Kolen & Brennan, 2004). Finally, a
linear transformation was done resulting in measures that could range from 0 to 100 on a textcomplexity scale. That is, the 350 texts ordered by teachers could be assigned a measure from 0
to 100, and the texts read by students could be assigned a measure from 0 to 100.
Text characteristics and their variable operationalizations. Twenty-two text
characteristics were identified at four linguistic levels—sounds in words, words, within-sentence
syntax, and across-sentences or discourse level. Discourse-level characteristics captured
repetition, redundancy, and patterning (of letters, words, phrases, and/or sentences) that occurred
in the texts. In an effort to capture a wide variety of ways of representing the text characteristics,
multiple computerized variable operationalizations were created for many of the 22 textcharacteristics, totaling 238 variable operationalizations. The rationale for including as many
variable operationalizations as possible was that different metrics may pinpoint different aspects
of a text characteristic (Baca-Garcia, Perez-Rodriguez, Saiz-Gonzalez, Basurte-Villamor, SaizRuiz, Leiva-Murillo, et al., 2007). By including as many operationalizations as possible, the
chances of capturing critical text characteristics for text complexity were increased.
Early-Grades Text Complexity
21
Table 1 shows the 22 text characteristics according to linguistic level, along with
definitions, the number of variable operationalizations for each, and selected examples of
operationalizations and their possible score ranges and interpretations. A complete list and
description of operationalizations is available as Journal of Educational Psychology
Supplementary Material 2 online at (LINK).
Operationalizations were accomplished using four logical approaches.
First, several types of computational metrics were considered. In addition to traditional
metrics such as counts, mean, and percentage, six specialized computational linguistic techniques
were used to produce other metrics. One specialized computational linguistic technique was
distributional semantics (Landauer & Dumais, 1997), a method for quantifying semantic
similarities between linguistic items. Three additional specialized computational linguistics
techniques were: part-of-speech tagging (Collins, 2002); syntactic parsing (Sleator & Temperly,
1991); and a Levenshtein (1965/1966) metric, which gauges the minimum number of
substitutions, insertions, or deletions required to turn one linguistic unit (e.g., a written word)
into another. Also, two unique metrics that specifically capture text characteristics in relation to
student readers were applied to all of the sounds-in-words variables and most of the word-level
variables—types- (unique words in a text) as-test and words- (all words in a text) as-test. Both
metrics treat the text characteristic of interest as test items, while considering a potential student
who might be reading the text to have a trait level for the characteristic of interest. Both represent
an alternative way to measure central tendency for a distribution of values, and both are more
impacted by outliers than an average. For instance, for a types-as-test operationalization for
syllables (the text characteristic of interest) in a text, the unique words in the text are listed, and
the number of syllables is counted in each word. Then one might hypothesize that a student has a
Early-Grades Text Complexity
22
“syllable-level reading ability” for reading the text. The unique words (types) form a test for
measuring a student’s ability to use syllables to read the text. Each unique word is given an item
difficulty level that is the number of syllables in the word. A target level of hypothetical student
performance is set (50%, 75%, 100% of the items predicted to be correct), and then using Rasch
modeling (Bond & Fox, 2007) the metric determines what level of reader ability would be
expected to attain the percentage that was set. The overall metric (derived from a mathematical
formula) therefore summarizes a “syllable” level of complexity for the text.
A second logical approach was that discourse text characteristics were systematically
treated as follows. The main focus of discourse-level variables was to capture linkages among
words and meanings in text (e.g., cohesion), redundancy, and patterning that occur across a
whole text or parts of text but more than just within sentences. For each discourse text
characteristic, first, variable operationalizations were considered that would reflect a lexical
emphasis or a syntactic (part of speech) emphasis. Second, whether an operationalization
employed lexical or syntactic emphasis, operationalizations could also involve linear activity,
that is adjacent sentences, or they could involve a Cartesian product over sentences (that is,
context beyond adjacent sentences), or they could address both types of activity. As an example,
for the text characteristic, Linear Edit Distance, the lexical-emphasis operationalization uses the
words in two adjacent sentences whereas a syntactical-emphasis operationalization uses parts of
speech for replacement judgments. (Further detail is provided in Supplementary Material 3
online at [LINK].)
A third logical approach was to use existing databases and resources where possible to
create variable operationalizations. The following databases were used. The MRC
Psycholinguistic Database (Coltheart, 1981) “. . . is a machine usable dictionary containing
Early-Grades Text Complexity
23
150,837 words with up to 26 linguistic and psycholinguistic attributes for each . . .” (MRC
Psycholinguistic Database, n. d.). Number of phonemes in words, number of syllables in words,
and indices of word abstractness were extracted from the MRC Psycholinguistic Database. The
Carnegie Mellon University Pronouncing Dictionary (Carnegie Mellon University, n. d.) “ . . . is
a machine-readable pronunciation dictionary for North American English that contains over
125,000 words and their transcriptions.” It was used for variable operationalizations of the text
characteristic, mean internal phonemic predictability. The Kuperman, Stadthagen-Gonzalez, and
Brysbaert (2012) age-of-acquisition ratings for 30,000 English words was used for
operationalizations of the age-of-acquisition text characteristic. The rating indicates the age at
which a word’s meaning is first known. Word frequencies for running text in a corpus of
1.39billion words from 93,000 kindergarten through university texts (MetaMetrics, n.d.b)
normalized to link to Carroll, Davies, and Richman (1971) word frequencies, were used to create
operationalizations for word rareness. The Link Grammar Parser (Link Grammar, n. d.; Sleator
& Temperley, 1991) was used for operationalizations of Grammar. The Parser “. . . is a syntactic
parser of English, based on link grammar, an original theory of English syntax. Given a
sentence, the system assigns to it a syntactic structure, which consists of a set of labeled links
connecting pairs of words” (Link Grammar, n .d.).
Additional existing resources were as follows. The Menon and Hiebert (1999)
decodability scale was slightly modified for operationalizations of the text characteristic,
decoding demand. The scale provides numeric values for varying degrees of within-word
structural complexity. The Dolch (n. d.) lists and the first 660 words on the Fry (n. d.) lists were
used in operationalizations of the text characteristic, Sight Words.
Early-Grades Text Complexity
24
A fourth logical approach was to use techniques to control for factors that might be
considered irrelevant to the measurement of specific text characteristics. One technique used for
some operationalizations of sounds-in-words and word-level text characteristics was stop listing
(Luhn, 1958), which is commonly used in natural language processing computations. Stop listing
means deletion of the highest frequency words that tend to have low semantic value. However,
because it is not known in advance whether deleting highly frequent words matters for
examining text complexity, when stop listing was used for selected text characteristic
operationalizations, the same text characteristics were also operationalized without stop listing.
Another technique was aimed at addressing possible impact of text length on a textcharacteristic value. In general, longer discourse units can be related to increased complexity in
part because inclusion of more material offers more opportunity for additional text characteristics
or higher-levels of individual text characteristics, but also because each addition in a longer
progression of discourse may require additional cognitive integration on the part of the reader
(Merlini Barbaresi, 2003). Many text-characteristic operationalizations employed length control
by using “slices” or “chunks” of text. When slices/chunks were employed, multiple slices/chunks
were obtained from a text, covering the entire text, and then the final metrics were averaged over
slices/chunks.
Analyses
Analyses were accomplished using a machine-learning logical analytical progression
(Mohri, et al., 2012). Random forest regression was used for statistical modeling. The analyses
performed for the present study are among the first to appear in the educational research
literature and therefore deserve some added attention and description here.
Early-Grades Text Complexity
25
The statistical modeling approach. The interdisciplinary team of researchers who
accomplished the present study worked from a statistical modeling approach that is not
commonly used in educational research, but it is an approach that holds promise for some kinds
of educational problems (Strobl, Malley, & Tutz, 2009). Two cultures of statistical modeling
derive from diverse epistemological terrains in which different ways of knowing undergird
different paradigms and procedures (Breiman, 2001b). A classical statistical modeling paradigm
in educational research progresses in a top-down fashion. A theory is created detailing which
constructs hypothetically matter in relation to some outcome(s) and how the constructs are
related to one another. Consideration is given to how the constructs can be measured, a relatively
small set of “predictors” is selected, and the relationships are examined. Often a few interactions
among predictors are hypothesized and represented in the statistical model. The resulting model
is tested statistically through fit of the data to the originating model.
In another statistical culture, the one used in the present research, the counter-culture to
the predominant educational statistical paradigm, although theory can be involved initially (and
was in our work), modeling works in a bottom-up fashion—starting with data (Breiman, 2001b).
In the past years, multivariate data exploration methods have become increasingly popular in
many scientific fields, including health sciences, biology, biostatistics, medicine, epidemiology,
genetics, and most recently, psychology, and in machine-learning communities (Grömping, 2009;
Strobl et al., 2009). “Machine learning” references construction, exploration, and study of
algorithms and models that are “learned” or “trained” from data (Mitchell, 1997). Large amounts
of data are processed, patterns are discovered, and predictor models are built. While some
theoretical background is certainly helpful in discerning key constructs involved in a particular
problem, there is no limit on the number of variables. Rather, all variables that can be imagined
Early-Grades Text Complexity
26
and measured are included as potential predictors. Sometimes, depending on modeling choice,
any and all possible interactions among variables can be accounted for. The result is a model of
the important predictors (and interactions) associated with the outcome. The “goodness” of the
model is tested through its predictive capacity using a previously “unseen” set of data.
Random forest regression. The statistical modeling technique used in the present
research was random forest regression—a non-parametric statistical analysis that involves an
ensemble (or set) of regression trees (often referred to as CART—Classification and Regression
Tree) (Breiman, 2001a; Breiman, Friedman, Olshen, & Stone, 1984). Random forest regression
overcomes limitations of a single regression tree and linear regression for particular
circumstances such as when large numbers of variables are involved (Hastie, Tibshirani, &
Friedman, 2009; Strobl, et al., 2009). It is called an ensemble procedure because predictions
from many decision trees are aggregated to produce a single prediction. Decision tree regression
is based on the principle of recursive partitioning, where the feature space (defined by the
predictor variable operationalizations) is recursively split into regions containing observations (in
our case, texts) with similar response values. The predicted value for a text in a region is the
mean of the response variables for all texts in that region. For example in our study, the many
regressions produce regions or classes where texts have similar text characteristics in relation to
their text-complexity levels. (For a detailed explanation of recursive partitioning, see Strobl, et
al., 2009.) The procedure is called random forest because each individual decision tree is
“trained” using a different random bootstrap sample of the texts and because each split within
each tree is created using a random subset of candidate variables (Grömping, 2009).
(Bootstrapping is a process of repeated resampling of the data, with each sample randomly
obtained with replacement from the original dataset.) Ultimately, from the forest (ensemble) of
Early-Grades Text Complexity
27
trees, a single prediction can be made by calculating a mean of predictions output by the
individual trees (Grömping, 2009).
Essentially, using the available data (in our case, the text-complexity level as outcome
and 238 variable operationalizations for each text as predictors), random forest regression builds
a final model “from the ground up” by aggregating over many individually “trained” models. (To
better understand random forest regression, and partly to better understand why it is potentially
beneficial for analyzing text complexity, comparison to linear regression can be informative. A
detailed comparison is provided in the Journal of Educational Psychology Supplementary
Material 4 online at [LINK?].)
Steps in analyses. Initially, an automated computer analysis was conducted for the 350
digitized texts and the 89 passages that students read, resulting in values for each text and
passage for text-complexity level and for the 238 text-characteristic variable operationalizations.
Then, four analytical phases were accomplished. (a) The first step in analysis was to set baseline
performance. Eighty percent of the texts were randomly selected, and a three-pronged training
phase was conducted using random forest regression. Three random forest regressions were
conducted for: the 80% of the 350 texts that teachers ordered (n = 279 [one text was discarded
due to poor digitization]); the 80% of the 89 student passages (n = 71); and the two sets of texts
combined (n = 350). Each of the three random forest regressions yielded Importance values for
each of the 238 variables in relation to the text-complexity outcome variable. Model prediction
capacity (correlation) and prediction error were calculated for each of the three models on “outof-bag” samples (Grömping, 2009). b) To determine whether a more parsimonious set of
variables could predict text complexity as well as, or nearly as well as, the 238 variables, a twostage iterative variable-selection procedure was used (Grömping, 2009). First, for each of the
Early-Grades Text Complexity
28
three models, the least important variable was removed from the model, random forest regression
was re-run, and prediction error was re-calculated. The process was repeated until model
prediction error began to increase, resulting in a moderately sized set of predictors for each of the
three models. Then the union of predictors in the three models was selected creating a
moderately sized set of predictors. Second, in a next round of variable elimination, redundant
operationalizations of text characteristics in the moderately sized set were identified, and the
least important of the correlated redundant variables were trimmed out using a combination of
strength of redundant operationalizations cut-point while maintaining model prediction capacity.
c) In a validation phase, the predictive capacity for the trimmed model was investigated, using
texts not employed for the variable selection and “training” phases—a 20% hold-out set of texts.
d) Follow-up analyses were done to explore the data structure.
Results
Preliminary Random Forest Regression Decisions
The following decisions were made for conducting the random forest regressions using
scikit-learn (Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, et al., 2011): (a) At each
node, the computer selected just one variable to make a split. (b) A constant predictor split point
was used in each leaf. (c) Mean Square Error was used as the splitting objective to optimize in
each node. (d) Randomness was injected into the trees using “bagging,” a method that allows all
variables to be available for selection at a given node. During the training phase, “mtry” (the
number of predictors available for selection) was set at 238. During the validation phase, “mtry”
was set at three (or the square root of “p” where “p” was nine predictors). The larger “mtry” was
used when there was a moderate or large number of correlated predictors, because in the case of
many predictors more power is concentrated in a relatively small subset of predictors. For
Early-Grades Text Complexity
29
variable selection, concentration of power is desirable, and as well, large mtry results in more
stable variable selection because the most powerful variables tend to emerge repeatedly. (e) For
variable selection, each random forest model was conducted with 100 trees. In the validation
phase, random forest regressions were conducted with 500 trees. (f) The Importance values were
normalized random-permutation-based. (g) During training, out-of-bag model error (Root Mean
Square Error [RMSE], for which error is normalized relative to the number of texts) was
calculated as an estimate of generalizability error (Breiman, 2001a). During the validation phase,
non-out-of-bag RMSE was calculated (Breiman, 2001a).
Phase 1: Training Phase Results: Baseline Model Performance
For the model using the 279 texts that teachers ordered and all 238 text-characteristic
operationalizations, the mean correlation of text complexity as predicted from the model with the
empirical text-complexity measures from 10 analytical runs of 100 trees each was .89, and the
model error (RMSE) was 8.66. For the model using the 71 passages that students read and the
238 text-characteristic operationalizations, the mean correlation was .69, and the RMSE was
10.58. For the model combining the two sets of texts (n = 350), the mean correlation was .87,
and the RMSE was 8.72. For each of the three models, predictive power was high, and error was
low. (Importance values were computed for all 238 predictor variables in each of the three
models, but given the large number of variables, only the final model variable Importance values
are reported in a following section.)
Phase 2: Trimmed Model and Final Operationalization Descriptives
First, Figure 1 shows that for two of the three models, as the least Important
operationalizations were dropped from the model, one by one, model correlation, that is, the
predictive capacity, began to visibly drop for the teacher and combined models when
Early-Grades Text Complexity
30
approximately 25 variable operationalizations were left in the model. For the student model, it
dropped with approximately 10 variables remaining. The union of the top 25 operationalizations
in each of the three models was then selected, resulting in 45 predictor operationalizations. Then
one model was created for the next step using the 45 predictor variable operationalizations.
Second, the first trim included redundant variable operationalizations for single text
characteristics. To eliminate redundancies that were highly correlated, the inter-correlations of all
45 predictors were computed, using the combined dataset. Then in the top of Figure 2 potential
correlational thresholds are shown on the x-axis, and the y-axis shows what the model
correlation would be if redundant variable operationalizations were removed using different
magnitudes of threshold correlation as cut-points to delete redundant variables. Through visual
inspection of the top graph, .70 was chosen as the correlational cut-point because it appeared that
doing so would result in only very slight model correlation drop while removing a significant
number of redundant predictors.
Then, as shown in the bottom graph in Figure 2, using the threshold cut-point of a .70
correlation, 11 variable operationalizations remained in the model. Among the 11, two sets of
operationalizations were highly similar, and in each case, the least Important of the two was
dropped. In sum, the model trimming procedure resulted in a nine-predictor model: for word
structure—decoding demand, number of syllables in words; for word meaning—age of
acquisition, abstractness, and word rareness; and for sentence and discourse level—
intersentential complexity, phrase diversity, text density/information load, and noncompressibility.
After variable selection, a final set of three random forest regression models was trained
using only the nine variables (mtry = 3) with the teacher text-complexity assignments, the
Early-Grades Text Complexity
31
student assignments, and the two combined together. The resulting correlations (and RMSEs) for
the teacher, student, and combined models were: .89 (8.40), .71 (10.35), and .88 (8.59),
respectively.
Phase 3: Model Validation
To validate the model, the hold-out set of 20% of books (n= 71) and 20% of the passages
for student reading (n = 19) was combined. A final random forest regression (mtry = 3) was run
with the nine selected variables as predictors and the empirical text-complexity variable from the
combined (teacher and student) data as the outcome. The model was validated with a correlation
of .85 and RMSE of 9.68. Figure 3 shows the generally tight relationship among the nine
predictors and text complexity level. Variance explained by the model was 71.98%. Of note, the
validation model error was similar to the combined dataset model during training (8.72),
suggesting minimal, if any, model overfit.
Variable Importance Values, Descriptives (Including Text Complexity), and InterCorrelations
Finally, after the validation phase, mean Importance values were obtained from 10 final
random forest regressions with 500 trees and mtry set at 3, using the 350 texts (Grömping, 2009).
The variable Importance values, mean, standard deviation, and range for the final nine variables
along with mean, standard deviation, and range for the text-complexity variable, are shown in
Table 2. The order of text-characteristic Importance was: intersentential complexity (the linear
edit distance operationalization) (most Important), text density/information load, phrase diversity
(the longest common string operationalization), age of acquisition, number of syllables in words,
abstractness, decoding demand, non-compressibility, and word rareness. Notably, three
discourse-level characteristics appeared near the top of the Importance order suggesting relative
Early-Grades Text Complexity
32
strength of discourse-level characteristics for predicting text complexity. Also included were
word-structure and word-meaning text characteristics. While no variable that represented withinsentence text characteristic alone emerged, the discourse-level variables indirectly included
facets of within-sentence characteristics—because to create measures across sentences, withinsentence characteristics had to be taken into account.
The text-characteristic variable operationalization means for the word structure variables
(decoding demand and number of syllables in a word) suggested that across the entire set of texts
word structure was moderately challenging, though the range for decoding demand was wide—
up to 7.91 (out of 9). (See Table 2 for summary statistics.) The means for the word meaning
variable operationalizations (age of acquisition, abstractness, and word rareness) again suggested
that on the whole, the abstractness of the words in the text was moderate (approximately at the
middle of the possible range of scores), but as would be expected, word rareness was minimal
and age of acquisition tended to be low—though again, for all three variables, the standard
deviations suggested a wide range of values. Means for the discourse level variable
operationalizations suggested that the text corpus involved a fair amount of repetition,
redundancy, and patterning in that means for three of the variables ranged from .55 (for NonCompressibility, a compression ratio that could range from 0 to 1) to .80 (Phrase Diversity:
Longest Common String that could range from 0 to 1), with intersentential complexity reflecting
such features more modestly. In all four cases, nearly the complete range of values was
represented in the corpus, suggesting a fair amount of variability on the discourse-level text
characteristics. Finally, the full range of text-complexity values was witnessed, with a mean of
50.10.
Early-Grades Text Complexity
33
The correlations in Table 3 indicate moderately positive relationships of all nine variable
operationalizations with text complexity, ranging from .35 to .73 with the exception of noncompressibility (.18, though significant). Next, on the whole, variable operationalizations within
word structure, within word meaning (see the left-most triangle in Table 3), and within discourse
level (see the right-most triangle in Table 3) were, on the whole, moderately correlated with each
other, though in each of the three groups, there were one or two low correlations, suggesting that
within linguistic level variable operationalizations tended to capture similar text characteristics.
Also, on the whole, the cross-group correlations tended to be somewhat lower than within-group
correlations, suggesting to some degree that each group of variables was measuring a unique set
of characteristics (see the boxes in Table 3). That is, correlations of decoding demand and
number of syllables in words correlated with the three word meaning variable operationalizations
from .06 to .54, all lower than .66, the correlation of decoding demand with number of syllables
in words. The top right-most box shows a similar pattern. For the comparison of the wordmeaning variable within-group correlations (the left-most triangle in Table 3) versus the crossgroup correlations of word meaning with discourse level variable operationalizations (the bottom
box in Table 3) again, on the whole, the within-group word-meaning correlations (.34 to .57), not
including the low correlation of abstractness with word rareness (.05), tended to be similar to, or
higher than, the cross-group comparison to the discourse-level correlations (with the exception of
the correlation of age of acquisition with intersentential complexity, .12 to .53).
Exploring the Data Structure and the Text-Characteristic Interplay
Several follow-up analyses (using all 350 texts and the teacher-based empirical textcomplexity levels) were done to explore the data structure, the degree of text-characteristic
variability in high versus low text-complexity levels, the interplay of text characteristics in
Early-Grades Text Complexity
34
relation to text complexity levels (decision trees and quintiles), and the interplay of text
characteristics in relation to genre. The analyses were conducted using visualization
methodology from CARTscans (a graphical tool that displays predicted values across multidimensional subspaces [Nason, Emerson, & LeBlanc, 2004]), along with additional visualization
techniques recommended by Cook and Swayne (2008) and by Cohen, Cohen, Aiken, and West,
(2003). A strong theme permeated findings—the interplay of text characteristics was an
important factor for explaining text complexity.
The general structure of text characteristics in relation to text complexity. In a
traditional approach, principal components analysis or factor analysis might be used to describe
the data structure, but those techniques assume a linear relationship among variables. We
hypothesized non-linearity and used an unsupervised, nonlinear dimension-reduction technique
—modified locally linear embedding analysis (Zhang & Wang, 2006). The technique accounts
for the intrinsic geometric properties of each neighborhood of texts that share text-characteristic
profiles. Essentially, in the analysis, the nine text characteristic operationalizations were reexpressed in a three-dimensional space by finding local planes of best fit for the neighborhood
around each text (set at 15 neighbors [Vanderplas & Connolly, 2009]) and then stitching them
together to describe the entire 350-text space. The planes of best fit need not share the same
parameters across neighborhoods. Once the dimension-reduced text space was constructed, the
text-complexity levels were noted in colors, warmer colors represent higher text-complexity
levels, and cooler colors represent lower text-complexity levels. The result is shown in Figure 4.
The three locally linear dimensions are not in themselves interpretable. Each is associated to
varying degrees with the nine text characteristics. All 350 texts are represented as dots in the
space. The main conclusion of the visual analysis was that there was a clear thread of text-
Early-Grades Text Complexity
35
characteristic relationships with each other and with text complexity that moved through the
space, a thread that suggested an essentially unidimensional construct in measurement terms, but
the text-characteristic relationships with text complexity were not globally linear. Instead, textcharacteristic relationships interplayed differently in different local neighborhoods.
Degree of text-characteristic variability in high versus low text-complexity levels. To
examine the extent to which text-characteristic variability was different according to textcomplexity level, the nine text-characteristic variables were standardized as z-scores, and texts
were split into high and low text-complexity groups using the following procedures (outlined in
Cohen, Cohen, Aiken, and West [2003] and Green and Salkind [2001]). Centers for the high and
low texts were determined at one standard deviation above and below the total text-set mean,
respectively. Next, bands for high and low texts were created at plus and minus half of a standard
deviation around the mean of the center points, respectively, so as to filter out texts close to the
mean (Cook & Swayne, 2008). Finally the split plots in Figure 5 were generated.
A main conclusion was that for most sets of relationships, there was more variability in
lower text-complexity texts than in high ones. For the two word-structure relationships with textcomplexity level, the decoding-demand levels for the low-complexity texts ranged widely, while
most decoding-demand levels for high-complexity texts were tightly collected around the mean.
For the two of the three word meaning characteristic operationalizations (age of acquisition and
word rareness), the variability patterns were highly similar for low and high text-complexity
texts, but for higher complexity, the word meaning values were shifted upward by approximately
two standard deviations. On the other hand, for three of the four discourse-level variables
(intersentential complexity, phrase diversity, and text density) there was little to no overlap in the
Early-Grades Text Complexity
36
two patterns, signaling a dramatic shift in the degree of repetition, redundancy, and patterning—
less of it (higher values) in the higher-complexity texts.
Also evident in the split plots are outlier texts. For instance, in the low text-complexity
group for age of acquisition, there were some texts that had relatively high age-of-acquisition
values, leading to the question of how a book with such high values on that text characteristic
might receive a low value on text complexity. A general pattern appeared from examination of
complete profiles of text characteristics for some randomly selected “outlier” texts. Where
extreme values were present in low text-complexity texts, generally, the high values tended to be
compensated by low values on other text characteristics. For example, a text’s relatively high
value on a word structure or word meaning characteristic was modulated and supported by a high
degree of repetition, sufficiently enough to effect a relatively low text-complexity level.
Interplay of text characteristics: Generalized interactions or regions of interactions?
Two ways to explore the potential for text characteristics to function together in relation to textcomplexity level were visualization of a single regression tree and contour plots (Nason et al.,
2004). First, we created a single regression tree (See Figure 6) using standardized z-score values
for the predictor variable operationalizations, with the tree grown to five levels of depth and
restricting nodes to a minimum of 10 texts. The goal was to visualize the degree to which text
characteristics might be conditioned on one another when predicting text complexity—not to
determine which variables interacted with one another in the classic statistical sense. While
information can be gleaned from exploring a single regression tree, generalization to early-reader
texts at large is cautioned because of the possibility of single-tree overfit to a dataset (Breiman
2001a).
Early-Grades Text Complexity
37
Two main findings from examination of the decision tree were that the interplay of text
characteristics mattered for text complexity and that micro-interactions among text
characteristics were regional rather than generally applicable to the whole body of text
characteristics and text complexity. The tree depicts several localized interactions (some are
noted in circles in Figure 6), or ways that text-complexity values may be predicted from
combinations of certain text characteristics such that the impact of a text characteristic is
conditioned by the value of one or more other text characteristics (two are circled in the figure).
As an example, the far right side of the regression tree in Figure 6 depicts a localized
asymmetrical interaction. Starting at the top of the regression tree in Figure 6, the computer
algorithm made the first split using intersentential complexity as the predictor that would result
in the least error in predicting text complexity. To the right are texts that have intersentential
complexity values higher than -.3045, that is, not much repetition, redundancy, or patterning.
Moving farther to the right to Node B (which split the high intersentential complexity texts into
even further subgroups of higher and lower intersentential complexity) and then Node C, the 109
texts at Node C have the least amount of repetition, redundancy, or patterning of the 350 texts. At
Node C abstractness was selected as the predictor that conditioned intersentential complexity so
as to achieve the smallest error in predicting text complexity. Notice that for 11 of the 109 texts,
the ones with the lowest abstractness values, no further predictors were required to arrive at the
final text complexity value with the smallest error. However, 98 of the 109 texts that had higher
values on abstraction were further conditioned by non-compressibility and after that by age of
acquisition. That is, the effect of abstractness is different for the two branches created by
intersentential complexity.
Early-Grades Text Complexity
38
Another interesting subtle finding reflecting the interplay of text characteristics that can
be visualized from the regression tree is that sometimes slightly different combinations of textcharacteristic conditioning can result in approximately the same text-complexity level. Notice for
instance among the first four bottom-most left boxes in the figure that two sets of texts have text
complexity levels of 21.60 and 22.57, respectively. While both share similarly low intersentential
complexity, for the left-most texts (21.60), conditioning intersentential complexity by the
presence of higher word rareness values resulted in approximately the same text-complexity
value as the right-most texts (22.57) where intersentential complexity was conditioned by lower
values on non-compressibility.
A second way to explore potential interplay among variables was to visually examine
contour plots (Nason et al., 2004). Several were created for selected combinations of text
characteristics. A general finding was that there was interplay among the text characteristics in
relation to text complexity. A limitation of contour plots is that a maximum of two predictors can
be plotted. Figure 7 illustrates the interplay of age of acquisition with phrase diversity in relation
to text-complexity level. The plot was generated from a random forest regression with just the
two text-characteristic variable operationalizations and text-complexity level as the outcome,
without controlling for the other seven text characteristics and with minimum node size of five.
The main finding from the illustrative contour plot was that age of acquisition was conditioned
by phrase diversity in relation to text complexity. Regions of texts are seen in the plot. The
highest values on text complexity (red in the plot) occurred in texts that had high values on age
of acquisition and high values on phrase diversity (low amounts of repetition, redundancy, or
patterning). As well, texts with the lowest text-complexity values (dark blue) tended to have low
values for age of acquisition and phrase diversity. However, some texts (e.g., light blue in the
Early-Grades Text Complexity
39
lower right quadrant) that had high values on age of acquisition had low text-complexity values
when age of acquisition was moderated or conditioned by low values on phrase diversity, that is,
when a fair amount of repetition, redundancy, or patterning was present. The point is, again,
there is interplay of text characteristics in relation to text-complexity level.
Text characteristic profile changes as text-complexity level increased. Another
visualization method to understand text-characteristic collective patterning was to examine text
characteristic profiles as text-complexity level increased (Cohen et al., 2003). The nine text
characteristics were standardized as z-scores, texts were formed into quintile groups, and a graph
was plotted using the within group means. As shown in Figure 8, first, the lowest quintile texts
had a profile pattern that is markedly different from the other patterns. On average, the texts were
characterized by less complex word structure (low decoding demand and relatively few
syllables), relatively low-level vocabulary (younger age of acquisition, not very abstract words,
and words that were not as rare as what appeared in more complex texts), coupled with, on the
whole, highly redundant and repetitive texts (the exception is non-compressibility) (recall that
lower scores on the discourse level variables mean more redundancy and patterning). Moving up
the graph, the next two quintile patterns were highly similar to one another, and the highest two
quintile profiles were nearly flat with minor exceptions. In essence, text-characteristic profiles
gradually changed as text complexity increased. Second, word structure became increasingly
complex with each rising quintile. As well, on the whole, word meanings became harder and
harder as text complexity increased. The exception was word rareness, which was similar in the
bottom two quintiles. Also, on the whole, discourse-level redundancy and repetition decreased as
text complexity increased (recall that higher discourse level averages reflected less redundancy
and repetition). Non-compressibility was a minor exception in that although texts were
Early-Grades Text Complexity
40
consistently less compressible as text complexity increased, the changes were less dramatic than
for other discourse-level variables or for word structure and word meaning characteristics. In
short, on the whole, as text complexity increased, word structure and word meanings became
harder, and texts displayed less and less redundancy, repetition, and patterning. Again, the
interplay among the text characteristics was an important factor for text-complexity level.
Genre effects. Genre effects were analyzed using the same procedures as noted in the
preceding section on “Degree of text-characteristic variability in high versus low text-complexity
levels” (Cohen, et al., 2003; Green & Salkind, 2011). Four groups of texts were created—
narrative and informational texts that were high text complexity and narrative and informational
texts that were low text complexity, and the text-characteristic profile differences across genre,
controlling for text-complexity level, were examined. Only texts identified as narrative or
informational were included in the analysis because hybrid or other texts were rare. Text
complexity means, standard deviations, ranges were comparable for the narrative and
informational high text-complexity texts, and they were comparable for the two genres within
low text-complexity texts: for high text-complexity narratives (n = 64)—67.16, 4.85, 59.86 to
78.19; for high text-complexity informational (n = 24)—67.39, 4.81, 60.34 to 77.02; for low
text-complexity narratives (n = 67)— 31.25, 5.56, 22.14 to 40.50; and for low text-complexity
informational (n = 17)—32.88, 5.07, 24.11 to 40.43. Finally, the nine text characteristics were
standardized as z-scores, and using the text-characteristic within-group means, the graph in
Figure 9 was created to show the four text groups’ text-characteristic profiles.
In general, as would be expected, controlling for text-complexity level, the genres within
text-complexity level had slightly different text-characteristic profiles. For high text-complexity
narrative texts, on average, abstractness, intersentential complexity, phrase diversity, and text
Early-Grades Text Complexity
41
density tended to have higher levels than the other text characteristics. On the other hand, for
high text-complexity informational texts, only age of acquisition, on average, tended to rise
above the other text-characteristic levels, and also, on average, non-compressibility tended to dip
below all other text characteristic levels. Notably, several text characteristics were at
approximately the same levels in the two genres. The most divergent characteristics across hightext-complexity text genres were age of acquisition (higher for informational texts) and word
rareness (also higher for informational texts).
For low text-complexity narrative texts, on average, text-characteristic levels were
approximately similar, with the exception of non-compressibility, which is, surprisingly, much
higher than the others. For low text-complexity informational texts, on average, decoding
demand, syllables, and word rareness tend to be higher than the other informational text
characteristics. Notably, several text characteristic levels were similar across the two low-textcomplexity genres. The most divergent were decoding demand, syllables, word rareness—all
higher measures than for informational texts, and non-compressibility—which was higher for
narratives. Again, another example of text-characteristic interplay was witnessed. When word
structure and word meanings were relatively difficult (as for informational texts compared to
narratives), more repetition and patterning at the discourse level (realized by relatively low
scores) likely modulated the impact of the difficult words to bring the overall text complexity to
a relatively low level.
Conclusions and Discussion
Conclusions
Nine text characteristics were most important for early-grades text complexity: word
structure—decoding demand and number of syllables in words; word meaning—age of
Early-Grades Text Complexity
42
acquisition, abstractness, and word rareness; and sentence and discourse level—intersentential
complexity (the linear edit distance operationalization), phrase diversity (the longest common
string operationalization), text density/information load, and non-compressibility. The ninecharacteristic model predicted text complexity very well, in fact, nearly as well as the more
complicated model with all 238 text-characteristic operationalizations. Notably, the three most
important text characteristics were at the sentence and discourse level—intersentential
complexity, text density/information load, and phrase diversity. Additionally, interplay among
text characteristics was important to explanation of text complexity. While a clear thread of the
relationship of the nine text characteristics with text complexity was evident, the relationship was
not globally linear. Instead, text-characteristic relationships interplayed differentially in local
neighborhoods of similar texts.
Discussion
To our knowledge, the present study is the first to reveal important text characteristics for
early-grades text complexity through empirical investigation. The results support the contention
that early-grades texts can be considered complex systems consisting of characteristics at
multiple linguistic levels that variously interplay to impact text complexity. Further the nine
most-important text characteristics revealed in the present study map to some of the wellresearched critical features of young children’s early reading development. The early-grades
developmental phase is often characterized as “cracking the code,” which has led some educators
to believe the work of early reading is primarily about, or even all about, phonological awareness
and word-related factors. Interestingly, phonemic measures did not surface among the most
important text characteristics for text complexity. The importance of phonological awareness for
Early-Grades Text Complexity
43
progress in early reading is indisputable. Possibly the measures in the current study did not
sufficiently reflect the domain of key phonological knowledge required of students.
As for the centrality of word structures in “cracking the code,” it was not surprising to
find that word decoding and number of syllables were among the top-most important for
predicting text complexity. As well, factors involved in word meanings, specifically age of
acquisition of words, abstractness, and word rareness, were important. The findings are
consistent with prior suggestions that lower text complexity might be achieved in part through
inclusion of easier and more familiar vocabulary (e.g., Hiebert & Fisher, 2007).
At the same time, aspects of the findings in the present study shed additional light on the
distinctiveness of early-grades text complexity as compared to upper-grades text complexity.
While traditional measures of within-sentence syntax (such as sentence length or various
grammatical indices) were not among the nine most important text characteristics, some of the
discourse-level metrics captured within-sentence complexity while also measuring text
characteristics beyond the sentence level. For instance, while the intersentential complexity
metric, linear edit distance, addressed the degree of word, phrase, and letter repetition across
adjacent sentences, it was also impacted by overall sentence length irrespective of patterning and
repetition. That is, linear edit distance captured both within and across-sentence characteristics.
Consequently, within-sentence features were necessarily included. Still, it is worth noting that
traditional within-sentence indicators such as sentence-level syntax or sentence length itself were
not among the critical metrics for early-grades text complexity. One possible reason is that
although within-sentence indicators tend to be highly associated with complexity for texts
beyond second grade, many early-grades texts that have long sentences tend to have long
sentences that are marked by repetition of words or phrases. The repetition of words or phrases in
Early-Grades Text Complexity
44
early-grades texts may reduce the challenge posed by long sentences and render within-sentence
indicators, such as length, less effective for estimating early-grades text complexity.
One of the most striking findings was the emergence of discourse-level text
characteristics that primarily captured repetition, redundancy, and patterning in texts. The finding
was striking because it is often not discussed in the context of “code cracking.” Educators and
researchers tend to focus on word-level text characteristics as almost singularly critical for early
reading, and the role of how texts are structured to facilitate ease of early-reading progress is
often overlooked. Indeed, even one of the most commonly used text-leveling systems, the
Fountas and Pinnell (1996, 2012) system, does not directly include attention to repetition and
redundancy, though they do address text structure and genre in general. As noted earlier, few
prior text-analysis systems for the upper grades include analysis of discourse-level characteristics
—though those systems were not intended for early-grades texts. However, at least one or two of
the discourse-level characteristics (intersentential complexity and phrase diversity) in the present
study are reminiscent of cohesion operationalizations in the Coh-Metrix (Graesser, et al., 2011)
system. While some evidence exists that above second-grade level, models of text complexity
that include discourse-level indicators do not outperform those that do not include them (Nelson,
Perfetti, Liben, & Liben, 2011), our findings suggest that attention to discourse-level
characteristics at the early grades is crucial (cf. Hiebert & Pearson, 2010 who suggest that
current text-complexity systems may need adjustments for early-grades texts). Indeed, the
functions of repetition and redundancy in discourse have received increasing attention on the part
of linguists in the past few years, and repetition/redundancy is considered by some to be an
essential feature of language use (Bazzanella, 2011).
Early-Grades Text Complexity
45
Unearthing the presence of locally-embedded differential interplay of text characteristics
and witnessing examples of that interplay are novel contributions to the literature. The finding
was intriguing in that to the mature eye, early-grades texts appear to be “simple.” But
experienced readers often have long forgotten the challenges of learning to read in the early
phases, and to more expert readers, as Prince (1997) and others (e.g., Bazzanella, 2011) have
pointed out, “. . . the really interesting complexities of language work so smoothly that they
become transparent” (Prince, 1997, p. 117).
The finding of locally embedded text-characteristic interplay was also supportive of prior
linguists’ and complexity theorists’ understandings that in complex environments, subsystems (in
the present study, sub-linguistic systems) often “co-operate” to balance efficiency and
effectiveness. In the case of early-grades texts, subsystems “co-operate” to balance young
children’s ease of learning to read with the requirements for depth of processing (Bar-Yam, 1997;
Juola, 2003; Merlini Barbaresi, 2003). However, while the presence of regional interactions
among text characteristics could be witnessed, as for example, in the single decision tree and the
contour plot, explaining or describing them with simple generalizations was difficult because of
the number of characteristics involved and the variation in co-existing characteristics across
witnessed incidents of interactions.
Although local interplay was a chief characteristic of early-grades text complexity, some
general trends described features of the early-grades texts in the aggregate. One general trend
was that, on the whole, as text-complexity level increased, word structure and word meaning text
characteristics became more complicated or harder (as would be expected), while texts displayed
less and less redundancy, repetition, and patterning. That is, linguistic levels interplayed such that
Early-Grades Text Complexity
46
text characteristics tended to coalesce in one way for less complex texts and in another way for
more complex texts.
Another general trend was for high-complexity informational texts to have somewhat
higher age-of-acquisition and word rareness measures as compared to narrative texts. On the
other hand, low-complexity informational texts tended to have somewhat higher decoding
demand, more syllables, and rarer words than narratives, but narratives were less compressible.
For both high- and low-complexity texts, interestingly, discourse-level text characteristics were
fairly similar across the two genres with informational texts having slightly lower discourse-level
values, indicating more repetition, redundancy, or patterning. The result again supports the
interplay of variables in that the presence of more difficult words was compensated by increased
scaffolding in the form of repetition or patterning. The difference should be considered with
caution, as a relatively small number of books constituted the genre analysis. Rather than
assuming the result is generalizable, it is more appropriate to consider it sufficiently provoking to
warrant further analysis in future studies.
However, taken at face value, the genre result is consistent with logical expectations. In
general, at the early-grades levels, informational texts might tend to have more difficult
vocabulary than narratives, and at the lowest text complexity levels, it would be challenging to
lower decoding demand for content-laden material. It is worth noting that when using random
forest regression with the nine-characteristic text-complexity model, random forest regression
easily accounts for any localized or general text characteristic collections that might be related to
genre.
The Promise of Random Forest Regression and Machine-Learning Research Methods
Early-Grades Text Complexity
47
The successful use of random forest regression for modeling text complexity in earlygrades texts demonstrates the potential for the random forest regression advantage when
addressing a high-dimensional educational problem. In the case of early-grades text complexity,
a modeling technique such as linear regression may not satisfactorily allow for investigations
employing either the large number of variables required for text analysis or the potentially huge
number of complex text-characteristic interactions that likely permeate early-grades texts. It is
important to note however, that we did not accomplish a comparison of results from a theorized
linear regression model and a random forest model, and consequently our statement here about
the possible random forest regression advantage is hypothetical. At the same time, it is difficult
to imagine how such a comparison could be tested—because there is no way to tap a priori
localized interactions among text characteristics in traditional linear regression.
As well, random forest can be a more robust model than some other traditional modeling
techniques in that it accounts for exceptional cases. To comprehensively study early-grades texts,
where many different types of text exist, it is important to include even those texts that might
traditionally be considered “outliers,” that is, texts that might have text-characteristic
configurations that fall in the long tails of early-grades text distributions. For instance, label
books do not contain connected text, but instead one word is shown beside a picture. In a
traditional analysis, such books might be considered outliers because they have text
characteristics that are quite different from a majority of texts. However, label books are
commonly used in early-grades classrooms, and any study of text complexity should take them
into consideration. As well, random forest regression automatically handles conditionality that
can occur in ensembles of text characteristics, and as such it brings the tails of distributions “into
the fold.”
Early-Grades Text Complexity
48
Finally random forest regression can take advantage of a weak predictor by using it only
when it is needed. In the present study, non-compressibility might be considered a weak
predictor in that it was not highly correlated with other characteristics (except for phrase
diversity) or with text complexity. However, non-compressibility tended to locate repetition,
redundancy, and patterning where the other three discourse-level characteristics did not locate it.
Such texts were rare in the present study, but on those rare occasions, there was important value
in the non-compressibility measure.
High-dimensional problems are common in educational arenas in cases where large
numbers of variables are at play and large amounts of data are generated, and random forest
regression is a statistical modeling technique that could innovate the repertoire of educational
statistical modeling. Where pressing educational problems involve large numbers of variables
and/or potentially large numbers of interactions among variables, random forest regression could
provide uniquely satisfying solutions (Baca-Garcia et al., 2007).
The machine-learning techniques used in the present study uniquely revealed earlygrades text complexity. While prior text-complexity systems existed, theorization about text
complexity, especially early-grades text complexity, was limited (Mesmer et al., 2012), and
debates about construct coverage in the existing measurement systems proliferated (e.g.,
Sheehan, et al., 2010). As a consequence, employing a wide array of possible operationalizations
of text characteristics, each of which might capture a nuanced sense of any text characteristic,
was important, as was the use of a logical investigative progression to narrow the most important
characteristics. That is, through machine learning techniques, the data could “speak,” and a textcomplexity model could be constructed from the data themselves (Wasserman, in press).
Early-Grades Text Complexity
49
Further, the interactive, dynamic graphics used to explore data structure are common in
machine-learning communities, but not as common in educational research. While no statistical
significance was attached to the visualization techniques, they tended to be very useful in
understanding functional relationships among text characteristics and text complexity.
Limitations of the Study
The following limitations of the study should be considered as context for interpreting the
findings. First, although random forest provided many advantages for the study of early-grades
text complexity, the resulting functional shape of the data was interpretable only to a certain
degree. That is, the complexity of text-characteristic interactions was acknowledged, but it could
not be described in simple ways or with a parsimonious set of rules. Whether lack of a final
specified statement detailing local interactions is a failure or a limitation is debatable. For those
who embrace complexity theory, tensions between chaos and parsimony, between complexity
and simplicity are natural—they exist in the natural world, and attempts to over-specify distort
reality.
Second, text selection for study was extremely important. The population of classroom
texts should be broadly represented. While every attempt was made to accomplish broad
representation, the texts selected for the study may set boundaries on the generalizability of
findings, and readers of the study should draw their own conclusions about the text
representation.
A third limitation is that a traditionalist statistician working in the fields of psychology or
education might consider the process of trimming variables awkward or imprecise. Lacking
statistical estimation of variable “significance,” logical analysis was necessary. Some may
question the reliability of the logical analysis. Certainly, when such methodology is used, it is
Early-Grades Text Complexity
50
critical that detailed description is provided so that readers may glean whether conclusions are
warranted.
A fourth possible limitation is that because pictures could not be analyzed digitally, the
role of pictures in early-grades text complexity was not directly assessed. However, pictures
were indirectly involved in that they were present in both the teacher and student substudies for
creation of the text-complexity metric.
Implications for Practice
One major practical implication of the present results is that educators should consider
discourse-level text characteristics in early-grade readers perhaps more than is the current case.
Some researchers and teacher educators advocate that educators should account for text
“organization” (e.g., Shanahan, Fisher, & Frey, 2012), or in the case of Coh-Metrix, discourselevel features such as cohesion (Graesser, et al., 2011), when assigning texts to students. Given
that “code-cracking” is prevalent during the early-grades, it is likely that in everyday classroom
instruction, word-level characteristics are favored, and discourse-level text characteristics may be
given short shrift. Instead, attention to discourse-level features such as repetition, redundancy,
and patterning would appear to be in order.
As well, few teacher educators or researchers espouse the significance of the interplay
among text characteristics for text complexity in general, even above the early grades. While the
important text characteristics often, if not typically, make unique contributions to text
complexity, in many texts, their interplay is equally important, if not more important.
Consequently, it is critical that, when selecting texts for young children, educators consider ways
in which characteristics can modulate one another’s challenges. For example, presence of
repetition, redundancy, and patterning can ease reading progress for children when texts have
Early-Grades Text Complexity
51
somewhat challenging word structures and/or word meanings. In light of evidence that presentday core-reading programs tend to have somewhat difficult vocabulary (Foorman, et al., 2004),
teachers might particularly observe degrees of repetition and patterning in core readers and
provide additional instructional support for students as needed.
The finding of more variability in lower text-complexity texts than in higher ones was
interesting in that some might anticipate the opposite—less variability (more control over) the
characteristics for students who are just beginning to learn to read, with more variability (less
control over) characteristics as students advance their reading ability. Educators might need to
consider the lowest level texts especially carefully when choosing texts for students’ independent
reading versus for instructional settings where teachers can provide more support.
Finally, publishers of early-grades texts should account for multiple text characteristics
when creating and/or leveling early-grades texts. Some current-day leveling systems that are
commonly used by publishers and/or classroom teachers, such as Fountas and Pinnell’s (2012)
system, do take into account text characteristics at multiple linguistic levels, but many publishers
rely solely on measurement of word frequency and sentence length. While the latter two factors
can be useful for many reasons, creation of optimal texts that ease young students’ reading
growth and use of optimal leveling systems likely requires consideration of a wider gamut of
early-grades text characteristics.
Implications for Future Research
The present findings lend credence to a complexity theory of early-grades texts. One
challenge for future research is further exploration of potential classes of early-grades texts
where, within class, selected ensembles of characteristics condition one another in similar ways.
If such classes of texts are identifiable, through professional development sessions, educators
Early-Grades Text Complexity
52
might come to a fuller understanding of the importance of selecting texts with certain
characteristics to enhance particular cognitions as students begin to learn to read.
The results of the present work suggest that a tool, an automated analyzer, could be
created from the final nine-variable predictor model using random forest regression. The
development of such a tool could be potentially useful to researchers who are interested in
evaluating existing reading materials or to guide the development of new materials.
Finally, the present text-complexity model of text characteristics might also be used in
intervention efforts. Texts could be theoretically configured as “best texts to facilitate young
children’s reading progress.” Then in a controlled comparison-group intervention design,
children’s reading progress could be examined when reading instruction occurs with such texts
as compared to other classes of texts that exist widely in current-day classrooms.
Early-Grades Text Complexity
53
References
ACT. (2006). Reading between the lines: What the ACT reveals about college readiness in
reading. Iowa City, IA: Author.
Adams, M. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT
Press.
Albert, R., & Barabási, A-L. (2002). Statistical mechanics of complex networks. Reviews of
Modern Physics, 74, 47-97.
ATOS. (n.d.). http://www.renlearn.com/atos/
Aukerman, R. C. (1984). Approaches to beginning reading (2nd ed.). New York: John Wiley &
Sons.
Baca-Garcia, E., Perez-Rodriguez, M. M., Saiz-Gonzalez, D., Basurte-Villamor, I., Saiz-Ruiz, J.,
Leiva-Murillo, J. M., et al. (2007). Variables associated with familial suicide attempts in a
sample of suicide attempters. Progress in Neuro-Pscyhopharmacology & Biological
Psychiatry, 31, 1312-1316.
Bar-Yam, Y. (1997). Dynamics of complex systems. Reading, MA: Addison Wesley.
Bazzanella, C. (2011). Redundancy, repetition, and intensity in discourse. Language Sciences,
33, 243-254.
Biber, D. (1988). Variation across speech and writing. Cambridget, England: Cambridge
University Press.
Bond, T. G., & Fox, C. M. (2007). Fundamental measurement in the human sciences (2nd
Edition). Mahwah, NJ: Erlbaum.
Early-Grades Text Complexity
54
Bowers, P. G., & Wolf, M. (1993). Theoretical links among naming speed, precise timing
mechanisms and orthographic skill in dyslexia. Reading and Writing: An Interdisciplinary
Journal, 5, 69-85.
Breiman, L. (2001a). Random forests. Machine Learning, 45, 5-32.
Breiman, L. (2001b). Statistical modeling: The two cultures. Statistical Science, 16, 199-231.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression
trees. New York: Chapman & Hall.
Britton, B. K., Glynn, S. M., Meyer, B. J., & Penland, M. J. (1982). Effects of text structure on
use of cognitive capacity during reading. Journal of Educational Psychology, 74, 51-61.
Burrows, M., & Wheeler, D. J. (1994). A block sorting lossless data compression algorithm
(Technical Rep. No. 124). Maynard, MA: Digital Equipment Corporation.
Carnegie Mellon University. (n.d.) CMU Pronouncing Dictionary.
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Carroll, J. B., Davies, P. & Richman, B. (1971). The American Heritage Word Frequency Book.
New York: American Heritage.
Cohen, J., Cohen, P., Aiken, L. S., & West, S. H. (2003). Applied multiple regression/correlation
analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.
Cohesion (Linguistics). (n. d.) http://en.wikipedia.org/wiki/Cohesion_%28linguistics%29
Collins, M. (2002, July). Discriminative training methods for hidden Markov models: Theory
and experiments with perceptron algorithms. In Hajič, J. & Matsumoto, Y. (Eds.),
Proceedings of the Conference on Empirical Methods in Natural Language Processing
(EMNLP) (pp. 1-8). Philadelphia: Special Interest Group on Linguistic Data and CorpusBased Approaches to NLP (SIGDAT).
Early-Grades Text Complexity
55
Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of
Experimental Psychology, Section A: Human Experimental Psychology, 33, 497-505. The
MRC Psycholinguistic Database is available at: www.psych.rl.ac.uk. [It is a machine usable
dictionary containing 150,837 words with up to 26 linguistic and psycholinguistic attributes
for each.]
Compton, D. L., Appleton, A. G., & Hosp, M. K. (2004). Exploring the relationship between
text-leveling systems and reading accuracy and fluency in second-grade students who are
average and poor decoders. Learning Disabilities Research & Practice, 19, 176-184.
Cook, D., & Swayne, D. F. (2008). Interactive and dynamic graphics for data analysis with R
and Ggobi. New York: Springer.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R. (1990). Indexing by
Latent Semantic Analysis. Journal of the American Society for Information Science, 41, 391407.
Dolch Word List. (n.d.). http://en.wikipedia.org/wiki/Dolch_word_list
Duke, N. K. (2000). 3.6 minute per day: The scarcity of informational texts in first grade.
Reading Research Quarterly, 35, 202-224.
Ehri, L. C., & McCormick, S. (1998). Phases of word learning: Implications for instruction with
delayed and disabled readers. Reading & Writing Quarterly: Overcoming Learning
Difficulties, 14, 135-163.
Fitzgerald, J., & Shanahan, T. (2000). Reading and writing relations and their development.
Educational Psychology, 93, 3-22.
Early-Grades Text Complexity
56
Foorman, B. R., Francis, D. J., Davidson, K. G., Harm, M. W., & Griffin, J. (2004). Variability in
text features in six grade 1 basal reading programs. Scientific Studies of Reading, 8, 167197.
Fountas, I. C., & Pinnell, G. S. (1996). Guided reading: Good first teaching for all children.
Portsmouth, NH: Heinemann.
Fountas, I. C., & Pinnell, G. S. (2012). Guided reading: The romance and the reality. The
Reading Teacher, 66, 268-284.
Fry Word List. (n.d.). http://www.k12reader.com/fry-word-list-1000-high-frequency-words/
Gamson, D. A., Lu, X., & Eckert, S. A. (2013). Challenging the research base of the Common
Core State Standards: A historical reanalysis of text complexity. Educational Researcher, 42,
381-391.
Gervasi, V., & Ambriola, V. (2003). Quantitative assessment of textual complexity. In L. Merlini
Barbaresi (Ed.), Complexity in language and text (pp. 1999-230). Pisa: Edizioni Plus.
Graesser, A. C., & McNamara, D. S. (2011). Computational analyses of multilevel discourse
comprehension. Topics in Cognitive Science, 3, 371-398.
Graesser, A. C., McNamara, D. S., & Kulikowich, J. M. (2011). Coh-Metrix: Providing
multilevel analyses of text characteristics. Educational Researcher, 40, 223-234.
Green, S. B., & Salkind, N. J. (2011). Using SPSS for Windows and Macintosh: Analyzing and
Understanding Data (6th ed.). Upper Saddle River, NJ: Prentice Hall.
Grömping, U. (2009). Variable importance assessment in regression: Linear regression versus
random forest. The American Statistician, 63, 308-319.
Early-Grades Text Complexity
57
Gusfield, D. (1997, reprinted 1999). Algorithms on strings, trees and sequences: Computer
science and computational biology. Cambridge, England, New York, and Melbourne,
Australia: University of Cambridge.
Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Longman.
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning (2nd ed.).
New York: Springer.
Heldsinger, S., & Humphry, S. (2010). Using the method of pairwise comparison to obtain
reliable teacher assessments. The Australian Educational Researcher, 37, 1-19.
Hiebert, E. H. (2011). Texts for beginning readers: The search for optimal scaffolds. In C.
Conrad & R. Serlin (Eds.), The SAGE Handbook for Research in Education: Pursuing Ideas
as the Keystone of Exemplary Inquiry (pp. 413-428). Thousand Oaks, CA: SAGE.
Hiebert, E. H. (2012). The Common Core’s staircase of text complexity: Getting the size of the
first step right. Reading Today, 29(3), 26-27.
Hiebert, E. H., & Fisher, C. W. (2007). The critical word factor in texts for beginning readers.
Journal of Educational Research, 101, 3-11.
Hiebert, E. H., & Pearson, P. D. (2010). An examination of current text difficulty indices with
early reading texts. (Reading Research Report 10-01). Santa Cruz, CA: TextProject, Inc.
Howes, D. H., & Solomon, R. L. (1951). Visual duration thresholds as a function of word
probability. Journal of Experimental Psychology, 92, 248-255.
Juel, C., & Roper-Schneider, D. (1985). The influence of basal readers on first grade reading.
Reading Research Quarterly, 20, 134-152.
Early-Grades Text Complexity
58
Juola, P. (2003). Assessing linguistic complexity. In M. Miestamo, K. Sinne Sinnemäki, & F.
Karlsson (Eds.), Language complexity: Typology, contact, change (pp. 89-108). Amsterdam,
The Netherlands and Philadelphia: John Benjamins Publishing Co.
Kauffman, S. A. (1995). At home in the universe: The search for laws of self-organization and
complexity. New York and Oxford: Oxford University Press.
Kintsch, W. (1998). Comprehension: A paradigm for cognition. Cambridge, UK: Cambridge
University Press.
Klare, G. R. (1974-1975). Assessing readability. Reading Research Quarterly, 10, 62-102.
Kolen, M. M., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and
practices (2nd ed.). New York: Springer-Verlag.
Koslin, B. I., Zeno, S., & Koslin, S. (1987). The DRP: An effective measure in reading. New
York: College Entrance Examination Board.
Kruskal, J. B. (1999). An overview of sequence comparison. In D. Sankoff & J. B. Kruskal
(Eds.), Time warps, string edits, and macromolecules: The theory and practice of sequence
comparison (pp. 1-44). Stanford, CA: Center for the Study of Language and Information.
Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for
30,000 English words. Behavior Research Methods, 44, 978-990.
Kusters, W. (2008). Complexity in linguistic theory, language learning and language change. In
M. Miestamo, K. Sinnemäki, & F. Karlsson (Eds.), Language complexity: Typology, contact,
change (pp. 3-22). Amsterdam, The Netherlands and Philadelphia: John Benjamins
Publishing Co.
Early-Grades Text Complexity
59
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The Latent Semantic
Analysis theory of acquisition, induction and representation of knowledge. Psychological
Review, 104, 211-240.
Langer, J. A., Campbell, J. R., Neuman, S. B., Mullis, I. V. S., Persky, H. R., & Donahue, P. S.
(1995). Reading assessment redesigned: Authentic texts and innovative instruments in
NAEP’s 1992 survey. Washington, DE: U. S. Department of Education, Office of
Educational Research and Improvement.
Levenshtein, V. I. (1965, translated to English 1966). Binary codes capable of correcting
deletions, insertions, and reversals. Doklady Akademii Nauk SSSR, 163, 845-848.
Link Grammar. (n.d.). http://www.link.cs.cmu.edu/link/
Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and
Development, 2, 159-165.
Malvern, D. D., Richards, B. J., Chipere, N., & Durán, P. (2009). Lexical diversity and language
development: Quantification and assessment. New York: Palgrave Macmillan.
Mandler, M. J., & Johnson, N. S. (1977). Remembrance of things parsed: Story structure and
recall. Cognitive Psychology, 9, 111-151.
McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z. (2014). Automated evaluation of
text and discourse with Coh-Metrix. New York: Cambridge University Press.
McNamara, D. S., & Kintsch, W. (1996). Learning from text: Effects of prior knowledge and text
coherence. Discourse Processes, 22, 247-287.
Menon, S., & Hiebert, E. H. (1999). Literature anthologies: The task for first-graders. Ann
Arbor, MI: Center for the Improvement of Early Reading Achievement.
Early-Grades Text Complexity
60
Merlini Barbaresi, L. M. (2002). Text linguistics and literary translation. In A. Riccardi (Ed.),
Translation studies: Perspectives on an emerging discipline (pp. 120-132).
Merlini Barbaresi, L. M. (2003). Towards a theory of text complexity. In L. Merlini Barbaresi
(Ed.), Complexity in language and text (pp. 23-66). Pisa, Italy: Edizioni Plus.
Mesmer, H. A. (2006). Beginning reading materials: A national survey of primary teachers’
reported uses and beliefs. Journal of Literacy Research, 38, 389-425.
Mesmer, H. A., Cunningham, J. W., & Hiebert, E. H. (2012). Toward a theoretical model of text
complexity for the early grades: Learning from the past, anticipating the future. Reading
Research Quarterly, 47, 235-258.
MetaMetrics. (n.d.a). Text corpus. Durham, NC: MetaMetrics.
MetaMetrics. (n.d.b). Word corpus. Durham, NC: MetaMetrics.
Metsala, J. L. (1999). Young children’s phonological awareness and non-word repetition as a
function of vocabulary development. Journal of Educational Psychology, 91, 3-19.
Miestamo, M. (2006). Implicational hierarchies and grammatical complexities. In G. Sampson,
D. Gil, & P. Trudgill (Eds.), Language complexity as an evolving variable. Oxford: Oxford
University Press.
Mitchell, T. (1997). Machine learning. Columbus, Ohio: McGraw Hill.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning
(adaptive computation and machine learning series). Cambridge, MA: MIT Press.
Muter, V., Hulme, C., Snowling, M. J., Stevenson, J. (2004). Phonemes, rimes, vocabulary, and
grammatical skills as foundations of early reading development: Evidence from a
longitudinal study. Developmental Psychology, 40, 665-681.
Early-Grades Text Complexity
61
Nason, M., Emerson, S., & LeBlanc, M. (2004). CARTscans: A tool for visualizing complex
models. Journal of computational and graphical statistics, 13, 807-825.
National Governors Association (NGA) Center for Best Practices & Council of Chief State
School Officers (CCSSO). (2010). Common Core State Standards for English language
arts and literacy in history/social studies, science, and technical subjects. Washington,
DC: Authors. www.corestandards.org/assets/CCSSI_ELA%20Standards.pdf
Nelson, J., Perfetti, C., Liben, D., & Liben, M. (2011). Measures of text difficulty: Testing their
predictive value for grade levels and student performance. Technical Report to the
Gates Foundation. www.ccsso.org/Documents/2012/Measures%20ofText
%20Difficulty_final.2012.pdf
Nerbonne, J., & Heeringa, W. J. (2001). Computational comparison and classification of dialects.
Dialectologia et Geolinguistica, 9, 69-83.
Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery, and meaningfulness
values for 925 nouns. Journal of Experimental Psychology, 76(1, P. 2), 1-25.
Patton, M. (1990). Qualitative evaluation research methods. Beverly Hills, CA: Sage.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011).
Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 28252830.
Prince, E. (1997). On the functions of the left-dislocation in English discourse. In A. Kamio
(Ed.), Directions in functional linguistics (pp. 117-144). Philadelphia and Amsterdam: John
Benjamins.
Reading Maturity Metric. (n.d.). http://www.readingmaturity.com/rmm-web/#/
REAP Readability Tool. (n.d.). www.reap.cs.cmu.edu/
Early-Grades Text Complexity
62
Rescher, N. (1998). Complexity: A philosophical overview. New Brunswick and
London:Transaction Publishers.
Rosenblatt, L. M. (1938). Literature as exploration. New York: D. Appleton-Century.
Rosenblatt, L. (2005). Making meaning with texts: Selected essays. Portsmouth, NH:
Heinemann.
Rudrum, D. (2005). From narrative representation to narrative use: Towards the limits of
definition. Narrative, 13, 195-204.
Rumelhart, D. E. (1985). Toward an interactive model of reading. In H. Singer & R. B. Ruddell
(Eds.), Theoretical models and processes of reading (pp. 722-750). Newark, DE:
International Reading Association.
Sanders, N. C., & Chinn, S. B. (2009). Phonological distance measures. Journal of Quantitative
Linguistics, 16, 96-114.
Schwanenflugel, P. J., & Akin, C. E. (1994). Developmental trends in lexical decisions for
abstract and concrete words. Reading Research Quarterly, 29, 250-264.
Schatschneider, C., Fletcher, J. M., Francis, D. J., Carlson, C. D., & Foorman, B. R. (2004).
Kindergarten prediction of reading skills: A longitudinal comparative analysis. Journal of
Educational Psychology, 96, 265-282.
Sheehan, K. M., Kostin, I., Futagi, Y., & Flor, M. (2010, December). Generating automated text
complexity classifications that are aligned with targeted text complexity standards (ETS RR10-28). Princeton, NJ: Educational Testing Service.
Shanahan, T. Fisher, D., & Frey, N. (2012). The challenge of challenging text. Educational
Leadership, 69(6), 58-62.
Early-Grades Text Complexity
63
Shin, J., Deno, S. L., & Espin, C. (2000). Technical adequacy of the maze task for curriculumbased measurement of reading growth. The Journal of Special Education, 34, 164-172.
Simon, H. A. (1962). The architecture of complexity. Proceedings of the American Philosphical
Society, 106, 467-582.
Sleator, D., & Temperley, D. (1991, October). Parsing English with a Link Grammar (Carnegie
Mellon University Computer Science Technical Report CMU-CS-91-196). Pittsburgh:
Carnegie Mellon University.
Snow, C. (2002). Reading for understanding: Toward an R&D program in reading
comprehension. Santa Monica, CA: RAND Corporation.
Solso, R. L., Barbuto, P. F. Jr., & Juel, C. L. (1979). Methods & Designs: Bigram and trigram
frequencies and versatilities in the English language. Behavior Research Methods &
Instrumentation, 11, 475-484.
Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual
differences in the acquisition of literacy. Reading Research Quarterly, 21, 360-406.
Steen, G. (1999). Genres of discourse and definition of literature. Discourse Processes, 28, 109120.
Stenner, A. J., Burdick, H., Sanford, E., & Burdick, D. (2006). How accurate are Lexile text
measures? Journal of Applied Measurement, 7, 307-322.
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale,
application, and characteristics of classification and regression trees, bagging, and random
forests. Psychological Methods, 14, 323-348.
Early-Grades Text Complexity
64
Vadasy, P. F., Sanders, E. A., Peyton, J. A. (2005). Relative effectiveness of reading practice or
word-level instruction in supplemental tutoring: How text matters. Journal of Learning
Disabilities, 38, 364-382.
Vanderplas, J., & Connolly, A. (2009). Reducing the dimensionality of data: Locally linear
embedding of Sloan Galaxy Spectra. The Astronomical Journal, 138, 1365-1379.
van der Sluis, F., & van den Broek, E. L. (2010). Using complexity measures in information
retrieval. In Proceedings of the third symposium on information interaction in context (pp.
18-22). New Brunswick, N: ACM.
Wasserman, L. (in press). Rise of the machines. In L. Xihong, D. L. Banks, C. Genest, G.
Molenberghs, D. W. Scott, & J-L. Want (Eds.). Past, present and future of statistical sicence.
New York: Taylor and Francis.
Whaley, J. F. (1981). Readers’ expectations for story structures. Reading Research Quarterly, 17,
90-114.
Williamson, G. L. (2008). A text readability continuum for postsecondary readiness. Journal of
Advanced Academics, 19, 602-632.
Woolams, A. M. (2005). Imageability and ambiguity effects in speeded naming: Convergence
and divergence. Journal of Experimental Psychology: Learning, Memory, an dCognition,
31, 878-890.
Wright, B., & Stone, M. (1999). Measurement essentials (2nd ed.). Wilmington, DE: Wide Range
Incorporated.
Yarkoni, T., Balota, D., & Yap, M. (2008). Moving beyond Coltheart’s N: A new measure of
orthographic similarity. Psychonomic Bulletin & Review, 15, 971-979.
Early-Grades Text Complexity
Zhang, Z., & Wang, J. (2006). MLLE: Modified locally linear embedding using multiple
weights. In Advances in neural information processing systems 19, Proceedings of the
twentieth annual conference on neural information processing systems, Vancouver. Trier,
Germany: DBLP (Database and Language Programming) at Universität Trier.
65
The degree to which co-occurring
phonemes exist across words.
(Levenshtein Distance is a standard
computer metric of string edit distance
which gauges the minimum number of
substitution, insertion, or deletion
operations to turn one word into another.
Measures phonemic similarity across
words for the 20 closest words.
[Levenshtein, 1965/1966; Yarkoni,
Balota, & Yap, 2008; cf. Kruskal, 1999;
Nerbonne & Heeringa, 2001; Sanders &
Chinn, 2009].)
Mean Internal Phonemic The degree to which phoneme
Phonemic Levenshtein
Distance
0 (fewer phoneme
1 (few words in
closest 20 share
phonemes) to 3
(more words in
closest 20 share
phonemes)
4/Ex. Mean with text
14/Ex. Mean Phonemic
Levenshtein Distance 20
with stop list 50 most
frequent words
Text-Characteristics by Linguistic Level, Definition, Possible Score Range for Examples of Operationalizations,
and N of Operationalizations with Examples
Possible Score
N of/Variable
Range
for
Operationalizations/
Linguistic
________Definition (Source)________
____Examples____ ______Examples_______
__Level__ _Text Characteristic_
Sounds in
Number of Phonemes in Smallest unit of sound. (The MRC
1 (fewer phonemes 14/Ex. Mean number of
Words
Words
Psycholinguistic Database provides
in words, less
phonemes for words in
phoneme values for words [Coltheart,
complex) to less
the text
1981].)
than 10 (more
phonemes in words,
more complex)
Table 1
Word:
Structure
The decoding demand of words in the
text. (Slight modification of Menon &
Hiebert’s [1999] decodability scale.)
See Phonemic Levenshtein Distance
above. Orthographic Levenshtein
Distance measures orthographic similarity
across words for the 20 close words.
(Levenshtein, 1965/1966; cf., Kruskal,
1999; Yarkoni, et al., 2008.)
Number of syllables in words. (The MRC
Psycholinguistic Database provides
syllable values for words [Coltheart,
1981].)
The degree to which letter collocations
occur given the totality of the letter
collocations in the particular text.
(Researcher computer coded; cf. Solso,
Barbuto, & Juel, 1979).
The most commonly occurring words in
primary grades texts. (Dolch Word List,
Orthographic
Levenshtein
Distance
Number of Syllables in
Words
Mean Internal
Orthographic
Predictability
Sight Words
collocations occur given the totality of the
phoneme collocations in the particular
text. (Words are converted to phonemes
using the CMU [Carnegie Mellon
University] Pronouncing Dictionary
[Carnegie Mellon University, n.d.].)
Decoding Demand
Predictability
14/Ex. Mean
22/Ex. Mean with stop list
50 most frequent words
chunk size 125
67
0 (less complex) to
100 (more
0 (fewer
orthographic
trigrams are
repeated in the text)
to 1 (more are
repeated in the text)
13/Ex. Percent of words in
a text that are on the
4/Ex. Product of internal
word values with chunk
size 125
1 (few words with
18/Ex. Types as test with
many syllables) to 8
stop list 50 most
(more words with
frequent
more syllables)
(ability at 75%)
1 (fewer words in
20 share
orthographic
patterns) to 3 (more
orthographic
patterns)
1 (less complex
word structure) to 9
(most complex
word structure)
collocations are
repeated in the text)
to 1 (more phoneme
collocations are
repeated in the text)
Early-Grades Text Complexity
Syntax:
Within
Sentence
Word:
Meaning
The inverse of the frequency with which a
word appears in running text in a corpus
of 1.39billion words from 93,000
kindergarten through university texts
normalized to equate to the frequencies in
the Carroll, Davies, & Richman
frequency 5million word list.
(MetaMetrics, n.d.b; Carroll, Davies, &
Richman, 1971.)
Word Rareness
Number of characters, words, unique
words, or phrases in a sentence.
(Researcher computer coded.)
Degree to which the text contains words
that reference general or complex
concepts such as “honesty” and cannot be
seen or imaged. (Paivio, Yuille, &
Madigan, 1968, updated by Coltheart,
1981.)
Abstractness
Sentence Length
Age at which a word’s meaning is first
known. (Kuperman, StadthagenGonzalez, & Brysbaert, 2012.)
Age of Acquisition
n.d.; Fry Word List, n. d.)
Dolch Preprimer list
68
1 (fewer characters,
words, unique
words, or phrases)
and above 1 (more
characters, words,
.10 (less rare, less
complex) to 6
(more rare, more
complex)
0 (less abstract, less
complex) to 700
(more abstract,
more complex)
6/Ex. Mean number of
letters and spaces in
sentences
14/Word rareness types as
test (ability at 90%)
20/Degree of Abstractness
types as test with stop
list 50 most frequent
words (ability at 50%)
1 to 25 in our study 13/Age of Acquisition
(lower means more
types as test with stop
of the words are
list 50 most frequent
known by younger
words (ability at 50%)
readers and a higher
score means fewer
are known by
younger readers)
complex)
Early-Grades Text Complexity
Discourse
(Across
Sentences)
Linear Word Overlap
Family 1:
Intersentential
Complexity:
Linear Edit Distance
Grammar
Degree to which unique words in a first
The degree of word, phrase, and letter
pattern repetition across adjacent
sentences. The number of single character
replacements required to turn one
sentence into the next one. (Levenshtein,
1965/1966).
Link Type, a linguistic convention that
ties a word in a sentence to another word
within the sentence. Differentiates
between long sentences with many
different syntactic relationships and long
sentences with few syntactic relationships
(Link Grammar, n.d.; Sleator & Temperly,
1991; Definitions of all link types can be
found at
http://www.link.cs.cmu.edu/link/dict/sum
marize-links.html.)
0 (no words are
0 (if all sentences
are identical or
there is only one
sentence; lots of
redundancy, less
complex) to
approximately 110
in our study (not
much redundancy,
more complex)
1 (fewer unique
syntactic
relationships, e.g.,
subject/object or
noun-acting-asadjective) to 29
(more unique
syntactic
relationships within
sentences [a larger
number can occur
when the text has
one or more very
long sentences])
unique words, or
phrases)
6/Ex. Mean linear word
4/Ex. Mean linear edit
distance
1/Ex. Mean number of
unique link types in
sentences
Early-Grades Text Complexity
69
Edit Distance
Family 3: Phrase
Diversity:
Longest Common
String
Family 2:
Lexical/Syntactic
Diversity:
Type-Token Ratio
Cohesion Triggers
0 (no words on the
cohesion trigger
word list) to 39 in
our study (many
words on the
cohesion trigger
word list)
Words that indicate occurrence of
cohesion in text. Five categories of
cohesive devises between words in text
work to hold a text together. (cf. Halliday
& Hasan, 1976; Researcher devised
beginning with words listed at
Cohesion[Linguistics], n.d.)
0 (a lot of overlap,
a lot of redundancy,
less complex) to 1
(not much overlap,
more complex)
0 (the same
characters are
Degree of word, phrase, and letter pattern
repetition across multiple sentences.
Captures couplets and triplets. (Gusfield,
1997, reprinted 1999.)
Number of single character additions,
deletions, or replacements required to
An indicator of word diversity, or the
0 (few unique
number of unique words in a text divided words) to 1 (all
by the total number of words in a text. (cf. words are unique)
Malvern, Richards, Chipere, & Durn,
2009.)
repeated in a
following sentence)
sentence are repeated in a following
sentence, comparing sentence pairs
sequentially. (Researcher computer
coded.)
70
8/Ex. Mean Cartesian edit
distance with slice 125
21/Ex. Mean Cartesian
Longest Common String
percentage with slice
125
2/Ex. Type-token ratio
with chunk 125
1/Ex. Percent of words in
text that are on the
cohesion trigger word
list
overlap with slice 125
Early-Grades Text Complexity
The degree to which information in the
text can be compressed. Novel text is less
compressible. (Burrows & Wheeler,
1994.)
Total information load in text. Denser
texts have more information load, less
redundancy, and are more complex. Also
taps overlap of groups of co-occurring
word repetition. (Researcher devised
incorporating Latent Semantic Analysis
[Deerwester, Dumais, Furnas, Landauer,
Harshman, 1990; Landauer & Dumais,
1997].))
Degree to which unique words in a first
sentence are repeated in a following
sentence comparing all possible pairs in a
125 slice. (Researcher computer coded).
0 (low density, low
information load,
lots of novel cooccurring wordgroup repetition) to
1 (denser text,
higher information
load, not as much
novel co-occurring
word-group
repetition)
12/Normalized percent
reduction of information
load across sentences
for 10 dimensions with
slice 500
4 (unique words not 4/Ex. Percentage of
repeated much in a
Mean Cartesian word
following sentence)
overlap with slice 125
to 6 (unique words
for part of speech
repeated more)
repeated, high
redundancy) to 127
in our study (very
few characters are
repeated, low
redundancy)
71
0 (more
2/Ex. Compression ratio
compressible, more
with chunk 125
redundancy, less
complex) to 1 (less
compressible)
________________________________________________________________________________________________________
Family 5: NonCompressibility
Compression Ratio
Family 4: Text Density
Information Load
Cartesian Word
Overlap
turn one string (or sentence) into another.
(Levenshtein, 1965/1966; Kruskal, 1999.)
Early-Grades Text Complexity
Early- 72
Grades Text Complexity
Table 2
Importance Values for the Nine Text-Characteristics Variables and Descriptives for TextCharacteristics and Text-Complexity
Variable
Mean Importance
Mean
Operationalization
Value (S.D.)
(S.D.)
Range
Text Complexity
50.10
0.33(18.85)
100.00
Text Characteristics(1)
Word Structure
Decoding Demand (7)
Mean with stop list 50
.0164 (.0017)
5.32
2.00most frequent words
(0.97)
7.91
Types as test with stop list
50 most frequent (ability
at 75%)
.0633 (.0038)
1.42
(.24)
0.0022.42
Types as test with stop list
50 most frequent words
(ability at 50%)
.0917 (.0073)
3.67
(.52)
2.415.26
Abstractness (6)
Types as test with stop list
50 most frequent words
(ability at 50%)
.0557 (.0040)
384.35
(63.11)
199.80700.00
Word Rareness (9)
Types as test (ability at
90%)
.0064 (.0004)
1.29
(.29)
0.542.23
Mean linear edit distance
.3487 (.0125)
31.04
(17.37)
0.00109.88
Phrase Diversity (3)
Mean Cartesian Longest
Common String
percentage with slice 125
.1782 (.0090)
.80
(.13)
0.311.00
Text Density:
Information Load (2)
Normalized percent
reduction of information
load across sentences, 10
dimensions with slice 500
.2313 (.0116)
.76
(.10)
0.220.89
Number of Syllables in
Words (5)
Word Meaning
Age of Acquisition (4)
Discourse Level
Intersentential
Complexity (1)
Non-Compressibility (8)
Compression ratio with
.0084 (.0006)
.55
0.25chunk 125
(.11)
1.00
Note. Permutation accuracy Importance values were used following Strobl and colleagues
(2009). 1Rank order on Importance value. Descriptives for 350 texts. 2Zero scores occur when all
the words in the text are on the stop list.
Word
Rareness
Intersentential
Complexity
Phrase
Diversity
Text Density:
Information
Load
Non-Compressibility
Note. *p < .05; **p < .01.
Decoding
Demand
N of Syllables
in Words
Age of
Acquisition
Abstractness
.54**
.34**
.05
.41**
.63**
.57**
.34**
.51**
.37**
.06
.67**
.73**
.53**
.19**
.69**
.18**
.73**
.08
.57**
.35**
.49**
.63**
.51**
.52**
.13*
.12**
.13*
.18**
.23**
.53**
.46**
.34**
.22**
.37**
.41**
.42**
N of
Age of
Abstractness Word
Intersentential Phrase
Text
NonText
Syllables Acquisition
Rareness Complexity
Diversity Density:
Compressibility Complexity
in Words
Information
Load
.66**
.49**
.17**
.30**
.45**
.31**
.37**
.16**
.47**
Table 3
Correlations among Final Nine Text Characteristics and Text Complexity
____________________________________________________________________________________________________________
Early-‐Grades
Text
Complexity
73
Early-Grades Text Complexity 74
NOTE to COPY EDITOR: Please make this figure black and white for both print and online. (No
charge)
Figure 1
Correlation of Predicted with Empirical Text-Complexity in Relation to Least Important Variable
Deletion from Each of Three Models
Note. The top line represents correlational changes for teacher judgment, the middle line
represents correlational changes for the combined teacher and student text-complexity
assignments, and the bottom line represents correlational changes for the student text-complexity
assignments. Also, out-of-bag correlation is used.
Early-Grades Text Complexity 75
Figure 2
Trimming Variables: Relationship between Potential Correlational Threshold Cut-Points (XAxis) with Model Correlation (Y-Axis) (Top Figure), and the Relationship between Potential
Correlational Threshold Cut-Points (X-Axis) with Number of Remaining Variables (Y-Axis)
Note. Correlation is the correlation of the predicted with the empirical text-complexity measure.
Early-Grades Text Complexity 76
Figure 3
Scatterplot Depicting the Final Model During Validation
Early-Grades Text Complexity 77
NOTE to Copy Editor: We would like color for figure 4 for both online and print ($900).
Figure 4
Three-Dimensional Scatterplot Showing the Data Structure
Note. Color represents text-complexity level, with red as the highest and blue as the lowest. Each
point is a text.
Early-Grades Text Complexity 78
NOTE to Copy Editor: This can now be black and white both online and in
print. (no charge)
Figure 5
Split Plots for Individual Text-Characteristic Variable Relationships with Low
and High Text Complexity Levels
Note. Top clusters are high text-complexity texts. Bottom clusters are low text-complexity texts.
Figure 6
Single Regression Tree
NOTE to Copy Editor: Please do black and white for both online and print (no charge)
Early-Grades Text Complexity
79
Early-Grades Text Complexity
80
Early-Grades Text Complexity 81
NOTE to Copy Editor: Please do color for both online and print ($600)
Figure 7
Contour Plot of Age of Acquisition, Phrase Diversity, and Text Complexity
Early-Grades Text Complexity 82
Figure 8
Text-Characteristic Profiles by Text-Complexity Quintile Group.
Early-Grades Text Complexity 83
Figure 9
Text-Characteristic Profiles according to Text-Complexity Level and Genre.
Note. The top two lines represent high text-complexity levels. The bottom two lines represent low textcomplexity levels. Solid lines represent narrative texts, and dotted lines represent informational text.
View publication stats