Memory & Cognition

2001, 29 (6), 850-859

Comprehension skill, inference-making ability,

and their relation to knowledge
University of Sussex, Brighton, England

University of Toronto, Toronto, Ontario, Canada
and The Hospital for Sick Children, Toronto, Ontario, Canada
University of Oxford, Oxford, England

In this study we investigated the relation between young children’s comprehension skill and inference-
making ability using a procedure that controlled individual differences in general knowledge (Barnes
& Dennis, 1998; Barnes, Dennis, & Haefele-Kalvaitis, 1996). A multiepisode story was read to the children,
and their ability to make two types of inference was assessed: coherence inferences, which were es-
sential for adequate comprehension of the text, and elaborative inferences, which enhanced the text
representation but which were not crucial to understanding. There was a strong relation between com-
prehension skill and inference-making ability even when knowledge was equally available to all partici-
pants. Subsidiary analyses of the source of inference failures revealed different underlying sources of
difficulty for good and poor comprehenders.

Young children’s reading comprehension problems have at a local level but are unable to produce a coherent inte-
been attributed to deficiencies in a wide range of lower grated model of the text as a whole. Poor comprehenders’
level cognitive processing abilities, such as phonological difficulties with inference making are a likely cause of
processing skill (e.g., Shankweiler, 1989), word-decoding their text-level comprehension problems (Cain & Oakhill,
facility (e.g., Perfetti, 1985), and vocabulary knowledge 1999). In the present study, we explored possible sources of
(e.g., Beck, Perfetti, & McKeown, 1982; Carroll, 1993). In poor comprehenders’ difficulties with making inferences.
this article, we focus on a group of children who demon- Inference making is regarded as a central component of
strate text comprehension difficulties despite proficiency skilled reading (e.g., Garnham & Oakhill, 1996; Graesser,
in both word reading and these lower level cognitive skills Singer, & Trabasso, 1994; Singer, 1994; van den Broek,
(see Cain & Oakhill, in press, for a review). The compre- 1994). Although less skilled readers are capable of infer-
hension difficulties of these children must, therefore, arise ential processing, they do not generate as many inferences
from impairments in higher level cognitive skills. For ex- as more skilled readers do (e.g., Casteel, 1993; Casteel &
ample, previous research has shown that children with Simpson, 1991; Long, Oppy, & Seely, 1997; Oakhill, 1982,
comprehension difficulties are poor at inference making 1984; Omanson, Warren, & Trabasso, 1978; Paris & Lin-
(e.g., Cain & Oakhill, 1999; Oakhill, 1982, 1984). Such dauer, 1976; Paris & Upton, 1976). Thus, it is important to
problems have been interpreted within the mental models establish which factors limit inference making within such
framework (e.g., Oakhill, 1996). Our findings suggest that populations.
poor comprehenders construct incomplete representa- An inference can be made only when the requisite gen-
tions of text: They are often able to integrate information eral knowledge necessary to make that inference is avail-
able (e.g., Ackerman, Silver, & Glickman, 1990; Casteel,
1993). Indeed, relevant background knowledge for a pas-
sage is a better predictor of fourth graders’ ability to gen-
The study reported in this paper was supported by Economic and So- erate inferences from and elaborate on that text than is their
cial Research Council Grant R000 23 5438 awarded to J.V.O. and P.E.B.
The authors gratefully acknowledge the help of the Experimental Psy- comprehension skill (Marr & Gormley, 1982). General
chology Society, whose award to the first author, in the form of a Study knowledge differences are, therefore, a potential source of
Visit Grant, facilitated this work. The authors also thank The Hospital for individual differences in inference generation. Using a pro-
Sick Children, Toronto, for their kind hospitality on this visit. Finally, cedure that ensured that the relevant general knowledge
thanks to all the staff and pupils from the Brighton and Hove schools who
participated in this work. Correspondence should be addressed to K. Cain,
was equally available to all participants prior to inference
who is now at the Department of Psychology, University of Essex, Wiven- making, Barnes and colleagues have demonstrated that
hoe Park, Colchester, CO4 3SQ, England (e-mail: . knowledge availability is not sufficient to ensure adequate

Copyright 2001 Psychonomic Society, Inc. 850


inference making in both normally developing children different functions and make a greater number of coher-
(Barnes, Dennis, & Haefele-Kalvaitis, 1996) and children ence inferences than elaborative inferences (e.g., Cas-
with the neurodevelopmental disorder of hydrocephalus teel, 1993; Singer, 1994; Whitney, Ritchie, & Clark, 1991).
(Barnes & Dennis, 1998). Given that skilled comprehen- Previous work has demonstrated that poor comprehen-
ders are likely to read more than less skilled comprehen- ders are poor to generate both types of inference, relative
ders and, thus, acquire more information from text, it is to their skilled peers (e.g., Cain & Oakhill, 1999; Oakhill,
plausible that their superior inference-making ability may, 1982, 1984). However, a limitation of this previous work
in part, stem from greater general knowledge. is that the two types of inference have depended on the in-
A primary aim of the present study was to use Barnes’s tegration of information from different sources. Genera-
paradigm to determine the extent to which the inference- tion of a coherence inference required integration of dif-
making problems experienced by children who experi- ferent pieces of information from within the text, whereas
ence text-comprehension difficulties without neurologi- generation of an elaborative inference required the reader
cal disorder may be accounted for by “general knowledge” to integrate information from the text with prior or general
deficits. To explore this issue, children were first taught knowledge. In the present study, generation of both types
a novel knowledge base—a series of facts about an imag- of inference depended on the ability to recall the correct
inary planet. These facts provided a background for the textual premise, retrieve information from outside the text
text that they subsequently read. Individual facts from the (from the taught knowledge base), and integrate these
knowledge base had to be retrieved and integrated with two pieces of information. Thus, we were able to explore
information in the text in order to generate particular in- whether poor comprehenders were impaired in drawing
ferences. Answers to the inference questions were consid- knowledge-based inferences that served different func-
ered only if the relevant knowledge-base information was tions in a text, even when the processing requirements for
recalled immediately after the story and questions had both inferences were the same. In this study, both types
been completed. This procedure enabled us to investigate of inference required the integration of a text premise
inference-making ability when knowledge was equally with the knowledge base.
available to all participants. In addition, the learning and A further aim of the study was to investigate sources of
recall trials enabled us to determine whether reading- inference failure. Different sources of inference failure
comprehension ability and skill at drawing inferences were have been identified for different populations of children.
related to differences in the retention of the knowledge Failure to recall relevant textual premises is the main
base. source of young children’s failure to make coherence in-
Barnes et al. (1996) found that short-term retention of ferences (Barnes et al., 1996), but failure to integrate the
a learned knowledge base was comparable across differ- text premise with the knowledge-base item (when cor-
ent age groups but that poor comprehenders with hydro- rectly recalled) accounts for the majority of inference
cephalus (a developmental brain pathology) remembered failures by older good and poor (garden variety) readers
fewer knowledge-base items when retested at the end of (Barnes & Dennis, 1996). In the present study, different
the narrative (Barnes & Dennis, 1998). It is not known reasons for inference failure were investigated: failure to
whether such information is learned or represented differ- retrieve the correct premise from the text, failure to recall
ently by poor comprehenders who do not have neurologi- the relevant item for the knowledge base, failure to inte-
cal impairments, the population of interest in the present grate the two, or generation of the incorrect inference.
study. The ability to access such information and integrate These reasons for inference failure are detailed below.
it within a model of the text during comprehension may Failure to recall the correct premise from the text may
depend on the stability of the information in memory. One arise because of poor memory for the text per se. When
index of stability for a knowledge representation is the fewer propositions from the story are recalled, a less co-
ability to remember that information over time. We there- herent representation of the text will exist to support re-
fore included a delayed memory test for the taught knowl- call. Alternatively, the correct premise may not be re-
edge base, 1 week after the initial experimental session, in called because there may be failure to encode a particular
order to assess whether comprehension skill and inference- premise in the first place, either fully or partially. Failure
making ability were related to the stability and retention to recall the correct knowledge-base item may occur
of the knowledge base over time. when the item is available but is, for whatever reason, dif-
Inference-making ability was assessed in the following ficult to retrieve. Knowledge-base items may be less ac-
way. After learning the knowledge base, children were pre- cessible because they may have been encoded less effi-
sented with short episodes from a story. Using questions ciently or retained less precisely. When both items (text
asked after each episode, we assessed their ability to make premise and knowledge-base item) are available, an infer-
two types of inference: coherence inferences, which are ence may not be made because the two pieces of informa-
necessary to establish the links between premises in the text, tion are not integrated. Finally, children may also fail to
and elaborative inferences, which enrich the text repre- generate the correct inference because they make a differ-
sentation. Previous authors argue that these two inference ent one (incorrect inference) or because they are utiliz-
types are conceptually distinct and serve different func- ing a different set of criteria for textual cohesion and are
tions in the construction of a text representation (e.g., Garn- not aware that an inference is necessary. Ultimately, fail-
ham, 1982, 1989). In general, readers are sensitive to these ure to generate such inferences, for any of these reasons,

will result in a poorly integrated representation of the text, MacGinitie, 1989) and the Neale Analysis of Reading Ability—
and comprehension will suffer (Oakhill, 1996). Revised British Edition (Neale, 1989). As stated above, the purpose
As stated before, we know that children with adequate of the group selection procedure was to select two groups of chil-
dren with age-appropriate word-reading skills that differed in reading-
word-reading and vocabulary skills but poor text compre-
comprehension ability. We selected these groups by first adminis-
hension experience difficulties with inference making and tering the Gates–MacGinitie test to the entire 7- to 8-year-ol d
integration (e.g., Cain & Oakhill, 1999). However, although population of three junior schools (n = 163). This test is group-
their difficulties have been related to both working memory administered and requires children to select one out of four words
and metacognitive impairments, the source of inference- to go with the accompanying picture. This test provides a measure
making failure is not known. In the present study, we set of a child’s ability to read and understand single words out of con-
out to establish the reasons for inference-making differ- text. It was used to screen out “exceptional ” readers. These were
children who obtained either very low or very high scores and whose
ences between skilled and less skilled comprehenders reading age (calculated using the Neale Analysis) would be pre-
when knowledge was equally available to the two groups. dicted to be either substantially below or above their chronologica l
The question of interest here is, Do less skilled compre- age. In addition, children whose first language was not English or who
henders’ difficulties with inference making arise from the had known behavioral, emotional, or language difficulties were ex-
same underlying source as those of skilled comprehen- cluded from further testing. The remaining children (n = 79) were
ders, or do they fail to make as many inferences as their assessed individually using the Neale Analysis.
In the Neale test, children read a series of short stories out loud,
skilled peers because of a different source of difficulty? and any word reading errors are corrected. They are asked a set of
In summary, the present study was designed to assess comprehension questions after each story. The passages are graded
the following issues: (1) to determine whether poor com- in difficulty, and testing stops once a prescribed number of reading-
prehenders have difficulties with two types of knowledge- accuracy errors has been made. The test provides separate scores for
base inferences that perform different functions in text, reading accuracy, based on the number of words read correctly, and
(2) to assess the extent to which (general) knowledge def- reading comprehension, based on the number of comprehensio n
questions that the child answers correctly. Performance on the Neale
icits affect inference generation, (3) to identify the rea-
test was used to select and match the two groups (see Table 1 for
sons for inference failure and how they relate to compre- group characteristics) .
hension skill and inference type, and (4) to determine The skilled and less skilled comprehenders all obtained age-
whether less skilled comprehenders experience difficul- appropriate reading-accuracy scores and did not differ significantly
ties with inference generation from texts that they have on this measure [t(24) < 1.0]. The skilled group consisted of chil-
listened to (since the presentation of the text in the pres- dren whose reading-comprehension scores were at or above those
ent experiment was auditory). predicted by their reading-accuracy ability, whereas the less skilled
group consisted of children whose comprehension scores were de-
pressed relative to their word-reading age. As the values in Table 1
METHOD demonstrate, the mean difference between reading accuracy and
Participants reading comprehension for the less skilled group was 25 months. In
Two groups of children participated in this study: 7- to 8-year-ol d addition, the difference in reading-comprehension age between the
skilled comprehenders and less skilled comprehenders. It is now well skilled and less skilled comprehenders was 30 months [t(24) = 8.48,
established that some poor readers’ comprehension difficulties stem p < .001]. The two groups were also matched for chronological age,
from poor word-reading skills (e.g., Perfetti, 1985). In this study, we sight vocabulary (Gates–MacGinitie test), and the number of Neale
were not interested in generally poor readers, but, rather, we were stories that they had completed (all ts < 1.0). The latter measure was
interested in children who had a specific comprehension deficit in necessary to ensure that the difference in comprehension scores did
the presence of age-appropriate word-reading skills. Therefore, the not arise because the less skilled group had read fewer stories and,
skilled and less skilled comprehenders were matched for their abil- therefore, obtained lower comprehension scores simply because
ity to read words (both in and out of context) and for chronologica l they had attempted fewer comprehension questions .
age but were selected to differ on a measure of text comprehension .
In this way, we aimed to control for the influence of lower level de- Materials and Procedure
coding and vocabulary skills on text comprehension (cf. Nation & All children were tested individually. The materials and proce-
Snowling, 1998). dure were modified from those used by Barnes et al. (1996) and are
There were 13 children in each group, selected using two tests: The explained in more detail below. There were three phases to the
Gates–MacGinitie Primary Two Vocabulary Test (MacGinitie & experiment.

Table 1
Group Characteristics (Means and Standard Deviations)
Gates– Reading Reading Number of
Age MacGinitie Accuracy Comprehensio n Stories
Skill Group M SD M SD M SD M SD M SD
Less skilled 8,0 3 41.3 2.01 8,10 12 6,9* 4 3.9 0.86
Skilled 8,1 4 41.7 1.88 8,8 7 9,2 12 4.1 0.76
Note—For less skilled and skilled comprehenders, ns = 13. Where appropriate, ages are given in
years, months, and standard deviations are given in months. The reading accuracy and comprehen -
sion scores are the age equivalent scores provided in the Neale test, and the number of stories read
refers to the stories that were completed during this assessment. *Less skilled comprehenders ob-
tained significantly ( p < .05) lower scores than skilled comprehenders .

1. Children were first taught a knowledge base, which comprised Table 2

12 facts about an imaginary planet called Gan. Example of Story Episode and Questions
2. When the knowledge base had been taught to criterion (perfect Episode
recall), the children were read a six-episode story about this planet.
Immediately after they had heard each episode, the children were The sun was going down and it was getting very cold indeed. Dack and
asked four questions tapping literal and inferential information Tane took their coats out of their bags and put them on. Their coats were
from that episode. made of bear’s fur. They felt much warmer. Before long the path was icy
3. Immediately after the children had heard all six episodes, re- and slippery. Dack and Tane kept falling on the ice. They saw two tur-
tles ahead of them on the path. “I wish I was a turtle,” sighed Dack.
tention of the knowledge base was retested. It was also retested a
Tane slipped and fell on top of her rucksack, crushing all the strawber-
week later.
ries that they had picked earlier. When Dack tried to help her up, he fell
over too. Dack was covered in scrapes and bruises. He was like a boxer
The Knowledge Base who had lost a fight. “Poor Dack,” said Tane picking herself up, “you’ll
The knowledge base comprised the 12 items used by Barnes et al. feel better tomorrow.” She helped Dack up. Then they walked very
(1996) that were relevant for our shortened version of the story. Each carefully along the path, holding each other by the hand.
item was a piece of information about people, the environment , or a
common object that was given a different property on Gan (e.g., “The Questions
ponds on Gan are filled with orange juice,” “Bears on Gan have bright 1. What did Dack and Tane take out of their bags? (elaborative inference)
blue fur,” “The turtles on Gan have ice skates attached to their feet”)
The experimenter read out each item, emphasizing the novel prop- 2. What did Dack wish? (coherence inference)
erty (underlined above). Acquisition of the knowledge base was then
tested using a forced-choice picture-recognition task and a verbal
recall task. These two tasks provided an indication of how easily the 3. What happened when Tane fell down? (literal information)
participants acquired the novel information and served to teach the
knowledge base to criterion, as follows. 4. What does “Dack was like a boxer who had lost a fight mean?”
Forced-choice picture-selection task. This test was adminis- (novel simile)
tered after the experimenter had read out the knowledge base, as de-
scribed above. There were 12 trials, one for each item in the knowl-
edge base. The children were presented with four pictures. Their task to establish cohesion, because particular clauses in the story were
was to choose the picture that corresponded to the state of affairs on anomalous unless integrated with information in the knowledge
Gan. For example, to test recall of the information that “the turtles on base. For the example in Table 2, “‘I wish I was a turtle,’ sighed Dack”
Gan have ice skates attached to their feet,” the child’s task was to can be fully comprehended only if the information from the knowl-
point to the picture that represented the characteristics of the ponds edge base that the turtles on Gan have ice skates attached to their
on Gan. The three distractors were (1) the true state of affairs on Earth feet is brought to bear on its interpretation .
(turtles without skates), (2) property other than the one ascribed to Elaborative inference s. These inferences were not necessary to
the object on Gan (turtles wearing roller skates), and (3) the Ganian maintain textual cohesion, but they elaborated on story information .
property ascribed to another object (ducks with ice skates). For the example in Table 2, it is not necessary to make the inference
Verbal recall task. This test was administered immediately after that the bear fur coats that the children put on were blue. However,
the picture-selection task described above. There were 12 specific the addition of such information would create a richer representatio n
questions to test memory for each item in the knowledge base (e.g., of the text.
“What are the turtles on Gan like?”) All of the inference and literal questions remained the same as
In both tasks, wrong answers were corrected immediately and those used by Barnes et al. (1996), but alterations to the texts meant
retested later, after the complete set of items had been presented . that one simile was different .
Thus, only items that were recalled incorrectly were presented more After the knowledge base had been learned to criterion, the story
than once. was read out by the experimenter, one episode at a time. The ques-
tions were asked immediately after each episode in the order in
The Story Episodes and Questions which the information occurred in that episode. Therefore, order of
After the verbal recall test, the experimenter presented the story question type (coherence inferences, elaborative inferences, literal,
episodes. There were six episodes, selected from the original 10 and simile) varied across the six episodes, but strict counterbalanc -
episodes used by Barnes et al. (1996). The main criterion for inclu- ing of the different types of question was not possible. When a ques-
sion of an episode was continuity of story line. Some vocabular y tion was answered incorrectly or incompletely, a nonspeci fic
items were changed to British English terms and minor modifications prompt was used to elicit a fuller answer, “tell me more about that.”
were necessary to ensure that the plot was coherent. Each episode When an inference was not made, a direct question restating the
was 142–169 words in length. There were four questions associate d premise information was asked (e.g., “Why did Dack wish he was
with each episode to assess different aspects of text comprehension . a turtle?”).
These are described below. An example of the story and the four
question types is provided in Table 2. Retention of the Knowledge Base
Question types. There were four types of questions: literal con- When all of the story episodes and questions had been com-
tent, simile comprehension, coherence inferences, and elaborative pleted, memory for the knowledge base was tested once more. The
inferences. children were asked the same questions as those used in the verbal
Literal content. These questions assessed memory for informa- recall task described above, but, obviously, no feedback or retesting
tion given literally in the text. occurred. Performance on this task was taken into account in the
Simile comprehensio n. For each episode, there was a question scoring of the inference comprehension questions: Only the re-
that required an explanation of a novel simile that appeared in that sponses to inference questions for which the knowledge-base items
episode. General knowledge had to be integrated with information were remembered in this poststory test were included in the total
given explicitly in the text, in order to understand the similes. scores used in the analysis. Thus, the participants were not penal-
Coherence inference s. These inferences were necessary to main- ized for incorrect responses that were dependent on knowledge-bas e
tain story coherence. Although individuals may differ in their cri- information that they could not recall. Recall of the knowledge base
teria for coherence, this type of inference was regarded as essential was tested once more, a week later, using the same task.

RESULTS Comprehension Questions

Scores obtained on the three question types—literal
The first set of analyses reported below assessed ac- content, similes, and inferences—were analyzed in sep-
quisition and memory for the novel information pre- arate ANOVAs. All responses were scored according to
sented in the knowledge base. The next set of analyses the guidelines in the manual (constructed on the basis of
assessed performance on the comprehension questions. the study by Barnes et al., 1996) by two raters who were
Subsidiary analyses were conducted to investigate rea- blind to the skill group of the child. Consistency between
sons for not making inferences. the raters was high. There were no disagreements on re-
sponses to literal and coherence inference questions, one
The Knowledge Base: Acquiring disagreement on an elaborative inference question, and
and Remembering Novel Information nine disagreements on responses to simile questions
Scoring. One point was awarded for each item cor- (which was fewer than 6% of all simile responses). All
rectly recalled the first time, two points for items requir- disagreements were resolved by discussion.
ing a second trial, three points for three trials, and so on. Memory for literal content. Scores were awarded as
The score obtained for each task was the sum of the follows: 3 points for a full response without prompting
learning trials required until perfect recall was achieved. (for the example in Table 2 “she squashed all the berries
Therefore, the score obtained reflects ease of learning: A in her bag”), 2 points for a full response after the prompt
score of 12 denotes perfect recall (in either the picture “tell me more about that,” 1 point for a partial response
choice or the verbal recall test); a score of 14 indicates that that was not improved when prompted (e.g., “she fell on
either one trial was corrected and retested twice or that top of her bag/she squashed all the strawberries”), and 0
two trials were corrected and retested once. points for incorrect answers and “don’t knows.” There
Learning the knowledge base. The scores are pre- were six literal content questions in total (one in each
sented in the first two columns of Table 3. The picture- episode); therefore, the maximum possible score was 18.
selection task was administered before the verbal recall Mean scores are shown in Table 4 (column 1). A t test re-
task, and, as expected, the scores on the latter task were vealed that the skilled comprehenders recalled more lit-
slightly lower, indicating that fewer items were retested. eral information than did the less skilled comprehenders
Scores on both measures suggest that, as intended, both [t(24) = 2.65, p < .015].
groups acquired the knowledge base with relative ease. Similes. The scoring system was similar to that used
However, there was some indication that the skilled com- for the literal content questions. Example responses refer
prehenders were quicker to learn the knowledge base to the simile in the sample text provided in Table 2.
than were the less skilled comprehenders [picture, t(24) = Three points were awarded for a full interpretation of the
1.96, p < .062; verbal, t(24) = 1.93, p < .067]. simile without prompting (e.g., “he was covered in
Retention of the knowledge base. The scores for the scrapes and bruises”), 2 points for a full interpretation
immediate and delayed tests of recall of the knowledge after prompting, 1 point for a partial response that was
base are given in Table 3. A two-way analysis of variance not improved when prompted (e.g., “he was hurt”) and
(ANOVA; skill group ´ test session) revealed a marginal for “mixed” responses that included some interpretation
effect of skill group [F(1,24) = 3.46, p < .08], and a sig- and some literal description (e.g., “he’s just been in a
nificant effect of test session [F(1,24) = 9.75, p < .006]. f ight, looks like he got punched”), and 0 points for
There was a significant interaction between the two vari- wholly incorrect answers and “don’t knows.” A t test be-
ables [F(1,24) = 9.75, p < .006]. The interaction arose tween the two groups’ scores did not reach significance
because the two groups obtained comparable scores in [t(24) = 1.63, p > .10].
the immediate recall test, but the skilled comprehenders Inference-making skill. The total number of infer-
demonstrated superior retention of the knowledge base ences correctly drawn when the question was first asked
over time. or after the prompt for further information (“tell me more

Table 3
The Knowledge Base: Ease of Learning and Retention Scores
(Means and Standard Deviations) for Each Skill Group
Ease of Learning Retention
Picture Test Verbal Recall Immediate 1-Week Delay
Skill Group M SD M SD M SD M SD
Less skilled 14.08 2.25 13.62 2.40 11.77 0.69 10.77* 1.48
Skilled 12.77 0.83 12.31 0.48 11.85 0.37 11.85 0.37
Note—For the picture test and the verbal recall test, 1 point was awarded for each item correctly
recalled the first time. Errors were corrected and retested, and 1 point was added for additional
trials needed. Perfect recall score was 12. For the immediate and delayed tests of recall, the max-
imum score was 12. *Less skilled comprehenders obtained significantly ( p < .05) lower scores
than skilled comprehenders .

Table 4 performance on coherence inference questions. It is also

Literal Questions and Similes: Mean Scores possible that differential memory for the knowledge base
and Standard Deviations for Each Skill Group affected performance, even though responses to the in-
Literal Questions Similes ference questions were conditionalized for immediate
Skill Group M SD M SD knowledge-base recall: The skilled comprehenders dem-
Less skilled 10.38* 3.52 12.31 4.05 onstrated better retention of the knowledge base over a 7-
Skilled 13.31 1.84 14.92 4.11 day period than did the less skilled comprehenders.
Note—For literal questions and similes, the maximum score was 18. Analysis of covariance was used to determine whether
*Less skilled comprehenders obtained significantly ( p < .05) lower scores these differences in memory for either the text or the
than skilled comprehenders .
knowledge base could account for differential perfor-
mance on the inference questions. As noted above, the in-
ference scores were conditionalized for immediate recall
about that”) was calculated. These totals were then ad- of the knowledge base. Therefore, this analysis consti-
justed to take into account an individual’s retention of the tutes a very stringent test of the hypothesis that differ-
knowledge base when tested immediately after the story ences between good and poor comprehenders’ inference-
presentation. Points were awarded on a matched basis. making skill are not simply the result of knowledge
For example, failure to recall the specific information differences. Both indicators of differential memory—
about the turtles on Gan resulted in the exclusion of the literal scores and delayed recall of the knowledge base—
response to that question. There were no instances in were entered as covariates to control for a general trend
which a child produced a correct inferential answer but toward differences in memory. The main effect of skill
did not recall the appropriate knowledge-base item at the group was significant [F(1,22) = 4.45, p < .05].
end of the story. The scores entered into the analysis were Stages in the inference-making process. In the intro-
each individual’s raw score expressed as a proportion of duction, we identified different stages of the inference-
the total possible score for that child, dependent on their making process where difficulties might arise. Incorrect
memory of the knowledge base. Thus, if a child recalled premise recall is one type of information retrieval error,
all six items of the knowledge-base items on which the which occurs when an individual fails to retrieve the rel-
elaborative inferences could be drawn, their maximum evant premise from the text. Individuals may either for-
possible raw score was 6; if they only recalled three of get the relevant premise or retrieve the incorrect premise
these items, their maximum possible raw score was 3. from the story. Another type of information retrieval error
The proportional scores were analyzed in a two-way is a failure to recall the correct item from the knowledge
ANOVA, with skill group (skilled, less skilled) and in- base. This source of inference failure has already been ad-
ference type (coherence, elaborative) as factors. The group dressed in the analysis of the proportional (adjusted)
means are reported in Table 5. There was a main effect of scores. Integration failures occur when children fail to
skill group because the skilled comprehenders made more integrate the two relevant pieces of information when re-
inferences in general than the less skilled comprehenders trieved. In addition, individuals may generate an incor-
did [F(1,24) = 12.31, p < .002]. Although more coher- rect inference in order to make sense of the text and es-
ence inferences were drawn than were elaborative infer- tablish coherence by, for instance, integrating the premise
ences, the effect of inference type was not significant information with other (possibly real-world) knowledge.
[F(1,24) = 1.06, p > .10], and there was no interaction For instance, in one episode, the children get to a high
between the two factors, skill group and inference type fence. The children put their shoes on and “they flew across
[F(1,24) < 1.0]. Because of the small number of items the fence, landing gently on the other side.” To fully un-
(n = 6) for each inference type, effect sizes were calcu- derstand this part of the story, a coherence inference
lated. The effect sizes were substantial: 1.3 and 0.96 for must be made to integrate the textual premise with the
coherence and elaborative inferences, respectively. knowledge-base information that the shoes on Gan have
wings. However, when asked “What did Dack and Tane
Reasons for Inference Failure do at the fence?” one participant responded, “they jumped
There are several reasons why an individual may have
failed to make an inference, even when he/she was able to
Table 5
recall the knowledge-base information immediately after Inference Questions: Mean Proportional Scores
story presentation. We were able to explore some of these and Standard Deviations for Each Skill Group
possibilities in subsidiary data analyses, reported next. for the Two Types of Inference
Differential memory of the text and the knowledge Type of Inference
base. It is possible that the less skilled comprehenders Coherence Elaborative
had poorer memory for the text per se. Indeed, in this Skill Group M SD M SD
study, they were poorer at answering literal questions Less skilled .359* .245 .322* .228
than the skilled comprehenders (cf. Cain & Oakhill, Skilled .627 .168 .556 .255
1999; Oakhill, 1982). Furthermore, Barnes et al. (1996) *Less skilled comprehenders obtained significantly ( p < .05) lower
found that scores on the literal questions were related to scores than skilled comprehenders .

over.” Such errors were classified as incorrect inferences Table 7

and agreed by both markers. The percentages of incorrect Inferences Made With Direct Questions: Mean Proportional
coherence inferences were 1.3% and 2.5% for skilled and Scores and Standard Deviations, Adjusted to Take Recall
of Knowledge Base Into Account, for Each Skill Group
less skilled comprehenders, respectively. The percent- for Both Types of Inference
ages of incorrect elaborative inferences were 2.5% and Type of Inference
10.3% for skilled and less skilled comprehenders,
Coherence Elaborative
Skill Group M SD M SD
The different error types are not logically independent,
so the data were explored in the following way. First, we Less skilled .705 .282 .751 .191
Skilled .974 .063 .962 .073
conducted a two-way ANOVA (skill group ´ inference
type) on the incorrect inference responses. The analysis of
these errors takes into account the number of inference
failures made for each type of inference. So, for instance, greater proportion of less skilled comprehenders’ inference
if there were three coherence failures, and all three could failures could be attributed to a failure to recall the relevant
be attributed to generating an incorrect inference, the pro- premise from the text, relative to the skilled group’s errors.
portion score for “inference failure” entered into the analy- There was also a main effect of inference type [F(1,24) =
sis would be 1.0. If only two of the errors were attributable 7.08, p < .015], because this source of difficulty was more
to incorrect inference generation, the proportion would common for elaborative inference failures than for co-
be .667. herence ones. The interaction between the two factors was
Very few errors could be attributed to an incorrect in- not significant [F(1,24) = 0.77, p > .10].
ference, and an analysis of these errors showed no signif- Recognition of the need to make an inference.
icant effect of skill group [F(1,24) = 1.32, p > .10]. Thus, Skilled comprehenders may generate more inferences than
there was no evidence to suggest that less skilled compre- do less skilled comprehenders because they regularly mon-
henders gained lower inference scores because they were itor their comprehension and see the need to make infer-
generating nontarget inferences. However, a greater pro- ences to fill in missing details. Barnes et al. (1996) found
portion of elaborative inference failures, relative to coher- that children were able to answer direct questions that re-
ence inference failures, could be attributed to incorrect in- quired an inference (e.g., “Why did Dack wish that he was
ference generation [F(1,24) = 4.70, p < .05]. Incorrect a turtle?”) even when they had not made that inference
inference generation accounted for 3% of coherence infer- when originally asked. We incorporated these direct ques-
ence failures for both groups. This type of error accounted tions into our comprehension questioning procedure in
for 7% of elaborative inference failures made by skilled order to determine whether less skilled comprehenders
comprehenders and 15% of less skilled comprehenders’ were able to generate inferences when explicitly required
elaborative failures. The interaction between the two fac- to do so. The means in Table 7 represent the total number
tors was not significant [F(1,24) < 1.0]. of inferences made, collated from when the question was
To explore the contribution of incorrect premise recall first asked, with prompts (if needed) and with direct ques-
and integration failure to inference-making difficulties, we tions (if needed). Because these data are not independent
calculated the proportion of remaining errors (not includ- of those reported earlier, they were not subjected to sta-
ing knowledge-base retrieval difficulties or incorrect in- tistical analysis. Nevertheless, the mean scores are reveal-
ference generation) that involved a failure to recall the cor- ing: The skilled comprehenders were now performing at
rect premise from the text. These scores, shown in Table 6, ceiling, and, more interestingly, the less skilled compre-
were entered into a two-way ANOVA, with skill group and henders’ performance improved greatly.
inference type as factors. The analysis revealed a main ef-
fect of skill group [F(1,24) = 5.33, p < .035], because a DISCUSSION

The primary aim of the present study was to investi-

Table 6 gate the relation between reading-comprehension skill
Inference Failures: Mean Proportional Scores and the ability to draw inferences when knowledge was
and Standard Deviations of Remaining Inference Failures available. To make an inference, information from the
That Could Be Attributed to Incorrect Premise Recall
Once Knowledge-Base Failure and Incorrect Inference text and a taught knowledge base had to be recalled and
Generation Had Been Excluded for Each Skill Group integrated. When inferences were not made, we explored
for Both Types of Inference whether there was a common source of difficulty for
Type of Inference good and poor comprehenders and a common source of
Coherence Elaborative difficulty for the two inference types, coherence and
Skill Group M SD M SD elaborative. As stated in the introduction, less skilled
Less skilled .368* .345 .526 .285 comprehenders may simply have poorer memory of the
Skilled .103 .199 .415 .368 information necessary for inference generation. An in-
*Less skilled comprehenders obtained significantly ( p < .05) higher ference may not be made because of a failure to recall the
scores than skilled comprehenders . knowledge-base item or the correct premise from the text

or because of a failure to integrate the two. In addition, ing performance on the test of long-term retention of the
children may generate a different inference to that in- knowledge base. This restricted range of scores may have
tended, or they may not draw an inference because they are limited the explanatory power of the retention measures.
not aware that one is necessary. Inference failure, for any Second, it may be that these variables are not suitable con-
of these reasons, will result in a less detailed and integrated trols for knowledge stability or accessibility. They do not
model of the text. We summarize and discuss the results assess the degree of integration between the items ac-
as they relate to these points, in turn. quired in the new knowledge base, nor do they assess the
The procedure was designed to ensure that all children speed or efficiency of access of this new information.
could learn the knowledge base from which the infer- Barnes et al. (1996) found that easily accessible knowl-
ences could be drawn with relative ease. There were ceil- edge was more likely to be used in inferencing than was
ing effects in both immediate and delayed recall of the knowledge that took longer to retrieve. We were not able
knowledge base, but these were a necessary consequence to test accessibility of knowledge in the present study, but
of the task requirements (to learn the knowledge base to it is plausible that poor comprehenders were restricted
criterion and to retain it). The results suggest that children by the accessibility of information in the taught knowledge
with good comprehension skills may find it easier to ac- base to a greater degree than were good comprehenders.
quire new knowledge and are also able to construct more Measures of knowledge accessibility should be included
stable representations of newly taught knowledge than less in follow-up studies.
skilled comprehenders, even when their short-term reten- In this study, we found that literal memory for the text
tion does not differ markedly. However, the present study did not account for group differences in inference making.
was not designed to assess the acquisition and retention of In contrast, Barnes et al. (1996) found that literal memory
knowledge, although these are issues that warrant further for the text in general was a significant predictor of co-
investigation. The skilled comprehenders’ superior recall herence inference-making ability within a population of
of the knowledge base may also have been aided by their 6- to 15-year-olds. Our present finding is supported by a
better memory for the story. Specifically, they may have study conducted by Omanson et al. (1978), who demon-
constructed a more integrated and embellished represen- strated that age-related gains in general memory capac-
tation of the text, which may have served to strengthen ity could not wholly account for developmental improve-
their memory of the knowledge base, such that knowledge- ments in inference making. The discrepancies between
base items may have been available as an integral part of different studies suggest that it may be necessary to explore
the story rather than a list of discrete facts. different aspects of literal memory in more detail in future
The differences that existed in availability of the knowl- work, such as the quality (i.e., detail) of the literal recall,
edge base were taken into account when inferencing skill as well as the quantity.
was assessed. The less skilled comprehenders generated A common source of inference difficulty for the less
significantly fewer inferences than the skilled comprehen- skilled comprehenders in the present study was a failure to
ders did. The effect sizes revealed that the group differ- retrieve the relevant textual premise. The primary source of
ences in inference-making skill were substantial (Cohen, skilled comprehenders’ inference failures occurred at a dif-
1988). Thus, even when they had the requisite knowledge- ferent stage in the comprehension process. Often, they re-
base information from which to generate an inference, called both the relevant textual premise and the knowledge-
the less skilled comprehenders did not make these infer- base item but failed to integrate the two. Recall of the
ences as readily as their skilled peers did. Knowledge incorrect premise suggests that the less skilled compre-
availability is therefore not a sufficient condition for in- henders experienced difficulty in selecting the relevant in-
ferencing, and we can rule out lack of knowledge as a pri- formation on which the inference should be based. Despite
mary source of poor comprehenders’ inference-making the finding that a higher proportion of the good compre-
difficulties. Furthermore, analysis of covariance demon- henders’ failures can be attributed to integration failures,
strated that the skilled comprehenders’ superior inference- relative to the less skilled group, it is certainly not the case
making skills were not simply due to differences in their that less skilled comprehenders do not experience integra-
memory for either the text or the knowledge base over tion failures. Rather, the less skilled comprehenders’ diffi-
time. As stated above, knowledge for the story may serve culties arose at an earlier stage in the inference-making
to strengthen memory for the knowledge base. Thus, the process: They often failed to recall the information that had
inclusion of delayed knowledge-base recall in our analy- to be integrated to generate the inference.
sis provides a particularly strong test of the hypothesis A small proportion of inference failures in both groups
that inference-making differences could be attributed to could be attributed to generating the incorrect inference.
differential memory. The group difference remained even This type of error was more common for elaborative in-
when both indicators of differential memory were en- ferences than for coherence ones. Generation of the wrong
tered into the analysis. inference is an indication that the reader (or listener) is poor
There are two qualifications to the conclusion that dif- at selecting the relevant information from the text and from
ferential memory did not affect inference-making perfor- his/her general knowledge (in this instance, the taught
mance. First, the skilled comprehenders demonstrated ceil- knowledge base). There were very few instances of incor-

rect inference generation and, thus, no indication that the knowledge base, even when those inferences are necessary
less skilled comprehenders gained lower inference scores for comprehension. In addition, we demonstrated that less
because they were generating nontarget inferences. skilled comprehenders’ difficulties with inference mak-
The groups did not generate a significantly greater ing are not just restricted to reading situations but are
number of coherence inferences than elaborative ones, al- apparent in tasks involving listening comprehension as
though there was a trend in that direction, suggesting that well. Less skilled comprehenders’ difficulties with infer-
both groups of children were sensitive to the need to main- ence making were not wholly accounted for by memory for
tain textual coherence by making necessary inferences. the text or information outside of the text that was essen-
Previous studies have explored coherence and elaborative tial for inference generation. An analysis of errors revealed
inferences and report a difference between these two that a more likely source of inference-making difficulty for
types of inferences (e.g., Casteel & Simpson, 1991). How- this group was an inability to select the information rel-
ever, none of these studies have required the integration evant to making the inference.
of information from both text and a knowledge base for
both types of inferences: Coherence inferences can often REFERENCES
be generated from information provided in the text alone
Ackerman, B. P., Silver, D., & Glickman, I. (1990). Concept avail-
by, for example, integrating two propositions. This differ- ability in the causal inferences of children and adults. Child Devel-
ence in processing requirements may be one reason for opment, 61, 230-246.
the smaller than expected difference between coherence Barnes, M. A., & Dennis, M. (1996). Reading comprehension deficits
and elaborative inferences that was found in the present arise from diverse sources: Evidence from readers with and without
developmental brain pathology. In C. Cornoldi & J. V. Oakhill (Eds.),
study. Using the same texts, Barnes et al. (1996) found Reading comprehension difficulties: Processes and interventions
the smallest difference between the two inference types (pp. 251-278). Hillsdale, NJ: Erlbaum.
for 8- to 9-year-olds. Thus, our apparent lack of differen- Barnes, M. A., & Dennis, M. (1998). Discourse after early-onset hy-
tiation between these types for 7- to 8-year-olds may be drocephalus: Core deficits in children of average intelligence. Brain
an indication of developmental differences. It could be & Language, 61, 309-334.
Barnes, M. A., Dennis, M., & Haefele-Kalvaitis, J. (1996). The ef-
that, once word reading has become fairly fluent and text- fects of knowledge availability and knowledge accessibility on coher-
comprehension skills are beginning to develop indepen- ence and elaborative inferencing in children from six to fifteen years
dently, children are still learning which inferences are neces- of age. Journal of Experimental Child Psycholog y, 61, 216-241.
sary and which are merely elaborative, and they are having Beck, I. L., Perfetti, C. A., & McKeown, M. G. (1982). Effects of long-
term vocabulary instruction on lexical access and reading comprehen -
to adjust their standards accordingly. sion. Journal of Educational Psychology, 74, 506-521.
Interestingly, different errors were associated with the Cain, K., & Oakhill, J. V. (1999). Inference making and its relation to
two inference types. Incorrect inference generation and comprehension failure. Reading & Writing, 11, 489-503.
retrieval of incorrect textual premises were more common Cain, K., & Oakhill, J. V. (in press). Reading comprehension difficul-
sources of elaborative inference than coherence inference ties. In P. E. Bryant & T. Nunes (Eds.), International handbook of
children’s reading. Dordrecht: Kluwer.
failures. These different patterns of error indicate that, Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-
even though there was no overall advantage for coherence analytic studies. New York: Cambridge University Press.
over elaborative inferences, the children were less aware Casteel, M. A. (1993). Effects of inference necessity and reading goal
of which information was relevant for elaborative infer- on children’s inference generation. Developmental Psycholog y, 29,
ence generation than for coherence inference generation. Casteel, M. A., & Simpson, G. B. (1991). Textual coherence and the
The less skilled comprehenders were not significantly development of inferential generation skills. Journal of Research in
impaired in their ability to interpret novel similes. This Reading, 14, 116-129.
finding is somewhat surprising, because many of the cog- Cohen, J. (1988). Statistical power analysis for behavioral sciences (2nd
nitive processes necessary to make inferences are likely ed.). New York: Academic Press.
Garnham, A. (1982). Testing psychological theories about inference
to be involved in simile interpretation—namely, applica- making. Memory & Cognition, 10, 341-349.
tion of (general) knowledge from outside the text to a tex- Garnham, A. (1989). Inference in language understanding: What, when,
tual premise. Although similes are a type of figurative ex- why and how. In R. Dietrich & C. F. Graumann (Eds.), Language pro-
pression, they offer clues to their nonliteral interpretation cessing in social context (pp. 153-172). Amsterdam: North-Holland.
Garnham, A., & Oakhill, J. V. (1996). The mental models theory of
in much the same way that direct questions indicate that language comprehension. In B. K. Britton & A. C. Graesser (Eds.),
an inference is required. It may be that comprehension of Models of understanding text (pp. 313-339). Hillsdale, NJ: Erlbaum.
a different form of figurative language, which requires a Graesser, A. C., Singer, M., & Trabasso, T. (1994). Constructing infer-
greater degree of inferential processing, such as unfamil- ences during narrative text comprehension. Psychological Review,
iar or novel idiomatic phrases, may be more strongly re- 101, 371-395.
Long, D. L., Oppy, B. J., & Seely, M. R. (1997). Individual differences in
lated to comprehension skill—a hypothesis that we are cur- readers’ sentence- and text-level representations. Journal of Memory &
rently pursuing. Language, 36, 129-145.
To summarize, the relation between comprehension MacGinitie, W. H., & MacGinitie, R. K. (1989). Gates–MacGinitie
skill and inference making is now well established. In the reading tests. Chicago: Riverside.
Marr, M. B., & Gormley, K. (1982). Children’s recall of familiar and
present study, we demonstrated that children with com- unfamiliar text. Reading Research Quarterly, 18, 89-104.
prehension difficulties are deficient at making inferences Nation, K., & Snowling, M. J. (1998). Semantic processing and the
that require integration of textual premises with a taught development of word-recognition skills: Evidence from children with

reading comprehension difficulties. Journal of Memory & Language, Paris, S. G., & Upton, L. R. (1976). Children’s memory for inferential
39, 85-101. relations in prose. Child Development, 47, 660-668.
Neale, M. D. (1989). The Neale analysis of reading ability—Revised Perfetti, C. A. (1985). Reading ability. Oxford: Oxford University Press.
British edition. Windsor: NFER-Nelson. Shankweiler, D. (1989). How problems of comprehension are related
Oakhill, J. V. (1982). Constructive processes in skilled and less-skilled to difficulties in decoding. In D. Shankweiler & I. Y. Liberman (Eds.),
comprehenders ’ memory for sentences. British Journal of Psychology, Phonology and reading disability: Solving the reading puzzle (pp. 35-
73, 13-20. 68). Ann Arbor: University of Michigan Press.
Oakhill, J. V. (1984). Inferential and memory skills in children’s com- Singer, M. (1994). Discourse inference processes. In M. A. Gernsbache r
prehension of stories. British Journal of Educational Psychology, 54, (Ed.), Handbook of psycholinguistic s (pp. 479-515). San Diego: Aca-
31-39. demic Press.
Oakhill, J. V. (1996). Mental models in children’s text comprehension . van den Broek, P. (1994). Comprehension and memory of narrative
In J. V. Oakhill & A. Garnham (Eds.), Mental models in cognitive sci- texts: Inferences and coherence. In M. A. Gernsbacher (Ed.), Handbook
ence: Essays in honour of Phil Johnson-Laird (pp. 77-94). Hove, of psycholinguistic s (pp. 539-588). San Diego: Academic Press.
U.K.: Psychology Press. Whitney, P., Ritchie, B. G., & Clark, M. B. (1991). Working-mem-
Omanson, R. C., Warren, W. M., & Trabasso, T. (1978). Goals, infer- ory capacity and the use of elaborative inferences in text compre-
ential comprehension and recall of stories by children. Discourse hension. Discourse Processes, 14, 133-145.
Processes, 1, 337-354.
Paris, S. G., & Lindauer, B. K. (1976). The role of inference in children’s
comprehension and memory for sentences. Cognitive Psychology, 8, (Manuscript received June 4, 1999;
217-227. revision accepted for publication May 2, 2001.)

