Tom Test

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14
At a glance
Powered by AI
The article describes the development of a new instrument called the TOM test to measure theory of mind abilities in children. It discusses studies that evaluated the reliability and validity of the TOM test in measuring different aspects of theory of mind.

The TOM test is a new instrument developed to assess theory of mind abilities in normal children and children with pervasive developmental disorders. It aims to measure various aspects of theory of mind such as emotion recognition, understanding of humor, and understanding of second-order beliefs.

Study 1 found the TOM test scores correlated with performance on other theory of mind tasks. Study 2 showed it has sufficient test-retest stability. Study 3 found good interrater reliability. Study 4 found children with PDDs scored lower than other groups and intelligence was positively associated with TOM scores.

Journal of Autism and Developmental Disorders. Vol. 29. No. 1.

1999

The TOM Test: A New Instrument For Assessing Theory of


Mind in Normal Children and Children with Pervasive
Developmental Disorders

Peter Muris,1,4 Pim Steerneman,2 Cor Meesters,3 Harald Merckelbach,1


Robert Horselenberg,' Tanja van den Hogen,3 and Lieke van Dongen3

This article describes a fust attempt to investigate the reliability and validity of the TOM test, a
new instrument for assessing theory of mind ability in normal children and children with pervasive
developmental disorders (PDDs). In Study 1, TOM test scores of normal children (n = 70) cor-
related positively with their performance on other theory of mind tasks. Furthermore, young chil-
dren only succeeded on TOM items that tap the basic domains of theory of mind (e.g., emotion
recognition), whereas older children also passed items that measure the more mature areas of theory
of mind (e.g., understanding of humor, understanding of second-order beliefs). Taken together, the
findings of Study 1 suggest that the TOM test is a valid measure. Study 2 showed for a separate
sample of normal children (n = 12) that the TOM test possesses sufficient test-retest stability.
Study 3 demonstrated for a sample of children with PDDs (n = 10) that the interrater reliability
of the TOM test is good. Study 4 found that children with PDDs (n = 20) had significantly lower
TOM test scores than children with other psychiatric disorders (e.g., children with Attention-deficit
Hyperactivity Disorder; n = 32), a finding that underlines the discriminant validity of the TOM
test. Furthermore, Study 4 showed that intelligence as indexed by the Wechsler Intelligence Scale
for Children was positively associated with TOM test scores. Finally, in all studies, the TOM test
was found to be reliable in terms of internal consistency. Altogether, results indicate that the TOM
test is a reliable and valid instrument that can be employed to measure various aspects of theory
of mind.

KEY WORDS: Theory of mind; pervasive developmental disorders; reliability.

INTRODUCTION interest. Research in this area is described under the gen-


eral heading "theory of mind." Premack and Woodruff
(1978) were the first to use the term to refer to the
Recently, children's understanding of their own and child's ability to ascribe thoughts, feelings, ideas, and
others' mental states has been the focus of considerable intentions to others and to employ this ability to antici-
pate the behavior of others. According to Wellman
1Department of Psychology, University of Limburg, P.O. Box 616.
6200 MD Maastricht, The Netherlands.
(1990), theory of mind is a prerequisite for the under-
2 South-Limburg Centre of Autism, c/o RIAGG-OZL, P.O. Box 165. standing of the social environment and for engaging in
6400 AD Heerlen, The Netherlands. socially competent behavior (see also Astington & Jen-
3 Department of Experimental Abnormal Psychology, University of kins, 1995).
Limburg. P.O. Box 616. 6200 MD Maastricht, The Netherlands. It has been proposed that autistic children are so-
4 Address all correspondence to Peter Muris. Department of Psychol-
ogy, University of Limburg, P.O. Box 616. 6200 MD Maastricht,
cially impaired precisely because they lack a theory of
The Netherlands. mind (Frith, 1989). In a series of studies, Baron-Cohen,

67
0162-3257/99/ 0200-0067$16.00/0 C 1999 Plenum Publishing Corporation
68 Muris et al.

Leslie, and Frith (1985, 1986) demonstrated that the children give an answer in this sense, they are shown
ability of autistic children to attribute mental states to that the box actually contains a pencil. Next, children
others is seriously impaired. These researchers found are told that another child will be asked what is in the
that about 80% of the autistic children were unable to box. They are then asked the crucial question: "What
correctly predict the ideas of others, whereas most men- do you think the other child will say?" From their an-
tally retarded and normal controls of lower mental age swer on this question, one can infer whether children are
were able to do so. able to make a judgment about another person's false
Specific programs have been developed to train the- expectation. That is, an understanding of another indi-
ory of mind skills in autistic children. For example, in vidual's false belief—and presence of theory of mind—
a study by Ozonoff and Miller (1995), five autistic chil- is demonstrated if children predict that another person
dren received a training program in which they were not will think that there are Smarties in the box. Conceptual
only taught specific interactional and conversational difficulty with false belief attribution—and absence of
skills but also received explicit and systematic instruc- theory of mind—is revealed if children assume that an-
tion regarding the underlying social-cognitive principles other person will think that there is a pencil in the box.
necessary to infer the mental states of others (i.e., theory Several authors have argued that theory of mind is
of mind). Pre- and posttreatment assessment demon- more than just the comprehension of false belief. For
strated that the trained children improved on a number example, Perner and Wimmer (1985) have described two
of false belief tasks compared to control children who other types of belief that play a crucial role in children's
had received no treatment. Similar positive results were understanding of social interactions: first-order beliefs
obtained by Swettenham (1996), Hadwin, Baron-Cohen, that refer to what children think about real events (e.g.,
Howlin, and Hill (1996), Bowler, Strom, and Urquhart "Michael thinks that Sophie is angry") and second-or-
(1993), and Whiten, Irving, and Macintyre (1993). All der beliefs that pertain to what children think about other
these studies were successful in that autistic children people's thoughts (e.g., "Michael thinks that Sophie
who had received training were able to pass theory of thinks that he's angry with her").
mind tasks. Furthermore, in a recent study of Steerne- Flavell, Miller, and Miller (1993) argue that chil-
man, Jackson, Pelzer, and Muris (1996), socially im- dren develop a theory of mind along five successive
mature (but not autistic) children were given a social stages. During the first stage, children adopt the concept
skills intervention program that incorporated theory of of mind, that is, they attribute needs, emotions, and other
mind principles. Results showed that this type of training mental states to people and use cognitive terms such as
produced positive effects on theory of mind tests. Yet, "know," "remember," and "think." During the second
it should be added that the treatment effects found in stage, children acknowledge that the mind has connec-
these studies do not always generalize to nonexperimen- tions to the physical world. More specifically, they un-
tal settings or to tasks in domains where children re- derstand that certain stimuli lead to certain mental states,
ceived no teaching (see, for a discussion of this issue, that these mental states lead to behavior, and that mental
Slaugther & Gopnik, 1996). states can be inferred from stimulus-behavior links. Dur-
Given the availability of reasonably successful ing the third stage, children recognize that the mind is
treatment programs, theory of mind assessment instru- separate from and differs from the physical world. For
ments are important for two reasons. First, such instru- example, they realize that a person can think about an
ments can be used to identify those children who display object even though the object is not physically present.
deficits in theory of mind. Second, such instruments can During the fourth stage, children learn that the mind can
be employed to evaluate the efficacy of theory of mind represent objects and events accurately or inaccurately.
training programs. Thus, a representation can be false with respect to a real
The assessment of theory of mind in children has object or event (e.g., in a false belief task), behavior can
been predominantly confined to so-called "false belief be false with respect to a mental state (e.g., when a sad
tasks. Such tasks intend to test children's comprehension person smiles), and two people's perceptual views or
of another person's wrong belief. An example is the so- beliefs can differ (i.e., perspective taking). During the
called Smarties test (e.g., Hogrefe, Wimrner, & Pemer, fifth and final stage, children learn to understand that the
1986). During this test, children are presented with a mind actively mediates the interpretation of reality. For
Smarties box and asked what it contains. Children are instance, children recognize that prior experiences affect
highly familiar with these boxes and know that they usu- current mental states which in turn affect emotions and
ally contain Smarties, a desirable chocolate candy. When social inferences. According to Flavell et al. (1993)
The TOM Test 69

Stages 1-3 can best be regarded as theory of mind pre- lidity of the TOM test. More specifically, its relationship
cursors. These authors assume that these stages "prob- with other, more traditional, indices of theory of mind
ably emerge in quick succession, for they are very and social development was examined.
closely related concepts having to do with the differen-
tiation of, and relations between, the mind and the ex- Materials and Method
ternal world" (p. 101). The step from Stage 3 to 4, the
Subjects and Procedure
emergence of a "real" theory of mind, probably comes
more slowly (around the age of 6); Stage 5, the "more Seventy children (46 boys and 24 girls) recruited
mature" theory of mind, would emerge still later. from a regular primary school ('De Driesprong' in Ge-
Taken together, theory of mind refers to the child's leen, the Netherlands) participated in the study. The chil-
capacity to analyze the behavior of others by recognizing dren ranged in age from 5 to 12 years. Ten children of
the mental states (i.e., desires and beliefs) that underlie each age level (i.e., 5, 6, 7, 8, 9, 10, and 11/12 years)
intentional behavior. Thus, theory of mind is a complex, were selected. All children were healthy, socially well-
developmental phenomenon, which implies certainly functioning, and none had learning difficulties. Thus, it
more than just the understanding of false belief. Obvi- can be assumed that they had normal intelligence. Chil-
ously, there is a need for assessment tools that measure dren were tested at school in a private room with only
the developmental progression of theory of mind in a the experimenter present. The assessment took place in
broader age range. One promising candidate in this re- two sessions. In one session, children underwent the
spect is the Theory-of-Mind test (TOM test) designed by TOM test. In another session, a series of alternative the-
Steerneman (1994). The TOM test contains a variety of ory of mind or social development tasks was adminis-
items that can be allocated to three subscales which cor- tered. The order of the sessions was counterbalanced
respond with the three main theory of mind stages as within each age level group (i.e. half of the children
proposed by Flavell et al. (1993): (a) precursors of the- started with the TOM test, while the other half first re-
ory of mind (e.g., emotion recognition), (b) first mani- ceived the alternative battery of tests).
festations of a real theory of mind (e.g., understanding
The New Theory of Mind Test
of false belief), and (c) mature aspects of theory of mind
(e.g., second-order beliefs). As a practical tool, the test The TOM test comprises an interview that can be
provides information about the extent to which a child used in children between 5 and 12 years of age. The
possesses social understanding, insight and sensibility, TOM test consists of vignettes, stories, and drawings
and the extent to which he or she takes the feelings and about which the child has to answer a number of ques-
thoughts of others into account. The present article is tions. The test lasts about 35 minutes and contains 78
concerned with the reliability and validity of the TOM items (i.e., questions). The TOM test contains three
test. subscales: (a) precursors of theory of mind (i.e., TOM
1; 29 items; e.g., recognition of emotions, pretense), (b)
first manifestations of a real theory of mind (i.e., TOM
STUDY 1 2; 33 items; e.g., first-order belief, understanding of false
belief), and (c) more advanced aspects of theory of mind
The purpose of Study 1 was twofold. First, the con- (i.e., TOM 3; 16 items; e.g., second-order belief, under-
struct validity of the TOM test was investigated. The standing of humor). In the Appendix, examples of items
TOM test intends to be a developmental scale. There- of the three subscales are shown. Each TOM test item
fore, it was anticipated that TOM test scores correlate is scored as either failed (0) or passed (1). Accordingly,
positively with age. That is, as children grow older, their total TOM scores range between 0 and 78, with higher
theory of mind develops, and hence they pass more scores indicating a more mature theory of mind. TOM
TOM test items. Furthermore, one expects that younger 1, TOM 2, and TOM 3 subscale scores vary between 0
children predominantly succeed on TOM items that tap and 29, 0 and 33, and 0 and 16, respectively.
the basic domains of theory of mind (e.g., emotion rec-
ognition), whereas older children should increasingly Alternative, More Traditional, Indices of Theory of
pass items that measure the more mature aspects of the- Mind and Social Development
ory of mind (e.g., understanding of false belief, under- A number of alternative indices of theory of mind
standing of humor, second-order belief). A second and social development were employed in the current
purpose of Study 1 was to evaluate the concurrent va- study.
70 Muris et al.

The Sally and Anne test (see Baron-Cohen et al., others, level 0); subjective role taking (i.e., the child rec-
1985) is a false belief task. It consists of a comic-strip ognizes his own point of view and that of others, level
story in which Sally and Anne are first introduced: Sally 1); self-reflective role taking (i.e., the child is able to
with a basket in front of her and Anne with a box. Next, adopt another person's perspective, level 2); and recip-
Sally is shown placing a ball in the basket and leaving rocal role taking (i.e., the child weights his perspective
the room. Anne is then shown taking the ball from the against that of others and finds a solution for the social
basket and placing it in the box. Following this, Sally dilemma, level 3).
returns and children are asked: "Where will Sally look The John and Mary test (Perner & Wimmer, 1985)
for her ball?" If the children point to the previous lo- assesses children's understanding of second-order be-
cation of the ball, they pass the task because they ac- liefs. The test is an acted story in which two characters
knowledge Sally's false belief (score = 1). If, however, (John and Mary) are independently informed about an
they point to the ball's current location, they fail the task object's (an ice cream van) unexpected transfer to a new
by not taking into account Sally's false belief (score = location. Hence both John and Mary know where the
0). van is but there is a mistake in John's second-order be-
The Smarties test (Hogrefe et al., 1986) was used lief about Mary's belief. "John thinks that Mary thinks
as an alternative false belief task (see Introduction). that the van is still at the old place." Children's under-
Scores on this test also vary between 0 (failed) and I standing of this second-order belief was tested by ask-
(passed). ing: 'Where does John think Mary will go for the ice
Two tests of emotion recognition (Spence, 1980), cream?' Scores on this test are either 0 (failed) or 1
the "Test of perception of emotion from facial expres- (passed).
sion" and the "Test of perception of emotion from pos-
ture cues" were administered. Children were asked to
identify four basic emotions (happiness, fear, anger, and RESULTS AND DISCUSSION
sadness) on pictures showing facial expressions or bod-
ily postures. Scores on each test range between 0 and 4. General Results
The Social Interpretation Test (SIT; Vijftigschild,
Berger, & Spaendonck, 1969) examines the child's abil- Reliability of the TOM Test
ity to interpret social situations adequately. The test con-
sists of a colored picture depicting a street in which a The internal consistency of the TOM test was sat-
number of events take place. The child has to answer 9 isfactory, that is, Cronbach's alphas were .92 for the
questions about the picture (e.g., 'What has happened total TOM-scale, .84 for TOM 1, .86 for TOM 2, and
here?', 'Why is the ambulance driving in the street?'). .85 for TOM 3.
The answers are registered, and classified into 24 cate-
gories. For each category, 1 point is given. SIT test
Age and Theory of Mind
scores range between 0 and 24 with higher scores re-
flecting greater ability to interpret social situations. Table 1 (right column) presents Pearson product-
The Picture Arrangement subtest of the Wechsler moment and point-biserial correlations between age, on
Intelligence Scale for Children-Revised (WISC-R; the one hand, and theory of mind measures, on the other
Wechsler. 1974) was used as a measure of social sen- hand. As can be seen from this table, except for the
sibility. This subtest asks children to order 12 series of Smarties test, all measures were positively and signifi-
4 pictures in such a way that each series of pictures cantly associated with age. The absence of a connection
depicts a sensible story (range 0-12). between age and Smarties test performance is due to the
The Role Taking test (Selman & Byrne. 1974) taps fact that nearly all children in the present study, even
role taking skills of children. The test comprises a story the 5- to 6-year-olds, passed mis test.
of a social dilemma (a young girl has to save a little cat As expected, there was a robust correlation between
from a high tree, although she has just promised her TOM test and age: r(70) = .80, p < .001. Inspection of
father not to climb in trees anymore). Children are ques- mean TOM scores per age level (see Table 1) showed
tioned about this story. From their answers on these that theory of mind capability increased linearly as chil-
questions, one can derive the level of role taking: ego- dren grew older. This indicates that the TOM test has
centric role taking (i.e.. the child is not able to differ- one crucial property of a developmental scale, namely,
entiate between his/her own point of view and that of it is sensitive to maturation. With respect to this result,
The TOM Test 71

Table I. Mean Scores of Children on Theory of Mind and Social Development Measures for Different Age
Levels, and Pearson Product-Moment and Point-Biserial Correlations Between Age and Various Measures

Age (in years)


5-6 7-8 9-10 11-12
Measure M SD M SD M SD M SD r with age
TOM test 42.5 7.4 59.3 6.9 63.9 5.2 68.1 4.8 .80°
Emotion recognition-face 3.1 0.9 3.4 0.7 3.9 0.3 3.9 0.3 .50"
Emotion recognition-posture 2.4 1.1 2.7 1.2 3.4 0.9 3.7 0.7 .46°
Sally and Anne test 0.4 0.5 0.7 0.5 0.9 0.2 0.8 0.4 .48°
Smarties test 0.9 0.3 0.9 0.2 1.0 0.0 1.0 0.0 .25
Social Interpretation test 7.2 3.0 8.8 2.6 13.5 2.8 14.7 2.4 .74°
WISC-R picture arrangement 3.2 3.0 8.3 2.0 9.7 1.6 9.4 1.2 .72°
Role taking test 0.5 0.6 1.6 0.8 2.0 0.6 2.3 0.7 .73°
John and Mary test 0.4 0.5 0.9 0.3 0.9 0.3 0.9 0.3 .44°

"p < .05/9 (i.e., Bonferroni correction).

two further remarks are in order. To begin with, it should age level (i.e., 5, 6, 7, 8, 9, 10, and 11/12 years) success
be noted that the most pronounced increase in theory of percentages of the three TOM subscales were calculated
mind took place between ages 6 and 7. This is in line (i.e., number of passed items on a subscale divided by
with the findings of previous studies showing that chil- the total number of items of that subscale). Figure 1
dren of that age display marked improvement in their shows mean success percentages on the three TOM
performance on more complicated theory of mind tasks subscales for the various age levels. A 3 (Subscales) X
(e.g., Perner & Wimmer, 1985). Second, the TOM test 7 (Age Levels) multivariate analysis of variance per-
also proved suitable to index differential development of formed on these data revealed a significant effect of age,
theory of mind in older age groups (i.e., in 9-10- and F(6, 63) = 32.1, p < .001, indicating that TOM test
11-12-year-old children). Note that a number of the al- performance improves with age. Furthermore, a signifi-
ternative tasks tap an aspect of theory of mind that most cant effect of subscale, Fhot(2, 62) = 133.2, p < .001,
normal children master at a relatively early age. For ex- emerged due to the fact that success percentages on
ample, from age 7 onwards about 90% of the children TOM 1 (i.e., precursors of theory of mind) were higher
successfully pass the John and Mary test, whereas from than those on TOM 3 (i.e., mature theory of mind),
age 8 onwards most children recognize the four basic whereas success percentages of TOM 2 (i.e., first man-
emotions from facial expression (see Table I). This in- ifestations of a real theory of mind) were in between.
dicates that these tests are less sensitive to index differ- Finally, the interaction of subscale with age also reached
ential development of theory of mind in older age significance, Fhot(l2, 122) = 2.3, p < .05. As can be
groups. seen, 7-year-old children succeeded on the vast majority
of TOM 1 and TOM 2 items (>80%), indicating that
most of these children have passed the first two stages
Construct Validity of the TOM Test of theory of mind development. Note also that the mean
success percentage on TOM 3 items in 5-year-old chil-
As the TOM test intends to measure three succes- dren was only 23.8%, whereas in 11- to 12-year-old chil-
sive developmental stages of children's theory of mind dren a success percentage of more than 80% is reached.
(i.e., precursors of theory of mind, first manifestations Thus, as expected, children acquire advanced aspects of
of a real theory of mind, mature theory of mind), one theory of mind at a relatively later age (i.e., after they
would expect that young children predominantly succeed have learned the more basic principles of theory of
on items that index the precursors of theory of mind, mind).
while at the same time they fail to pass items that mea-
sure the more mature aspects of theory of mind. For Concurrent Validity of the TOM Test
older ages, one would predict that an increasing number
of children succeed on items that tap the more advanced The relationships between TOM test and alternative
areas of theory of mind. To examine this issue, for each indices of theory of mind were studied by means of
72 Muris et al.

Fig. 1. Mean success percentages on the three TOM subscales calculated per age level

Table II. Pearson Product-Moment and Point-Biserial Correlations Between TOM Test and Alternative
Theory of Mind and Social Development Measures

Variable TOM 1 2 3 4 5 6 7 TOMa


1. Emotion recognition-face .55b .34
2 Emotion recognition-posture .46b —
.27 — .30
3. Sally and Anne test .50b .42* .30 — .17
4. Smarties test .37b .45* .30 .16 — .29
5, Social Interpretation Test .61b .38* .48b ,29 .10 .22

6, WISC-R picture arrangement .77b .45" .44b .49b .27 .55b — .30
f.
Role taking test .75b .55" .40" .40b .27 .57b .63* — .40
8. John and Mary test .55" .44b .23 .45b .20 .29 .54b .54b .18

- To control for age effects. Pearson and point-biserial correlations were computed for each age level and then
averaged. Mean correlations thus obtained are shown in this column.
p < .05/36 (i.e.. Bonferroni correction).

Pearson product-moment correlations. In cases where di- correlations). However, by selecting 10 children of each
chotomous variables were involved, point-biserial cor- age level, the design of Study 1 capitalized on the de-
relations were used. As can be seen in Table II, most velopmental progression of theory of mind. Thus, con-
theory of mind indices are significantly correlated with trolling for age would imply the elimination of an
each other. intrinsically important factor in both TOM and alterna-
At first sight, it seems appropriate to compute cor- tive tests (i.e., the developmental progression of theory
relations between TOM test and alternative indices of of mind). To circumvent this problem, Pearson and
theory of mind while controlling for age (i.e., partial point-biserial correlations between TOM test and con-
The TOM Test 73

current measures were computed for each age level Table III. Demographic Variables of Normal Children in Study 2,
separately. The mean of these separate correlations are and Their Total TOM Test Scores on Both Occasions

presented in the right column of Table II. As can be TOM test scores
seen, correlations attenuated considerably. Nevertheless, (8 weeks apart)
the TOM test was still positively associated with con- Child Sex Age Occasion 1 Occasion 2
current theory of mind indices. This result suggests that,
1 M 5 40 41
as intended, the TOM test covers a broad range of theory
2 M 6 46 48
of mind aspects. 3 M 6 46 54
4 F 7 41 45
5 M 8 56 56
STUDY 2 6 F 8 62 67
7 M 9 62 65
8 M 9 63 68
Study 2 intended to investigate another aspect of 9 M 10 66 67
the reliability of the TOM test, namely, its test-retest 10 F 11 65 71
stability. To examine this issue, 12 normal primary 11 M 11 73 74
school children were tested twice with the TOM test, 8 12 F 12 71 77
weeks apart. M 60.5 64.4
SD 10.7 10.4

Method

Subjects and Procedure ficients were .99 (p < .001) for the total score, .80 (p
< .005) for TOM 1, .98 (p < .001) for TOM 2, and .91
Twelve children (8 boys and 4 girls) varying in age (p < .001) for TOM 3. These results indicate that the
between 5 and 12 years from a regular primary school TOM test has sufficient test-retest stability and that the
(De Pater van de Geld in Waalwijk, the Netherlands) test can be used to measure children's development or
participated in the study. AH children were healthy, nor- improvement in theory of mind capability.
mal-functioning children. Children were interviewed
with the TOM test twice, 8 weeks apart. Both interviews
were conducted by the same experimenter in a separate
STUDY 3
room at school.
The results presented so far suggest that the TOM
Results and Discussion test can be used as a measure of the efficacy of theory
of mind training programs in children with pervasive
Internal Consistency developmental disorders (PDDs). Yet, as the TOM test
is based on an interview with the child, data about the
Internal consistency of the TOM test appeared to
interrater reliability are needed. Study 3 addressed this
be sufficient: Cronbach's alphas were .95 for the total
issue. Ten children with PDDs were tested with the
score, .62 for TOM 1, .94 for TOM 2, and .77 for TOM
TOM test. Two independent observers classified the re-
3. actions of the children to each TOM test item as either
failed or passed.
Test-Retest Reliability
Table III shows demographic variables (age and Method
sex) of the children as well as their total TOM test scores
on both occasions. As can be seen. TOM test scores Subjects and Procedure
increased with age; the Pearson correlation was .88 (p
< .001). Note further that most children slightly im- Ten children (10 boys) with PDDs were randomly
proved their score on Occasion 2. A paired t test showed selected for the purpose of the present study. Age of the
that this improvement was significant. t(l 1) = 5.4. p < children ranged between 7 and 13 years. All children
.01. Most important, test-retest reliability for the TOM were treated in one of the AUTI-groups of the Pediatric
test was satisfactory; intraclass correlation (ICC) coef- Center Overbunde, Maastricht, The Netherlands. After
74 Muris et al.

Table IV. Demographic Characteristics of 10 Boys and TOM Test Scores as Obtained by both Observers

TOM test score


Child Age (years; months) DSM-III-R diagnosis- IQb Observer 1 Observer 2 Kappac
1 13:3 PDDNOS 92 75 75 1.00
2 12:9 PDDNOS 93 70 70 1.00
3 10:11 AD 82 44 4S 0.87
4 7;6 AD 86 32 33 0.98
5 8:1 PDDNOS 93 61 59 0.97
6 11:2 PDDNOS 119 71 71 1.00
7 10;8 PDDNOS 92 60 59 0,96
8 12;3 PDDNOS 97 69 68 0.90
9 6.9 PDDNOS 96 35 33 0.90
10 7:10 PDDNOS 92 40 38 0.95

a PDDNOS = pervasive developmental disorder not otherwise specified; AD = autistic disorder.


b As indexed by the WISC-R,
c Interrater reliability (Cohen's kappa).

extensive psychodiagnostic and psychiatric screening, affected by the level of theory of mind development of
the children were assigned a diagnosis of Autistic Dis- each child. As can be seen in the right panel of Table
order or Pervasive Developmental Disorder Not Other- IV, the kappa values were high (i.e., all exceeded .87).
wise Specified (PDDNOS). The children fulfilled the Furthermore, both observers produced a highly similar
relevant DSM-III-R criteria (American Psychiatric As- rank order of the children with regard to theory of mind;
sociation, 1987). Diagnoses were made by a specialized, Spearman rank correlation was .99, p < .001.
multidisciplinary team of professionals of the Center of Altogether, the results of Study 3 indicate that the
Autism South-Limburg. The main demographic charac- interrater reliability of the TOM test is good.
teristics of the children are shown in Table IV.
Children were tested in a silent room with two ex-
perimenters present. Five children were tested by Ex- STUDY 4
perimenter 1, while Experimenter 2 observed from a
distance. For the other five children. Experimenter 2 ad- Study 4 examined the discriminant validity of the
ministered the TOM test, while Experimenter 1 ob- TOM test. Various studies have concluded that a sub-
served. Both experimenters monitored the responses and stantial proportion of the children with PDDs exhibit def-
reactions of the children on-line. They were not able to icits in theory of mind. In most of these studies, theory
observe each other's registrations. of mind deficits have been demonstrated by means of
false belief tasks (Baron-Cohen et al., 1985; Eisenmajer
Results and Discussion & Prior, 1991; Leslie & Frith, 1988; Perner, Frith, Les-
lie, & Leekam, 1989; Prior, Dahlstrom, & Squires,
Internal Consistency 1990). To investigate whether the TOM test is able to
detect this specific deficit in children with PDDs, Study
Internal consistency of the TOM test was good; 4 compared TOM test scores of children with autism and
Cronbach's alphas were .98 for the total score, .95 for PDDNOS with those of children who suffered from
TOM 1, .97 for TOM 2, and .95 for TOM 3. other psychiatric disorders (i.e., Attention-defi-
cit/Hyperactivity Disorder, Anxiety Disorder).
Interrater Reliability There is evidence to suggest that intelligence is a
moderator variable in performance on theory of mind
Interrater reliability of the TOM test was examined tests (see, for a review, Happe, 1995), For example,
by computing Cohen's kappa using scores of both ob- Happe (1994) investigated the WISC-R scores of autistic
servers for the 78 items of the test. Kappas were cal- children who either passed or failed a false belief task.
culated for each child separately because this makes it Her results showed that passers had significantly higher
is possible to evaluate whether interrater reliability is IQ scores than failers. Most researchers in this domain
The TOM Test 75

Table V. Demographic Characteristics and Mean TOM Test Scores for Children with Attention-deficit/Hyperactivity Disorder (ADHD), Children
with an Anxiety Disorder (AnxD), and Children with a Pervasive Developmental Disorder (PDD)
ADHD children AnxD children PDD children
Variablea (n = 14) (n = 18) (n = 20) F or X2 P Post hoc comparisons
Age 8.5 (0.9) 9.1 (1.9) 9.3 (2.4) 0.7 ns
Sex (m/f) 12/2 11/7 17/3 3.8 ns
TIQ 86.9(7.1) 93.6 (12.7) 85.4 (12.9) 2.6 <10 PDD<AnxD
VIQ 91.6 (12.0) 90.5(11.9) 84.3 (16.1) 1.5 ns
PIQ 83.4 (9.1) 97.4 (14.3) 86.6 (10.9) 6.6 <.005 PDD<AnxD; ADHD<AnxD
TOM 61.1 (8.4) 58.9 (9.9) 39.1 (24.9) 9.2 <.00l PDD<AnxD; PDD<ADHD
TOM1 23.5 (3.2) 23.1 (3.1) 16.9 (8.6) 7.2 <.005 PDD<AnxD; PDD<ADHD
TOM 2 27.5 (3.8) 26.7 (4.5) 16.8(11.3) 10.9 <.001 PDD<AnxD; PDD<ADHD
TOM 3 9.5 (22) 8.5 (3.2) 4.9 (5.4) 6.4 <.005 PDD<AnxD; PDD<ADHD
' m = male; f = female; TOM = TOM total score: TOM 1 = precursors of theory of mind; TOM 2 = first manifestations of the 'real' theory of
mind; TOM 3 = mature theory of mind. Levels of intelligence were measured with the WISC-R.

assume that it is especially verbal IQ that plays a role PDDNOS) also participated in Study 4. These children
in the performance on false belief tasks (Happe, 1995). were chosen randomly from the database of the Center
This may be relevant for the TOM test, as this test is of Autism South-Limburg (see Study 3) and then inter-
essentially an interview instrument. Thus, it may well be viewed with the TOM test. WISC-R scores of the PDD
the case that children's scores on this test are critically children were also available. Demographic characteris-
dependent on their verbal ability (i.e., language compre- tics (i.e., age, sex distribution, and IQ scores) of the three
hension and/or expression ability). To examine this is- groups are shown in the upper part of Table V.
sue, WISC-R scores of the children in Study 4 were also
obtained.
Results and Discussion

Method Internal Consistency


As in the previous studies, the internal consistency
Subjects and Procedure of the TOM test was satisfactory; Cronbach's alphas of
The subjects of Study 4 consisted of three groups: the total scale and the various TOM subscales varied
a group of anxiety-disordered children, a group of chil- between .87 and .96 for the total group, .95 and .98 for
dren with Attention-deficit/Hyperactivity Disorder the children with PDD, and .72 and .80 for psychiatric
(ADHD), and a group of children with pervasive devel- control children.
opmental disorders.
From the database (1996) of the children and youth Discriminant Validity
section of the Community Mental Health Center, Eastern
South-Limburg in Heerlen, The Netherlands, all children The lower part of Table V shows mean TOM test
suffering from ADHD (n = 14) or an anxiety disorder scores for the three groups. Analyses of variance fol-
(AnxD, i.e., obsessive-compulsive disorder, overanxious lowed up by post-hoc t tests revealed that children with
disorder, specific phobia, posttraumatic stress disorder, PDD had significant lower TOM test scores than chil-
and separation anxiety disorder; n = 18) were selected. dren with ADHD and AnxD.
Children were classified on the basis of the DSM-III-R For this sample, the Pearson product-moment cor-
after extensive psychodiagnostic and psychiatric screen- relation between TOM test and age was only .24 (p <
ing. As part of the intake procedure, all children com- .10). Correlations between TOM test scores, on the one
pleted the TOM test and the revised version of the hand, and Total IQ, Verbal IQ, and Performance IQ, on
Wechsler Intelligence Scale for Children (WISC-R; the other hand, however, were all positive and signifi-
Wechsler, 1974). cant; r(52)s were .58 (p < .001), .61 (p < .001), and
Twenty high-functioning children with PDDs (i.e., .45 (p < .001), respectively. Thus, children with higher
8 children with Autistic Disorder and 12 children with intelligence scores performed better on the TOM test.
76 Muris et al.

To examine the unique contribution of the diag- The current study was a first attempt to investigate
nosis Pervasive Developmental Disorder to TOM test the reliability and validity of the TOM test. The main
performance, two additional analyses were performed. results can be summarized as follows. To begin with,
First of all, a multiple regression analysis (forward step- the TOM test was found to be a reliable instrument;
wise) was earned out with Diagnosis Autism, Diagnosis internal consistency was good, test-retest reliability was
PDDNOS (both dummy variables), Verbal IQ, Perform- sufficient, and interrater reliability was high. Second,
ance IQ, and Age as the predictors, and TOM test scores TOM test scores increased with age, indicating that the
being the dependent variable. Results showed that Di- test is sensitive to developmental progression. In line
agnosis Autism entered on the first step r(52) = -.69, with this, young children only succeeded on TOM items
p < 0.001; accounting for 47.6% of the TOM test that tap basic domains of theory of mind, whereas older
scores. Verbal IQ (partial r = .32, p < .01), Age (partial children also passed items that measure the more ad-
r = .24, p < 0.05), and Diagnosis PDDNOS (partial r vanced areas of theory of mind. Third, evidence was
= -0,23, p < .05) entered on the second, third, and obtained that supports the concurrent validity of the
fourth step of the regression equation, accounting for TOM test. That is, TOM test scores correlated positively
significant proportions of the variance (10.2, 5.8, and and significantly with the performance on several other
4.4%, respectively). Second, an additional multiple re- theory of mind tasks (i.e., tests of emotion recognition,
gression analysis was performed while forcing Verbal understanding of false and second-order beliefs, and role
IQ, Performance IQ, and Age in the equation at Step 1. taking). Fourth and finally, children with a PDD per-
Still, both Diagnosis Autism and Diagnosis PDDNOS formed worse on the test than children with other psy-
contributed significantly to TOM test scores: partial rs chiatric disorders. This suggests that the TOM test
being -.45 (p < .001) and -.22 (p < 0.05). Thus, even possesses discriminant validity.
when controlling for IQ level and age, diagnoses still The TOM test can be used in three ways. First, the
predicted TOM test performance; the more severe chil- test can be employed to screen children for deficits in
dren's pervasive developmental disorder, the worse they theory of mind. There is some evidence to suggest that
performed on the TOM test. a poorly developed theory of mind can have negative
Altogether, the results of Study 4 support the dis- social-emotional consequences, even in normal children
criminant validity of the TOM test in that children with (Lalonde & Chandler, 1995). Consequently, an instru-
a PDD performed worse on the test than children with ment that measures the maturity of children's theory of
other psychiatric disorders. Furthermore, the findings in- mind at different age levels is important. Second, be-
dicate that this difference in TOM test performance is cause the TOM test is informative about the develop-
not carried by differences in intelligence. Even when mental phase of children's theory of mind, it enables
controlling for intelligence, a significant and negative clinicians to tailor their intervention to specific problems
association between diagnoses of autism and PDDNOS, of each child. For example, when the TOM test indicates
on the one hand, and TOM test performance, on the that a child even fails on items that measure precursors
other hand, emerged. of theory of mind, it would be futile to teach this child
understanding of false beliefs. Third, the TOM test can
GENERAL DISCUSSION be used to evaluate the efficacy of theory of mind train-
ing programs.
Theory of mind pertains to children's capacity to Altogether, the present findings imply that the TOM
analyze the behavior of others by recognizing mental test is a reliable and valid instrument that can be em-
states (i.e., desires and beliefs) that underlie intentional ployed to screen the development of theory of mind in
and social behavior. Clearly, then, theory of mind con- 5- to 12-years-old normal children, children with per-
sists of various aspects, such as the recognition of emo- vasive developmental disorders, and other socially im-
tions, the assessment of how others think, and the mature children.
understanding of the motives underlying behavior of
others. The TOM test has been construed to measure this
broad range of aspects from a developmental perspec- APPENDIX
tive. The test intends to tap three successive stages in
the development of theory of mind: precursors of theory Examples of TOM Test Items
of mind, first manifestations of a real theory of mind, Each question represents a TOM test item which is
and more advanced aspects of theory of mind. scored as either failed (0) or passed (1). The subscale to
The TOM Test
77

Fig. Al. Picture of Example 1. Fig. A2. Picture of Example 3.

which each item belongs is mentioned between paren- Story: Pirn is one year old. He's at home, playing on the
theses. ground Mother has given him a piece of apple. Sud-
denly, Pim bites his lip and he starts to cry. He throws
the piece of apple on the ground. Mother lifts Pim up,
Example 1 comforts him, and puts the piece of apple on the table.
Instruction: Take a look at this picture. When father arrives at home, mother is on the phone.
Question 1: What has happened? Can you tell something Father lifts Pim up and hugs him. Then he puts Pim back
about it? (TOM 1) on the ground, and gives him the piece of apple which
Question 2: Who in this picture is afraid? (TOM I) is still lying on the table. As soon as Pim sees the piece
Question 3: Why is this person afraid? (TOM 2) of apple, he starts to cry.
Question 4: Who in this picture is happy? (TOM 1) Question 1: Why is Pim crying when father gives him
Question 5: Why is this person happy? (TOM 2) the piece of apple? (TOM 1)
Question 6: Who in this picture is sad? (TOM 1) Question 2: Does father know why Pim is crying? (TOM
Question 7: Why is this person sad? (TOM 2) 2)
Question 8: Who in this picture is angry? (TOM 1) Question 3: Does father know that Pim has bitten his lip
Question 9: Why is this person angry? (TOM 2) when he wanted to eat the apple? (TOM 2)

Example 3
Example 2 Instruction: Take a look at this picture.
Instruction: I will read you a short story. Listen care- Question 1: What, do you think, is happening in this
fully. picture? (TOM 1)
78 Muris et al.

Example 5
Instruction: Take a look at this picture.
Story: This is Ben. Ben wants to play with his bricks.
Question 1: Which box will Ben open to play with his
bricks? (TOM 1)
Story: Ben opens the box of bricks, and surprisingly he
finds out that it is filled with washing powder! He closes
the box, and opens the other smaller box. There are his
bricks! He takes out some bricks, and goes playing with
them in his bedroom. Then his brother Tim is entering
the room. Tim also wants to play with the bricks...
Question 2: Which box will Tim open to play with his
bricks? (TOM 2)
Question 3: Do you know where the bricks really are?
(TOM 2)

Example 6
Instruction: I will read you a short story. Listen care-
fully.
Story: Father and mother are at a birthday party. They
only know a few people, and think the music is too loud.
"Wow," says father, "It's a pleasure to be here!"
Question 1: What does father mean? (TOM 3)
Question 2: Why does father say: "It's a pleasure to be
here!" (TOM 3)
Fig. A3. Picture of Example 4.
Example 7

Story: The two boys in the foreground gossip about the Question: Do as if you comb your hair. (TOM 1)
other boy. Suddenly, that boy approaches them and Question: Do as if you brush your teeth (TOM 1)
hears what they are saying. The two boys are startled. Question: Do as if you are feeling cold, (TOM 1)
Question 1: How does this boy feel? (point at the boy Question: How can I see that you are feeling cold?
in the background) (TOM 1) (TOM 2)
Question 2: How does this boy feel? (point at one of the Question: Do as if you have a nasty drink. (TOM 1)
boys in the foreground) (TOM 1) Question: How can I see that your drink is nasty? (TOM
2)
Question: Do as. if you are scared? (TOM 1)
Question: How can I see that you are scared? (TOM 2)
Example 4

instruction: Take a look at this picture.


Example 8
Question 1: What has happened in this picture? (TOM
1) Instruction: Take a look at this picture.
Question 2: How do you feel when you hurt yourself? Story: This is John. John often dreams. Sometimes he
(TOM 1) dreams about a new bike that he likes to have.
Question 3: Can you see from the girl's face how she Question 1: Is John able to touch the bike that he dreams
really feels? (TOM 2) about? (TOM 1)
Question 4: Is it possible to look happy, when you have Story: Sometimes John has a frightening dream. Then
hurt yourself? (TOM 2) he dreams about shadows.
The TOM Test 79

Fig. A4. Pictures of Example 5.

Question 2: Does John really see these shadows with his REFERENCES
eyes? (TOM 1)
Question 3: Can somebody else see the shadows or the Astington. J. W., & Jenkins, J. M. (1995). Theory-of-mind develop-
ment and social understanding. Cognition and Emotion, 9, 151-
bike of John's dreams? (TOM 1) 165.
American Psychiatric Association. (1987). Diagnostic and statistical
manual of mental disorders (3rd ed., Rev.), Washington, DC:
Author.
Example 9 Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic
child have a 'theory of mind'? Cognition, 2J, 37-46.
Baron-Cohen, S., Leslie, A. M., & Frith. U. (1986). Mechanical, be-
Instruction: I will read you a short story. Listen care- havioral and intentional understanding of picture stories in autistic
fully. children. British Journal of Developmental Psychology, 4, 113-
Story: It is summer. Will and Mike have their holidays. 125.
Bowler, D. M., Strom, E., & Urquhart, L. (1993). Elicitation of first-
They go out for a bicycle ride. Suddenly, there is a order "theory of mind" in children with autism. Paper presented
downpour and they have to shelter in a bus station. at the SRCD Conference, New Orleans, LA.
There are two men in the bus station who also shelter Eisenmajer, R., & Prior. M. (1991). Cognitive linguistic correlates of
"theory of mind" ability in autistic children. British Journal of
from the rain. One of the men remarks: "Wow, we have Developmental Psychology. 9, 351-364.
nice weather today!" Flavell, J. H., Miller, P. H., & Miller, S. (1993). Cognitive develop-
Question 1: What does the man mean? (TOM 3) ment. Englewood Cliffs, NJ: Prentice-Hall.
Frith. U. (1989). Autism; Explaining the enigma. Oxford: Blackwell.
Question 2: Is it true what the man says? (TOM 3) Hadwin, J., Baron-Cohen, S., Howlin, P., & Hill, K. (1996). Can we
Question 3: Why does the man say: "Wow, we have teach children with autism to understand emotions, belief, or pre-
nice weather today!" (TOM 3) tence? Development and Psychopathology, S. 345-365.
80 Muris et al.

Lalonde, C. E., & Chandler, M. J. (1995). False belief understanding


goes to school: On the social-emotional consequences of coming
early or late to a first theory of mind. Cognition and Emotion, 9,
167-185.
Leslie. A. M., & Frith, U. (1988). Autistic children's understanding of
seeing, knowing and believing. British Journal of Developmental
Psychology, 6, 315-324.
Ozonoff, S., & Miller, J. N. (1995). Teaching theory of mind: A new
approach to social skills training for individuals with autism.
Journal of Autism and Developmental Disorders, 25, 415-433.
Perner, J., Frith, U., Leslie, A. M., & Leekam, S. (1989). Exploration
of the autistic child's theory of mind: Knowledge, belief and com-
munication. Child Development, 60, 689-700.
Perner, J., & Wimmer, H. (1985). 'John thinks that Mary thinks that..,'
Attribution of second-order beliefs by 5-10 years old children.
Journal of Experimental Child Psychology, 39, 437-471.
Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a
theory of mind? Behavioural and Brain Sciences, 4, 515-526.
Prior, M., Dahlstrom, B., & Squires, T. (1990). Autistic children's
knowledge of thinking and feeling states in other people. Journal
of Child Psychology and Psychiatry, 31, 587-601.
Selman, R. L., & Byrne, D. F. (1974). A structural-developmental
analysis of levels of role taking in middle childhood. Child De-
velopment, 45, 803-806.
Slaugther, V., & Gopnik, A. (1996). Conceptual coherence in the
child's theory of mind: Training children to understand belief.
Child Development, 67, 2967-2988.
Spence. S. (1980). Social skills training with children and adolescents.
A counselor's manual. Windsor: NFER/Nelson.
Steerneman, P. (1994). Theory-of-mind screening-schaal fTlieory-of-
mind screening-scale]. Leuven/Apeldoorn: Garant.
Steerneman, P., Jackson. S., Pelzer, H., & Muris, P. (1996). Children
with social handicaps: An intervention program using a theory-
of-mind approach. Clinical Child Psychology and Psychiatry, I,
251-263.
Swettenham, J. (1996). Can children with autism be taught to under-
stand false belief using computers? Journal of Child Psychology
Fig. A5. Picture of Example 8. and Psychiatry, 37, 157-165.
Vijtigschild, W., Berger, H. J. C., & van Spaendonck, J. A. S. (1969).
Sociale Interpretatie Test [Social Interpretation Test]. Amster-
Happe. F. (1994). Wechsler IQ profile and theory of mind in autism: dam: Swets & Zeitlinger.
A research note. Journal of Child Psychology and Psychiatry, 35, Wechsler, D. (1974). Wechsler Intelligence Scale for Children (Rev.).
1461-1471. New York: Psychological Corp.
Happe. F. (1995). The role of age and verbal ability in the theory-of- Wellman, H. (1990). The child's theory of mind. Cambridge. MA: MIT
mind task performance of subjects with autism. Child Develop- Press.
ment, 66. 567-582. Whiten, A., Irving, K., & Macintyre, K. (1993). Can three-year-olds
Hogrefe. G. J., Wimmer. H., & Perner, J. (1986). Ignorance versus and people with autism team to predict the consequences of false
false belief: A developmental lag in attribution of epistemic belief. Paper presented at the British Psychological Society De-
stales. Child Development. 57. 567-582. velopmental Section Annual Conference, Birmingham, UK.

You might also like