Experimental Syntax
Experimental Syntax
Experimental Syntax
jyitax
Experimenta
Syntax
to Sentence Judgments
Wayne Cowart
I SAGE Publications
I International Educational and Professional Publisher
Thousand Oaks London New Delhi
Copyright © 1997 by Sage Publications, Inc.
All rights reserved. No part of this book may be reproduced or utilized in any form or by
any means, electronic or mechanical, including photocopying, recording, or by any
information storage and retrieval system, without permission in writing from the pubHsher.
E-mail: order@sagepub.com
1 Oliver's Yard
55 City Road
Post Box 4 1 0 9
Cowart, Wayne.
Experimental syntax: Applying objective methods to sentence
judgments / author, Wayne Cowart.
p. cm.
alk. paper)
1. Grammar, Comparative and general—Syntax. 2. Linguistics—
P291.C68 1996
415—dc20 96-35620
97 98 99 00 01 02 03 10 9 8 7 6 5 4 3 2 1
Preface viii
of Acceptability 7
1.3. Evidence of Stability Within Populations 12
7. Sampling 79
7.1. Representativeness 79
7.2. Linguist Informants 81
7.3. Sample Size 81
7.4. Comparing Groups 84
Questionnaires 89
9.1. General Instructions and Demographic Data 90
Experiment 163
References 175
Acknowledgments
The project out of which this work arises has been in develop
ment for some time and has benefited from comments and sugges
Preface xi
at CLS '89, ESCOL '90, LSA '90, LSA '96, University of Massachu
setts/Amherst, Haskins Laboratories, City University of New York,
and the University of New Hampshire.
Alttiough I have benefited from the assistance, advice, and/or
cooperation of all of these colleagues, they bear no responsibility for
such foolishness as may remain in this text. All defects in the book
are solely due to the tiny green bugs that seemed to rain down on the
manuscript whenever I worked on it in my backyard.
The experimental work reported here was supported in part by
grants from the National Science Foimdation, SBR-9422688 (awarded
to the author and Dana McDaiuel), and the National Institutes of
Health, R01NS22606.
Finally, and mostly, I am grateful to Judy and Cayce for repay
ing my neglect with much more patience and affection than I think
I'm entitled to.
Note
1. See Popper ( 1 9 5 9 / 1 9 3 5 , pp. 44-48) for some comments or\ the role of inter-
subjective agreement in establishing scientific observations.
http:/www.usm.maine.edu/~lin
1
Introduction: Are
Judgments Stable?
1
2 EXPERIMENTAL SYNTAX
a form. What changed in the behaviorist era had to do more with the
theory of what consciousness is (or isn't) than with a thoroughgoing
rejection of introspective method.
Doubts about the stabiHty of judgment data began to appear in
the linguistic literature very soon after the emergence of generative
linguistics. As early as 1961, Hill reported on a very informal study
of several nonnaive informants in which he found substantial dis
agreement about the status of several of the 10 sentences he examined
(HUl, 1961). In the early 1970s, Garden (1970,1973) found evidence
of dialect differences among naive informants as to how negatives
and quantifiers interact in certain cirounstances. Other participants
in the subsequent discussion (Heringer, 1970; Labov, 1972; Stokes,
1974, among others) also fovmd evidence of substantial disagree
ments among informants.^ Labov (1975) called attention to differ
ences in experimentally derived judgments reported in these and
related papers. Snow and Meijer (1977) collected judgment data from
Dutch informants on 24 sentences via both a rating and a ranking
procedure. Although they found substantial agreement among in
formants on sentence rankings, there was no significant correlation
between the ratings and rankings obtained from individual infor
mants (provided at separate sessions 1 week apart). Snow and Meijer
were primarily impressed by the evidence of instability they foimd.
Ross (1979) describes a study in which 30 informants, including
some linguists, were asked to rate 13 sentences in various ways.
Although Ross is quite candid about the statistical limitations of his
analysis (the work being intended essentially as a pilot study), he
finds much evidence of disagreement among informants about the
status of particular sentences. He found subsets of informants that
agreed on a few sentences, but no two ii\formants provided the same
ratings for even a third of the sentences. Ross also noted that infor
mants seemed to be least likely to agree on more peripheral items
that show generally low acceptability.
A very thorough and careful critique of judgment data appears
in Labov (1975). In a review of numerous published and unpublished
findings, Labov found extensive evidence of disagreements among
informants. He also found many disagreements between published
assessments of sentences in the linguistics literature and the assess
ments of naive informants. He concluded that variation in judgments
was widespread and disorderly enough to call into question much
work based on intuitive data.
Introduction 10
judgment process (e.g., Carroll et al., 1981; Cowart, 1994; Gerken &
Bever, 1986; Nagata, 1987a, 1987b, 1988,1989a, 1989b, 1989c, 1989d,
1990,1991,1992).^
There is a notable contrast between most psychological uses of
judgments and a number of the linguistic studies that have criticized
judgments on the basis of experimental findings. Many of the experi
mental linguistic studies mentioned above tacitly assume that each
encounter between an informant and a sentence yields (at least in the
great majority of cases) a definitive index of that informant's assess
ment of that sentence. In statistical terms, these studies assimie that
there is Uttle or no "error variance" in sentence judgments (the
concept of error variance will be discussed in the next chapter).
Surprisingly few linguistic attempts to study judgments experimen
tally have given serious consideration to the possibility that an
informant's judgments might be subject to some amoimt of random
variation around a stable mean. Another important feature of some
of the linguistic studies mentioned above is that they effectively
assume that each sentence is a definitive representative of its syntac
tic type, sometimes using only a single representative of each sen
tence type tested. Such practices make no allowance for the widely
recognized fact that lexical differences between different tokens of
the same syntactic type can produce quite substantial differences in
judged acceptability. Many of these studies also ascribe no signifi
cance to order, in effect assuming that the order in which a set of
sentences is presented exerts no effect on judged acceptability.
The contrast to standard practice in psychophysics is particu
larly striking. Like much of contemporary syntactic research, psycho
physics is very much concerned with descriptions of the private
subjective states of human subjects. In the case of psychophysics,
however, the private states in question are generally responses to
relatively simple external physical events such as flashes of light,
brief tones, touches of the subject's skin, and so on. Even with these
far simpler stimuh, it is commonplace to have each subject judge
dozens or even hundreds of iivstances of the same stimulus. This effort
is imdertaken for the simple reason that subjects routinely give
variable responses to different presentations of a single stimulus. By
asking each subject to judge each stimulus many times, psychophysi
cists are able to establish highly reliable descriptions of those re
sponses that will generalize to future sessions with the same subject
and to data collected from other subjects as well. From a linguistic
Introduction 10
point of view, this bears emphasis: Although there are many stimuli
for which each subject's responses will show a highly stable norm, it
is often difficult to detect that norm on the basis of small numbers of
responses. The nature of sentences obviously makes it inappropriate
to present the same sentence to one informant many times,* but there
may be an important hint in standard psychophysical practice and
experience that the normal variability of individual sentence judg
ments could be much higher than linguists have generally assumed.
The next chapter will pursue this issue in more detail.
The data presented in this and the next section are intended to
show that the kinds of methods described in this book are able to
detect stable patterns of judgments in speech conununities. These
findings will not respond to all the issues raised in the critical
literature on judgments. They will show, however, that one popula
tion (native speakers of American English) has stable patterns of
response with respect to some theoretically significant syntactic is
sues, and that those patterns of resporwe can be reliably measured
via the methods to be described in this book. I will treat all other
questions of stability, reUability, and validity as secondary to these
issues (see Endnote 1, this chapter). Because the rest of the book will
describe in detail the methods used here, a very brief summary will
suffice as background to the experiments discussed in this chapter.
The experiments use questionnaires whose materials were con
structed according to constraints that typically apply in on-line ex
Introduction 13
1.3.1 Subjacency
1.0 -ι 1
N 88
0.8 I =
~ 0.6
(0
'c
2 0.4
o
υ
0,2
1 "
f o.o - 5
φ
Iαϊ8 -0.2
- -0.4 - *
-0.6
-0.8 »
.1.0 J 1 1 1
Control Indefinite Definite Specified
Subject
Figure 2. Relative judged acceptability of three cases of extraction from a
picture-NP.
acceptability of the Indefinite case in (lb) above. The experiment
used token sets structured as in (2).
(2) a. Control Why did the Duchess sell a portrait of Max?
b. Indefinite Who did the Duchess sell a portrait of?
with "of"
c. Indefinite Who did the Duchess sell a portrait to?
with "to"
d. Specified Subject Who did the Duchess sell Max's portrait to?
with "to"
Introduction 17
1.0 -, 1
0.8 - Ν =41
χ
δ 0.4-
8
Γ
Ν, 0.2- 0
£ τ
00
IS - - 0 ."2
υ
I -0.4
f-0.6
S -0.8
Έ
-1.0 J 1
I 1 — — 1
Control Indefinite Definite Specified
with "of" with "to" Subject
with "to"
Figure 3. Relative judged acceptability of extractions from picture-NPs compared
with extraction from a sister to NP.
Here (2a,b) replicate the Control and Indefinite conditions of the first
experiment, and (2c) and (2d) provide added relevant points of
comparison. Cases such as (2c) are maximally similar to (2b) on the
surface but do not involve extraction from within a picture-NP. Thus,
if the low relative acceptability of the Indefinite cases in the first
experiment resulted from informants applying a prescriptive rule
against sentences ending in prepositions, (2c) ought to be similarly
compromised. The contrast between (2c) and (2d) allows for a com
parison to the pattern attained with the superficially similar cases in
(1) and provides a third clear case of a class of apparently acceptable
sentences against which to compare cases such as (2b).
The procedure was unchanged except that a smaller group of
16
informants was used (Ν = 41). The results of the second experiment
are summarized in Figure 3.
18 EXPERIMENTAL SYNTAX
1.3.2 "That"-Trace
1.0
No "that"
0.8
With "that"
c 0.6
ο 0.4
Ν
I
0.2
0.0
& -0.2
< -0.4 +
f -0.6 +
§ -0.8 + Ν = 32
0)
-1.0
Subject Object
Extraction Extraction
Figure 4. The "that"-trace effect, an interaction between extraction site and
the presence of that.
(3) No "that"
a. Subject Extraction I wonder who you think likes John.
b. Object Extraction I wonder who you think John likes.
With "that"
c. Subject Extraction I wonder who you think that likes John.
d. Object Extraction I wonder who you think that John likes.
(4) No Coordination
a. Local Antecedent Cathy's parents require that Paul
support himself.
b. Remote Antecedent Paul requires that Cathy's parents
support himself.
Simple Coordination
c . Local Antecedent Cathy's parents require that Paul
support himself and the child.
d. Remote Antecedent Paul requires that Cathy's parents
support tiimself and the child.
1.0
- ο — Local Antecedent
0.8 \
—·— Remote Antecedent
(η
Έ
3
0.4 \
0.2 I
1
I-ο
α>
ο>
τ3
3
-5
C
(Ο
0)
/V = 4 3
-1.0 ι
I 1
I
No Simple Coordination
Coordination Coordination witii "both"
1.0 τ
Ν = 54
CM r^ = .97
c
ο
'«
0.5 I
to
0)
CO
0.0
I -0.5 +
8
φ -1.0 •
Ο)
•σ
3
C -1.5 1
CO
φ
-2.0
-2.0 -1.0 0.0 1.0
Mean Judged Acceptability (Session 1)
Figure 6. Identical sentences judged on two occasions.
N O T E : Each data point represents the mean of 54 judgments from Session 1 and the mean of 5 4
judgments from Session 2.
Stability evident in Figure 6 is due to the fact that the sentences were
presented in the same order on both occasions. The large differences in
acceptabiUty among these sentences also help in achieving stable re
sults; contrasts of theoretical interest often wiU be much more subtle.
To estimate the stability of responses to individual sentences
where the order of presentation differs in the two sessions completed
by each informant and where, within any one session, various sub
sets of informants see the materials in different orders, we isolated
and analyzed a subset of the data from a "that"-trace experiment (n
= 52) in which each informant participated in two sessions.
In this experiment, the filler sentences were identical in the two
sessions, differing only in the order in which they were presented.
The filler list used was carefully constructed. A preliminary list of
24 EXPERIMENTAL SYNTAX
3.0 1.0
Low Acceptability High Acceptability
Sentences Sentences
3.4 1.4
0
3.6 ο <g 1.6
0 ° /
3.8 1.8
Ν =32 Ν »64
4.0 ° , 1 2.0
4.0 3.8 3.6 3.4 3.2 3.0 2.0 1.8 1.6 1.4 1.2 1.0
The overall results of this study were very similar to those reported
in Section 1.3.2 above, except that the gap between Cfcject Extraction cases
with and without "that" was smaller, although still reliable overall. There
was a small statistically reliable difference between the results from the
Alabama cohort and the other two, which resulted from the fact that the
gap between the Object Extraction cases with and without "that" did not
appear in the results for the Alabama cohort.
26 EXPERIMENTAL SYNTAX
1.4.10verview of Evidence
1.40
1.40
-.10
-.60
MT/SE NT/OE WT/SE WT/OE MT/SE NT/OE WT/SE WT/DE
literature, these results also show that there are sometimes details of
these patterns that have gone unnoticed or underestimated in syn
tactic research. The evidence reviewed above also demonstrates that
there are practical experimental methods by which the phenomenon
of sentence acceptability can be measured and assessed.
This chapter has stressed stability; the next will stress variabil
ity. At least where the judgments of naive informants are concerned.
28 EXPERIMENTAL SYNTAX
Notes
cally distort the characteristic they are meant to meastire (giving, for example, values that
are consistently too high or too low in particular parts of the range of the measurement).
Reliability is also connected with validity, questions about what it is that a particular
psychological test actually measures (e.g., to what extent IQ is a measure of a cognitive
trait rather than an inadvertent index of socioeconomic background). Schutze's Chapter
3 (19%) is essentially a review of the literature on validity as it relates to judgments.
All of these issues are obviously relevant to work with judgments in linguistics,
but collecting reliable and scientifically useful data on patterns of sentence accept
ability in speech communities does not require the prior resolution of any questions
of these kinds. Although I will keep to the relatively informal usages of the linguistic
literature when any of these issues come up, I will try to indicate dearly which notion
of stability is relevant wherever we need to corwider matters other than stable
measures of the characteristics of linguistic communities.
2. Newmeyer (1983) argues that most of these disagreements are not about
the facts of acceptability. He suggests that in several cases judgments were constant,
but conflicts among theorists on how various sources of unacceptability should be
accounted for drove changes in the status ascribed to certain sentences.
3. This sample of papers includes those that were identified in a search of the
PsychlNFO database (American Psychological Association). The search target w a s
specified as follows: grammatical? (In) judgment?. This was intended to select all
papers whose database entry included adjacent ii«tances (in either order) of the target
terms or their variants.
4. Nagata (1987a, 1987b, 1988,1989a, 1989b, 1989c, 1989d) notwithstanding,
il appears to be essential to the nature of sentences (and human responses to them)
that they have identities in ways that flashes of light do not. Linguists and psycholo
gists, in taking an interest in the structure of language, naturally view sentences as
categories and see particular utterances as irwtances of those categories. From the
standpoint of day-to-day usage, however, repeating sentences seems to have some
status akin to re-eating a meal; the food wasn't quite right, or the conversation took a
bad turn, so we hit the rewind button and did it again on the spot. Sentences seem to
be ordinarily experienced as unique and unrepeatable events. Although w e can
individuate flashes of light (or sounds, or smells, or the like) by identifying each with
a time, this seems somehow contrary to the ordinary experience of such events. From
a psychological point of view, it seems all the light that flows from a bulb is part of
the same quantity or entity; individuating particular flashes feels a bit like giving
different personal names to a person's left arm, right arm, left ear, and so on. In short,
it does serious violence to the ordinary way of relating to sentences to ask informants
to judge the same sentence many times. This does not mean, of course, that repeated
presentations in experimental settings are unthinkable, but it does raise questions
about how to relate those events to more typical uses of sentences.
5. I am, with some trepidation, attempting here to summarize a view of these
issues that emerges in various of Chomsky's writings over many years. Readers who
wish to consider the issues in more detail should consult Schutze (1996), especially
his Chapter 2, which cites the various relevant works of Chomsky's.
6. Bever and his colleagues (Bever, 1974; Bever & Carroll, 1981; Gerken &
Bever, 1986; Katz & Bever, 1976; see also Schutze, 1996, pp. 62-70) have argued that
there are very good reasons to explain evidence of gradedness in judgments solely by
reference to performance systems while maintaining a discrete model of the grammar.
7. Formal methods have the distinct advantage that they are public and
replicable. They allow for productive ways of resolving data disputes where standard
informal methods do not. With traditional methods, disputes about data can easily
become dead-ends, regardless of the actual quality of the data in question.
30 EXPERIMENTAL SYNTAX
Error Variance
in Sentence
Judgments
31
32 EXPERIMENTAL SYNTAX
Informant 1
1 2 3 45
luu uuu
1 2 34 5 1 2 3 4 5
Informant 2
1 2 3 46 1 2 34 5
uili
1 2 3 4 5
Informant 3
1 2 3 4 S
luh
1 234 5
Ja-A
1 2 34 5 1 2 3 4 5
Informant 4
Li
1 2 34 5
lui tUi
1 234 5 1 2 3 4 5 1 2 3 4 5
Informant 5
Overall:
1 2 34 5 1 234 5 1 2 34 5 12 3 4 5
Avg w/in category
range = 3.5
those seen in Informants 3,4, and 5. Although the first two of these
informants shows the "that"-trace effect overall, the range of vari
ation within sentence categories overwhelms average between-cate
gories differences in every case.
This degree of error variance, however, did not obscure the overall
finding. Iriformants gave high ratings to the (6a) and (6b) sentences,
slightly (but reliably) lower ratings to the (6d) sentences, and far lower
ratings to the (6c) sentences. The overall pattern was highly reliable.^
In short, when results were averaged across the several sen
tences to which each informant responded and averaged across
informants, the error variance demonstrated earlier was not suffi
cient to overcome or seriously obscure the underlying evidence of
the "that"-trace effect. Similar patterns of substantial error variance
accompanied by clear and consistent overall patterns have been
found for other syntactic phenomena as well.
A further indication of the magiutude of error variance in
judgments comes from another "that"-trace experiment in which 21
informants (out of a far larger sample) responded to exactly the same
questionnaire on each of two occasions a week or more apart. This
contrasts with the previous study where the informants saw the
target sentences in two different orders. These questionnaires in
cluded only three sentences of each syntactic type. Nevertheless, the
general "that"-trace effect evident in the full experiment and earlier
studies was reliably replicated with this small subset of informants.^
On the other hand, an informant's means for the four experimental
conditions from the first session were generally poor predictors of
that informant's means from the second session (see Figure 10). As
we saw earlier, highly stable patterns may be discernible in data from
a group of informants without those same patterns being readily
detectable in the results of individual informants.
These findings indicate that there is normally more than enough
error variance in judgments of sentence acceptability given by naive
informants to make the control of this variance a proper concern of
any reader of proffered general claims about the relative accept
ability of sentence types. Nevertheless, as the data reviewed in
Chapter 1 indicate, this error variance is easily controlled by standard
tools of experiment design and statistical analysis, so long as these
tools are properly applied.^
It is not clear how the judgment perfonnance of experts com
pares with that of typical naive informants. It may be that particular
36 EXPERIMENTAL SYNTAX
2 0
rVoe
1.5 •
0 οο ο ο
0 e, 1.0
0 0
0.5
NO ^^^--"""^ ο 0
•tJurt" •— 0.0
° 0
-0 5 ο
0 0
ο 0 -1.0 ο
-IS
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
With
•that"
-1.5
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2 . 0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2 . 0
Mean Judged Acceptability Mean Judged Acceptability
{ζ -score units) / Session 1 (z -score units) / Session 1
Notes
1. The difference between the informant's average rating for the With "that"/
Subject Extraction cases and the average of the other three categories was taken as the
measure of the "thaf'-trace effect for each informant.
2. From this point on in the text, when needed, statistical results will be
reported in endnotes. Readers unfamiliar with the tests reported should consult
Appendix A.
There was a reliable interaction between the Presence or Absence of "that" and
the Extraction Site factors, f i(l, 331) = 361, ρ < .001. Both the main effects were also
significant, F i ( l , 331) = 690, ρ < .001, for "that," and F i ( l , 331) = 281, ρ < .001, for
Extraction Site. Various formal and informal methods suggest that the effects were
robust enough to be tested with samples as small as eight informants without undue
risk (= .25) of a Type Π error. We'll return to sample size issues in Chapter 9.
3. The "that" versus Extraction Site interaction was reliable, F i ( i , 20) = 11.4, ρ <
.01, as was the main effects of "that," Fi(i, 20) = 11.i, ρ < 01, and Extraction Site, F i ( l ,
20) = 11.0, ρ <.01.
4. Readers unfamiliar with the theory of measurement and reliability may find
it paradoxical that a measurement procedure, such as giving acceptability judgments
for sentences, can give reliable evidence about the relative acceptability of sentences
or sentence types for a population while providing very unstable information about
the individual informants in the sample on which those general findings are based.
There is n o paradox. What is happening is simply that the analysis of the experiment
as a whole is aggregating weak information gleaned from the individual informants
to provide a strong indication of the state of affairs in the target population. If there
were a way to subject each informant to a measurement effort as intensive as that
applied to the full sample (e.g., collecting 50 responses on every target sentence), it is
likely that we could acquire information about each informant roughly as reliable as
that we are able to obtain for a population from a sample.
A corollary is that having achieved reliable measurement of a population via a
sample does not give evidence that we have achieved reliable measurement of
individuals that make up the sample. This raises some concerns because when
linguists collect experimental data, they are often tempted to examine and comment
on the results for individual informants. Such efforts ought to be reserved for cases
where the reliability of the measurement procedure (vis-A-vis individual informants)
has been established. As of now, linguists have available no experimental procedure
for gathering judgment data that meets this criterion.
3
Designing
Experiments
on Acceptability
All
Raaponaae
Sum
• 9
Grand Maanl
4.9B
Sun of
Dllfaranoa 0.00 0.00 0.00
Soora*
(Sumof
Squmd Sum of
Squara* 126.95 90.50 36.45
OWarwioM
SOOTM)
(Mawiof
Squxd Varlanca 6.35 • 4.53 1.82
Diffannca
Soorai)
Within Batwaan
Tota
Totall
Varianc
Variancaa
Greupa * Group*
Varlanca Varlanca
NOTE: Each row in the main body o f the table represents a single informant. T h e informants are
divided into two groups, each of which includes ten individuals w h o supplied one judgment response
each. The first column of difference scores compares the individual's response to the grand m e a n ( 5 3 ) .
The second CQlumn of difference scores compares each informant's judgment response to die mean
for that informanl's group (3.90 for flie first group and 6.70 for ΰκ second). Finally, rightmost
column of difieieiKX scores shows ttte difference Iwtween the mean for Uie iiifbrmant's group arid ttie
grand mean. Thus, for the first ten informants this difference score is always 3.90 - 5.3 = - 1 . 4 0 , and
for Uie second it is 6.70 - 5 3 = 1 4 0 . Toward the bottom of the table, each value in the Sum of Squares
row shows the sum of the s^uiml difference values in Oie same column. The values in the variance
row aie obtained b y dividing the values in Uie Sum of Squares row b y Uve number of individuals (20).
Crucially, tiie sum of Ote Within Groups and Between C i o u p s Variances equals the Total Variance.
This example demonstrates the general logic of variance partitioning, but not its actual application in
statistical tests. In statistical tests, the goal is to csftmote relevant p a r a m e t e r for a population b y way
of data takenfiroma sample drawn from that population. This complicates the logic and the math,
but the underiying concept remains (he same.
Designing Experiments on Acceptability 43
might choose to select informants from only one of the two commimi
ties, or run two parallel experiments using informants from just one of
the communities in each of the two procedures. These latter alternatives
have the advantage of reducing the total error variance in the experi
ment, which improves the experimenter's opportunity to detect real
differences associated with the intended manipulatiorw.
One of the key challenges in designing meaningful and inter
esting experiments is to ensure that any systematic variance in the
experiment is either appropriately identified and directly manipu
lated or effectively neutralized and submerged within the error
variance in the experiment. As we'll see, the main focus of this effort
in the design of experiments on acceptability is directed toward the
design of the materials that will represent the various experimental
conditions. In an experiment on judgments, by far the most hkely
source of extraneous systematic variance hes in the opportimity to
inadvertently introduce differences between the materials in various
conditions.
Apart from the sentence materials themselves, the main hkely
sources of extraneous systematic variance in experiments on sen
tence judgments are the ordermg of those materials and differences
among informants. Each of these factors will be considered below.
Presence/absence of "that"
Without "that" With "that"
Extraction Site
Subject Extraction Who do you think Who do you think that
Ukesjohn? likes John?
Object Extraction Who d o you think Who do you think that
John likes? John likes?
NOTE: Eadi dimension of the tablerepresentsa "factor" of the design. Thus Uvere is a "ttuit"factor
(Presence/Absence of "that") and an Extraction Site factor. Each factor in this design is said to
have two "levels" (e.g.. Subject Extraction and Object Extraction are each levels of the Extraction
site factor. This is said to tie a 2 χ 2 design because it has two leveb on each of two factors.
Antecedent Location
Local Antecedent Remote Antecedent
Coordination Type
N o Coordination Cattiy's parents require Paul requites that Cathy's
that Paul support himself. parents support himself.
Simple Coordination Cathy's parents require Paul requires that Cathy's
that Paul support himself parents support himself
and the child. and the child.
Coordination with Cathy's parents require Paul requires that Cathy's
"both" that Paul support both parents support both
himself and the child. himself and the child.
N O T E : There are two factors. Antecedent Location and Coordination Type, with two levels of the
first of these and three levels of the second. Thus this is described as a 2 χ 3 design.
Finding Verbs
3.4.1 Overview
All of the designs discussed so far have been what are called
"within subject" designs. That is, they are intended to detect differ
ences across sentence types by assessing each informant's average
response in each of several sentence type categories. By augmenting
a within subjects design with a between groups factor that distin
guishes in some fashion among different types of informants, designs
of this kind can be extended to address many sorts of differences
among groups.
Note
The Sentence
Judgment Task
55
56 EXPERIMENTAL SYNTAX
T a b l e 4 A l t e r n a t i v e i n s t r u c t i o n s for a j u d g m e n t e x p e r i m e n t .
Intuitive Instructions
Please read each of the sentences listed below. For each sentence, w e would like
you to indicate your reaction to the sentence. Mark your response sheet A, B, C, or
D. Use (A) for sentences that seem fully normal, and understandable to you. Use
(D) for sentences that seem very odd, awkward, or difficult for you to understand.
(Note: DO NOT USE "E.") If your feelings about the sentence are somewhere
between these extremes, use one of the middle responses, Β or C. THERE ARE NO
"RIGHT" OR "WRONG" ANSWERS. Please base your responses solely on your
gut reaction, not on rules you may have learned about what is "proper" or
"correct" English.
Prescriptive Instructions
Please read each of the sentences listed below. For each sentence, we would like
you to indicate whether or not you think the sentence is a well-formed,
grammatical sentence of English. Suppose this sentence were included in a term
paper submitted for a 400-level English course that is taken only by English majors;
would you expect the professor to accept this sentence? Mark your response sheet
A, B, C, or D. Use (A) for sentences that seem completely grammatical and
well-formed. Use (D) for sentences that you are sure would not be regarded as
grammatical English by any appropriately trained person. (Note: DO NOT USE
"E.") If your judgment about the sentence is somewhere between these extremes,
use one of the middle respoiwes, Β or C. Use Β for sentences you think probably
would be accepted but you are not completely sure. Use C for sentences you think
probably would not be accepted.
% 0.5;
S
Si 0.0 φ
I
§
<
^ -0.5 f
T3
D
Notes
consciously aware of. Asking informants to make distinctions among syntactic, se
mantic, pragmatic, and other sorts of influences is quite another matter.
2. The interaction among the instructions variable, antecedent location, and
coordination type did yield a small reliable difference, Fi(2, 82) = 3.71·, F2(2, 46) =
4.18*. Note, however, that there appears to be no difference in pattern of any theoreti
cal interest for the results obtained under the two sets of instructions.
3. Gross differences in overall acceptability will be relatively easy to obtain,
but these are of no particular theoretical interest.
5
Presenting
Sentence
Materials to
Informants
Table 5 C o m p a r i s o n s o f a l t e r n a t i v e w a y s of p r e s e n t i n g s e n t e n c e m a t e r i a l s
to informants.
Response Methods
67
68 EXPERIMENTAL SYNTAX
9.0
r»=.8e
ο oJK
ο
ο ° °° χ " "
ο ο οο
/Ο ο
5.0 ο
° Χ ο
^4.0 4
3.0 4
0 ( 5 ^ °9 ο
^ ο
ο 0
0.0 1 1
Overview
Notes
Sampling
7.1 Representativeness
There are many familiar domains within the social and behav
ioral sciences where exacting control of the sampling process is
critical to the representativeness of the sample. Polling on political
79
80 EXPERIMENTAL SYNTAX
and social issues is perhaps the most famihar example. Here very
carehilly drawn samples of 1,000 or more individuals are common. In
general, the sampUng process is critical whenever "typical" individuals
are (or may be) imrepresentative of the population as a whole, or where
the investigator intends to estimate the proportion of infrequent types
of individuals (e.g., generally healthy 35-year-old males who show signs
of subclinical aphasia). Statistical theory also leads to a concern with
sampling because most statistical tests are based on a model that
assumes random samphng from the relevant population.
This is not to say, however, that samphng is always a critical
issue. Many samples used in biological, medical, and psychological
research are controlled more by issues of convenience and economy
than by sampling theory, and this does not generally reflect tmdue
laxity on the part of the researchers who use these samples (although
it can). Many biological phenomena are sufficiently consistent across
individuals within a population to make drawing importantly un
representative samples far harder than drawing representative ones.
For example, the relative length of the bones of the human upper arm
and upper leg is quite consistent across individuals (though, of
course, not invariant). Almost any small, casually drawn sample
from individuals found in a school, hospital, jail, or shopping mall
will provide an estimate of this ratio that wiU clearly distinguish
humans from chimps or other primates. Countless other anatomical,
physiological, and even behavioral traits show similar consistency
across individuals in numerous species.
Notable consistency is also apparent in a variety of human
behavioral traits, especially in language. Small samples are accepted
for a wide variety of articulatory, acoustical, and perceptual studies
of speech because theoretically relevant patterns can be readily and
rehably detected in these small samples.
Thus the task of drawing a representative sample is not neces
sarily difficult. The degree of care appropriate to a particular experi
ment depends upon what is already known about the variabihty of
the phenomenon in question (relative to the theoretically important
contrasts to be tested) and what particular purposes the investigator
has in mind. A medical researcher looking for reliable diagnostic
signs of some disorder may need much more precise information
about variation m contemporary human facial or body proportions
than does a paleoanthropologist who is considering how to classify
a particular fossU.
Sampling 81
The discussion of sampling thus far has assumed that the goal
of research is to characterize general properties of groups of infor
mants. There are also linguistic issues that call for drawing samples
from two or more groups with a view to comparing the groups. If
the differences among the groups to be compared are relatively stark,
the differences may be detected with samples of sizes similar to those
discussed above. However, some differences between groups may
be stable but quite small. Where small differences are involved,
group comparisons may require samples many times larger than
those used in the studies discussed above. The size of differences can
be estimated through pilot testing and this information can be used
in estimating needed sample size.
Notes
Settings for
Experiments
85
86 EXPERIMENTAL SYNTAX
The Organization
and Construction
of Questionnaires
We would like you to imagine that your job is to teach English to speakers of other
languages.
For each sentence listed below, we would like you to do the following. Please read the
sentence, then ask yourself if the sentence seems English-sounding or not. Suppose
one of your students were to use this sentence. If we ignore pronimciation, would the
student sound like a native speaker? Or would the sentence seem strange or unnatural
to a native speaker no matter how it was pronounced? Your task is to tell vis how
English-sounding each sentence is, using a scale.
have been used and tested previously, it will be prudent both to test
the instructioixs directly on a few ir\formants and to include some
check on the effectiveness of the instructions within the experiment
itself. One good way to help the informant master the task and also
check on the effectiveness of the instructions, informant by infor
mant, is to include practice cases such as those included on the fourth
page of the S c i m p l e questiormaire ("Practice" section) in Appendix E.
• Each informant will judge no more than one sentence from each token set.
• Each informant will judge all experimental conditions and will see equal numbers
of sentences from each condition.
• Every sentence in every token set will be judged by some informant.
9.5.7 Counterbalancing
9.5.2 Blocking
Matwtak
Sotpti 8αΙρΙ2
ToMnMI/SwilKmA T o k a i M I / G w i l K leaB
Toian S M 2/ Sanlino* A w B
TolanSMS/SwimaB T o k n M 3 / 8 « l k f «aA
T(ilwiS«t4/8MHnnB T ( g k K l M 4 / 8 M M f K»A
Rtac S a n a n o M
Ordered Scripts
9.5.3 Randomizing
No "for," Subject Who did the counselor Who would Nona like to
Extraction case want to hug Chantal? pay Sarah?
No "for," Object Who did the counselor Who would Nona like
Extraction case want Chantal to hug? Sarah to pay?
With "for," Subject Who did the counselor Who would Nona like for to
Extraction case want for to hug Chantal? pay Sarah?
with "for," Object Who did the counselor Who would Nona like for
Extraction case want for Chantal to hug? Sarah to pay?
NOTE: Here the components of two token sets are enumerated in the upper panel of the table. In
the lower panel, spreadsheet functions have been used to assemble the components into finished
sentences of the intended types.
(continued)
§ Table 12 Continued
NOTE: Each preliminary script constitutes a counterbalanced set of experimental sentences that would be appropriate for use in one questionnaire. Every version of
every token set is used in some preliminary script, and within each preliminary script, every sentence type is equally represented. All the sentences on each row are
members of the same token set. The labels above each block of three rows indicate the type of the sentences in that block for that column, such as NT/SE (No "thaf'/Subject
Extraction) or WT/OE (With "that"/Object Extraction).
The Construction of Questionnaires 101
Table 13 A p r o c e s s for i n t e g r a t i n g a n d o r d e r i n g e x p e r i m e n t a l m a t e r i a l s
a n d fillers b y b l o c k s .
N O T E : "BIk I D " is a block identifier that initially assigns each item to a block. "Item I D " identifies
each item by type and version; " T S I A " is Token Set 1 in version A and " F 7 " is Filler Sentence 7.
See text for further discussion. Note that in the third block constructed in Step 4, there are three
experimental sentences appearing consecutively. Experimenters should control the randomization
process to prevent such occurrences.
block. I'll call the lists that emerge from Step 4 just "scripts." Note
that by repeating Steps 2-4 for the same preliminary script, we can
generate several differently ordered versions of the same script. It is
prudent to use at least two different orderings of each script; using
only a single ordering of a script can lead to unwanted ordering
effects that may obscure the main experimental findings. This pro
cess must be applied to each of the preliminary scripts.
It is important to preserve the files generated in the course of
constructing scripts because the Item IDs in these files will later be
used to decode the data files emerging from the scarming or data
entry process.
Coding and
Decoding the Data
103
104 E X P E R I M E N T A L SYNTAX
Μ ®©®®®®®®®® A • C D ( f α Η • I• ©J® ® ® ® ® © ο © ©
11 I I I I I I 11 I III ®0©®®®®©®®
IIIII I iiiiitiiiiiiiri
10.1 Scanning
A B C D E F G H I J
1 ΘΘΘΘΘ®®©®®
A B C D E F G H I J
2 ® © ® ® ® ® ® ® ® ®
A B C D E F G H I J
3 ® ® ® ® ® ® ® ® ® ®
A B C D E F G H I J
4 ® ® ® ® ® ® ® ® ® ®
A B C D E F G H I J
5®®®®®®®®®®
A B C D E F G H I J
6 ® ® ® ® ® ® ® ® ® ®
F i g u r e 1 5 . A n e x p a n d e d v i e w o f six r e s p o n s e i t e m s f r o m t h e f o r m s h o w n in
Figure 14.
S O U R C E : F o m © National Computer Systems, Inc. Reprinted courtesy of NCS.
Ideally, all the data from each informant will appear on one line
(within one "record") in the resulting data file, and corresponding
data items (e.g., the informant's response to the 25th sentence) will
106 EXPERIMENTAL SYNTAX
Una 1 INSTRUCTOR A
Htm 2 F 055742162 1 7 2 S S 2 3 6 7 4 S 3 S 5 9 8 8 4 9 9 5 6 4 4 9 9 6 6 8 6 64 9 6 8 8 6
Una 3 C 07007M7S 0 2 9 6 1 0 3 9 0 0 0 0 9996 00 98 04 9 0 9 9 4 5 7 8 1 3 S 7 9 9 1
Ul» 4 r 08B4874SS 190620796591929380909891290883287299
Una % Η 0317750957088(2 0792952632472627392830573824676163715
Una ( 12 1 32645411 O 0 0 0 7 7 7 7 7 7 T 7 4 7 4 3 J S 5 5 3 2 2 5 6 22 6 6 0 7 7 7 7 7 7
U » 7 Β RE
0944415800000260600041721113438562488
Un* I ε 103 6 0 5 9 67 051600790651397308472576449979660(007
Um 9 λ 093D7 4 0 7 ] 6 8 9 1 5 7 0 4 0 90 04 9 04 51 S3 9 9 4 0 9 4 4 0 9 0 9 4 4 0 4 4 4 0 4 9 4 4 0
Lin* 10 ο » · RLE Ok NM RIE
1 0 0 2 7 3 0 5 6 5 8 7751 00111100 01011100 OOUIIOIUOO100110101
BO ATO D Y X
Lin* 11 052S64000 216 9 0 3 1 ) 8 9 35 84 5663 93 54 93 9 9 7 8 9 1 9 9 9 7 9 3 4
Lin* 12 ε 061702687 0 9 3 9 4 4 7 9 3 8 9 4 9 9 9 9 4 4 9 9 9 7 9 6 9 9 999967 9 9 9 9 7
Lin* 13 c 0 7 1 4 0 0 8 9 0 1 9 0 2 6 3 0 1 9 8 9 9 9 9 9 8 3 8 0 2 4 4 7 92 0 9 5
1297274 70
D
fall in the same place (left to right) on each line, but this often is not
the case. Scanning any more than a handful of forms seems almost
always to result in at least a few anomalies (see Figure 16). For a
variety of reasons, scanning systems often report blank data fields
for imused fields and use long series of blanks as spacers in data files.
This can make the process of interpreting a file of scanned data more
challenging than it should be. Had the results represented by Figure
16 been ideal, the first 12 data lines (Lines 2-13) would all have been
patterned exactly like Line 2. Note that over Lines 4 through 11, there
are instances of missing data and spurious or surplus data (where
informants appear to have filled in items they were asked to leave
blank). Often scanned data look even less orderly on first appearance
because they are organized into very long lines that break across two
or three display lines in a word processor or editor.
Urifortimately, the only corrective for these sorts of anomalies is
usually to edit the data file by hand. Many anomalies are obvious (e.g.,
excess data) but some will require reference to the original forms. It is
generally easier to make the corrections with a word processor or text
editor in the data file before it is processed further. It is also helpful in
this process to use a monospace font (e.g.. Courier).
In general, the goal at this stage is to have every line of the file
conform to the same format, to ensure that the nth character on every
line represents the same thing as the nth character on every other line
(e.g., the third digit of the informant's ID number or the informant's
36th judgment response), to fill in any missing data that can be
recovered via inspection of the original forms, and, above all, not to
introduce any new errors via this editing process. The data file that
results from this editing process should look something like the one
shown in Figure 17. Once the file is in this form, it is ready to be
moved into a spreadsheet for further analysis.'
Coding and Decoding the Data 107
IKSTRVCTOR A
Β 1345411 0000777777774743355532254622660777777
F 0887455 1906207965939293809098912908832872993
Β 3058775 OOlUlOOOlOlUQOOOnilOniQOlOOllOlOl
Ε 9999999 0944415800000260600041721113438562488
6 0796476 0^6103900009996019804909945781357991
& 1005967 0516007906513973084725764499796608007
Ε 1281584 10011001101 oouinioooiooiioioouiioo
Η 0908862 0792952632472627392830573824676163715
Γ 0542162 1725523674535598849956449966866496886
IHSTRTfCTOR Β
Τ 0988587 052304492160327961971171870274347S8S2
Ε 0564000 1903338935845663935493997893999793499
F 0683862 0983108801604097409940946707610690790
& 0776478 0296103900009996009804909945781357991
Ρ 0973127 1011111011111100110110110010101011001
& 0602687 0939447938949999449997969999996799997
Β 1227470 0714008901902630198999998380244792095
Η 0742670 053722775654675848777764747S777577777
F 1107405 0719803934807199709931988908992697591
F i g u r e 1 7 . A c l e a n e d d a t a set.
I SainulH Uolo I
t / ι ^ Λ ι J ^ ^ ^ / ι ^ / ι t / ι u ι v ι u ι v ι u t ^ A ^ A U i
§1 θ
F IHt
0
I
Μ
Ε (7te 0 I 3
A 09(0 9 9
Η S570 4 i
F S571 t ·
7I)D7 5 7
A G 1170 9 9
A 0 2972
A ε S25(
A Η loei
. 70Si
Β 95S0
C eSM .
A 2210 I
Η 75S4 0
F 1960
C 797<
<S 5112
Μ 2444
A I3S0
ε sisi
D KSI
ε 9770
S S474
D 1772
S 73tl
9 ifSI
f 9072 a t
A SJ74 I 2
C 0((2
F i g u r e 1 8 . P a r s e d d a t a in E x c e l s p r e a d s h e e t .
N O T E : T h e data collected b y Instructor A are n o w distinguished b y a character code in the first
column of the table. T h e T Y P E column iiuiicates which of eight different questioruiaires each
informant used and the I D column assigns a unique I D n u m b e r to each informant. T h e fourth
column, N S , is demograptiic data; 0 indicates that the informant is a ruitive speaker of English.
The next 12 items on each row r q i r e s e n i the first 12 sentences the informant judged. These
sentences were identical in content and order across all versions of this set o f questioiuuiires. The
columns containing data for sentences R2 through R l l have b e e n hidden. T h e remaining 24 items
on each line (S1-S24) represent both experimental and filler sentences in the linear order in which
they appeared for each informant.
so that all those who used the same ordered script appear together
in the table. Once the informants are grouped according to which
ordered script they used, codes can be added in adjacent rows to
show the type and identity of each item the informants in that block
responded to (see Figure 19).
The codes entered above the data for each group of informants
can now be used to control sort operations along the rows. By sorting
according to sentence number (or token set number), all of the rows
are aligned such that all of the responses for any given sentence (or
token set) appear in the same column of the table (see Figure 20). At
this stage, the data are ready for preliminary statistical analysis. At
this point, the data table should be preserved and further analysis
done on a copy.
Coding and Decoding the Data 109
• • • • • • ^ • • ^ ^ ^ • • • ^ • B Sample Dots
AIBICI Ρ Ι ί Ι F l a i Η I8ITIUIVIWI« I ζ |AA|>^|AqAI)|AE|Af|AapW|AI|AJ|At;|Al|AH*M|Afl|
U) U l <•/> — —
Cfllt9-> ρ ρ c 1 Ρ Ρ t b Ρ t Ρ g b p c Ρ Ρ Ρ Ρ a b Ρ > ρ
—> IS 43 7 8 5S 33 4 12 14 9 23 3 2 44 5 54 I 24 IS 10 11 55 1 34
A A ii3sa 0 5 3 S )
9 9
J t 1 4 ?
9
}
0 59 ?
9
i
9
4 40 9i 1 J9 i e
9 2
e
9 0
t t
9 10
a
7 9
(
Β A !9S0 0 7 0 7 0 9 I 1 i
Β A 0 7 1 ; 3 S β 5 3 1 5 e Ε 4 3 g > 6 7 7 7 4 7 1 3 8
C«ilu-> c b ρ Ρ 4 ρ c ρ Ρ d » Ρ ρ b Ρc b Ρ d t Ρ Ρ
15ρ 6· 2 9 43 23 S 5« u 54 IE 7 10 14 24 4 3912 9 55 1 34 44
A Β 32ί4 0 0 4 7 4 i 3 5 ! 5 5 i } 5 I t 1 ? ? 7 J ? J
A Β mo 0 9 0 0 9 9 9 9 0 9 D 9 9 5 9 Si 29 9 0 t09 5 5 9 0 9 9
Β Β S474 0 5 1 0 9 7 7 2 4 3 2 S 3 I 4 2 9 7 2 9 7 4 1 0 4 9 t
0 0 0 1 1 1 1 0 1 1 1 0 t I 0 1 0
d p d c p p t p t p p p e t p b i l
.. 24 93 I 3 15 7 2 44 II 4 IS 58 5 55 10 14 54 43 9 I 23 t 12 34
A c 70CS 0 9 3—7 2 i } t a 4 7 6 » I 1 5 a I 4 I a 1 2 8 i t a
Β C 7374 0 4 3 S 2 4 1 ί g 5 3 4 4 2 9 S I I 6 I 5 I 0 9 4
A A 8374 a
A A 03(0 0
7 4 • i
5 3 9
7 !
7 8
6 7 e 9 10 1
ρ
9
ρ ρ .
9 9
4.
4
ρ ρ
41 42
18
0 0
ρ ρ
51 52 5.
4 4
4 0
9 9 ! } i t 7 g 9 i 8 0 7 10
8 A 1350 α 7 α 7 3 8 6 7
9 7 9 2 9 Ρ Ρ Ρ Ρ Ρ Ρ Ρ Ρ
5 6 12 13 21 2 J 1 9 41 42 51 52 5
t5l Β β 3284 0 4 : ; 1 •7 J 7 5 J ) 4 5 7 5 7 5
A β 5474 5 1 9 3 7 1 4 6 0 2 7 7 9 3 4 4 10 4
A Β 9(50 9 0 9 9 9 S 9 9 9 9 9 9 9 9 5 Ο S Ο Ο
t§1 0 1 0 10 1 1 1 0 0 0 α Ο Ο 1 1 1 1 1
e θ sssa
H i t Ρ Ρ Ρ Ρ Ρ Ρ Ρ Ρ Ρ Ρ
κ 1 2 3 6 7 8 4 b 12 13 31 41 5
,ΤΓΙ θ C 7574 . 4 -t ) e 4 i
LUJ
i<\i\>\H\/
A C 7066 Ο
4 3
9 3
0»Uw»hCQ4<i \ S « r l * 4 D 4 l « - - f nUncaa
Ί 11Ί 'JV' 1 2 0 9 9 9 4 0
Note
1. Where resources and personnel are available, scanning services can con
struct simple software routines that will provide output formatted more like the file
in Figure 17. However, the scaiming process (together with errors informants make)
will almost always introduce errors and anomalies that must be corrected by hand
before the data are processed further.
11
Summarizing
the Data
111
112 EXPERIMENTAL SYNTAX
ei b c d pi P2 P3 p4 p5 Λ t t
1 2 3
θ 0 6374 0 7 4 7.33 5.33 5.33 6.00 6.50 6.00 2.00 2.00 4.33
A 0 0360 0 5 3 8.00 8.67 5.00 7.33 7.67 6.50 3.00 0.00 1.33
i9 7? 7β
Β 0 1350 0 7 0 9.00 9.00 8.00 9.00 9.00 4.50 0.00 3.50 0.33 9 9 9
A 0 2280 1 6 6 4.00 1.67 0.67 4.33 9.00 9.00 2.00 0.00 0.67 9 1 2
a s a
10 11 12
A 1 3264 0 4 4.00 2.33 5.00 4.67 4.67 4.00 6.50 6.00 6.00
i
Q
A 1 5474 0 5 1 7.67 6.33 2.33 2.67 7.33 6.00 3.00 2.50 2.00
} 5?
Β 1 9650 0 9 0 9.00 9.00 4.67 9.00 9.00 4.50 5.00 2.50 0.00 79 7
9 9
9
IP A 1 5656 0 0 1
0.00 0.33 too 0.33 0.33 0.00 1.00 1.00 1.00 0
t
7
0
a
8
0
a
9
A 2 7374 0 4 3 7.00 8.67 3.67 7.00 7.67 6.50 4.00 2.00 2.33 4 i i
Β 2 7066 0 9 3 8.67 8.00 6.33 7.33 9.00 5.50 3.00 0.50 1.33 8 9 9
A 2 6964 I 8 5 6.67 8.67 9.00 9.00 7.33 4.50 2.50 0.50 2.33 7 4 9
A 2 0662 1 4 2 1.67 1.00 3.00 2.33 0.67 1.00 1.50 1.00 0.67 4 1 0
mately constant whether the overall quality of the list of fillers within
which the target sentences are embedded is high or low (see Cowart,
1994). One handicap associated with using percentile summaries is
that many linguists are tempted to ascribe more significance to
percentile results than is warranted. Many will expect sentences they
regard as acceptable to get a strong majority of the highest possible
responses, regardless of the content of the filler list.
Ratio scale procedures are not compatible with either simple
averages of raw scores or the percentile method described above.
• • H M Sample Data • H H H H M H H l ^ H H B B B I i
5ί „ ζ 1, 5 s s a 2 2
—> > 100» 3«» 75» 63» 50» 50» 1 3 » 2 S » 13» 63» 75» 75» 55» 77» 66»
b 63» 50» 50» 3 6 » 2 5 » 13» 50» 5 0 » 36» 63» 63» 63»
c 2S9< 25» 50» 63» 75» 36» 63» 63» 66» 75» 86» 88»
d t6!« 29)» 7l!li 66» 71*57» 5 7 » 5 7 »6 6 » 29» 43» S7»
• t « b b b c c C d 0 d Ρ |> Ρ
1 2 3 4 5 6 7 8 9 II 12 11 12 13
Β
A
0
0
•374
0360
0
0
7 4
5 3
i
10
«
6
i
9
Τ
6
i
7
4
6
4
4
7
7
ί
7
V
{
6
i
6 9
1 ί
6
t
10
I
10
β 0 I3S0 0 7 Q 10 10 10 10 10 10 ( 10 9 10 10 10 10 10 10
A 0 22«0 t 6 6 10 2 3 2 2 4 1 2 2 2 10 4 10 10 10
A 1 32M 0 0 4
b
1
b
2
b
3
t
c
4
c
5
c
6
\
d
7
)
d
6
d
9
a
10
a
11
a
12
ρ
11
(
ρ 13ρ
12
3 i 43 t i 4 1 4 5
A
Β
1
1
5474
9650
0
0
5 1
9 0
10
10
4
10
0
10 1
2
6
5
10
7
10
t1
10
3
10
6s
10
8
10
10
10
7
10
6
10
10
10
A 1 5656 0 0 1 1 2 1 2 2 2 2 1 1 1 1 1 2 1 1
A 2 7374 0 4 3
c
1
c
2
c
3
d
4
d
5
d
6
?
7
a
8
a
9
b
10
t>
11
b
12
Ρ
11
ρ
12 ρ
13
i 48 59 i 10 10 10 i 10 10
Β
A
2
2
7066
6964
0
\
9
t
3
S
7
10
7
s
10 10 ί i
9
10
10
10
6
10 6
10
5
10
10
9 9
9 10
9
10 e 10
10
10
10 10
5
A 2 0662 1 4 2 2 4 6 3 5 2 5 2 1 2 2 2 I 3
|W|V|>|W|/ Sld Sumiiwd Oala \ K S«iiiii«< H«t«r1«l» D a t a / ~ GaoW Summm Malarlala Dat|
Sample Data
ΑΙ Β ICTP"
2 Summary Table
Τ
Vera! on
Ύ Token Set b c d
Τ Si 63% 25% 86%
Τ 82 m%
38% 50% 25% 29%
ττ S3
S4
75%
63%
50% 50%
38% 63%
71%
86%
10 35 50% 25% 75% 71%
S6 50% 13% 38% 57%
ΤΓ
12 S7 13% 50% 63% 57%
Τ3 38 25% 50% 63% 57%
Τ* 39 13% 38% 88% 86%
•Ϊ5 310 63% 63% 75% 29%
16 311 75% 63% 88% 43%
ΎΤ 312 75% 63% 88% 57%
78
Τ9
N M I ^ | W | \ X S u m T b l Data/ GeoM||
Figure 23. By-materials d a t a s u m m a r i z e d in percentile t e r m s a n d r e f o r m a t t e d
for statistical a n a l y s i s .
NOTE: The values in the table are the same as those shown in Figure 22.
informants for a given sentence or token set cannot be done via the
methods described above. The standard method in this case is a
geometric mean.*
Notes
1. The terminology for the two kinds of analyses mentioned here varies some
what in the literature. Analyses by informants are often called "by-subjects" analyses.
By-materials analyses are often called "by-sentences" analyses or "by-items" analyses.
2. Spreadsheet columns, rows, cells, and ranges of cells are referred to by way
of the marginal lat)els seen in Figure 21 and elsewhere. Thus "AB:AD" is a reference
to the range that includes all of colimuis AB, AC, and AD. The uppermost left cell in
every sheet, for example, is referred to as "cell Al." A range of cells is referenced by
w a y of the upper left and lower right comers of the range. Thus "S6:V9" is a reference
to the summaries of experimental data for the first four informants in Figure 21.
3. Each of the iUustrations in this chapter will show only that portion of the
relevant spreadsheet necessary to reveal the structure of that particular summary
under discussion at that point. The complete spreadsheets are collected in an Excel
workbook that is available at the Web site for this book, which can be foimd at
HTTP:/ / W W W . U S M . M A I N E . E D U / - L I N
4. See Section C.4 of Appendix C for instructions on calculating geometric
means.
12
Statistical
Issues
119
120 EXPERIMENTAL SYNTAX
data and by-items data) such that an investigator can report a single
Mest or ANOVA result covering both analyses. This practice, how
ever, has met with far less than universal acceptance. When investi
gators do both kinds of tests, they quite commonly report the infor
mants and sentences tests separately. Tests done on summaries by
informant are usually presented with a subscript "1" appended (e.g.,
Fi) and summaries by token set have a subscript "2."
The rationale for doing both kinds of test is that, just as the
iiiformants actually tested in an experiment are (usually) seen as
representatives of the enthre population from which they are selected,
the token sets are likewise seen as representatives of all the relevantly
similar token sets one might construct in the same language (or
perhaps any language). Just as statistical tests on data for informants
test the reliabihty of patterns seen in those results, so tests on data
for token sets test the reUabihty of patterns seen in the summaries of
the token set data. Most often the two kinds of tests produce virtually
identical results, but especially where weak or marginal effects are
involved, it is worthwhile to ask whether an effect is reliable across
both informants and token sets. Many experiments mcorporate one
or more factors that cannot be tested in both analyses.
The analogy between informants and token sets is hardly per
fect, and the analogy between the notion of a population of people
and a population of token sets is shakier still. Thus it is not obvious
that statistical tests that are meant to test generalizations across
populations are being used equally appropriately in these two cases.
Chomsky's (1986) suggestion that languages exist only as patterns of
behavior projected from the internal states of speakers can or\ly raise
further doubts; in Chomsky's view, it appears that there is no popu
lation from which to sample. Nevertheless, Clark's recommendation
that both kinds of test be done is often soimd. There is no doubt that
experimental results on language are sometimes misinterpreted as
applying generally to all similar linguistic materials when all that has
been shown is that the result is reliable only when exactly the same
materials are used.
that need to be kept in view. Fortunately, there are also some addi
tional measures that can be derived from i-tests and analysis of
variance that help to ameliorate some of these limitations.
There is a sense in which significant results from f-tests or
ANOVAs can be highly reliable, but uninformative. It is entirely
possible, and often happens, that an ANOVA will detect a reliable
effect due to some manipulation in an experiment where that ma
nipulation has oruy a very tiny impact on the performance of sub
jects. Manipulations that control only a very small share of the
variance in an experiment, even where they are reliable, are relatively
urm\formative. Most of the variance is being affected by something,
but it isn't the factor we are manipulating.
It is also worth bearing in mind that the standard statistical
sense of "significant" has no necessary coimection to questions about
theoretical importance. Differences may be simultaneously signifi
cant (from a statistical point of view) and boring (from a theoretical
point of view). Whether a statistically significant difference matters
in some larger sense can only be determined by examiiung its rele
vance to alternative theories that bear on the situation in which the
difference arises.
Especially with a particularly stable phenomenon such as sen
tence judgments seem to be, there is a somewhat arbitrary lower limit
on the size of the effects that can be found significant.^ By increasing
sample size, progressively more and more delicate reliable differ
ences can be detected. For example, in our work on "that"-trace, it
is possible to detect a tiny but reliable difference in acceptability
between subject extraction and object extraction when no "that" is
present, provided that (by combining results from two large experi
ments) we attain a sample size of more than 1,100. Although it seems
obvious that this finding is of no theoretical interest, it raises the
question of how experimenters are to estimate which statistically
significant effects are also significant in a more general theoretical
sense.
A statistic that responds to this need is (see Appendix A for
information on calculating η^). This value indicates the share of all
the variation in a data set that is accounted for by a particular factor
in an experimental design. Thus, in an analysis of a "that"-trace
experiment, we can calculate the percentage of all the variation in the
result that is due to the "that" effect, to the interaction, and so on.
Data from analyses of two of the "that"-trace experiments mentioned
124 EXPERIMENTAL SYNTAX
earlier appear in Table 14. Here the second experiment, with roughly
ten times more informants, is obviously much more sensitive. The F
values are about ten times larger than those of the first experiment.
This suggests that if there were any smaller, weaker effects in the
same design, the second experiment would be more likely to detect
significant differences for those effects. Note, however, that the val
ues of are roughly similar for the three factors despite the 1:10
difference in F values.
In fact, there were some weak effects associated with a "groups"
factor in this experiment. This was a between-informants factor in
which individuals were categorized according to which of four ver
sions of the questiormaire they responded to. Although there were
no significant effects associated with this factor in the first experi
ment, there were two reliable interactions with this factor in the
second experiment. However, showed that each of these interac
tions accounted for only about one half of 1% of the variation in the
experiment. Thus, although both were quite reliable, neither is of any
theoretical interest because they accoimt for such a very small share
of the overall variation, compared with the effects listed in Table 14.
Another application of to the same two experiments appears
in Figure 24. Here has been apphed to comparisons between four
particular pairs of means within the larger design. Each of these
individual tests is logically equivalent to a f-test covering only two
means. In these cases, the values indicate how much of the variation
in the informant means underlying the overall means is accounted
for by the contrast between the conditions, as opposed to error
variance within the conditions. Note, as before, that despite the fact
that Experiment 2 employed far more informants, the values are
generally similar, although the effect of having or not having "that"
in the object extraction cases was larger in the first experiment.
statistical Issues 125
<1%
Subject Extraction / Object Extraction /
"That" Absent "That" Absent
<1%
58%
Subject Extraction / Object Extraction /
"That" Present "That" Present
62%
F i g u r e 2 4 . C o m p a r i s o n o f v a r i a t i o n a c c o u n t e d for in four p a i r w i s e c o m p a r i s o n s
in t w o " t h a f ' - t r a c e e x p e r i m e n t s .
N O T E : Ttie four innermost percentage figures show values for four pairwise compariscms from
Experiment 1 in Table 14. T h e italicized outermost values are the corresponding values for
Experiment 2.
Notes
study. Yet an experiment that compared some aspect of performance with nouns and
verbs necessarily would employ only some rather small sampling of actual nouns and
verbs in the relevant language. Clark pointed out that results obtained in this way
could not support generalizations stated in terms of linguistic categories unless
appropriate statistical tests were run on the linguisHc materials themselves, as well as
on the human subjects of the experiment. In the absence of such tests, scrupulous
investigators would confine their claims of generality to people (e.g., "Subjects iden
tify < list of specific words > more rapidly than < list of specific words >"} and avoid
statistically untested linguistic generalizations (e.g., "Subjects identify high frequency
nouns more rapidly than low frequency nouns"). This of course would be unsatisfactory
for most studies bearing on linguisHc materials, so Clark went on to suggest specific
statistical methods that could be used in cases where an investigator wishes to
generalize simultaneously to a human population and to a "population" of linguistic
materials. Clark's point has been widely accepted, although the particular statistical
solutions he advocated have enjoyed less success.
3. In statistics, effect size is not to be confused with the size of the numerical
difference between two means. For a f-test, for example, effect size is a ratio between
the difference between the means and the variability around those means. Large
numerical differences can be associated with high variability and thus small effect size
and numerically small differences between two means can be associated with little
variation and correspondingly large effect size.
Appendix A:
A Reader's Guide
to Statistics
129
130 EXPERIMENTAL SYNTAX
sample from that population that was actually tested in each experi
ment. Thus our goal is typically to detennine whether responses to
Sentence Type A indicate greater or lesser acceptabihty in the relevant
population than do responses to Sentence Type B. Given that experiments
will almost always jaeld at least some small nvunerical difference be
tween any two conditions, the basic problem of inferential statistics is
to provide investigators with some systematic guidance as to when it is
or is not appropriate to draw the inference that there is also a consistent
difference between the measured conditions in the target population.
There are two kinds of error an investigator may make in
decisions of this kind. Type I errors are those where the investigator
credits too small a difference. That is, the investigator detects a
difference between two conditions in an experiment and draws the
inference that there is a rehable difference between the two condi
tions in the target population when in fact there is no such difference.
Type Π errors are those where the investigator fails to credit a
difference detected in an experiment and falsely concludes that there
is no difference in the target population. Unfortimately, the likeli
hood of each of these two kinds of error is often linked. Insurance
against making one kind can sometimes only be purchased by in
creasing the chance of making the other. Increasing sample size can
reduce the probabihty of both kinds of error.
The test statistics used in this book address this dilemma by
trying to estimate the likelihood of the actually obtained data, given
some baseline hypothesis about conditions in the tested population.
Thus, if an investigator is mterested in the relative acceptabihty of
Sentence Type A and Sentence Type B, and if there is actually no
difference between these two in the target population, how likely is it that
the experiment could have produced the differences in average
measvu«d acceptability that actually appeared in the experiment?
This estimate is derived from the variabihty of the obtained sample.
That is, the variability within and between individuals within the
actual sample provides an estimate of the variabihty within the
population as a whole. From this it is possible to calculate how hkely
it is that differences in means of the obtained size could have arisen
if in fact there were no differences in these means in the population.
When, by procedures of this kind, we are able to show that the
obtained differences are very unhkely were there no real differences
in the population, we generally say that the obtained difference is
(statistically) sigruficant, although perhaps it is more helpful to say
Appendix A 133
that we have evidence that the difference between the tested means
is reliable.
A.2.1 t-Tests
Source SS DF MS F
This appendix reports on the statistical tests that were apphed to the
various experiments discussed in Chapter 1. The appendix is organ
ized vmder subheads relating to the sections of Chapter 1. Readers
unfamihar with the statistical terms used in this appendix will find
some notes on the puφose and interpretation of these measures in
Appendix A.
Clark (1973) argues that many experiments on linguistic materials
should be tested for their generahty across both subjects or informants
and hnguistic materials (see Chapter 12 for further discussion). Clark's
recommendation is generally followed here. Statistical tests done on
summaries by informant have a subscript " 1 " appended (e.g., Fj) and
summaries by token set have a subscript "2" appended. Significance
levels are indicated by asterisks:"»" for ρ < .05, "*»" for ρ < .01, and "*»»"
for ρ < .001. Where appropriate, the results reported below wiU be
expressed in terms of standard score (z-score) units.
B.1 Subjacency
137
138 EXPERIMENTAL SYNTAX
B.2 "That"-Trace
items, F(l, 62) = 140***, ? = .70, and for the Low Acceptability items,
F(l,30) = 76.3»»»,r^ = .72.
Notes
1. The original data set for this experiment is no longer available; no analysis
by materials was done.
2. Other criteria can also be used, such as the percentage of responses that fall
in the two highest categories or the percentage in the lowest category.
Appendix C:
Excel as a
Syntactician's
Workbench
141
142 EXPERIMENTAL SYNTAX
=D4&" "&D6
RppendlK C
Σ Ι I E
Fotmuis TakeiiSet
Who dWlhecounseiorwart:
- D 4 & ' ~&0e
10 to hug Chantal?
1^ *iΒ I r
RppendlK C
SI
c
NTSE(I)
I
NTOEd) WTSE(t) Vm)E(1>
I
32 MTSE(2> ΚΠ)Ε(2) WTSE(2> WT0E(2>
53 MTSE<3) MroE<3> WTSE(S> WT0E(3>
54 KTSE(4) MT0E<4) νΠΏΕ(4)
55 KTSE(5) KT0E(5) WTSE(5)
56 NTSE(6> ΝΤ0Ε(6} WTSE(6)
37 IIT3E(7) HrOE<7> WT3E<7) WT0E(7>
3Θ HTSE(8) ΝΤΟΕ(β) WTSE(8) WTOE(e)
10 S9 KTSE(9> MTOE(9> WTSE(9) WT0E(9)
SID NTSEdO) NTOE(tO) WTSE(IO) WTOE(IO)
11 Sti MTSEdt) MTOEdl) WTSEdl) WTOE(tl)
12
13 312 KTSEdZ) MT0Ed2) WTSE(12) WT0E(12)
Μ
MM
RppendiH C
I τ I
31 NTSE(I) NTOEd) V/TSE(1) WT0E<1)
S2 NTSE<2) NT0E<2) WSE(2) νΠΌΕ<2)
33 NTSE(3> NT0E{3) WTSE($> Wn)E(3)
S4
S5
36
KTSE(4)
li#(Bl
NT3E(6)
MT0E(4)
IITOMu)
NT0E(6)
WTSE(4>
N'if8E(B) jϋtiH
/τϋtlέgί(sίj
57 MTSE(7) NT0E(7)
58 Ki^-(n) ΙΙΤΠΓ(8)
to 59 MTSE(9) .ΝΤρ.Ε(9).
tl 5 1 0 NTSE(io) i^£{!tdi;
12 511 M ^ d i ) )im(^i\]
II. 3 1 2 NTSE(12) *tet{;f?)
14
M R • I M l / unrottt^rottHn
F i g u r e 2 9 . " R o t a t i n g " a list o f t o k e n sets.
flppendiK C p i
ΊΣΕ
Prell ml ntrg Scripts
1 2 3 4
31 NTSE(I) NTOE(I) WTSE(1) WTOE(l)
52 NTSE{2) NT0E(2) WTSE(2) WT0E(2>
53 MTSE(3) NT0E(3) WTSE(3) WT0E(3)
T a b l e 1 6 K e y c o n c e p t s related t o s c r i p t s a n d their s t r u c t u r e a n d o r d e r i n g .
are, there should be a repeatmg series of them down the column. The
number of repetitions of this series is equal to the total ntomber of
experimental and filler sentences that wiU appear in each block. The
RAND( ) function can be used to generate the column of random
numbers. Once an appropriate template is constructed for a particu
lar experiment, the template should be saved so that copies of it can
be generated for each new script and ordering needed for the full
experiment. Excel provides a template file type in the File/Save As
dialogue. When a worksheet is saved in this format, each time the
template file is opened. Excel creates a new workbook with a unique
name.
To construct one ordering of one script, the list of experimental
sentences from one preliminary script (see Figure 30) is copied into
the upper part of the sentence column in the template and the list of
fillers (usually constant for all scripts and orderings in one experi
ment) is copied into the lower part of the colunm. At the same time,
the Item IDs should be copied into the Item ID column. Notice that
the Item IDs in the schematic example in Figure 32 indicate that the
sentence is an experimental item (the initial "S"), show the ID number
150 EXPERIMENTAL SYNTAX
HppendJH C
Μ I c I
Β
Block Item
Sentence
Random
Ο
ID ID Number
1 0760010
2 0.158528
3 0.706782
1 0.V72257
2 0.239562
3 0.551109
1 0.206695
2 0.648457
10 3 0.952483
II 1 0.111553
12 2 0.546889
IS 3 0.303434
14 1 0.515593
IS 2 0.561069
16 3 0.133546
1 0.738032
17
18 2 0.139082
0,764607
19 3
1 0.468580
20 0.231173
21 2
3 0.854345
22
1 0.416409
23 0.609614
24 2
3 0.427696
25
26
template
F i g u r e 3 1 . A t e m p l a t e for e x e c u t i n g t h e p r o c e s s d e s c r i b e d in T a b l e 1 2 .
of the token set, and indicate which particular condition this sentence
represents ("NS" for No "that," Subject Extraction, "WO" for With
"that," Object Extraction, and so on). Here the IDs for filler sentences
indicate only that the sentence is a filler and give its number. Some-
times further distinctions among filler types will also need to be
encoded.
Once the preliminary script and filler list are entered in a
template, only three sort operations remain to complete this ordering
of the current script. Each of these sort operations can be completed
very quickly by selecting the relevant range (starting from the col-
unm that holds the data that will control the sort operation) and
clicking on Excel's Sort Ascending button:
Appendix C 151
HppendiH C
A I Β I
Block Item Random
Sentence
ID ID Number
1 S 1 _ N S W h o did N o n t e o . . . 0.055481
2 3 2 _ N 3 Who are thestu... 0.259332
3 S3_J6 W h o did the det... 0.179296
1 S 4 J W 0 W h o does the nu... 0.019341
2 S 5 _ W 0 W h o does she t h . . . 0.813805
3 S 6 _ W 0 W h o had the peo... 0.157172
1 S 7 _ W S W h o might E m i l y . . 0.732924
2 S8_WS Whodldthevcl... 0.801952
10 3 S 9 _ W S W h o did the edi... 0.591400
11 t S 1 0 - N O W h o does Lou ex... 0.114735
12 2 S 1 1 _ M ) W h o did t h e c o u . . . 0.346352
13 S 1 2 _ N 0 W h o does your m... 0.890398
14 FI E x t e r n a l Jacks... 0.078323
15 F2 Matters Improv... 0.916413
16 F3 W h o the presen... 0.999352
17 F4 Ironlcallg, re... 0.2490S8
18 F5 Solution Peter... 0.456042
19 F6 The also ex pan... 0.844186
20 F7 W h o must a l s o . . . 0.301013
21 FB The absolute t... 0.496661
22 F9 W h o does Darle... 0.416967
23 F10 NAFTAvasabo... 0.962412
24 Fll W h o worshipped.. 0.653221
25 F12 W h o f o r a l l et... 0.211123
26
p r o j e c t ! no token sets
all the data (up to 256 characters) from each line in one cell. This
column of cells should be formatted so that the contents are dis
played in Courier or another monospace font (in Excel, use For
mat/Cells/Font). Select the single column range of cells where the
copied data appear and invoke Excel's parsing fimction (Data/Text
to Colvunns). This will launch a "wizard" (a series of dialogue boxes)
that will guide the user through the process of specifying how the
input material is to be parsed. The first dialogue box asks the user to
specify whether the different fields (data items) in each row are to be
distinguished by their linear position (the Fixed Width option) or by
marker characters such as commas or tabs appearing between the
fields (the Delimited option). In Figure 17, the first two fields (col
umns of data) on most lines could be distinguished either way
because spaces follow the single character Type field and the seven-
character ID number. However, there are no marker characters be
tween the 37 single-character response items that appear to the right
of each line (which is typical of many scanning systems). Each
character represents a single informant response. The Fbced Width
option must be used to parse data formatted like this.
The second dialogue box in the Text to Coltmms Wizard allows
the user to explicitly specify where the lines are to be parsed. In the
third dialogue box, the user indicates what format is to be applied to
each data item recovered by way of the parsing operation, and where
the array of cells that results from the parsing operation is to be
placed.
The parsed data should be labeled so that each column in the
data table has a label on the first row (e.g.. Row 2 in Figure 18). Some
of these labels can usually be generated automatically. For example,
if "SI" is typed into a single cell, the cell can be selected and its fill
handle dragged right across the cell's row or down along the cell's
column. In either case, this will yield a series of labels in successive
cells of the form "S2," "S3," and so on. This works when the string in
the initial cell begins with an alphanumeric sequence of any length
(up to the limit of cell capacity) and ends with an integer value having
not more than 10 digits.
Once the input data (from whatever source) has been parsed,
ordered, and labeled, the worksheet containing the data should be
saved and backed up. Only copies of these data should be used in
further analyses. The parsed and labeled data table should be pre
served as is.
Appendix C 155
BB—iHJimimt flppendiH C — — B ^ — ί
ξ1
A! Β I C m
i Ε I F I G IΗ I I IJ I Κ I I I A8 1 AC �ι
Script Line
ID Type
1 0 WO F NS F F F NS NO F
1 1 10 12 1 1 9 4 2 5 5
2 0 F WS WO F F F NO NS NS
2 1 4 4 θ 5 11 3 3 11 10
3 G F NO WD F WS F F WS NS
3 1 6 12 4 S 3 8 3 1 7
4 0 F NO NS F NO F NO F WS
4 1 3 8 6 2 9 6 7 11 10
li
12
IT SJD LT Si S2 S3 S4 S5 S6 S7 S8 S24
14 1 3 4 4 8 8 7 6 2 8 3
15 1 3 1 1 8 0 7 0 1 5 6
16 2 3 3 1 0 1 7 5 8 4 7 D
17 2 3 4 5 8 7 5 3 2 6 7 A
18 3 3 8 8 7 8 8 7 2 2 4 Τ
19 3 3 6 9 7 7 0 3 5 5 3 A
220Γ 4 3 0 2 4 3 6 9 8 8 2
3
4 5 4 5 0 1 4 6 g 1
22 FTFKlteBlP»/ Interleaved ItemlDa
JL
—
Figure 33. Item IDs formatted for use in decoding data.
N O T E : Tlie "Script I D " colunm associates each line with an ordered script in tlus experiment.
Thus the t w o lines in the data range (B14:AB21)with Script IDs of 2 represent data from ir\f ormants
w h o used ordered script 2, and the two lines in the C o d e s range {B3:AB10) with Script lEte of 2
are the Item I D s that identify the type and n u m b e r of each o f the responses from infonnants.
Columns Μ through A A have been hidden.
specifying a Descending sort on this field (this wiU put the fillers,
marked "F," after the experimentals). The No Header Row button
should also be checked. The Sort operation is then executed. This
produces an order in which all items of the same type (i.e., all the
"WO" items, all the "¥" items, and so on) appear together left to right.
In this case (because of the descending sort), all the various categories
of experimental sentences will appear first in alphabetical order by
type ID, then the filler sentences. At this point, the range representing
Ordered Script 1 should be seen as divided into two parts, a section
with data from experimental sentences and a section with data on
Appendix C 157
RppendiH C
Al Β I c Id Ε I F I G i H l I I J l Κ I L I a b T
Script Line
ID Typ« SI 32 S3 S4 S 5 36 37 3 8 324
0 \ΑΌ F NS F F F NS NO F
1 10 12 1 1 9 4 2 5 5
3 4 4 8 8 7 6 2 8 3
3 1 1 8 0 7 0 1 5 6
0 F w s WO F F F NO NS NS
1 4 4 9 5 11 3 3 11 10
to 2 3 3 1 0 1 7 5 8 4 7
tt 2 3 4 5 8 7 5 3 2 6 7
12 3 0 F NO WO F WS F F WS NS
13 3 1 6 12 4 5 3 8 3 1 7
t4 3 3 8 θ 7 8 8 7 2 2 4
ts 3 3 6 9 7 7 0 3 5 5 3
t6 4 0 F NO NS F NO F NO F WS
t? 4 1 3 8 6 2 9 6 7 11 10
te 4 3 0 2 4 3 6 9 8 8 2
t9 4 3 5 4 5 G 1 4 6 9 1
20
iBterleaved i t e t l D « / Sheetj|^|w
filler sentences. Two further sort operations, one for fiUers and one
for experimentals, will achieve the intended result (as m Figure 20).
In each case, the same range is selected as in the previous sort, except
that it is limited to only the filler or experimental sentences. The sort
function is invoked and directed to do the sort using only the Item
Number row. Row 5 in Figure 34. When this operation has been
applied to fillers and experimental sentences, the items will appear
left to right in numerical order within each type. The same sequence
of sorting operations is then applied to the blocks of data for other
ordered scripts.
158 EXPERIMENTAL SYNTAX
IDI I HPPNPH-C.HCl • ^ • • ^ • • ^ • • • H
I 1 c I 0 I Ε I r I G I h I I I J l i ^ l
3 Summaries
4 for
5 Individual
Informants S u m m a r i M for Individual
7 token sets
8
9
4. 4. 4- 4,
To Tia TZa 73e T4a Γ5β Γ6β
11 > yf Tib T2b 73b T4b Γ56 TSb
12
jT Hem T y p e Iderlilier a a a b b b
14 Item Number 1 2 3 4 5 β
15
16 R e s p o n s e D a t a for I n f o r m a n t 1 ffa 11b Β 7 7 β 7 3
jT R e s p o n s e D a t a for I n f o r m a n t 2 12B 12b 9 7 β 5 β 7
18
19
20 Item T y p e Identifier b b b a a
^
21 Item Number 1 2 3 4 5 6
•22 R e s p o n s e D a t a for I n f o r m a n t 3 13a 13b 2 1 4 θ 3 7
R e s p o n s e D a t a for I n f o r m a n t 4 14a 14b 0 5 9 9 9 θ
H
2 4
Means Summaries
=AVERAGE(E16:G16)
Percentage Summaries
=COUNTIF(E16:G16,">=8")/COUNT(E16:G16)
replaces the placeholder Ua, Figure 35 Excel will return a result of .33, the
proportion of responses in the indicated range that are at or above 8.
Standard Scores
=AVERAGE(E16:J16)
=STDEV(E16:J16)
I will assume that these values are entered in columns to the right of
the table of responses in Figure 35 (in Colunms Κ and L). With these
values in place, the standardized means for Type a and b sentences
can be calculated with the following formula:
=STANDARDIZE(AVERAGE(E16:G16),K16,L16)
Geometric Means
=GEOMEAN(E16:E17)
Notes
back out of most sort operations that go wrong, but this is most effective if it is used
immediately after the error is made.
2. The transpose operation can be avoided if desired by applying the same logic
for constructing token sets as described earlier on a vertical rather than horizontal axis.
The column of sentence components and formulas in Figure 25 can instead be laid out
along a single row. Components for new token sets are then constructed in succeeding
rows and the formulas copied downward instead of rightward. This process will save
the transposition step later on but may not be practicable urJess the investigator has
access to a relatively large monitor.
3. The random numbers are generated by the R A N D ( ) fimction, which auto
matically yields a new random number in each cell where it is used each time a sort
is performed (provided Excel is set for automatic recalculation).
4. In Word, the name of the current file is invoked by w a y of the field code
FILENAME.
Appendix D:
Tolten Set Data
From a
"That"-Trace
Experiment
164 EXPERIMENTAL SYNTAX
3
: v :
4
V,
9 ^ 10 12
17
1 1 I f
18 19
> v
a 20
SE O E S E OE SE O E S E O E SE O E S E O E S •E O E S E I OI IE
NT W T NT W T NT W T NT W T
T a b l e 1 7 T o k e n sets.
N O T E : The table lists the 20 token sets used in the "that"-trace experiment described in Figure 36.
Appendix Ε:
Sample
Questionnaire for
Scannable Line
Drawing
168 EXPERIMENTAL SYNTAX
Sdoneo FoamUtion. If you would liko n o n Nona of the information collactad hare will be
iafennatien about th« project, w< will ba aaaodatad with your nama in any way.
worit and tfaa naulti waSra aehiavad ao far. DO NOT WRITE YOUR NAME ON THIS
You may eat in touch with ua at tha addnaa QUESTIONNAIRE OR THE ANSWER SHEET.
ahown balow.
Portland, ME M103
Dana McDuuel
SulgectID
PreHminariee
You ahould have received a ( 2 pencil and a read Ifae inatructioas carefully and do all
green (General Piupoae - NCS) Anawer Sheet talks apedfied. A U , OF THE STEPS OF THIS
with Om queationnwie. Fleaie let the PROCEDURE ARE IMPORTANT. SKIP
experimantar know if you are misaing either of NOTHINOul
theeeitema.
If i t any point you a n nndear about anything
nie following pagee in thia quaationnaire will you a n asked to do, pleaae feel free to ask the
teach yon how to record your reapeoiea in this experimentar for clarification.
experiment ft ia eictremely important thai you
Home Town
All other naponaaa you make in this procedun will be recorded on the green Answer Sheet However, your
response to this item is to be recorded h e n on the qoaationnain.
Question: When did you begin grade sdiool (first grade)? (Please write your responses to this
question in the blanks provided below)
cmr/TOWN
STATE OR COUNTRY.
Appendix Ε 169
o t h e r D e m o g r a j u c Information
Fleue take out your green Answer Sheet now and uae it to raconj the next several responaea. Pleaae use
your #2 pencil for thia and make heavy, clear marka, aa ahown below. Make no stray roarka on the Anawer
Sheet
A B C D E F G H I J A B C D E F G H I J
Make marka like thia 0 0 0 * 0 0 0 0 0 0 ^au. O O O ^ O O O O O O
Pleaae check to aee that the red number atamped in the upper left hand comer of your Anawer Sheet agreea
with the Subject 10 atamped in the box on the tint page of thia queationnaira.
In the block marked "SEX* (juat left of the heavy green line in the middle of your Anawer Sheet) mark the
appropriate category.
There ia another labeled "GRADE or EDUC* just below the box marked * 8 Ε Τ . Pleaae indicate hare how
many yeara of formal aehooliog you have completed. In the V.3., completing high achool connta aa 12, and
completing a bacfaeloi'a degree connta aa 16.
In the block marked "BIRTH DATE' (lower left), fill in the appropriate d r d e for the month of your birth,
then write in the day and year of your Urth and fill ia the appropriate cirdea below for the day and year.
Each of the oolumna A H in the bh>ck labeled 'IDENTinCATION NUMBER' win be naed (br a different
queation. Ilie queationa are liatMl below. Pleaae write in your reaponae in the empty box at the top of each
column, then fill in the appropriate bubble below. Remember Thia information ia not aaaeciated with your
name in thia atudy.
Column A: An y w a native apeaker of American Colainn F: Are any members of your immediate
Engliah? (0)yee (l)no family (biological parenU and aiblinga only)
C<daa>a B: Are you a native speaker of British left handed, ao far aa you know?
Engliah? (0)yee (l)no (0) confident aome are left-handed
Cotuinn C: Are yon a native apeaker of another (l)-not-aare, aome could be left-handed
variety of Engliah (Jamaican, Indian, (2) confident all are right-handed
AaatraUan, etc)? (0)yea (Uno Column G; What ia your academic major.
C o l n n u D: Among the parenta/caregivera of the (0) Engliah (1) a foreign language
family you grew up with, wiiat waa the bigfaeat (2) Unguistica (8) Other
level of education completed by any of thoae Cotumn B : Did you and your family move from
individuals? one city or town to anodier while you were in
(0) high achool or leat grade school (Years 1 4 ) ? (0)yea ( l ) n o
(1) college
(2) maaUra degree If you aaawered Yea to Qusation H, pleaae answer
(3) doctorate or-law or medical degree ()ueation 1 aa well
(4)don'tkBow Column I: If yo ed,how kwgdidyou
Cotnaan E : Do you ever choose to write with your continue adkool in the city/town where you
left band? (Ignore caaet where you are forced began first grade?
by drcunutancee to uae your left hand, as (0) leas than on year
when there ia temporary iqjuiy, a heavy (l)-«ne-to-two-yeara
package in your right hand, e t c ) (2)-three-to-four-yean
(Oinever (1) sometimes (2) always (3)-more-than-four-years
170 EXPERIMENTAL SYNTAX
T h e BBepcnee IVooednre
A B C D E F G H I J
A oioooooooo
A B C D E F G H I J
ooooooooool Β 0 0 0 0 0 * 0 0 0 0
Pnetloe
Aa a warm-up for the main experiment, we'd like you to uae the procedure we've juat introduced to deacrilw
the cirdea bekm. Hie aeoraa you aaaign ahauld refiect the reUtiva site ofeadi drde COHPARBD TO T H E
FIRSTOMB. UaathaVK r S h M t for your naponaea. Find die place for Itam «1 en Side 1 cf the
Anawer Sheet Now HMK at Cinle t l below and dadda what acore (ramamber - think of it aa a line length)
yonwantteuaatorepreaentthaaiiearCbdatl. Fill ia the appropriate fadMe in Item « 1 on dia Anawer
Sheet For Circle t2,aelect a acore (line length) that ahowa how mud) larger t2ia,eompared t o i l .
ConUnae on and enter acores for each of tha remaining eight drcles uaing Items S through 10 on the Anawer
Sheet 171
171
171
171 171
O f P O B X A N n DDDOTBAGIBTLIIAPARTIONFLRFLM aanteoca ao that the aeon ahowa how Bach better or
TNNTFLYAALUNANIDLIMII woiw that aentance ia eomparad to the firat aantence.
«iiepneedfa«pi«BaiidoaavMadllieIVaotiaeitaH π » better a aenttnca aeema, UM b i ^ a r the acore you
•bovai ahouldose. Onea yoa'va givaa a naiMose for a
We need aome infbnnatien from YUA aboot your aeatenee, pleaae don't go back and teoonaider.
evaluation of soma lantancea we're Hated bdow. We lococniaa that the scale ia crada; it win not
We wouU like YOB te bnagine that your JOB ia to nbUa diOimoeae between aentancea.
teadi En^iah to ^eakera OF ethar langiiajgae. JaatdXhabeatyouemi.
For oMh aantance listed below, wa woold Uka yea to Yen dont need to worry about grammar mlaa yon
do the Mowing. Flaaaa n a d the aenlanee, than aak may have atudiadfaiadiooL TaU ua what aeema moat
yooiaalf IF the aentance aeema BngUah'WIADING or nalnial to yon. whafter or not that ia •piopar'lqr ralea
not Suppoaaaaeofyouratndcntawnetonaethia yoo may hava tearaad. Wa a n only intareated in
aentance. U wa ignora ptennadation, would the ymur aenae of what would be appropriate in ordmary
atudentaonndlika a native apeaker? Orwooldthe relaxed converaatian.
I atraage or onnatnnl to a native Alao. pleaae dont atiuggla over individual
r it waa pi«nonnoad7 Tour taak awteneaa. Make tha beat quick Jodgment you can for
ia (0 tan ua how Ei«liah-eounding eadi aentance ia eadi aaotenca and ( 0 on to HM naxt
using a aeale. Tlwre are 92 aentaneea on the qnaationnaire,
Let the l i n t aentence he your refnence. Aaaigna numbered <KTNN*lt to «102. Pleaae be aura to do them
riatetayou.Keepla all.
Pleaae record your responaea c o the Anawer Sheet
BTAIUNG with Item «11.
172 E X P E R I M E N T A L SYNTAX
TfaeSeDtenom
11) This is s psiafiil movis to wstch. 28) Who did the article pioclaim that was
BtUcking the criminal?
12) Isrsol's $13,000 u n u a l par tha capita incoma
paru Wsatam Europa. 29) Who had you draamed that John would marry?
13) We had doubtad that peopla would diive up and SO) Who do you think John likes?
wait at tha border on week niefats.
31) Who, when Lisa contracted cancer found
14) It's also quiU possible to build your own herself terrible in pain, found he waa not of all
shelves bookcase, especially if have got a s u n his of theory?
knowledgeable ftiend nei^bor who's willing the
32) Extamal jacks plug of but a n next at the back
help.
cover plate.
15) Studying a the subject, we realize that Rob
33) During the second, the Black Pottery if
depended on people they being able think like.
Longshan phase, agricoltun became mon.
16) Giro Sport Design Inc., based near SanU Cruz,
34) tba alao expand aignificantly northwestwsrd
California, are making bicycle helmeU bicycle
into, besieging Vienna.
pumps in Mexico and hopes to sell them then soon.
36) Who was the nurse imagining would find her?
17) 58 percent of the businesies supported sn
import tax to make Japaneae producta lass 36) Who must also a n able to trust him, and to
competitive. believea he will then for you?
18) His brother believe they have a n on die brink 37) H i s a b i l i l y t o d i a n h e r v i e w i s a s m a n
of s bnaklhrougfa to the really big time. triumph, but one few people can claim.
19) A most extnotdinaiypsrty took place on the 38) Ironically, reform paved the way for a more
third floor second floor. radical political transformation.
20) Perhaps it took an adventunr, enigmatic and 39) At the beginning of the year the month,
reckleas, without a plan, heedless of risk, a con Lambert wanted only to make money, but at tha
man, to do «4iat he dkL end he wanted only to save his business.
21) l ^ e n has did no plot, no characters to identuy 40) Who does he expect will visit the neigfabon?
at, no hope.
41) Who for all at their resiliency, however, the
22) Women haa been assigned to non-combst seem stuck in a of rat in Stanleyville?
ships and aerved temporary duty aboard combat
42) Who had the lawyer heard that would be
ships in the past
fitting for the defendant?
23) Who the presented a n with harsh, sardonic?
43) Who did the actor pretend the aoldier had
24) Who claimed that the roaulting conflict killed?
between Serbia quidily escalated into World War
44) What hett going?
I?
45) A n then people students like Suzanne in that
26) His h i ^ r the cumber, the faster tha modem
class?
and the leaa it coats to use online servieas that
charge by the hour. 46) Many have long shared a "misconception that
everyone in Mexico is poor,' Grualva said.
26) 'It's just a big problem as see it," Freeman
said. 47) Destruction a n his only.
27) Who have questuned many of the basic 48) Who does the teacher believe that the boy hurt?
principles of Lang's theories?
49) What it take to do the and work is a computer, a
modem, and a the phone line.
Appendix Ε 173
BaiidJ.,&Noma,E(1978).FiOTdiimenifl&qfsca/m^endpsyc^
Bard, E. G., Robertson, D., & Sorace, A. (1996). Magnitude estimation of linguistic
acceptability. Language, 72(1), 32-68.
Bemdt, R. S., Salasoo, Α., Mitchum, C. C , & Blumstein, S. (1988). The role of intonation
cues in aphasic patients' performance of the grammaticality judgment task. Brain
6· Language, 34(1), 65-97.
Bever, T. G. (1974). The ascent of the specious; or. There's a lot we don't know about
mirrors. In D. Cohen (Ed.), Explaining linguistic phenomena (pp. 173-200). Wash
ington, DC: Hemisphere.
Bever, T., & Carroll, J. M. (1981). On some continuous properties in language. In T.
Myers, J. Laver, & J. Anderson (Eds.), The cognitive representation of speech (pp.
225-233). Amsterdam: North-Holland.
Bley-Vroman, R. W., Felix, S. W., & loup, G. L. (1988). The accessibility of universal
grammar in adult language learning. Second Language Research, 4(1), 1-32.
Bloomfield, L. (1930). Linguistics as a science. Studies in Philology. 27.533-557.
Blumstein, S. E., Milberg, W. P., Dworetzky, B., Rosen, Α., & Gershberg, F. (1991).
Syntactic priming effects in aphasia: An investigation of local syntactic depend
encies. Brain end Language, 40,393-421.
Bock, J. K. (1987). Coordinating words and syntax in speech plans. In A. Ellis (Ed.),
Progress in the psychology of language (pp. 337-390). London: Lawrence Eribaum.
Bock, J. K. (1990). Structure in language: Creating form in talk. American Psychologist,
45,1221-1236.
Boring, E. G. (1953). A history of introspection. Psychological Bulletin. 50(3), 169-189.
Bransford, J. D., & Franks, J. J. (1971). The abstraction of linguistic ideas. Cognitive
Psychology, 3,33\-350.
Garden, G. (1970). A note on conflicting idiolects. Linguistic Inquiry, 1(3), 281-290.
Garden, G. (1973). Dialect variation in abstract syntax. In R. W. Shuy (Ed.), Some new
directions in linguistics (pp. 1-34). Washington, DC: Georgetown University Press.
Carroll, J. M., Bever, T. G., & Pollack, C. R. (1981). The non-uniqueness of linguistic
intuitions. Language, 57,368-383.
Chomsky, N. (1957). Syntactic structures. The Hague, the Netherlands; Mouton.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge: ΜΓΓ Press.
Chomsky, N. (1973). Conditions on transformations. In S. Anderson & P. Kiparsky
(Eds.), A festschrift for Morris Halle (pp. 232-286). New York: Holt, Rinehart &
Winston.
175
176 E X P E R I M E N T A L SYNTAX
Baird,J.,74,77(nl)
Galanter, E. H., 7 1 , 7 3
Bard, E. G., 6 9 , 7 3 , 7 4
Gerken, L , 6,29(n6)
Bates, E., 5
Gershberg, F., 5
Bemdt, R. S.,5
Gescheider, G. Α., 71, 7 4 , 7 7 ( n l ) , 77(n2)
Bever,T.G.,5,6,8,29(r\6)
BIey-Vroman,R.W.,56
Bloomfield, L., 1
Henderson, J. M., 5
Bltmistein, S., 5
Heringer, J. T., 4
Bock, J. K . , 5 1
Hill, A. Α., 4 , 3 2
Boring, E. G., 3
Bransford, J. D., 50
loup, G. L., 56
Garden, G., 4 , 5 6
Carroll, J. M., 5 , 6 , 9 , 2 9 ( n 6 ) Katz,J.J.,29(n6)
32,122
Kirk, R. E., 77(n2)
CUfton, C J . , 21
Cohen,J.,83,84(n2)
U b o v , W., 4 , 5
Cowart, W., 6 , 9 , 3 0 ( n l 3 ) , 52
Langendoen, D. T., 8
U s n i k , H., 18
Linebarger, M. C , 5
IJworetzky, B., 5
Lodge, M., 74
ElUs, R., 5
Marks, L. E., 77(nl)
Meijer, G., 4 , 5
Milberg, W. P., 5
Felix, S. W., 56
Mitchum, C. C , 5
Ferreira, F., 5
Fowler, A. E., 5
Franks, J. J . , 50
Nagata,H.,5,6,9,29(n3)
Frazier, L., 21
Newmeyer, F. J . , 5,29(n2), 56
Noma, E., 7 4 , 7 7 ( n l )
179
180 EXPERIMENTAL SYNTAX
Pollack, C . R . , 5
Schwartz, M. R , 5
Sorace, Α., 69
Robertson, D., 69
Stokes, W., 4
Rosen, Α., 5
Ross, J. R . , 4 , 3 0 ( n l 8 ) , 3 2
Wulfeck, B., 5
Saffran, E. M . , 5
Salasoo, Α., 5
Yip, v., 5
Schachter, J., 5
Zribi-Hertz,A.,30(nl8)
Subject Index
Acceptability:
By-subjects summaries. See
absolute, 9
By-informants summaries
continuous data and, 1 8 , 4 4 , 7 0
relative, 22-26
Categorical variables, 44
threshold of, 72
data summaries for, 112-114,115
judgments
statistical tests and, 120-121
133-135,136
See e/so Means
sigiuficance and, 1 2 3 , 1 2 5
Cognitive resources, and judgments,
19-22
performance, 7
96,143-153
Benchmark sentences, 9 2 , 1 1 7
for presentation modes, 64
stability, 19-22,139
spreadsheet, 107-110,112-116
Blocking, 94,98-102
spreadsheet. Excel, 90, %, 109,
in Excel, 146-152
117(n3), 141-161
121-122
word processor, 1 0 6 , 1 1 0 , 1 5 2 - 1 5 3
summaries
Confidentiality, 87
114-116,121-122
Continuity, of acceptability data, 18,44,70
181
182 EXPERIMENTAL SYNTAX
experiments
for summarizing data, 158-161
Coordination experiments:
Web sites for, 117(n3), 142-143
Counterbalancing:
counterbalancing and, 1 3 , 9 3 - 9 4 , 9 5 ,
rules of, 9 3 - 9 4 , 9 5
factorial, 4 4 , 4 7 - 5 1 , 5 2
89,90,91,168-171
Data:
meaningful results achieved by, 40
158-161
scaling issues in, 68-77
Debriefing, of informants, 87
settings and, 75-76,85-88
Defirute NP experiments:
verbs in, 50-51
Demographic data, 9 0 , 1 1 2 , 1 6 8 , 1 6 9
Questionnaires; Sampling
design
35-36,59-60
Discrete data, 18
also Judgments, factors affecting
forms
Filler sentences, 1 3 , 2 4 , 5 1 - 5 2
Errors:
questiormaire construction and,
frequency of, 40
92-93,97-102
modeling of, 40
Excel software, 9 0 , 9 6 , 1 0 9 , 1 4 1 - 1 4 2
school, 5 6 , 5 7
Grammaticality, 5, 7, 8
Order, sentence, 6 , 1 3 , 1 5 - 2 3
144,149
blocking and, 94
Random variation, 6 , 4 0 . See also Error
scripts and, 9 6 , 1 4 8
Ratio data, 69
Ordinal data, 6 8 , 7 2 , 1 2 0
Ratio scales, 6 9 , 9 0
Ordinal scales, 6 8 - 6 9 , 7 0
data summaries for, 1 1 4 , 1 1 5
Pearson's r, 135-136
Recruitment, of informants, 8 6 - 8 7 , 8 8
Pic/«re-NPs, 15-18
as statistical significance, 13,132-133
Pilot testing, 84
error variance and, 32
39,79,131-136
informativeness and, 125
Procedures:
See e/so Reliability
coimterbalancing and, 1 3 , 9 8
Representativeness:
Proportions:
of sentences, 47
Psychophysics, 6
Response forms, scaniuible, 103-109
ρ values, 133-136
r statistic, 135-136
statistic, 136
Questionnaires:
distribution of, 87
grammaticality and, 68
master, 1 0 2 , 1 5 2
Sampling, 3 9 , 4 0
Scale inversion, 92
Scaling:
147-149,152
112-114,115,120-121
115
Binding Theory and, 19-22,139
57
number of responses and, 6-7
Scripts, 9 6 , 9 8 - 1 0 2
on coordination, 19-22,139
in Excel, 146-150,155-157
on subjacency, 15-18,137-138
Sentences:
theory vs. observation of, 11-12
benchmark, 9 2 , 1 1 7
within sentence types, 22-26
filler, 1 3 , 2 4 , 5 1 - 5 2
Standard error of the mean, 131
conshiiction, 92-93,97-102
in Excel, 160-161
139-140
theoretical importance and, 122-125
46, 51-52
Statistical tests, 13,132-136
13,15-23
for category scale data, 120-121
representativeness of, 47
set data, 121-122
47-51
136
15-22
as control of error, 120
Sorting, 107-108
sample size and, 83
in Excel, 150-152,155-157
Subject extraction:
stability, 15-18
factorial designs and, 48
Spreadsheets:
sample size and, 82-83
Excel, 9 0 , 9 6 , 1 0 9 , 1 1 7 ( n 3 ) , 141-161
Subjectivity, ix-x
117(n3)
in Excel, 158-161
86-87
Types I and Π errors, 132
mode, 64(table)
Unacceptability. See Acceptability
138-139
controlling of, 40-53
Threshold, of acceptability, 72
extraneous systematic, 45-51
Timed Usks, 88
partitioning of, 41-44
Token sets:
within group, 4 0 , 4 3 , 4 5
98
Veits, in experiment design, 50-51
139-140
Warm-up. See Response training
in Excel, 143-152
Within group variance, 40, 4 3 , 4 5
number of, 92
Within subject experiment design, 53
92-96,97-102
Word processors:
randomization in, 9 4 , 9 6 , 1 4 7 - 1 4 9 , 1 5 2
for coding/decoding data, 1 0 6 , 1 1 0 ,
summaries
Word software, 152-153
Total variance, 4 1 , 4 3
Worid Wide Web site, 117(n3), 142-143
Training:
Written presentation mode, 63,
of linguists, 59-60,126
f-tests, 84(n2), 1 2 2 , 1 3 3
z-scores, 1 4 , 1 1 4 , 1 3 0 - 1 3 1
187
CPSIAinformationcanbeobtainedatwww.ICGtesting.com I I Ml I II I I I
Printed in the USA III I I I I I Nil I I II
241716LV00004B/13/A 9 780761 900436