Rost Mcmurray 2009 Developmental Science
Rost Mcmurray 2009 Developmental Science
Rost Mcmurray 2009 Developmental Science
PAPER
Blackwell Publishing Ltd
Abstract
Infants in the early stages of word learning have difficulty learning lexical neighbors (i.e. word pairs that differ by a single
phoneme), despite their ability to discriminate the same contrast in a purely auditory task. While prior work has focused on
top-down explanations for this failure (e.g. task demands, lexical competition), none has examined if bottom-up acoustic-
phonetic factors play a role. We hypothesized that lexical neighbor learning could be improved by incorporating greater acoustic
variability in the words being learned, as this may buttress still-developing phonetic categories, and help infants identify the
relevant contrastive dimension. Infants were exposed to pictures accompanied by labels spoken by either a single or multiple
speakers. At test, infants in the single-speaker condition failed to recognize the difference between the two words, while infants
who heard multiple speakers discriminated between them.
Address for correspondence: Gwyneth C. Rost, 309 WJHSC, Department of Communication Sciences and Disorders, University of Iowa, Iowa City,
IA 52242, USA; e-mail: gwyneth-rost@uiowa.edu
© 2009 The Authors. Journal compilation © 2009 Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and
350 Main Street, Malden, MA 02148, USA.
340 Gwyneth C. Rost and Bob McMurray
newly learned words are misproduced (i.e. ball for doll; each inhibits the other, making it difficult to form unam-
vaby for baby; fope for vope) (Ballem & Plunkett, 2005; biguous lexical representations.
Fennell & Werker, 2003; Swingley & Aslin, 2002). Thus, In support of the hypothesis that lexical competition
the failure to acquire minimal pairs in the switch task hinders infants’ learning of lexical neighbors, Swingley and
would not appear to derive from a lack of ability to Aslin (2007) demonstrated that Dutch-speaking children
discriminate the two phonemes or from phonological have difficulty learning [xont], a neighbor of hond (‘dog’)
ability in general. and [dal], a neighbor of bal (‘ball’) but not [biS] or
This has largely led the field to assume that top-down [bεmp], which are neighborless in the children’s lexicons.
factors are responsible for the failure to learn minimal Complementing this, Thiessen (2007) showed that these
pairs. Two such hypotheses have been proposed. First, word competition processes are not always harmful: some
learning may impose attentional or cognitive demands constellations of words can overcome lexical competition.
that prevent the child from accessing all relevant abilities After learning dawgoo and tawgoo, 15-month-olds learning
and knowledge (e.g. Werker et al., 1998). Second, it daw did not dishabituate to taw. However, when the
has been proposed that competition with known words presence of a near neighbor offered different acoustic
(or between pairs of words as they are acquired) can information (daw, dawbow, and tawgoo), infants recognized
account for such failure (e.g. Swingley & Aslin, 2007). that taw was a mispronunciation of daw. In this case, the
Werker and colleagues (Werker et al., 1998; Werker & ability to contrast daw and taw was enhanced only by
Fennell, 2004) have suggested that the demands of word words that shared sounds with the targets but were
learning are too great to permit infants to engage their different from one another. Finally, Werker et al. (2002)
full range of phonetic skills. That is, learning a word report that the size of the lexicon in 17-month-olds is
requires an array of cognitive and perceptual abilities, correlated with their ability to succeed at the switch task,
including attention, segmentation, memory, inductive although whether the lexicon itself drives the switch-task
thinking, and the detection of referential intent; phoneme results or both are mediated by some other cognitive
discrimination is only one of many components of word factors is unknown.
learning. This large set of requisite processes taxes infants’ Both approaches can account for the available data,
limited cognitive and attentive resources, preventing them but neither is complete. It is not clear why phonetic
from taking full advantage of their phonetic abilities. discrimination (one of the earliest skills acquired, and
As these abilities develop, and more general cognitive fundamental to word learning) would not be preserved
capacities expand, lexical neighbors cause less difficulty in the face of capacity limits. Similarly, it is not clear why
for word learners (Werker et al., 2002). early phonological representations would not provide
For children in switch-task experiments, the task itself enough discriminatory power to overcome lexical com-
may add additional demands above those of a discrimination petition. While top-down factors contribute a piece of the
or looking-preference task. The switch task requires that explanation, co-developing perceptual and phonological
the child encode two acoustic forms, two visual forms, abilities may help fill in the gaps (as hypothesized by
and form connections between those representations. This PRIMIR; Werker & Curtin, 2005).
must be robust enough to withstand incorrect labeling: To date, there have been few investigations of the
the switch task does not allow children the luxury of role of bottom-up factors in the acquisition of lexical
having both choices before them and choosing one (as a neighbors. One exception to this is Nazzi (2005), who
looking-preference task would). Rather, they must remember demonstrated that children have more difficulty learning
the correct naming, and determine that the visual referent neighbors that differ by vowel than by initial or medial
does not match the one they had previously seen. In fact, consonant. However, participants were 20 months old,
in a less demanding task, 14-month-olds learn lexical an age at which infants succeed at consonantal pairs in
neighbors (Ballem & Plunkett, 2005; Mani & Plunkett, the switch task (Werker et al., 2002), and Nazzi’s task
2008). Thus, by 14 months, the relevant phonological was substantially different from standard switch designs.
and cognitive processes may have developed sufficiently Thus, while revealing new dimensions of difficulty
to perform discrimination and misperception tasks but (vowels), it cannot explain 14-month-olds’ failure to
may not be sufficient for more difficult tasks. learn minimal pairs differing by consonants. It does
Alternatively, it has been suggested that lexical nonetheless support the idea that perceptual or phono-
inhibition and competition processes (e.g. Luce & Pisoni, logical processes relevant for word learning are still
1998; McClelland & Elman, 1986) might be responsible developing at 20 months.
for children’s failure to learn lexical neighbors. For example, If this is the case, it is possible that the phonetic
when children hear a non-word (e.g. tog) that is similar representations available to 14-month-olds are sufficient
to a known word (dog), the known word is partially for relatively simple tasks tapping discrimination (e.g.
active. Over the course of processing, inhibition causes Werker & Tees, 1984), or misproduction (e.g. Swingley
the known word to suppress activation for competitors, & Aslin, 2002), or more supportive word-learning tasks
making it difficult to represent the non-word. Similarly, (e.g. Ballem & Plunkett, 2005). However, these same phono-
when learning two similar words (e.g. bih and dih), these logical representations may not be sufficient to overcome
partially learned words compete with each other. Ultimately, lexical competition or task demands.
Bottom-up input in early word learning type of representation may be functionally useful for
understanding language, as actual language use requires
In considering the contrast between phonological dis- flexible boundaries. Mispronunciation is normal; infant-
crimination and the acquisition of lexical neighbors, it is directed speech contains production parameters that are
important to consider the nature of phonetic categories. more variable than, and different from, adult-directed
If phonetic categories were represented as boundaries, speech (e.g. Englund, 2005); and social or dialectic factors
studies demonstrating discrimination of phonemes would can create systematic pronunciation variability. It may
provide firm evidence that such categories were well therefore be adaptive for infants to map mispronunciations
developed by 14 months and should support lexical like vaby to targets like baby in the absence of evidence
contrast. However, work on adult speech perception that vaby is being used to refer to a new word – especially
suggests that phonological categories are represented as if infants cannot be confident in their developing phoneme
either graded prototypes (e.g. Miller, 1997, 2001; see also categories. The benefits of such tolerance to mispronun-
Kuhl, 1991; McMurray, Aslin, Tanenhaus, Spivey & Subik, ciation have been demonstrated in adult listeners (McMurray,
2008) or clusters of exemplars (Goldinger, 1998), not as Tanenhaus & Aslin, 2009). Moreover, 14- and 15-month-
boundary-defined categories. Likewise, infants’ represen- olds are willing to identify mismatching words (e.g. vaby/
tations show similar properties of attunement to proto- baby) as available known referents even though they are
type structure in both consonants and vowels (Kuhl, 1991; sensitive to the mismatch (Mani & Plunkett, 2007; Swingley
Miller & Eimas, 1996; McMurray & Aslin, 2005). Thus, & Aslin, 2002).
categories are described by their prototypical (or most fre- Thus, while the demands of word recognition require
quent) values and by the range of acceptable variation. infants to restrict word forms to a small, usable repre-
This representation is well suited to the task of identifying sentation, infants are also under pressure to be flexible in
positive exemplars of a category, a process that would be what counts as a word. Achieving categories that can support
sufficient for simple discrimination in habituation-type this type of flexibility may take considerably longer than
tasks. However, identifying negative exemplars is much one year. In fact, it is known that phonological development
more difficult. Because category membership falls off in continues well into childhood (Edwards, Beckman &
a graded manner as a stimulus departs from the prototype, Munson, 2004; Munson, Swenson & Mathei, 2005).
there is no clear line over which a given token is clearly Moreover, the nature of phonological representations
not a category member, yet the ability to make this judgment seen in childhood would seem to support exactly this
is essential for performance in the switch task and in sort of flexibility. For example, Slawinsky and Fitzgerald
learning tasks like Swingley and Aslin’s (2007) in which (1998) demonstrated that the perceptual boundaries for
infants must notice that a non-word (e.g. tog) is not a approximants (/w/ vs. /r/) sharpen considerably between
member of a known word category (dog). Thus, the phonetic 5 and 9 years of age. Importantly, the shallow slope seen
representations available to listeners may make word in 5-year-olds would permit more ambiguity in what
learning more difficult than discrimination. counts as a category member. Likewise, children are still
The fact that such a representation does not easily learning to use multiple cues and context for single
support minimal-pair learning is consistent with the task contrasts well into childhood (Hicks & Ohde, 2005; Mayo
demands framework of Werker and colleagues (Werker & Turk, 2004; Morrongiello, Robson, Best & Clifton,
et al., 1998; Werker & Curtin, 2005). This is particularly 1984; Nittrouer, 2002; Ohde & Haley, 1997). Finally, the
true given the aforementioned evidence that when the dramatic vocabulary growth of later childhood may have
task is structured in a way that maximizes the effective- a significant effect on the development of phoneme
ness of such categories, sensitivity to phonemic contrast perception, as there is evidence that perceptual processes
is found (Ballem & Plunkett, 2005; Mani & Plunkett, are altered by the structure of the lexicon (McClelland,
2008; see also Mani & Plunkett, 2007; Swingley & Aslin, Mirman & Holt, 2006; Magnuson, McMurray, Tanenhaus
2002). However, this approach differs in a few key ways. & Aslin, 2003; Newman, Sawusch & Luce, 2005). Thus,
First, the task demands do not come from external capacity acquiring a flexible phonemic representation based on
limits, nor do they force infants to ignore phonetic infor- prototypes or exemplars may not be complete by 14
mation. Rather, the limitation arises out of the nature of early months.
phonetic representations, and needs of word learning. Second, As a result of this, an understanding of the mechanisms
it suggests that in addition to manipulating top-down factors by which such categories emerge may yield a way to
(e.g. task) to understand the failure to learn lexical neighbors, augment them and drive word learning during this period.
we may also need to examine the structure, use and acquisition More importantly for the present purposes, it provides a
of phonetic categories themselves. test of the importance of phonological factors in predicting
performance in word-learning tasks.
One important account of the development of speech
Use and acquisition of phonetic categories categories is the distributional learning hypothesis (Maye,
Werker & Gerken, 2002; see also McMurray, Aslin &
While phonetic categories defined by prototypes and variation Toscano, in press; Maye, Weiss & Aslin, 2008). Under
may not be optimal for learning lexical neighbors, this this view, speech cues in the environment form statistical
clusters (e.g. Lisker & Abramson, 1964) and simple training new speech categories via multi-talker input has
statistical learning mechanisms extract these clusters from been shown to improve both discrimination and general-
the underlying categories. The listener develops phoneme ization of novel phoneme categories (Lively, Logan &
categories by calculating the mean and allowable variance Pisoni, 1993; McClelland, Fiez & McCandliss, 2002).
for any given phoneme. If the variance of a phoneme Multi-talker training is therefore a theoretically and
category (i.e. the range of acceptable tokens) is not well ecologically valid way to support auditory encoding of
estimated, the listener might be unsure if two tokens lexical categories.
are members of a single, wide, phoneme category, or are Multiple-exemplar training also increases the top-down
representative of two different phonemes. task demands on the child in at least two overlapping
Alternatively, exemplar views of speech perception ways. First, during habituation and testing, it requires
(e.g. Goldinger, 1998) posit that phonological categories them to normalize quickly for different speakers, pitches,
emerge out of large sets of accumulated exemplars of and speaking rate, something that imposes a significant
individual words. As under statistical accounts, the range delay on normal word recognition (Mullennix, Pisoni &
of exemplars that have been recorded is crucial: only by Martin, 1989; Ryalls & Pisoni, 1997). Second, learning a
gathering a sufficiently broad sample can accurate word in this situation requires them to maintain more
categories emerge (see Pierrehumbert, 2003, for a critical items in memory, and they must be stored in more detail.
review). Thus, under both views of phonological cate- In addition, a manipulation of acoustic variability does
gorization, a single exemplar of each word – or even a not directly manipulate lexical structure or competition
small set of exemplars – may be inadequate to create a (assuming that the variable tokens are not phonetically
sufficiently robust category for the word. ambiguous). Thus, success at a multiple-exemplar
version of the switch task would be difficult to account
for with explanations based on top-down factors such
Multiple exemplars as capacity limits or lexical mechanisms.
Though multiple-exemplar habituation can control for
Current versions of the switch task use either a single top-down factors, it does not rule out their broader role
instantiation of the auditory token or a small range of in learning. In fact, the hypothesis that bottom-up factors
very similar ones (i.e. spoken by the same speaker in the can augment learning in the moment is compatible
same context). Thus, infants are placed in exactly the with both attentional-demands and lexical-competition
situation in which both approaches predict difficulty in hypotheses for failure in previous instantiations of this
forming functional categories. Importantly, if infants experiment. As we have argued, 14-month-olds’ failure
come to the lab with undeveloped or partially developed to learn minimal pairs lies at the nexus of perceptual/
categories, this might present an obstacle to acquiring phonological factors and the top-down demands of word
sufficient information during habituation. Even if phono- learning situations like the switch task.
logical categories are largely developed, a lack of varia- The role of acoustic/phonetic variability was tested
bility may hinder the maintenance of these categories: a in two switch-task experiments. The first used single-
single exemplar repeated over and over would warp the exemplar habituation and test similar to those reported
representativeness of a set of exemplars, and could by Werker and colleagues. The second employed multiple-
have biasing effects on the phoneme being estimated. exemplar habituation and test, in which 54 auditory
Importantly, under both statistical and exemplar models, exemplars from 18 speakers were used to train and test
variability would help infants augment or maintain the words.
their still-developing phonological categories in a way
that assists with making lexical contrast, and would allow
14-month-olds to learn lexical neighbors. Experiment 1
Studies of visual category learning in infancy are
consistent with this. These studies have shown that infants Experiment 1 used the switch task developed by Werker
trained on a single exemplar or a low variability set will et al. (1998; Stager & Werker, 1997), with only four minor
discriminate individual exemplars (i.e. they do not assume changes to the design. First, we added a completely novel
a category), while infants trained on multiple exemplars object to the end of the test sequence. Because the expected
will discriminate tokens that don’t belong to the trained result was a null effect, dishabituation to this object
category (Younger & Cohen, 1986). Additionally, the would provide evidence that infants were learning some-
amount of variability or similarity in the training set affects thing during habitation. Second, single-color real objects
which items infants will assign to the category (Oakes, were used for the novel objects, rather than fabricated
Coppage & Dingel, 1997; Quinn, Eimas & Rosenkrantz, multi-color objects. Third, we used photographs of the
1993). object, rather than moving film, because Fennell and
Multiple-exemplar tasks may also facilitate the acquisition Werker (2003) reported that still photos yielded the same
of auditory word-form categories. Indeed, Singh (2008) results. Fourth, the words buke (phonetically, /buk/) and
demonstrates that variation in affect can help 7.5-month- pook (/puk/) were used rather than Stager and Werker’s
olds segment words from running speech. In adults, (1997) original bih and dih. Because bih and dih violate
Figure 1 Infants were habituated to a pink koosh and a yellow scoop, named either /buk/ or /puk/. At test, they were given one
of the two objects called the correct name (a same trial) or the incorrect name (a switch trial). After both types of trials, they were
also tested on a completely novel object.
the phonology of English (/i/ is not permissible in only at test as a novel control stimulus. Pilot testing of
word-final position), this change was expected to create the photographs had revealed that none of the three
a more natural, easier learning situation, and it has been objects was inherently more preferable than the others to
shown that infants also fail to learn lexical neighbors infants of a similar age. The sound files contained one
with more word-like referents (Pater et al., 2004).1 exemplar each of /buk/ and /puk/ recorded by an adult
female native-English speaker who said the words in an
infant-directed register. Each word was copied and
Method
spliced to itself so that the resulting sound file contained
seven presentations of the same exemplar at 2-second
Participants
intervals. The picture and sound presentations were
Thirty-three monolingual English-learning 14-month-olds synchronized so that each picture appeared for 14
(between 13;9 and 14;29) participated in Experiment 1. seconds, while the word was presented seven times.
The infants were recruited via mail and phone from a
database of county birth records, and were considered
Apparatus
eligible if they were normally developing and without
history of ear infection according to parental report. The testing booth was a curtained-off portion of a laboratory
Data from 17 were excluded because the infants did not room. A comfortable chair faced a flat-screen monitor
complete the experiment due to fussiness (12), because with stereo sound speakers mounted on either side of the
they did not habituate to the training set (three), or monitor. A small infrared camera was situated below the
because they were learning a language other than English screen, which allowed the experimenter, who could neither
(two). Sixteen children, nine boys and seven girls, formed see nor hear the stimulus, to code looking time online.
the resulting experimental set. Interexperimenter reliability was above 90%. The HABIT
computer program (Cohen, Atkinson & Chaput, 2004)
controlled item presentation and data capture.
Stimuli
Experimental stimuli consisted of three digital photo-
Procedure
graphs and two sound files. The photographs were of
single-color novel objects (Figure 1) photographed After informed consent was obtained, infants and
against a black background. Two of the pictures were parents were shown the testing room and apparatus
designated as habituation stimuli, the third was presented where infants were seated on the caregiver’s lap and
testing began. Both the experimenter and the caregiver
1
listened to music over headphones at a level loud enough
While there was some concern that buke may be too similar to the
English word, book, we wanted to restrict the set to Stop-Vowel-Stop to mask the auditory stimuli.
sequences that contrast in voicing, and the English lexicon contains Each habitation and testing trial lasted for 14 seconds
very few such pairs for which both are non-words. and consisted of a single still picture paired with seven
Method
Participants
Recruitment and inclusion criteria were the same as in
Experiment 1. Twenty-two 14-month-olds (13;05–15;0)
participated in Experiment 2. Data from six were excluded
because the children failed to habituate (three), were unable
to complete the experiment due to fussiness (two), or
had a history of recurrent ear infections (one). The
remaining 16 children, ten boys and six girls, formed
the experimental group.
Stimuli
Eighteen adult native-English speakers recorded a series
of /buk/s and /puk/s in an infant-directed register. Three
tokens of /buk/ and /puk/ were spliced from each of the Figure 3 Looking times in Experiment 2. Error bars are
recordings, and the final set of prepared exemplars was Standard Error of the Mean.
then normalized for amplitude.2 This resulted in a set of
54 exemplars of each word. This set included three
tokens of /buk/ and /puk/ from the speaker used in
Experiment 1 (one of the three tokens was the token p = .06), driven entirely by responses to the novel object.
used in Experiment 1). Thus, learning was not affected by infants’ preference
In both the training and test phases, each 14-second for either word or prior experience with book.
trial contained seven different exemplars of the word Planned comparisons on the effect of test condition
from seven different speakers at 2-second intervals, and showed a different pattern from Experiment 1. Infants looked
each child was trained and tested on an individually at the switch trial (M = 6.96 sec, SD = 3.54) significantly
randomized set of tokens. These seven exemplars were longer than the same (M = 4.95 sec, SD = 3.01) trial
pseudo-randomly selected in advance. This was constrained (F(1, 14) = 7.1, p = .018), and this did not interact with
such that no voice or exemplar was repeated during any test word (F < 1). Additionally, infants looked at the
given presentation and such that one exemplar of each novel object trial (M = 8.69 sec, SD = 4.04) significantly
speaker saying each word was held out for test. Each longer than the same and switch trials (F(1, 14) = 8.2,
child therefore heard 36 different exemplars of each p = .013).
word (in all 18 voices) over the course of the habituation, Infants in the multiple-exemplar switch task succeeded
and seven previously unheard exemplars of each word at learning two phonologically similar words well enough
during the testing phase (though the speakers were to notice misnaming at test. They noticed the misnaming
familiar). both in switch trials, where a familiar word was used
with an incorrect (but familiar) object, and in novel trials,
where a familiar word was used for an unfamiliar and
Apparatus and procedure
perceptually dissimilar object. Thus, the multiple-exemplar
The experimental set-up and procedures were identical task presented children with sufficient information to
to Experiment 1. succeed at the switch task. It is therefore possible that
switch-task failure can be attributed in part to acoustic/
phonetic processes sensitive to the structure of the input.
Results and discussion
Because it is likely that children take longer to habituate
Data were collected and analyzed in the same manner as to a set of varied exemplars than to a set of similar ones, one
in Experiment 1, using a mixed-design ANOVA in which immediate question was whether children in Experiment
test condition (same, switch, and novel) was the within- 2 simply got more experience with the word–object
subject factor, and test word (/buk/ or /puk/) was the pairings than children in Experiment 1 due to longer
between-subjects factor. Results are shown in Figure 3. habituation times. This was not the case, however, as there
There was a main effect of test trial (F(2, 28) = 7.8, p = .002). was no significant difference between infants in the two
Again, there was no main effect of test word (F(1, 14) = 1.35, experiments in number of habituation trials (Experiment
p = .26). There was a marginal interaction between testing 1 = 18.4 habituation trials, Experiment 2 = 18.3; T(15) =
condition and looking response time (F(2, 28) = 3.19, .20, p = .93), nor in total looking time (Experiment 1 =
160.6 sec, Experiment 2 = 152.6; T(15) = .46, p = .66).
2
The complete set of stimuli is available on the Developmental Science Thus, it was not the amount of exposure, but the type of
online archive. exposure, that helped children succeed at the switch task.
A second question was whether the differences seen exposure to multiple exemplars allowed infants to but-
across the two experiments arose because of acoustic tress their representation of the input to make a contrast
differences in the stimuli. Acoustically, pairs like /buk/ between /buk/ and /puk/. Whether this represents a
and /puk/ differ primarily in voice onset time (VOT), the long-term strengthening of these phonetic categories or
time difference between the release of the lips and the a short-term learning phenomenon is unclear. Moreover,
onset of laryngeal vibration. If the difference between it is not clear whether the variability during habituation
the VOTs of the two targets were greater in Experiment or at test was more relevant. However, it importantly reveals
2 than in Experiment 1, the learning effects could have that the phonological categories that underpin early word
arisen from the fact that the two words were, on average, learning may not be fully formed, and that variability
more acoustically discriminable in this experiment. Again, may play a critical role in the learning mechanisms that
this was not the case. The /buk/ used in Experiment 1 augment them.
had a VOT of 9 msec, and the average VOT of /buk/ tokens There are at least two kinds of relevant variability –
in Experiment 2 was 11 msec (SD = 3). The /puk/ from and hence two kinds of learning mechanisms – that may
Experiment 1 had a VOT of 79 msec, while the tokens in be important. First, variability along specifically phonetic
Experiment 2 had average VOT of 80 msec (SD = 26).3 The dimensions (in this case, VOT) may have allowed the
lack of difference in VOT between exemplars of the two infants to define the phonetic or lexical categories that
experiments suggests that it was precisely the variability contrasted the words. This would require learning
in the multiple-exemplar experiment that helped children mechanisms of the sort demonstrated by Maye et al.
succeed at this task. (2002; see also Maye et al., 2008). This approach posits
that infants track the frequencies of specific phonetic
cues (e.g. VOT) and extract categories from the natural
General discussion clusters (perhaps interacting with other mechanisms like
competition: McMurray, Aslin & Toscano, in press).
The variability in the multiple-exemplar version of the switch Accordingly, the variation within the relevant acoustic
task improved infants’ performance by providing them category found across the multiple exemplars in Experi-
with a richly defined category for the words. Though ment 2 is what is crucial for defining the category in this
prior phonetic experience undoubtedly contributed to their task. In fact, measurements of VOT reveal considerable
performance, the failure in Experiment 1 suggests that variation along this dimension for both the voiced (M =
it was insufficient for the task. However, Experiment 2 11 msec; SD = 3; Range = 5–20,) and voiceless (M =
provided the infants with a set of input that allowed 80 msec, SD = 26; range = 39–141) categories. Thus, this
them either to estimate the variability of the categories exposure offered by Experiment 2 may fit the bimodal
or to maintain or augment their existing estimates. distribution of VOTs required by this sort of learning
Given this, the ability to make lexically relevant phono- mechanism.
logical contrast emerged in the moment, allowing them Second, it is possible that variability in non-phonetic
to succeed at the task. information helped infants extract the relatively invariant
However, this is not to argue that top-down factors phonetic dimensions. That is, infants at 14 months may
are irrelevant. Given evidence for online integration of still be unsure about what dimensions are relevant for
lexical information and phonetic percepts (McClelland the task, and variability in irrelevant aspects of the stimuli
et al., 2006; Magnuson et al., 2003), phonetic boundaries, improve performance by focusing attention on those
while not represented explicitly by the system, are best aspects of the input that are comparatively stable. Such
seen as an emergent property of bottom-up, graded a mechanism would be analogous to learning processes
phonetic categories, and top-down influence from the posited by Gómez (2002). She demonstrated that when
lexicon. Facing those difficulties imposed by their sparse learning sequences of syllables in which nonadjacent
lexicons and/or other more general cognitive limitations, syllables were predictable, the variability in the irrelevant,
14-month-olds in the switch task may have to rely heavily intervening stimuli was a crucial determinant of learning:
on bottom-up perceptual processes to determine that a when the set size of possible intervening elements was
given auditory token is not a member of a lexical category. large, infants learned the non-adjacent dependencies,
The results of the first experiment (as in prior switch-task while a small set-size led to failure. Thus, the relatively
studies) would seem to imply that these categories were stable elements of a stimulus set become increasingly
insufficient to overcome these difficulties. However, brief easier to extract as variability increases (see also Yu &
Smith, 2007, for an example in word learning). In fact,
3 Eighty msec seems like a relatively long VOT for a /p/ given Lisker measurements of pitch and the first four formants
and Abramson’s (1964) classic study (which found a mean VOT for (measurements of vowel quality) of Experiment 2 stimuli
voiceless stops in English of 53 msec). However, Allen and Miller were all highly variable (see Table 1). Most importantly,
(1999) found that in slow rates of speech, voiceless labials averaged 64
none of these cues differed significantly between /buk/
ms, and stops as a whole averaged 78 ms. Moreover, the small literature
in infant-directed speech (e.g. Englund, 2005), along with data in preparation and /puk/, suggesting that they would not be available to
from our lab, indicates that VOTs are significantly increased in infant- directly contrast the words. Nonetheless, the immense
directed speech Thus, this is not an unexpected VOT. amount of irrelevant variation present would provide the
Table 1 Measurements of pitch and formant frequency at incompletely developed phonetic categories, statistically
mid-vowel for tokens in Experiment 2. Pitch and all formant impoverished input, and the unique demands of the switch
values were measured at the vowel centroid. Measurements task. When the bottom-up input is manipulated in a way
for /buk/ and /puk/ were compared with a paired T-test with that is sensitive to the mechanisms used to extract phonetic
53 degrees of freedom categories, infants can learn lexical neighbors.
Significantly
contrasts /buk/
/buk/ /puk/ and /puk/? Acknowledgements
Measurement M SD M SD p
We would like to thank Brandon Abbs, Tracy Ball, Allison
VOT (msec) 11 3 80 26 <.01 Bean, Katie Bresson, Angelo LaRocca, John Lipinski,
Pitch (Hz) 255 92 257 100 >.1
F1 (Hz) 379 74 389 51 >.1 Dan McEchron, Cheyenne Munson, Amanda Murphy,
F2 (Hz) 1310 234 1277 199 >.1 Amanda Nematbakshk, Brooke Overgard, Sammy Perone,
F3 (Hz) 2676 304 2705 273 >.1 Molly Robinson, Scott Spilger, Joe Toscano, Beth Walker,
F4 (Hz) 3753 411 3780 426 >.1
and Jed White for recording /buk/ and /puk/s for us. We
are also indebted to Kristine Kovack-Lesh and Sammy
Perone for assistance with HABIT, and particularly thank
Sammy for suggesting that we include the novel object
necessary fodder for the sort of learning mechanism trial. We thank Janet Werker and Chris Fennell for helpful
that uses non-criterial variation to extract the invariant comments during the development of this project; Karla
elements from a noisy signal. McGregor for her comments on an early draft; and two
Both sources of variability – the criterial voice onset anonymous reviewers for their thoughtful and careful
time, and the non-criterial indexical and prosodic comments. Finally, we thank the many families who have
information – were available in the stimulus set of so generously donated their time to our efforts.
Experiment 2. The current data offer no insight into
which of these two coexisting sources of variability
provided infants with the necessary information to learn
two lexical neighbors, or if infants harnessed both. It References
will be important for future work to control each type of
Allen, J.S., & Miller, J.L. (1999). Effects of syllable-initial voicing
variability more precisely (e.g. hold VOT constant, and
and speaking rate on the temporal characteristics of mono-
vary only speaker, or use a single speaker and vary VOT)
syllabic words. Journal of the Acoustical Society of America,
to determine which type of variability predicts word 106, 2031–2039.
learning. Ballem, K.D., & Plunkett, K. (2005). Phonological specificity
These results do not argue that variability is a good in children at 1;2. Journal of Child Language, 32, 159–173.
thing in general. It is more important that the input Best, C.T., McRoberts, G.W., Lafleur, R., & Silver-Isenstadt, J.
contain the appropriate statistical structure for a given (1995). Divergent developmental patterns for infants’
learning mechanism, and that variability along some perception of two nonnative consonant contrasts. Infant
dimensions (criterial or non-criterial) will typically be Behavior and Development, 18 (3), 339–350.
part of this. This may point toward an explanation for Best, C.T., McRoberts, G.W., & Sithole, N.M. (1988). Examination
Nazzi’s (2005) demonstration of difficulty learning minimal of perceptual reorganization for nonnative speech contrasts:
Zulu click discrimination by English speaking adults and
pairs that differ by vowel. Vowels in general have more
infants. Journal of Experimental Psychology: Human Perception
overlapping distributions along criterial dimensions, but
and Performance, 14 (3), 345–360.
also (unlike consonants) these cues are not necessarily Casasola, M., & Cohen, L.B. (2000). Infants’ acquisition of
independent of non-criterial cues like pitch, duration, linguistic labels with causal actions. Developmental Psychology,
and timbre. Children’s behavior in tasks like these will 36, 155–168.
thus be an emergent property of the statistics of prior Cohen, L.B., Atkinson, D.J., & Chaput, H.H. (2004). Habit X:
history, the statistics of immediate learning, the structure A new program for obtaining and organizing data in infant
of the lexicon (which may support some contrasts over perception and cognition studies (Version 1.0). Austin, TX:
others) and the demands of the task. University of Texas.
Nonetheless, this work makes it clear that children do Dale, P., & Fenson, L. (1996). Lexical development norms for
not fail at the switch task because they have a limited young children. Behavior Research Methods, Instruments, &
Computers, 28, 125–127.
capacity for acoustic detail. Capacity limits or task demands
Edwards, J., Beckman, M.E., & Munson, B. (2004). The
do not provide a compelling explanation for these results
interaction between vocabulary size and phonotactic
because (a) we did not manipulate task-specific aspects, and probability effects on children’s production accuracy and
(b) our purely acoustic/phonetic stimulus manipulations fluency in nonword repetition. Journal of Speech, Language
actually add necessary processes to the task (speaker and Hearing Research, 47 (2), 421–436.
normalization, memory). Rather, these results suggest Englund, K.T. (2005). Voice onset time in infant directed speech
that the failure may arise from an interaction between over the first six months. First Language, 25 (2), 219–234.
Fennell, C.T., & Werker, J.F. (2003). Early word learners’ Mani, N., & Plunkett, K. (2008). Fourteen-month-olds pay
ability to access phonetic detail in well-known words. attention to vowels in novel words. Developmental Science,
Language and Speech, 46 (2–3), 245 –264. 11 (1), 53–59.
Goldinger, S.D. (1998). Echoes of echoes? An episodic theory Mattys, S., Jusczyk, P., Luce, P.A., & Morgan, J. (1999). Phonotactic
of lexical access. Psychological Review, 105 (2), 251–279. and prosodic effects on word segmentation in infants.
Gómez, R.L. (2002). Variability and detection of invariant Cognitive Psychology, 38, 465–494.
structure. Psychological Science, 13 (5), 431– 436. Maye, J., Werker, J.F., & Gerken, L. (2002). Infant sensitivity to
Hicks, C.B., & Ohde, R.N. (2005). Developmental role of distributional information can affect phonetic discrimination.
static, dynamic, and contextual cues in speech perception. Cognition, 82 (3), B101–B111.
Journal of Speech, Language and Hearing Research, 48 (4), Maye, J., Weiss, D.J., & Aslin, R.N. (2008). Statistical phonetic
960–974. learning in infants: facilitation and feature generalization.
Jusczyk, P.W., Hohne, E.A., & Baumann, A. (1999). Infants’ Developmental Science, 11 (1), 122–134.
sensitivity to allophonic cues for word segmentation. Mayo, C., & Turk, A. (2004). Adult–child differences in acoustic
Perception and Psychophysics, 61, 1465 –1476. cue weighting are influenced by segmental context: children
Jusczyk, P.W., & Aslin, R.N. (1995). Infants’ detection of the are not always perceptually biased toward transitions. The
sound patterns of words in fluent speech. Cognitive Psychology, Journal of the Acoustical Society of America, 115 (6), 3184–
29 (1), 1–23. 3194.
Kuhl, P.K. (1991). Human adults and human infants show a Miller, J.L. (1997). Internal structure of phonetic categories.
‘perceptual magnet effect’ for the prototypes of speech Language and Cognitive Processes, 12, 865–869.
categories, monkeys do not. Perception and Psychophysics, Miller, J.L. (2001). Mapping from acoustic signal to phonetic
50 (2), 93–107. category: internal structure, context effects and speeded categori-
Lisker, L., & Abramson, A.S. (1964). A cross-language study zation. Language and Cognitive Processes, 16, 683–690.
of voicing in initial stops. Word, 20, 384 – 422. Miller, J.L., & Eimas, P.D. (1996). Internal structure of voicing
Lively, S.E., Logan, J.S., & Pisoni, D.B. (1993). Training categories in early infancy. Perception Psychophysics, 58 (8),
Japanese listeners to identify English /r/ and /l/ II: the role of 1157–1167.
phonetic environment and talker variability in learning new Mills, D.L., Prat, C., Zangl, R., Stager, C.L., Neville, H.J., &
perceptual categories. Journal of the Acoustical Society of Werker, J.F. (2004). Language experience and the organization
America, 94 (3), 1242–1255. of brain activity to phonetically similar words: ERP evidence
Luce, P.A., & Pisoni, D.B. (1998). Recognizing spoken words: the from 14- and 20-month-olds. Journal of Cognitive Neuroscience,
neighborhood activation model. Ear & Hearing, 19 (1), 1–36. 16 (8), 1452–1464.
McClelland, J.L., & Elman, J.L. (1986). The TRACE model of Morrongiello, B.A., Robson, R.C., Best, C.T., & Clifton, R.K.
speech perception. Cognitive Psychology, 18 (1), 1–86. (1984). Trading relations in the perception of speech by 5-
McClelland, J.L., Fiez, J.A., & McCandliss, B.D. (2002). year-old children. Journal of Experimental Child Psychology,
Teaching the /r/–/l/ discrimination to Japanese adults: 37 (2), 231–250.
behavioral and neural aspects. Physiology and Behavior, 77 Mullennix, J.W., Pisoni, D.B., & Martin, C.S. (1989). Some
(4–5), 657– 662. effects of talker variability on spoken word recognition.
McClelland, J.L., Mirman, D., & Holt, L.L. (2006). Are there Journal of the Acoustical Society of America, 85 (1), 365–378.
interactive processes in speech perception? Trends in Cognitive Munson, B., Swenson, C.L., & Manthei, S.L. (2005). Lexical
Sciences, 10 (8), 363–369. and phonological organization in children: evidence from
McMurray, B., & Aslin, R.N. (2005). Infants are sensitive to repetition tasks. Journal of Speech, Language and Hearing
within-category variation in speech perception. Cognition, 95 Research, 48, 108–124.
(2), B15 –B26. Nazzi, T. (2005). Use of phonetic specificity during the acquisition
McMurray, B., Aslin, R., Tanenhaus, M., Spivey, M., & of new words: differences between consonants and vowels.
Subik, D. (2008). Gradient sensitivity to within-category Cognition, 80 (1), B11–B20.
variation in speech: implications for categorical perception. Newman, R.S., Sawusch, J.R., & Luce, P.A. (2005). Do postonset
Journal of Experimental Psychology, Human Perception and segments define a lexical neighborhood? Memory and
Performance, 34 (6), 1609–1631. Cognition, 33 (6), 941–960.
McMurray, B., Aslin, R.N., & Toscano, J. (in press). Statistical Nittrouer, S. (2002). Learning to perceive speech: how fricative
learning of phonetic categories: computational insights and perception changes, and how it stays the same. The Journal
limitations. Developmental Science. of the Acoustical Society of America, 112 (2), 711–719.
McMurray, B., Tanenhaus, M.K., & Aslin, R.N. (2002). Gra- Oakes, L.M., Coppage, D.J., & Dingel, A. (1997). By land or
dient effects of within-category phonetic variation on lexical by sea: the role of perceptual similarity in infants’ categor-
access. Cognition, 86 (2), B33–B42. ization of animals. Developmental Psychology, 33 (3), 396–
McMurray, B., Tanenhaus, M., & Aslin, R. (2009). Within- 407.
category VOT affects recovery from “lexical” garden paths: Ohde, R.N., & Haley, K.L. (1997). Stop-consonant and vowel
Evidence against phoneme-level inhibition. Journal of Mem- perception in 3- and 4-year-old children. The Journal of the
ory and Language, 60 (1), 65–91. Acoustical Society of America, 102 (6), 3711–3722.
Magnuson, J.S., McMurray, B., Tanenhaus, M.K., & Aslin, R.N. Pater, J., Stager, C., & Werker, J.F. (2004). The perceptual
(2003). Lexical effects on compensation for coarticulation: the acquisition of phonological contrasts. Language, 80 (3),
ghost of Christmast past. Cognitive Science, 27 (2), 285–298. 384–402.
Mani, N., & Plunkett, K. (2007). Phonological specificity of Pegg, J.E., & Werker, J.F. (1997). Adult and infant perception
vowels and consonants in early lexical representations. of two English phonemes. Journal of the Acoustical Society
Journal of Memory and Language, 57 (2), 252–272. of America, 102 (6), 3742–3753.
Pierrehumbert, J.B. (2003). Phonetic diversity, statistical learning, Werker, J.F., Cohen, L.B., Lloyd, V.L., Casasola, M., & Stager, C.L.
and the acquisition of phonology. Language and Speech, 43 (1998). Acquisition of word–object associations by 14-month-
(2–3), 115 –154. old infants. Developmental Psychology, 34 (6), 1289–1309.
Quinn, P.C., Eimas, P.D., & Rosenkrantz, S.L. (1993). Evidence Werker, J.F., & Curtin, S. (2005). PRIMIR: a developmental
for representations of perceptually similar natural categories framework of infant speech processing. Language Learning
by 3-month-old and 4-month-old infants. Perception, 22 (4), and Development, 1 (2), 197–234.
463– 475. Werker, J.F., & Fennell, C.T. (2004). Listening to sounds versus
Ryalls, B.O., & Pisoni, D.B. (1997). The effect of talker variability listening to words: early steps in word learning. In D.G. Hall
on word recognition in preschool children. Developmental & S.R. Waxman (Eds.), Weaving a lexicon (pp. 79–109).
Psychology, 33 (3), 441– 452. Cambridge, MA: MIT Press.
Singh, L. (2008). Influences of high and low variability on Werker, J.F., Fennell, C.T., Corcoran, K.M., & Stager, C.L.
infant word recognition. Cognition, 106 (2), 833–870. (2002). Infants’ ability to learn phonetically similar words:
Slawinski, E.B., & Fitzgerald, L.K. (1998). Perceptual develop- effects of age and vocabulary size. Infancy, 3 (1), 1–30.
ment of the categorization of the /r–w/ contrast in normal Werker, J.F., & Tees, R.C. (1984). Cross-language speech
children. Journal of Phonetics, 26, 27–43. perception: evidence for perceptual reorganization during
Stager, C.L., & Werker, J.F. (1997). Infants listen for more the first year of life. Infant Behavior and Development, 7, 49–
phonetic detail in speech perception than in word-learning 63.
tasks. Nature, 388 (6640), 381–382. Younger, B.A., & Cohen, L.B. (1986). Developmental change
Swingley, D., & Aslin, R.N. (2002). Lexical neighborhoods in infants’ perception of correlations among attributes. Child
and the word-form representations of 14-month-olds. Development, 57 (3), 803–815.
Psychological Science, 13 (5), 480 – 484. Yu, C., & Smith, L.B. (2007). Rapid word learning under
Swingley, D., & Aslin, R.N. (2007). Lexical competition in young uncertainty via cross-situational statistics. Psychological
children’s word learning. Cognitive Psychology, 54 (2), 99–132. Science, 18 (5), 414–420.
Thiessen, E.D. (2007). The effect of distributional information
on children’s use of phonemic contrasts. Journal of Memory Received: 12 July 2007
and Language, 56 (1), 16–34. Accepted: 30 March 2008