Understanding the role of prosody in encoding linguistic meaning and in shaping phonetic form req... more Understanding the role of prosody in encoding linguistic meaning and in shaping phonetic form requires the analysis of prosodically annotated speech drawn from a wide variety of speech materials. Yet obtaining accurate and reliable prosodic annotations for even small datasets is challenging due to the time and expertise required. We discuss several factors that make prosodic annotation difficult and impact its reliability, all of which relate to variability: in the patterning of prosodic elements (features and structures) as they relate to the linguistic and discourse context, in the acoustic cues for those prosodic elements, and in the parameter values of the cues. We propose two novel methods for prosodic transcription that capture variability as a source of information relevant to the linguistic analysis of prosody. The first is Rapid Prosody Transcription (RPT), which can be performed by non-experts using a simple set of unary labels to mark prominence and boundaries based on immediate auditory impression. Inter-transcriber variability is used to calculate continuous-valued prosody 'scores' that are assigned to each word and represent the perceptual salience of its prosodic features or structure. RPT can be used to model the relative influence of top-down factors and acoustic cues in prosody perception, and to model prosodic variation across many dimensions, including language variety, speech style, or speaker's affect. The second proposed method is the identification of individual cues to the contrastive prosodic elements of an utterance. Cue specification provides a link between the contrastive symbolic categories of prosodic structures and the continuous-valued parameters in the acoustic signal, and offers a framework for investigating how factors related to the grammatical and situational context influence the phonetic form of spoken words and phrases. While cue specification as a transcription tool has not yet been explored as RPT has, it has the potential to provide a level of detail that will be useful in modelling systematic context-governed variation in the implementation of prosodic categories, with applications in automatic speech synthesis and recognition, as well as modelling human speech production and perception. We discuss how RPT and cue specification, particularly when combined, can improve the efficiency and reliability of prosodic transcription and how they can be integrated with expert phonological transcription.
An algorithm for detecting sudden jumps in measured F0, which are likely to be inaccurate measure... more An algorithm for detecting sudden jumps in measured F0, which are likely to be inaccurate measures, is introduced. The method computes sample-to-sample differences in F0 and, based on a user-defined threshold, determines whether a difference is larger than naturally produced F0 velocities, thus, flagging it as an error. Various parameter settings are evaluated on a corpus of 30 American English speakers producing different intonational patterns, for which F0 tracking errors were manually checked. The paper concludes in recommending settings for the algorithm and ways in which it can be used to facilitate analyses of F0 in speech research.
Proceedings of International Conferences of Experimental Linguistics
We report six experiments on learnability of four non-adjacent phonotactic constraints which diff... more We report six experiments on learnability of four non-adjacent phonotactic constraints which differ in their attested frequency and phonetic conditioning factors; liquid harmony, liquid disharmony, backness harmony, and backness disharmony. Our results suggest that such phonotactic constraints can be implicitly learned from brief experience and that learnability of a phonological grammar may be independent of its attested frequency and phonetic basis.
This paper examines the usefulness of including prosodic and phonetic context information in the ... more This paper examines the usefulness of including prosodic and phonetic context information in the phoneme model of a speech recognizer. This is done by creating a series of prosodic and phonetic models and then comparing the mutual information between the observations and each possible context variable. Prosodic variables show improvement less often than phone context variables, however, prosodic variables generally show a larger increase in mutual information. A recognizer with allophones defined using the maximum mutual information prosodic and phonetic variables outperforms a recognizer with allophones defined exclusively using phonetic variables.
s Speakers produced errors on vowels less often than on consonants, and on nuclei less often tha... more s Speakers produced errors on vowels less often than on consonants, and on nuclei less often than on codas. s Rates of errors on CV and VC pairs were above chance s Errors on Vs most often occur with errors on at least one contiguous C, but not vice versa. s The Articulatory Phonology model of syllable structure, with the additional feature of sequential activation, would best predict these observed asymmetries.
Listeners can rapidly integrate intonational information to infer a speaker’s intended meaning. B... more Listeners can rapidly integrate intonational information to infer a speaker’s intended meaning. But not all components of an intonation contour contribute to meaning equally well. Prenuclear pitch accents, tonal events preceding the nuclear pitch accent in an utterance, have been described as not reliably mapping onto discourse meaning. We use mouse tracking to investigate whether German listeners can use prenuclear pitch accents to predict upcoming referential information in the utterance. Our results are compatible with the assumption that listeners ignore prenuclear accents when predicting speakers’ intentions. All materials, data, and scripts can be retrieved here: https://osf.io/xf8be/.
We investigate the prominence of English words with stress reversal (e.g. èlevátion 2-1 → élevàti... more We investigate the prominence of English words with stress reversal (e.g. èlevátion 2-1 → élevàtion 1-2). We ask what motivates the occurrence of the “early high” (1-2) pattern outside of stress clash contexts, and consider the hypothesis that it marks prominence non-locally. Experiment 1 tests the effect of prominence pattern on memory. Given its markedness and location at phrasal onset, we hypothesize that early high pitch broadly facilitates recall for sentence information. This hypothesis is not confirmed, suggesting that the effect of pitch accent on memory may be restricted to the accented word. In Experiment 2 listeners perform a prominence-rating task on the same patterns. Results show that early high is prominence-lending, but with weaker prominence than the lexical (2-1) stress pattern. The combined findings suggest a hybrid function for early high in marking the beginning of a discourse-level prosodic unit, and in lending prominence to the early high-accented word.
Information structure is said to play an important role in determining phrasal prominence and the... more Information structure is said to play an important role in determining phrasal prominence and the assignment of nuclear pitch accents in English. Early accounts claim that discourse-new or focused words receive a prominence-lending high/rising pitch accent, while given words are unaccented, with reduced prominence. Empirical findings are varied, but paint a more complex picture of the prosodic encoding of information structure. The present study investigated the phonological and phonetic encoding of information status and contrastive focus in nuclear position in American English, from speech read under neutral and lively affect. Given information was associated with decreased phonological and phonetic prominence, contrastive information with enhanced prominence, while new information corresponded to increased phonological, but not phonetic prominence, as assessed in pitch accent type, duration, intensity, and voice quality. The findings indicate a probabilistic relationship between ...
Phonological accounts of speech perception postulate that listeners map variable instances of spe... more Phonological accounts of speech perception postulate that listeners map variable instances of speech to categorical features and remember only those categories. Other research maintains that listeners perceive and remember subcategorical phonetic detail. Our study probes memory to investigate the reality of categorical encoding for prosody—when listeners hear a pitch accent, what do they remember? Two types of prosodic variation are tested: phonological variation (presence vs. absence of a pitch accent), and variation in phonetic cues to pitch accent (F0 peak, word duration). We report results from six experiments that test memory for pitch accent vs. cues. Our results suggest that listeners encode both categorical distinctions and phonetic detail in memory, but categorical distinctions are more reliably retrieved than cues in later tests of episodic memory. They also show that listeners may vary in the degree to which they remember prosodic detail.
Perceived prominence in Russian and Hindi, free word order languages, can be communicated prosodi... more Perceived prominence in Russian and Hindi, free word order languages, can be communicated prosodically and structurally, via word order. Paired production and perception experiments with native speakers show that discourse-prominent constituents are marked acoustically, via a perceptible increase in vowel intensity and f0, and structurally, via a change in word order and placing a word into a designated position in a sentence or clause.
1. Phonetic considerations in phonological research Phonology is concerned with characterizing th... more 1. Phonetic considerations in phonological research Phonology is concerned with characterizing the sound patterns of language, typically presented in terms of a system of contrastive sound elements (e.g., syllables, segments, features) and the distribution of those sounds in the makeup of phonological words and phrases. This focus on the sound system and the characteristic patterns that arise when sounds combine to form words is what distinguishes the study of phonology from the study of phonetics, as these two fields are traditionally construed. Yet the phonologist’s perspective on sound systems is typically rooted in knowledge about the phonetic properties of the sound elements that make up a language, and reflects direct observation of phonetic form, through the production and perception of spoken words and phrases.
Examining articulatory compensation has been important in understanding how the speech production... more Examining articulatory compensation has been important in understanding how the speech production system is organized, and how it relates to the acoustic and ultimately phonological levels. This paper offers a method that detects articulatory compensation in the acoustic signal, which is based on linear regression modeling of co-variation patterns between acoustic cues. We demonstrate the method on selected acoustic cues for spontaneously produced American English stop consonants. Compensatory patterns of cue variation were observed for voiced stops in some cue pairs, while uniform patterns of cue variation were found for stops as a function of place of articulation or position in the word. Overall, the results suggest that this method can be useful for observing articulatory strategies indirectly from acoustic data and testing hypotheses about the conditions under which articulatory compensation is most likely.
Cooperative and competitive game dialogs are comparatively examined with respect to temporal, bas... more Cooperative and competitive game dialogs are comparatively examined with respect to temporal, basic text-based, and dialog act characteristics. The condition-specific speaker strategies are amongst others well reflected in distinct dialog act probability distributions, which are discussed in the context of the Gricean Cooperative Principle and of Relevance Theory. Based on the extracted features, we trained Bayes classifiers and support vector machines to predict the dialog condition, that yielded accuracies from 90 to 100%. Taken together the simplicity of the condition classification task and its probabilistic expressiveness for dialog acts suggests a two-stage classification of condition and dialog acts.
Production and perception experiments with native speakers of Russian, a free word order language... more Production and perception experiments with native speakers of Russian, a free word order language, show that prosody and change in word order are used to mark discourse-prominent constituents. Concurrent application of these cues to prominence is possible, as evident from distinctively higher f0 and intensity maxima, and duration values associated with exsitu words, as well as their higher visibility in discourse. Distinctive acoustic-prosodic realization of ex-situ words may cue their relatively high informational load and discourse prominence, as well as (redundantly) signal that the word is left-or right-dislocated.
7th International Conference on Speech Prosody 2014, 2014
The perception of prosodic structure (phrasal prominences and boundaries) may depend in part on a... more The perception of prosodic structure (phrasal prominences and boundaries) may depend in part on acoustic cues in the speech signal and in part on utterance meaning as related to syntactic structure and discourse context. In this study we ask if listeners are able to differentially weigh acoustic and meaningbased cues to prosody. We test naive subjects’ transcription of prominences and boundaries in spontaneous American English under three different conditions, all of which involve listening to audio recordings and marking prominences and boundaries on a transcript. The three conditions differ in the instructions given to transcribers. In one condition, subjects were instructed to transcribe prominence and boundaries based on meaning criteria, in a second condition they were told to transcribe based on criteria of acoustic salience, and a third condition had less specific instructions, without explicit reference to either meaning-based or acoustic cues. Our results show that subjects perform differently when focusing on meaning than when focusing on acoustics, especially for prominence marking, where partially different sets of words are selected as prominent under the two tasks. Boundary marking is more similar under the two instructions, with acoustic criteria resulting in more listeners marking a given word as pre-boundary, but with boundaries marked largely on the same words in both tasks. With non-specific instructions, performance was similar to that obtained under acoustic-based instructions. We report on agreement rates within and across conditions. This study has implications for models of prosody perception and the methodology of prosodic transcription.
7th International Conference on Speech Prosody 2014, 2014
This paper investigates how prosodic elements such as prominences and prosodic boundaries in Hind... more This paper investigates how prosodic elements such as prominences and prosodic boundaries in Hindi are perceived. We approach this using data from three sources: (i) native speakers of Hindi without any linguistic expertise (ii) a linguistically trained expert in Hindi prosody and finally, (iii) classifiers trained on English for automatic prominence and boundary detection. We use speech from a corpus of Hindi narrative speech for our experiments. Our results indicate that non-expert transcribers do not have a consistent notion of prosodic prominences. However, they show considerable agreement regarding the placement of prosodic boundaries. Also, relative to the nonexpert transcribers, there is higher agreement between the expert transcriber and the automatically derived labels for prominence (and prosodic boundaries); this suggests the possibility of using classifiers for the automatic prediction of these prosodic events in Hindi.
Understanding the structure of intonational variation is a longstanding issue in prosodic researc... more Understanding the structure of intonational variation is a longstanding issue in prosodic research. A given utterance can be realized with countless intonational contours, and while variation in prosodic meaning is also large, listeners nevertheless converge on relatively consistent form-function mappings. While this suggests the existence of abstract intonational representations, it has been unclear how exactly these are defined. The present study examines the validity of a well-defined set of phonological representations for the generation of intonation in the nuclear region of an intonational phrase in American English: namely, the combination of binary pitch accents (H*/L*), phrase accents (H-/L-), and boundary tones (H%/L%) proposed in Pierrehumbert (1980). In an exploratory study, we examined whether speakers maintained the eight-way distinction among intonational contours posited to exist in this representational system. We created eight synthesized contours according to Pierrehumbert (1980) and examined whether listeners generalized these contours to novel productions. Speakers largely distinguished rising from nonrising contours in production, but few other distinctions were maintained. While this does not rule out the existence of additional contours in production, these findings do suggest that the representation of rising and non-rising contours may be privileged and more readily accessible in the intonational grammar.
This study tests the influence of acoustic cues and non-acoustic contextual factors on listeners'... more This study tests the influence of acoustic cues and non-acoustic contextual factors on listeners' perception of prominence in three languages whose prominence systems differ in the phonological patterning of prominence and in the association of prominence with information structure-English, French and Spanish. Native speakers of each language performed an auditory rating task to mark prominent words in samples of conversational speech under two instructions: with prominence defined in terms of acoustic or meaning-related criteria. Logistic regression models tested the role of task instruction, acoustic cues and non-acoustic contextual factors in predicting binary prominence ratings of individual listeners. In all three languages we find similar effects of prosodic phrase structure and acoustic cues (F0, intensity, phone-rate) on prominence ratings, and differences in the effect of word frequency and instruction. In English, where phrasal prominence is used to convey meaning related to information structure, acoustic and meaning criteria converge on very similar prominence ratings. In French and Spanish, where prominence plays a lesser role in signaling information structure, phrasal prominence is perceived more narrowly on structural and acoustic grounds. Prominence ratings from untrained listeners correspond with ToBI pitch accent labels for each language. Distinctions in ToBI pitch accent status (nuclear, prenuclear, unaccented) are reflected in empirical and model-predicted prominence ratings. In addition, words with a ToBI pitch accent type that is typically associated with contrastive focus are more likely to be rated as prominent in Spanish and English, but no such effect is found for French. These findings are discussed in relation to probabilistic models of prominence production and perception.
Understanding the role of prosody in encoding linguistic meaning and in shaping phonetic form req... more Understanding the role of prosody in encoding linguistic meaning and in shaping phonetic form requires the analysis of prosodically annotated speech drawn from a wide variety of speech materials. Yet obtaining accurate and reliable prosodic annotations for even small datasets is challenging due to the time and expertise required. We discuss several factors that make prosodic annotation difficult and impact its reliability, all of which relate to variability: in the patterning of prosodic elements (features and structures) as they relate to the linguistic and discourse context, in the acoustic cues for those prosodic elements, and in the parameter values of the cues. We propose two novel methods for prosodic transcription that capture variability as a source of information relevant to the linguistic analysis of prosody. The first is Rapid Prosody Transcription (RPT), which can be performed by non-experts using a simple set of unary labels to mark prominence and boundaries based on immediate auditory impression. Inter-transcriber variability is used to calculate continuous-valued prosody 'scores' that are assigned to each word and represent the perceptual salience of its prosodic features or structure. RPT can be used to model the relative influence of top-down factors and acoustic cues in prosody perception, and to model prosodic variation across many dimensions, including language variety, speech style, or speaker's affect. The second proposed method is the identification of individual cues to the contrastive prosodic elements of an utterance. Cue specification provides a link between the contrastive symbolic categories of prosodic structures and the continuous-valued parameters in the acoustic signal, and offers a framework for investigating how factors related to the grammatical and situational context influence the phonetic form of spoken words and phrases. While cue specification as a transcription tool has not yet been explored as RPT has, it has the potential to provide a level of detail that will be useful in modelling systematic context-governed variation in the implementation of prosodic categories, with applications in automatic speech synthesis and recognition, as well as modelling human speech production and perception. We discuss how RPT and cue specification, particularly when combined, can improve the efficiency and reliability of prosodic transcription and how they can be integrated with expert phonological transcription.
An algorithm for detecting sudden jumps in measured F0, which are likely to be inaccurate measure... more An algorithm for detecting sudden jumps in measured F0, which are likely to be inaccurate measures, is introduced. The method computes sample-to-sample differences in F0 and, based on a user-defined threshold, determines whether a difference is larger than naturally produced F0 velocities, thus, flagging it as an error. Various parameter settings are evaluated on a corpus of 30 American English speakers producing different intonational patterns, for which F0 tracking errors were manually checked. The paper concludes in recommending settings for the algorithm and ways in which it can be used to facilitate analyses of F0 in speech research.
Proceedings of International Conferences of Experimental Linguistics
We report six experiments on learnability of four non-adjacent phonotactic constraints which diff... more We report six experiments on learnability of four non-adjacent phonotactic constraints which differ in their attested frequency and phonetic conditioning factors; liquid harmony, liquid disharmony, backness harmony, and backness disharmony. Our results suggest that such phonotactic constraints can be implicitly learned from brief experience and that learnability of a phonological grammar may be independent of its attested frequency and phonetic basis.
This paper examines the usefulness of including prosodic and phonetic context information in the ... more This paper examines the usefulness of including prosodic and phonetic context information in the phoneme model of a speech recognizer. This is done by creating a series of prosodic and phonetic models and then comparing the mutual information between the observations and each possible context variable. Prosodic variables show improvement less often than phone context variables, however, prosodic variables generally show a larger increase in mutual information. A recognizer with allophones defined using the maximum mutual information prosodic and phonetic variables outperforms a recognizer with allophones defined exclusively using phonetic variables.
s Speakers produced errors on vowels less often than on consonants, and on nuclei less often tha... more s Speakers produced errors on vowels less often than on consonants, and on nuclei less often than on codas. s Rates of errors on CV and VC pairs were above chance s Errors on Vs most often occur with errors on at least one contiguous C, but not vice versa. s The Articulatory Phonology model of syllable structure, with the additional feature of sequential activation, would best predict these observed asymmetries.
Listeners can rapidly integrate intonational information to infer a speaker’s intended meaning. B... more Listeners can rapidly integrate intonational information to infer a speaker’s intended meaning. But not all components of an intonation contour contribute to meaning equally well. Prenuclear pitch accents, tonal events preceding the nuclear pitch accent in an utterance, have been described as not reliably mapping onto discourse meaning. We use mouse tracking to investigate whether German listeners can use prenuclear pitch accents to predict upcoming referential information in the utterance. Our results are compatible with the assumption that listeners ignore prenuclear accents when predicting speakers’ intentions. All materials, data, and scripts can be retrieved here: https://osf.io/xf8be/.
We investigate the prominence of English words with stress reversal (e.g. èlevátion 2-1 → élevàti... more We investigate the prominence of English words with stress reversal (e.g. èlevátion 2-1 → élevàtion 1-2). We ask what motivates the occurrence of the “early high” (1-2) pattern outside of stress clash contexts, and consider the hypothesis that it marks prominence non-locally. Experiment 1 tests the effect of prominence pattern on memory. Given its markedness and location at phrasal onset, we hypothesize that early high pitch broadly facilitates recall for sentence information. This hypothesis is not confirmed, suggesting that the effect of pitch accent on memory may be restricted to the accented word. In Experiment 2 listeners perform a prominence-rating task on the same patterns. Results show that early high is prominence-lending, but with weaker prominence than the lexical (2-1) stress pattern. The combined findings suggest a hybrid function for early high in marking the beginning of a discourse-level prosodic unit, and in lending prominence to the early high-accented word.
Information structure is said to play an important role in determining phrasal prominence and the... more Information structure is said to play an important role in determining phrasal prominence and the assignment of nuclear pitch accents in English. Early accounts claim that discourse-new or focused words receive a prominence-lending high/rising pitch accent, while given words are unaccented, with reduced prominence. Empirical findings are varied, but paint a more complex picture of the prosodic encoding of information structure. The present study investigated the phonological and phonetic encoding of information status and contrastive focus in nuclear position in American English, from speech read under neutral and lively affect. Given information was associated with decreased phonological and phonetic prominence, contrastive information with enhanced prominence, while new information corresponded to increased phonological, but not phonetic prominence, as assessed in pitch accent type, duration, intensity, and voice quality. The findings indicate a probabilistic relationship between ...
Phonological accounts of speech perception postulate that listeners map variable instances of spe... more Phonological accounts of speech perception postulate that listeners map variable instances of speech to categorical features and remember only those categories. Other research maintains that listeners perceive and remember subcategorical phonetic detail. Our study probes memory to investigate the reality of categorical encoding for prosody—when listeners hear a pitch accent, what do they remember? Two types of prosodic variation are tested: phonological variation (presence vs. absence of a pitch accent), and variation in phonetic cues to pitch accent (F0 peak, word duration). We report results from six experiments that test memory for pitch accent vs. cues. Our results suggest that listeners encode both categorical distinctions and phonetic detail in memory, but categorical distinctions are more reliably retrieved than cues in later tests of episodic memory. They also show that listeners may vary in the degree to which they remember prosodic detail.
Perceived prominence in Russian and Hindi, free word order languages, can be communicated prosodi... more Perceived prominence in Russian and Hindi, free word order languages, can be communicated prosodically and structurally, via word order. Paired production and perception experiments with native speakers show that discourse-prominent constituents are marked acoustically, via a perceptible increase in vowel intensity and f0, and structurally, via a change in word order and placing a word into a designated position in a sentence or clause.
1. Phonetic considerations in phonological research Phonology is concerned with characterizing th... more 1. Phonetic considerations in phonological research Phonology is concerned with characterizing the sound patterns of language, typically presented in terms of a system of contrastive sound elements (e.g., syllables, segments, features) and the distribution of those sounds in the makeup of phonological words and phrases. This focus on the sound system and the characteristic patterns that arise when sounds combine to form words is what distinguishes the study of phonology from the study of phonetics, as these two fields are traditionally construed. Yet the phonologist’s perspective on sound systems is typically rooted in knowledge about the phonetic properties of the sound elements that make up a language, and reflects direct observation of phonetic form, through the production and perception of spoken words and phrases.
Examining articulatory compensation has been important in understanding how the speech production... more Examining articulatory compensation has been important in understanding how the speech production system is organized, and how it relates to the acoustic and ultimately phonological levels. This paper offers a method that detects articulatory compensation in the acoustic signal, which is based on linear regression modeling of co-variation patterns between acoustic cues. We demonstrate the method on selected acoustic cues for spontaneously produced American English stop consonants. Compensatory patterns of cue variation were observed for voiced stops in some cue pairs, while uniform patterns of cue variation were found for stops as a function of place of articulation or position in the word. Overall, the results suggest that this method can be useful for observing articulatory strategies indirectly from acoustic data and testing hypotheses about the conditions under which articulatory compensation is most likely.
Cooperative and competitive game dialogs are comparatively examined with respect to temporal, bas... more Cooperative and competitive game dialogs are comparatively examined with respect to temporal, basic text-based, and dialog act characteristics. The condition-specific speaker strategies are amongst others well reflected in distinct dialog act probability distributions, which are discussed in the context of the Gricean Cooperative Principle and of Relevance Theory. Based on the extracted features, we trained Bayes classifiers and support vector machines to predict the dialog condition, that yielded accuracies from 90 to 100%. Taken together the simplicity of the condition classification task and its probabilistic expressiveness for dialog acts suggests a two-stage classification of condition and dialog acts.
Production and perception experiments with native speakers of Russian, a free word order language... more Production and perception experiments with native speakers of Russian, a free word order language, show that prosody and change in word order are used to mark discourse-prominent constituents. Concurrent application of these cues to prominence is possible, as evident from distinctively higher f0 and intensity maxima, and duration values associated with exsitu words, as well as their higher visibility in discourse. Distinctive acoustic-prosodic realization of ex-situ words may cue their relatively high informational load and discourse prominence, as well as (redundantly) signal that the word is left-or right-dislocated.
7th International Conference on Speech Prosody 2014, 2014
The perception of prosodic structure (phrasal prominences and boundaries) may depend in part on a... more The perception of prosodic structure (phrasal prominences and boundaries) may depend in part on acoustic cues in the speech signal and in part on utterance meaning as related to syntactic structure and discourse context. In this study we ask if listeners are able to differentially weigh acoustic and meaningbased cues to prosody. We test naive subjects’ transcription of prominences and boundaries in spontaneous American English under three different conditions, all of which involve listening to audio recordings and marking prominences and boundaries on a transcript. The three conditions differ in the instructions given to transcribers. In one condition, subjects were instructed to transcribe prominence and boundaries based on meaning criteria, in a second condition they were told to transcribe based on criteria of acoustic salience, and a third condition had less specific instructions, without explicit reference to either meaning-based or acoustic cues. Our results show that subjects perform differently when focusing on meaning than when focusing on acoustics, especially for prominence marking, where partially different sets of words are selected as prominent under the two tasks. Boundary marking is more similar under the two instructions, with acoustic criteria resulting in more listeners marking a given word as pre-boundary, but with boundaries marked largely on the same words in both tasks. With non-specific instructions, performance was similar to that obtained under acoustic-based instructions. We report on agreement rates within and across conditions. This study has implications for models of prosody perception and the methodology of prosodic transcription.
7th International Conference on Speech Prosody 2014, 2014
This paper investigates how prosodic elements such as prominences and prosodic boundaries in Hind... more This paper investigates how prosodic elements such as prominences and prosodic boundaries in Hindi are perceived. We approach this using data from three sources: (i) native speakers of Hindi without any linguistic expertise (ii) a linguistically trained expert in Hindi prosody and finally, (iii) classifiers trained on English for automatic prominence and boundary detection. We use speech from a corpus of Hindi narrative speech for our experiments. Our results indicate that non-expert transcribers do not have a consistent notion of prosodic prominences. However, they show considerable agreement regarding the placement of prosodic boundaries. Also, relative to the nonexpert transcribers, there is higher agreement between the expert transcriber and the automatically derived labels for prominence (and prosodic boundaries); this suggests the possibility of using classifiers for the automatic prediction of these prosodic events in Hindi.
Understanding the structure of intonational variation is a longstanding issue in prosodic researc... more Understanding the structure of intonational variation is a longstanding issue in prosodic research. A given utterance can be realized with countless intonational contours, and while variation in prosodic meaning is also large, listeners nevertheless converge on relatively consistent form-function mappings. While this suggests the existence of abstract intonational representations, it has been unclear how exactly these are defined. The present study examines the validity of a well-defined set of phonological representations for the generation of intonation in the nuclear region of an intonational phrase in American English: namely, the combination of binary pitch accents (H*/L*), phrase accents (H-/L-), and boundary tones (H%/L%) proposed in Pierrehumbert (1980). In an exploratory study, we examined whether speakers maintained the eight-way distinction among intonational contours posited to exist in this representational system. We created eight synthesized contours according to Pierrehumbert (1980) and examined whether listeners generalized these contours to novel productions. Speakers largely distinguished rising from nonrising contours in production, but few other distinctions were maintained. While this does not rule out the existence of additional contours in production, these findings do suggest that the representation of rising and non-rising contours may be privileged and more readily accessible in the intonational grammar.
This study tests the influence of acoustic cues and non-acoustic contextual factors on listeners'... more This study tests the influence of acoustic cues and non-acoustic contextual factors on listeners' perception of prominence in three languages whose prominence systems differ in the phonological patterning of prominence and in the association of prominence with information structure-English, French and Spanish. Native speakers of each language performed an auditory rating task to mark prominent words in samples of conversational speech under two instructions: with prominence defined in terms of acoustic or meaning-related criteria. Logistic regression models tested the role of task instruction, acoustic cues and non-acoustic contextual factors in predicting binary prominence ratings of individual listeners. In all three languages we find similar effects of prosodic phrase structure and acoustic cues (F0, intensity, phone-rate) on prominence ratings, and differences in the effect of word frequency and instruction. In English, where phrasal prominence is used to convey meaning related to information structure, acoustic and meaning criteria converge on very similar prominence ratings. In French and Spanish, where prominence plays a lesser role in signaling information structure, phrasal prominence is perceived more narrowly on structural and acoustic grounds. Prominence ratings from untrained listeners correspond with ToBI pitch accent labels for each language. Distinctions in ToBI pitch accent status (nuclear, prenuclear, unaccented) are reflected in empirical and model-predicted prominence ratings. In addition, words with a ToBI pitch accent type that is typically associated with contrastive focus are more likely to be rated as prominent in Spanish and English, but no such effect is found for French. These findings are discussed in relation to probabilistic models of prominence production and perception.
Uploads
Papers by Jennifer Cole