Academia.eduAcademia.edu

Prosodic correlates in sentences of signed languages

The user has requested enhancement of the downloaded file.

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/224026583 Prosodic Correlates of Sentences in Signed Languages. A Literature Review and Suggestions for New Types of Studies Article in Sign Language Studies · January 2011 DOI: 10.1353/sls.2011.0019 CITATIONS READS 13 60 2 authors: Ellen Ormel Onno Crasborn 33 PUBLICATIONS 344 CITATIONS 149 PUBLICATIONS 755 CITATIONS Radboud University SEE PROFILE Radboud University SEE PROFILE Some of the authors of this publication are also working on these related projects: Digging into Signs View project Form- ‐meaning units in sign languages: An inventory and studies of interpretation and use in Sign Language of the Netherlands (NGT) View project All content following this page was uploaded by Onno Crasborn on 27 May 2014. The user has requested enhancement of the downloaded file. ELL EN ORM E L A N D ON N O C RA SB ORN Prosodic Correlates of Sentences in Signed Languages: A Literature Review and Suggestions for New Types of Studies Th e ide ntif ication of sentences within a larger stream of signed discourse is not a trivial undertaking. Using established linguistic criteria, one can categorize units such as predicates and arguments or distinguish main clauses; however, semantic and syntactic analyses are ongoing points of issue and debate in signed language research (e.g., Liddell 1980, 2003; Johnston 1991; Engberg-Pedersen 1993; Neidle et al. 2000; Sandler and Lillo-Martin 2006). As with spoken languages, how the lexical and syntactic materials of signed language actually appear to the viewer (i.e., the phonetic form of the language) is mediated by a phonological level of organization. Phonological forms from the syllable upward have commonly been called “prosody” for sign languages, just as the rhythmic and melodic properties of spoken languages (Sandler 1999a, 1999b; Nespor and Sandler 1999).While many authors argue that the overall design of the Ellen Ormel is a postdoc researcher at the Centre for Language Studies, Department of Linguistics at the Radboud University. Onno Crasborn is a senior researcher, also at the Centre for Language Studies, Department of Linguistics at the Radboud University. Production of this article was made possible by EU FP7-ICT-2007-3 grant no. 231424, “SignSpeak”; ERC Starting Research grant no. 210373, “On the Other Hand”; and NWO VIDI grant no. 276-70-012, “On the Other Hand.” 279 Sign Language Studies Vol. 12 No. 2 Winter 2012 280 | Sign L anguage Studie s grammar of signed languages shows numerous similarities to spoken language organization (e.g., Meier 2002; Sandler and Lillo-Martin 2006; Sandler 2010), the phonetic substance of signed languages is very diferent from that of spoken languages (e.g., Crasborn 2001; Brentari and Crossley 2002; Sandler forthcoming). The phonetic correlates of rhythm and intonation in signed languages consist of nonmanual activities and modiications of manual phonological material. As for the syntactic level of organization, it is not self-evident how linguists can identify clause- or sentencelike units in signed languages on the basis of phonetic properties of the prosodic form.The same is likely to hold for sign-language users. As for spoken languages (Pierrehumbert 1980), phonetics does not always neatly relect phonology (Crasborn 2001; Johnson and Liddell 2010, 2011). At the same time, it is well known from psycholinguistic studies on spoken language that prosody does help in the perception and recognition of spoken language (Gerken 1996; Cutler, Dahan, and van Donselaar 1997; Frazier, Carlson, and Clifton 2006). It is likely that this is also the case for signed-language processing: Sign prosody helps to parse long streams of visible events into morphosyntactic and discourse units. In a number of empirical studies, both syntactic and prosodic perspectives on the sentence have been articulated (Nicodemus 2006, 2009; Crasborn 2007; Fenlon et al. 2007; Hansen and Heßmann 2007; Herrmann 2009; Hochgesang 2009; Jantunen 2007). These all (implicitly or explicitly) subscribe to the conception of prosody as being related to, but not a direct expression of, syntactic structure (cf. Sandler 2010). Thus, phonetic events (whether on the face or on the hands) indirectly point to phonological structure, just as F0 (fundamental frequency) and duration in spoken language are relections of tones and prosodic groupings in speech (Pierrehumbert 1980; Selkirk 1984; Ladd 1996; Gussenhoven 2004). Phonological domains such as the utterance and the intonational phrase relate to syntactic structure, but the actual phrasing depends in part on performance factors like speaking rate (Selkirk 1984; Nespor and Vogel 1986). Thus, the same syntactic string of words can be articulated in diferent ways, with diferent phrasing, showing more intonational phrases when it is articulated very slowly than when it is realized at high speed.The relationship between syntax Prosodic Correlates of Sentences in Signed Languages | 281 Fi g ure 1. The place of prosody in the grammar (from Shattuck-Hufnagel & Turk, 1996). and phonetics in signed languages may relect a similar relationship as suggested in spoken languages (Shattuck-Hufnagel and Turk 1996). One of the possible models Shattuck-Hufnagel and Turk conceive is presented in igure 1. An alternative they suggest is a joint phonological component for prosody and segmental phenomena. While there is little consensus on the presence of a unit “segment” in the phonology of signs, there surely is a subsyllabic level of organization that contains the phonological representation of lexical items (consisting of one- vs. two-handedness, selected ingers and their coniguration, orientation, a place of articulation, and movement properties) (Sandler 1989; Brentari 1998; Crasborn 2001; van der Kooij 2002; van der Kooij and Crasborn 2008). In this article we do not contribute new evidence to either prosodic or grammatical perspectives on the sentence, but we present a literature review of prosodic evidence for large prosodic domains that can be equated with a syntactic unit “clause” or “sentence.” This article was produced in the context of a European project on automatic 282 | Sign L anguage Studie s sign-language recognition and translation (SignSpeak, http://www .signspeak.eu), in which (after recognition of lexical items) the stream of signs in a video recording has to be segmented in some way by software in order to enable its translation to the written form of a spoken language. This translation process is tied to sentences in the output language. For that purpose, it is important to ask whether sentence units are in some way visible in the video recordings of signed discourse. It is not crucial per se whether the units in question are actually sentences or sentence parts (clauses). For that reason, in this article we do not address the topic of syntactic structure (nor, as a consequence, do we ask how speciic phonological domains are generated from the syntax). In addition, we use the terms “sentence” and “clause” fairly loosely, aiming to refer to a high syntactic domain that can receive a translation in a full written sentence. On the basis of the various ideas and views in the literature, we describe two types of possible studies that can contribute to our understanding of prosodic cues to sentence boundaries. First of all, we suggest reinements to the perception studies referred to earlier. Second, we suggest that the use of new techniques from visual signal processing can be exploited to detect salient events in a video recording of signing. Finally, we briely discuss how knowledge of the prosody of signed languages can be employed for language technology, such as the automatic translation from signed to spoken language. Literature Review For a sentence to be understood by an addressee as a cohesive proposition, one must utilize information from several linguistic cues, including the syntactic, semantic, and discourse structure of the utterance. All utterances and their components must have a speciic duration, amplitude, and fundamental frequency, and this is afected by the prosodic structure.When normal speech is recognized, prosodically determined variation is being processed (Cutler, Dahan, and Van Donselaar 1997). The aim of many studies of prosody is to understand the process of recognition. In contrast to prosody in spoken languages, prosody in signed languages has been studied to only a very limited extent, and empiri- Prosodic Correlates of Sentences in Signed Languages | 283 cal studies have focused on only a very limited number of languages. First we describe some studies on prosody in relation to sentences in spoken languages. Prosody and Sentences in Spoken Languages In spoken languages, prosody is a widely discussed topic. Within the research on prosody, boundary features have been the most studied topic (Cutler, Dahan, and Van Donselaar 1997). In a tutorial on prosody, Shattuck-Hufnagel and Turk (1996) explained the characteristics of spoken utterances that written equivalents do not have in terms of patterns of prosodic structure: intonation, timing, and variations in segmental implementation. The organization of a spoken utterance is not isomorphic to its morphosyntactic structure. Therefore, prosody is of great importance for (auditory) sentence processing by human perceivers: We cannot hear all of the syntactic structure of a sentence directly but are dependent on partly indirect cues such as those of prosody. The same is likely to be true for sign-language processing if one assumes the same general design of the grammar as in igure 1. Prosody has been deined in a number of ways. One deinition focuses on the phonological organization of parts into higher-level constituents and to the hierarchies of relative prominence within these constituents (Selkirk 1984; Nespor and Vogel 1986). Constituents include intonational phrases, prosodic phrases, and prosodic words. Another deinition of prosody refers to the acoustic parameters, presumably signaling constituent boundaries and prominence: F0, duration, amplitude, and segment quality or reduction (Laver 1994). A third group of deinitions combines the phonological aspect of prosody at the higher-level organization with the phonetic efects of the organization (e.g., F0, duration, segment quality/reduction). Shattuck-Hufnagel and Turk (1996) use the following working hypothesis: “Prosody is both (1) the higher-level structures that best account for these patterns and (2) acoustic patterns of F0, duration, amplitude, spectral tilt, and segmental reduction, and their articulatory correlates, that can be best accounted for by reference to higher-level structures” (196). In an extensive literature review on prosody in the comprehension of spoken language, Cutler, Dahan, and Van Donselaar (1997) similarly stated that 284 | Sign L anguage Studie s the term “prosody” is used in diferent ways: “From at one extreme those who maintain an abstract deinition not necessarily coupled to any statement about realization (‘the structure that organizes sounds’), to those who use the term to refer to the realization itself, that is, effectively use it as a synonym for suprasegmental features (‘e.g., pitch, tempo, loudness, pause’) at the other extreme” (142). One example of work concerning the psychological reality of prosodic constituents was carried out by Bögels et al. (2009), who suggested that prosodic information could be suicient for perceivers to determine the syntactic analysis of a spoken sentence. In other words, prosody not only provides additional support for the interpretation of spoken sentences but also directs the syntactic analyses. Cutler, Dahan, and Van Donselaar (1997) explained that the presence of prosodic information cueing a boundary could inluence listeners’ syntactic analyses. However, they suggested that the evidence for this inluence is not robust as yet. Several studies have shown that the listener does not always exploit the available prosodic cues in spoken languages (see Cutler, Dahan, and Van Donselaar 1997 for discussion). Furthermore, work on prosody in relation to sentence boundaries by Carlson, Clifton, and Frazier (2001) suggests that the interpretation of a prosodic sentence boundary is related to the existence and relative size of other prosodic boundaries in the sentence. This suggestion is in contrast to ideas that boundaries have a constant form independent of context. The diferent indings on the role of prosody to determine syntactic analyses, as well as the indings on the exploitation and interpretation of prosodic cues, indicate that conclusions about the speciic role of prosody in spokenlanguage processing are still refutable. Shattuck-Hufnagel and Turk (1996) have provided four helpful hints to support further research on auditory sentence-processing studies that may form a starting point for looking at signed utterances: “1. Since prosody can’t be predicted from text, specify the prosody of stimulus utterances as it was really produced; 2. Since prosody can’t be read of the signal alone, inform acoustic measurements by perceptual transcription of the prosodic structure of target utterances; 3. Consider interpretation of results in terms of prosodic as well as morpho-syntactic structure; 4. Deine those terms” (241). Prosodic Correlates of Sentences in Signed Languages | 285 Prosody in Signed Languages Although knowledge of spoken-language prosody has been advanced in the past decade, work on prosody in signed languages is still in its infancy, and many questions need further investigation. The study of sentence boundaries in relation to prosody in sign language is one of the areas that have been studied to a rather limited extent. The same four hints by Shattuck-Hufnagel and Turk (1996) on spoken-sentence processing appear useful for looking at signed-sentence processing.We consider especially the irst and the second points in our suggestion for future approaches to studies of prosody in signed languages. The nature of prosody in sign language is discussed next, and we follow that with a review of a number of studies on prosody at sentence boundaries in signed languages. Subsequently, we describe several recent developments on automatic detection of sentence boundaries in spoken and signed language. Nespor and Sandler (1999) have deined the basic research theme of whether prosodic patterns are exclusively present in spoken languages or also occur in signed languages (although earlier studies have addressed similar issues; e.g., Wilbur 1994). The latter inding would form an argument for prosody’s being a universal property of human language irrespective of modality. The inding that there is nonisomorphism between syntactic and prosodic constituents indicates that the requirement for rhythmic structure forms an independent property or phonological organization (see also Sandler and LilloMartin 2006). Several studies on prosodic constituents have since been conducted in signed languages. Nespor and Sandler (1999) have presented evidence that Israeli Sign Language (ISL) sentences do indeed have separate prosodic constituents such as phonological phrases (PP) and, at a higher level in the prosodic hierarchy, intonational phrases (IP) (see also Sandler and Lillo-Martin 2006). In 1991, Allen, Wilbur, and Schick studied the rhythmic structuring of sign language, in their case, American Sign Language (ASL), by looking at three groups of signers: ASL-luent deaf signers, ASL-luent adult hearing children, and nonsigning hearing adults. The participants were asked to tap a metal stick in time to the rhythm of signed narratives.The participants 286 | Sign L anguage Studie s tapped in each of the three groups in a rhythmical way. In their report, Allen, Wilbur, and Schick (1991) explained that nonnative signers are often detected by the lack of rhythm in their signing. Some diferences were found between the rhythmic identiication patterns by the hearing nonsigners and by the deaf signers, which showed that the rhythm of ASL is fully observed only when participants have a good grasp of sign language. Temporal aspects of signs that might function as prosodic elements were analyzed for Swiss-German Sign Language by Boyes Braem (1999), in an attempt to determine what makes the signing of native signers seemingly more rhythmic than the signing of late learners. For that reason, three early signers were compared to three late signers. Two kinds of rhythmic patterns (typically referring to beat or stress) were found in the study, which may have resulted in the observation of a sign as being rhythmic. One rhythmic pattern was the temporal balancing of syntactic phrases. Temporal balancing refers to the phenomenon in which two subsequent phrases of approximately the same duration are produced. Here, no diferences were found between early and late learners of sign language. The second kind of pattern was a regular side-to-side movement of the torso, which appears to phonetically mark larger parts of certain types of discourse. Early learners showed this pattern more than late learners. Prosodic cues can occur on the manual articulators (hands and arms) and nonmanual articulators (face, head, and body). Nonmanuals typically also add semantic information to the manual signs. Nonmanual markers include head position, body position, eyebrow and forehead, eye gaze, nose position, and mouth, tongue, and cheek actions. As Wilbur (2000, 237) states, “Nonmanual markers are integral components of the ASL intonation system, performing many of the same functions in the signed modality that pitch performs in the spoken modality.” In a number of studies, nonmanuals were investigated to gain insight into prosodic boundaries. In general, nonmanuals cue either the start or the end of phrases (boundary markers) or their duration (domain markers). Wilbur also found that the signiicant degree of layering in American Sign Language is a consequence of the modality. Layering serves prosodic purposes, the most apparent of which is the simultaneous production of nonmanual markings with manual signs. Prosodic Correlates of Sentences in Signed Languages | 287 As we emphasize several times in this article, all of these phonetic events that are related to lexical semantics, syntax, information structure, or pragmatics also have a phonological form (see also Sandler 2010 for further discussion). The fact that they are clearly categorized and described as “X” expressing “Y” implies that the phonological analysis on the basis of the phonetic events has already taken place. However, this derivation from variable phonetic events to categorical phonological values (“brow raise”) is typically not discussed in great detail in the literature. The phonetics of speech prosody is extremely complex (cf. the work of Pierrehumbert 1980 and related studies on phonetic implementation since her work), and we expect the same complex relation to exist between phonology and phonetics in signed languages (Crasborn 2001). Several nonmanual cues can serve as boundary markers. Eye blinks are one of the most frequently mentioned boundary markers. As early as 1978, Baker and Padden irst introduced the importance of eye blinks in sign-language research, in particular (what has later been referred to as) the inhibited periodic blinks, which often seemed to occur at the end of intonational phrases (see also Wilbur 1994). As opposed to boundary markers, several nonmanual articulators can function as domain markers: eye brow position, eye gaze, head tilt, negative headshake, body leans, and body shift. These domain markers often appear to begin at the start of a constituent and end when the constituent ends. As they are articulated independently of the manual string of signs and each articulator can maintain a speciic position, these positions can be kept constant for a certain time span. It is this phonetic afordance of the visual modality that makes the presence of domain markers speciically prominent and their roles quite diverse. Although tone in spoken language can also be raised or lowered for a certain domain, the many factors inluencing F0 in speech (including a natural tendency to gradually lower one’s voice over a sequence of syllables) make it quite hard to mark domains for a perceiver in a constant way. For some nonmanual articulators in signed languages, it is slowly becoming apparent that there, too, are more inluences on their phonetic appearance than was initially thought (see, for example, De Vos, van der Kooij, and Crasborn (2009) on eyebrows in Sign Language of the Netherlands). 288 | Sign L anguage Studie s Part of the work on prosody in spoken and signed languages has focused on segmentation and sentence boundaries. In the following section we summarize a series of studies that have speciic impact on the identiication of clause and sentence boundaries in signed languages. Prosody at Sentence Boundaries in Signed Languages In the following overview and the subsequent thoughts for future directions given later, we attempt to achieve more insight into prosodic processing in signed languages, in particular in relation to the segmentation of signed discourse into clause or sentence units and the diferent prosodic boundary cues that relate to sentence boundaries. We start with a review of the literature on eye blinks, which is frequently discussed, and then move on to studies that look at various cues in combination.The studies have varying empirical bases in terms of the number of signers that are studied and the nature and amount of data that they analyze. Appendix 1 gives a summary of the data that form the basis of the studies that we refer to here. This overview makes it clear that most production studies are not based on an experimental paradigm but typically analyze a small collection of sentences or short narratives from one or a few signers. Studies of Eye Blinks. Wilbur (1994) studied eye blinking in signers of ASL. This study showed that their eye blinks were sensitive to syntactic structure. These indings provide insight into how intonational information appears in a signed language, information that is carried by pitch in spoken languages. Wilbur also examined a range of other nonmanual markers and distinguished between nonmanual markers carried on the lower part of the face and those on the upper face and head/body. The irst were analyzed as performing a lexical semantic function, whereas the latter perform grammatical and prosodic functions. The nonmanuals that marked phrasal boundaries were blinks, head nods, change of head/body, and pauses. Wilbur distinguished three basic types of eye blinks (following the physiological literature, Stern and Dunham 1990): (1) relexive blinks (not studied), (2) involuntary or periodic blinks, and (3) voluntary blinks. Prosodic Correlates of Sentences in Signed Languages | 289 The results showed that signers typically blink at the end of intonational phrases (right edge of ungoverned maximal projection); these are the involuntary blinks. Moreover, voluntary blinks are longer in duration and have greater amplitude than involuntary blinks.Voluntary blinks occur at lexical signs. Wilbur identiied four diferent functions for blinks at boundaries: the marking of (a) syntactic phrases, (b) prosodic phrases (intonational phrases), (c) discourse units, or (d) narrative units. Sze (2008) studied the relationship between blinks, syntactic boundaries, and intonational phrasing in Hong Kong Sign Language (HKSL). Wilbur’s indings on voluntary blinks (marking emphasis, assertion, or stress) and involuntary blinks (marking intonational boundaries) in ASL were evaluated empirically in HKSL. Wilbur’s classiication was insuicient to explain all of the data in the study on HKSL. Blinks in HKSL also bore a high correlation with head movement and gaze changes. Moreover, they could co-occur with the syntactic boundaries of constituents smaller than a clause. Wilbur distinguished between lexical blinks and boundary blinks. Lexical blinks are voluntary and occur simultaneously with the lexical item, whereas boundary blinks are involuntary and occur at intonational phrase boundaries. The diferences between the studies by Wilbur and by Sze may have resulted from diferences in the exact measurement of the duration of a blink, diferences in the exact measurement of the duration of a sign, and diferences in determining whether a blink is produced voluntarily or involuntarily. Whereas in Wilbur’s study, 90 percent of the boundary blinks fall right on the intonational phrase boundaries, this was true for only 57 percent in HKSL. Second, many of the blinks (30 percent) in HKSL data co-occur with the last sign in a sentence, which seems to function as a boundary marker itself. However, according to Wilbur’s description, these would have to be interpreted as a lexical blink. In addition, Sze states that some blinks are accompanied by another blink at the same sign. To account for these diferences with Wilbur’s indings, Sze proposes a new classiication of types of blinks: (1) physiologically induced; (2) boundary sensitive; (3) related to head movement/gaze change and unrelated to syntactic boundaries; (4) voluntary/lexically related; (5) associated with hesitations or false starts. Types 1 and 5 do 290 | Sign L anguage Studie s not have a linguistic function in the sense of being related to a speciic lexical or grammatical form, although they can potentially have a function in perception in being related to performance phenomena related to false starts or hesitations. In addition to the new classiication system, Sze suggested that changes in head movement may, in fact, be better than blinks to serve as clues for intonational phrase boundaries, given that blinks often co-occur with head movements. Moreover, it is rather uncertain whether the addressee makes use of blinks in perception, whereas head movements are larger and more easily observable, which makes them more likely to indicate intonational phrase boundaries. In a study of prosodic domains in ASL, Brentari and Crossley (2002) have used an extreme interpretation of Wilbur’s indings by adopting inhibited, involuntary eye blinks in their methodology as the basic dividing marker between intonational phrases. On the basis of that division, further studies of manual and nonmanual markers of phonological phrases were sought. It appears that they did not take into account the 10 percent of IP boundaries in Wilbur’s study on ASL (and 43 percent of IP boundaries in Sze’s study on HKSL) that were not accompanied by an involuntary boundary blink. It seems to us that the conclusion that all voluntary blinks can be seen as reliable boundary markers may be slightly premature. Crasborn, van der Kooij, and Emmerik (2004) performed a small study similar to those by Wilbur (1994) and Sze (2008) on the prosodic role of eye blinks. Frequencies and locations of blinks were established in diferent types of data (monologues versus question-answer pairs). The results indicate that a distinction seems to be present between shorter and longer blinks. Longer blinks seemed to resemble other “eye aperture” functions such as wide eyes and squinted eyes. Wide eyes express surprise, disgust, and emphasis, whereas squinted eyes express fear and shared information. Closed eyes appear to express disgust and counterassertion in Dutch Sign Language (NGT) and may be related to reference. Sze (2008) emphasizes that frequency and duration of involuntary blinks and possibly also the frequency and duration of other types of blinks are likely to be inluenced by the physical context, including factors such as humidity, the amount of dust in the environment, and temperature. This makes it hard to Prosodic Correlates of Sentences in Signed Languages | 291 compare results across studies and also indings from diferent recording sessions. In sum, there does not appear to be a strict mapping of one articulator to one function in the eye-opening parameter. One open question based on the literature concerns the distinction between a periodic and a voluntary blink. It is unlikely that this can be determined purely on durational criteria. The results in the various studies also showed that blinks occurred at many locations in sentences, and it is an open question whether they all relate to linguistic elements. Another open question is whether sign-language perceivers actually can and do perceive blinks, including the brief involuntary blinks, in normal interaction. There has to date been no targeted study of any sign language on the role of eye blinks in the perception of prosodic structure. Finally, we would like to emphasize that what we can actually see in a video recording is not a phonological shape but one speciic phonetic instance of more general phonological categories. This phonetic implementation of a phonological form in sign is not likely to be much less complex than what we know from intonation research on languages like English or Dutch (see Crasborn 2001 on phonetic implementation of manual phonological categories in sign languages; see Ladd 1996 and Gussenhoven 2004 for overviews of phonetic implementation in prosody). In relation to eye blinks, this implies that the duration and timing of the “lexical,” emphatic blinks may vary up to a point where their phonetic characteristics overlap with the prototypical involuntary blinks.The phonetic forms will have to be related to factors like signing speed in order to ind any clear distinction between the two. For this, experimental phonetic studies will be indispensible. In addition to phonological forms that are linked to a semantic or syntactic feature, there may appear to be prosodic phonological features that do not have any semantic or grammatical content but function only to signal prosodic boundaries. If one would conceive of inhibited periodic eye blinks as similar to boundary tones in spoken languages, for example, then these still relate to the grammar in marking instances of the prosodic domain “intonational phrase,” for example, which is derived from the syntactic structure of the sentence. 292 | Sign L anguage Studie s What makes them diferent is that their occurrence is induced by the physiology of the eye; only their timing appears to be related to prosodic structure. Thus, they would not form part of any phonological speciication but be generated purely by the phonetics. This makes it ever more likely that we will ind large variation in the presence (vs. absence) of blinks, depending on various performance factors. The comparison with breathing comes in again here as a comparable physiological condition related to speech events that may impact the appearance of some of the phonetic phenomena that we can observe and would otherwise be inclined to ascribe to linguistic structure. Studies of Multiple Sentence Boundary Cues. Nespor and Sandler (1999) found that four markers almost always occur at phonological phrase boundaries in ISL: (1) reduplication (reiteration of the sign); (2) hold (freezing the signing hand or hands in their shape and position at the end of the last sign); (3) pause (relaxing the hands briely); and (4) separate facial articulation. Hold, reduplication, and pause may all belong to the same phenomenon of lengthening given that these markers are almost never used simultaneously (see e.g., Byrd, Krivokapic, and Lee 2006 on lengthening in spoken languages). Moreover, the occurrence of an increased number of repetitions is dependent on the lexical sign’s having a speciic distinctive feature (repeated movement). Each of these speciic prosodic cues may be perceptually prominent and suice for the recognition of a domain. However, the modality favors a layering of information, for example, for facial articulations such as eyebrows, eyelids, and mouth, which may be simultaneously present. The issue of simultaneous occurrence of prosodic cues requires further examination in future studies, as one would predict that the facial articulation would be able to co-occur with the manual cues. According to Sandler and Lillo-Martin (2006), even more obvious prosodic cues occur at the boundaries of intonational phrases, as compared to phonological phrases. Intonational phrases are at a higher prosodic level than phonological phrases. In spoken languages, intonational phrase are produced in one breath and begin with a new breath of air. A correspondence exists between breathing in spoken language and eye blinks in sign language in that both are imposed by our physiology and regularly occur irrespective of whether we are ut- Prosodic Correlates of Sentences in Signed Languages | 293 tering linguistic units. It would therefore not come as a surprise if eye blinks indeed indicate intonational boundaries in signed languages. In addition to blinks, at least two other characteristics noticeably indicate IP boundaries: changes in head position and major changes in facial expression. During an IP, head position seems to remain constant up to the point of boundary, where it clearly changes, providing a rhythmic cue for those phrases (Sandler and Lillo-Martin 2006). Fenlon et al. (2007) studied deaf native signers to determine whether they agreed on the locations of boundaries in narratives and what cues are used when parsing the narratives. In addition, hearing nonsigners were asked to mark sentence boundaries in order to compare the visual cues used by deaf native signers and hearing nonsigners. Six native signers of BSL and six nonsigners were asked to mark sentence boundaries in two narratives, one in British Sign Language (BSL) and one in Swedish Sign Language (SSL). Narratives were segmented in real time, using the ELAN annotation tool (http:// www.lat-mpi.eu/tools/elan/). Before assessing participants’ responses, all intonational phrase (IP) boundaries in both BSL and SSL narrative were identiied using a cue-based approach. Similar to the approach taken by Brentari and Crossley (2002), the occurrence of a blink between signs was used to indicate possible IP boundaries, and these boundaries were further veriied by the presence of other cues, such as pauses and head nods. Following identiication, a strict 1.5-second window was applied to all IP boundaries in both signed narratives in which responses associated with that boundary could occur. Results indicated that the majority of responses from both groups in both narratives fall at IP boundary points (rather than only a phonological phrase boundary or a syntactic boundary). The results showed that the number of cues at each boundary varies from two to eight (occurring simultaneously), showing prosodic layering in the sense of Wilbur’s (2000) work. Several cues were frequently present at boundaries, such as head rotation, head nods, blinks, and eyebrows. Blinks formed one of the cues that very frequently co-occurred with a boundary. However, many of those blinks were detected by only a limited number of signers (low boundary agreement), which is perhaps related to the limited perceptual salience of eye blinks. Some other cues, such as pauses, drop hands, and holds, 294 | Sign L anguage Studie s seemed to occur mainly at strong IP boundaries (i.e., boundaries agreed upon by at least four native signers or hearing nonsigners). Because the cues could occur simultaneously and sequentially, it was diicult to detect those that the participants actually used to segment the narratives in the real-time online segmentation. No relationship was found between the number of cues present at an IP boundary and the identiied boundaries. Further, participants seemed to be sensitive to the same cues in a signed narrative regardless of whether they know the language. However, segmentation was more consistent for the deaf signers than for the hearing signers. The main conclusions were that phonetic cues of prosody form reliable indicators of sentence boundaries, while cues from grammar were argued not to be essential for segmentation tasks, as shown by the fast process in real time, which does not seem to allow thorough processing of all cues from grammar for segmentation purposes. Some IP boundaries are perceptually stronger than others and can even be identiied by those who do not know the language. For the cues at those boundaries (pauses, drop hands, holds), language experience seems to play only a minor role. It is therefore likely that the IP boundaries here coincide with even higher boundaries, such as of the prosodic domain “utterance,” or an even greater discourse break. As far as the theory of prosodic phonology is concerned, all IP boundaries are equal, even though the phonetic cues of a speciic IP boundary might be more prominent than those of another one, depending on contextual factors, for example. Alternatively, some presumed IP boundary cues may actually be cues for utterance boundaries. We ind this type of study to be very valuable. It is questionable whether at normal-speed viewing, signers do not make use of their semantic and syntactic processing in determining boundaries and actually rely only on prosodic cues. Further studies are needed to verify the indings and conirm them for other languages and materials, as well as to further analyze any diferences in boundary detection between those who are native users of a language and those who are not. We come back to this issue later in a suggestion for additional studies. Moreover, more research is needed to discover exactly how boundaries of identical and diferent layers difer (e.g., PP vs. IP vs. utterance), whether in duration, intensity, or type of cue marking. Prosodic Correlates of Sentences in Signed Languages | 295 Hansen and Heβmann (2007) state that none of the formal markers, such as blinks, change of gaze, lengthening, and transitions, are conclusive in detecting sentence boundaries in German Sign Language (DGS) and that they cannot be determined independently of meaning. Nevertheless, they argue that three cues are useful indicators in determining sentence boundaries in DGS. As opposed to using a cue-based approach (blinks and some additional cues) to indicate intonational phrase boundaries, a short sample of DSG was segmented by means of a functional analysis called TPAC (topic, predication, adjunct, and conjunct). The TPAC analysis supports the identiication of boundaries of nuclear sentences. The results of this functional analysis were largely in line with, and to a degree reined by, results based on intuitive judgments of sentences. As a next step, the occurrence of speciic manual signs, interactively prominent gestures, head nods, eye blinks, gaze direction, pauses, and transitions were compared to the segments based on the TPAC analyses of propositional content to determine whether prosodic cues of sentence boundaries occur consistently or exclusively at relevant boundaries, as shown by the TPAC analysis. The results showed that none of the prosodic cues consistently function as a boundary marker on their own. Hansen and Heβmann (2007) state that “As we will argue, signers recognize sentences by identifying propositional content in the course of a sense-making process that is informed but not determined by such form elements” (146). Nevertheless, temporal adjuncts such as past and now seemed to point to an antecedent boundary of some sort.The palm-up gesture discussed in this study clearly requires attention in future studies on sentence boundaries (see also Van der Kooij, Crasborn, and Ros 2006). Hansen and Heβmann show that 60 percent of the palm-up gestures appeared at a sentence boundary. The remaining 40 percent referred to multiple additional functions of the palm-up gesture, which altogether would appear to make it an inconsistent cue for sentence boundaries by itself. Similarly, head nods with a “concluding force” (see also Bahan and Supalla 1995) may indicate sentence boundaries and occurred to a minor extent in their data. This kind of head nod is interactively prominent, whereby the content of the previous part is airmed before the signer continues. However, as for the palm-up sign, Hansen and Heβmann suggest that it would be most peculiar to 296 | Sign L anguage Studie s ind head nods marking the boundaries of every sentence in ordinary discourse. Furthermore, eye blinks were found to co-occur rather often with sentence boundaries. A large-scale perception study of prosodic cues at sentences boundaries by native deaf participants on the production of signing by ASL sign-language interpreters was performed by Nicodemus (2009). Fifty native deaf signers identiied sentence boundaries in a video of an interpreted lecture in real time. Twenty-one prosodic markers at the identiied sentence boundaries were scored and grouped into one of the following four articulatory categories: (1) hands: held hand shape, hand clasp, ingers wiggling, hands drop, signing space; (2) head and neck: head position tilt (front and back; left and right), head position turn (left and right), head movement (nod; shake; side to side), and neck tension; (3) eyes, nose, and mouth: eyebrows, eye gaze, eye aperture, nose, and cheeks; (4) body: breath, body lean, body movement, and shoulder actions. The most frequent markers at the boundaries identiied by the native signers were examined in each of these four categories. The results showed that in the “hands” category the most frequent marker is hand clasp, and the second most frequent is held hand shape. In the “head and neck” category the most frequent marker is head tilt, and the second most frequent is head turn. In the “eyes, nose, and mouth” category the most frequent marker is increased eye aperture, and the second most frequent is eyebrows. Finally, in the “body” category the most frequent marker is body lean, and the second most frequent is shoulders. The cues involving larger articulators were the most frequent at boundary points (such as hand clasps and body leans). Markers of ongoing movements were used less frequently or in co-occurrence with a held marker. In addition to frequency, she studied the duration of the prosodic cues, the number of markers at each identiied boundary, and the timing of the markers in relation to a target cue, which was the handclasp. The longest duration was found for body lean, and the shortest duration was found for eye aperture, which is what one would expect given the diference in mass between the articulators: eyelid(s) and whole upper body. The maximum number of co-occurring cues of one sentence boundary was seven. Nevertheless, for 31 percent of the cues, a sequential timing pattern (occurring completely before or Prosodic Correlates of Sentences in Signed Languages | 297 after a target cue) was found. The speciic combinations of the occurring cues at the boundary points were not analyzed in detail, although the overall simultaneous use of (smaller) articulators was established for most of the cues (see also Nicodemus 2006). The term “sentence” was used to explain the task of segmenting in the written instruction. Which boundaries really refer to sentences in signed languages remains an open question. In contrast to studies of the cue-based approach by Fenlon et al. (2007) and the TPAC analyses by Hansen and Heβmann (2007), Nicodemus identiied the boundaries based only on native signers’ judgments, given that there is no easy morphosyntactic or semantic answer as to what sentences in signed languages should contain to form a coherent whole. As we indicated at the end of the irst section, an approach such as that taken by Nicodemus is not a problem if the perspective is that of automatic detection of prosodic boundaries; however, it remains unclear what the “sentences” look like. One of the further questions based on Nicodemus’s work is whether we can perceive the larger cues better simply because they are perceptually more prominent or whether they would also be more frequent if we used video analysis to examine the occurrence of cues at boundary points. Related is the question of whether the most frequent cues are also the most reliable ones for segmentation. In addition, it might be the case that interpreters produce the various cues diferently when compared to deaf signers. Finally, for the segmentation task, the speciic type of instructions to the participants may play a role in their performance. Hochgesang (2009) similarly studied sentence identiication in ASL. Twenty-one deaf native and early sign-language users from Gallaudet University looked at three clips of narratives. One of the topics Hochgesang examined was the type of task instructions. The participants were divided into three groups (seven participants in each group). Each of the three groups received diferent instructions. Seven people were asked to identify sentences, seven were asked to identify where the periods should be, and seven were asked to identify where the narrative could be divided. The irst time they saw the video they were instructed only to look without doing anything. On the second viewing, they segmented the data by reporting the time code of the video where they perceived the end. The participants were told that 298 | Sign L anguage Studie s they could change their answers if they wished. The results showed that the type of question asked to identify the boundaries of sentencelike units does not have much efect. Hochgesang also states that the exact type of unit that was segmented is not quite clear. Transcription of sign language videos can be done at the level of intonation unit, utterance, idea unit, clause, sentence, intonational phrase, and possibly other levels as well. Similar to Nicodemus, Hochgesang did not examine the content of the chunks that the deaf participants identiied. Herrmann (2009) performed an extensive empirical study on prosody in German Sign Language (DGS), based on eight native DGS signers who were recorded for two hours each. Two hundred forty short dialogs and contexts, plus twenty-four picture stories, were analyzed. Multiple prosodic cues were analyzed, referring to rhythm, prominence, or intonation. For rhythm, the following cues were analyzed: pauses, holds/frozen signs, lengthening, eye blinks, signing rate, head nods, reduplication, and gestures. For prominence, the following cues were analyzed: head movement, eyebrow movement, eye aperture, tense signing, and lengthening and enlarging of signs. For intonation, eyebrow movement, eye aperture, eye gaze, frown, facial expression, mouth gesture, and head movement were studied. Some cues are spread across multiple syllables and function as domain markers. Domain markers that change at phrase boundaries include facial movement, head movement, and body movement (Herrmann 2009). Edge markers (e.g., eye blinks, head nods, pauses, repetition of signs, holds, and inal lengthening) are observed at the prosodic phrase boundaries. Around a third of the blinks did not have a prosodic function, according to Herrmann’s analysis. At 78.8 percent of the intonational phrase boundaries, a blink was observed. In 94.7 percent of the intonational phrase boundaries, either blinks or other cues were observed. As discussed in the previous studies, a complex interplay appears to occur between prosodic markers as opposed to a one-to-one form and function relationship. Equative sentences formed the subject of an investigation of Finnish Sign Language (FinSL) by Jantunen (2007). Equative sentences are nominal structures that are often applied for identiication, such as introduction, deining, and naming. In those equative sentences, Jantunen also studied the nonmanual behaviors, including Prosodic Correlates of Sentences in Signed Languages | 299 prosodic features such as eye blinks, eye gaze, eyebrow movements, and movements of body and head position. Jantunen found that the nonmanual behaviors in the diferent types of equative sentences showed substantial uniform occurrence of features. In general, alterations of head postures and body postures seemed to mark phrase or sentence boundaries (cf. the indings on ASL, HKSL, and ISL reported earlier). Moreover, blinks occurred often, though not always, at sentence boundaries and were also observed at locations other than boundaries. Brentari (2007) inds that native signers and nonsigners sometimes difer in their segmenting strategies. In a segmentation study, signers and nonsigners were asked to mark the edges of intonational phrases in ASL passages that contained pairs of identical sequences of two diferent signs, either with an IP break between the signs or with no break. The sign pairs were produced as part of longer sequences of signs by ASL instructors in infant directed signing (IDS); their interlocutor was a hearing sixteen-month-old toddler who was familiar with baby sign. The use of IDS might have resulted in exaggerating their prosody.The cues in the stimuli in the two types of sign pairs (either with or without IP breaks between the signs) were eye blinks (70 percent between clause vs. 0 percent within clause), duration (mean 1100 msec between clause vs. 730 msec within clause), hold (mean 400 msec between clause vs. 66 msec within clause), pause (mean 780 msec between clause vs. 90 msec within clause), and drop hands (mean 70 percent between clause vs. mean 0 percent within clause). Brentari shows that native signers were more accurate than nonsigners at detecting the IP boundaries. In a more recent study, Brentari et al. (2011) shows the extent to which sensitivity to visual cues in IPs can be attributed to the visual modality (by signers and nonsigners) and to language experience (by signers only). In their work on identifying clauses in Auslan (Australian Sign Language), Johnston and Schembri (2006) have found that signals such as pauses, blinks, changes in eye gaze, changes in brow position, changes in head position, and so forth do not always systematically occur at boundaries. This suggests that any potential boundary cues are not completely grammaticalized, and most of these cues have a pragmatic function instead. As a result, seeing sentences in sign would present a challenge for linguistic analysis (Johnston and Schembri 2006). 300 | Sign L anguage Studie s Summary: Combinations of Phonetic Cues to Prosodic Boundaries. This presentation of the rather diverse set of studies has made clear that we have no evidence of a dominant role of one cue or a speciic combination of cues in the signaling of prosodic boundaries. Multiple cues of both a durational and a punctual nature appear to be present in various signed languages, including ASL, HKSL, DGS, FinSL, NGT, and BSL. Some authors point to the complex relation between syntax and phonetic form, and many of them propose that a phonological level of organization exists as well. Although most authors would agree that this phonological level includes a hierarchical set of prosodic domains mediating between the two (cf. the speech model of Shattuck-Hufnagel and Turk [1996], presented in igure 1; cf. also the seminal work of Nespor and Sandler [1999]), very few (if any) authors explicitly discuss the fact that this should also hold for the actual phonetic events that are related to syntactic or semantic features, such as eye brow positions or head movements. As we indicated earlier, the comparison with the overall linguistic organization of spoken languages makes this rather likely, yet little research has been done in this area. Very few observational or experimental studies of phonetic variation in the form of intonational features in signed languages have been carried out. In the next two subsections we briely discuss how machine processing of speech and sign is trying to automatically recognize prosodic boundaries. Automatic Detection of Sentence Boundaries in Spoken Languages Sentence boundary locations can be extracted from audio with reasonable reliability. Gotoh and Renals (2000) have described an approach whereby sentence boundary information was extracted statistically from text and audio resources in broadcast speech transcripts. Pause duration based on speech-recognizer outputs was used to establish boundaries, in addition to the conventional language model component that can identify sentence markers to some extent. The combination of the pause duration model and the language model appears to provide an accurate identiication of boundaries. As for text, it is important for the understanding of spoken language to ind the location of sentence boundaries. In text, punctua- Prosodic Correlates of Sentences in Signed Languages | 301 tion is structurally provided. This is not explicitly indicated in spoken language. Similar to Gotoh and Renals (2000), Stolcke et al. (1998) found that combining models (in their case a combination of prosodic and language model sources, modeled by decision trees and n-grams) led to better results than use of the individual models (see also, e.g., Shriberg et al. 2000). For their study, Stolcke et al. (1998) examined three aspects of prosody: duration (of pauses, inal vowels, and inal rhymes, normalized both for segment durations and speaker statistics), pitch (F0 patterns preceding the boundary, across the boundary, and pitch range relative to the speaker’s baseline), and energy (signal-tonoise ratio). These machine-processing strategies indicated that, for spoken languages, no single cue would ever be reliable enough to segment the stream of language production. Automatic Detection of Sentence Boundaries in Signed Languages Sentence boundary detection in spoken languages is relatively new compared to the processing of textual data. Sentence boundary detection in signed languages is even more recent. Only a very limited amount of work has been done thus far. Nevertheless, in signed languages it is as important to be able to detect boundaries as it is in spoken languages. In ongoing work by Koskela et al. (2008), computer vision techniques for the recognition and analysis of gesture and facial expressions from video are applied to sign-language processing of FinSL. Existing video feature-extraction techniques provide a basis for the analysis of sign-language videos. Koskela et al. (2008) further apply an existing face-detection algorithm to perceive the eyes, mouth, and nose. The relation between motion and sign-language sentence boundaries is currently under investigation. Results have not yet been published. Related to the work by Koskela et al. (2008), Jantunen et al. (2010) describe a technical method of visualizing prosodic data in signed languages. Data on prosody in sign language were similarly represented graphically and analyzed semiautomatically from the digital video materials. Similar techniques are extensively used in spokenlanguage research in software such as Praat (http://www.praat.org), which analyzes speech recordings and presents them graphically. In the past, several attempts were made to perform linguistic analysis of 302 | Sign L anguage Studie s motion and other parameters when the signed videos were produced in predetermined laboratory settings using complex motion-tracking equipment. As recently described by Piater, Hoyoux, and Du (2010), the automatic recognition of natural signing is demanding due to the presence of multiple articulators, each of which has to be identiied and processed (e.g., ingers, lips, facial expressions, body position), as well as to technical limitations such as restricted spatial and temporal resolution and unreliable depth cues. Similar challenges were mentioned by Jantunen et al. (2010) and by Crasborn et al. (2006). Piater, Hoyoux, and Du (2010) used videos to analyze two body areas. The irst part of the video analyses extracted detailed facial expressions such as mouth and eye aperture and eyebrow raise. The second part focused on hand tracking. Any type of numeric data can be displayed along with video recordings in recent versions of ELAN, the multimodal annotation tool. Crasborn et al. (2006) described the development of this facility in ELAN. They presented the multimodal annotation tool in the context of the collection of kinematic recordings of hand and inger movements. Some of the major advantages of using such data rather than the output of video processing is the high spatial and temporal resolution that many motion-tracking systems can obtain (the time resolution often ranges between 100 and 500 Hz, compared to 25 Hz for PAL video). Analyses based on raw position data may help us calculate velocity, acceleration, and jerk, parameters that some argue can tell us something about stress (Wilbur 1990, 1999; Wilbur and Zelaznik 1997), which may also turn out to enhance our knowledge of other aspects of prosody. However, among the disadvantages of using kinematic equipment are the unnatural signing environment and the impossibility of analyzing the growing number of video corpora of signed languages. However, the two-dimensional data inherent in video recordings no longer pose a problem, given that movement in all three spatial dimensions is recorded with equal accuracy. Cutler, Dahan, and Van Donselaar (1997) have stated the following: “Of great value to future work for spoken languages are greater phonetic precision, consideration of cross-language variation, and a theoretical framework allowing explicit prediction (of prosody) towards Prosodic Correlates of Sentences in Signed Languages | 303 processing efects” (171). We suggest that the same is true for future work related to signed languages. It will be clear that the technical possibilities discussed in this paragraph have not yet led to knowledge of segmentation of signed discourse into signs and sentences but hold great promise for the future. Knowledge of phonetic cues is slowly growing, including knowledge of prosodic cues of sentences. Nevertheless, none of the investigated prosodic cues seem to provide a fully reliable predictor function thus far for the presence of sentence boundaries. Suggestions for Two Types of Empirical Studies In this section we describe two research directions that can contribute to our understanding of prosodic cues of sentence boundaries, one involving experimental work on human perception and one involving perceptual tracking of phonetic cues. First, we suggest new tests of human segmentation of signed sentences. Second, we describe the use of new tracking techniques from visual signal processing to detect salient events in a video recording of sign language. Study 1: New Tests of Human Segmentation of Signed Sentences We suggest several new tests of human segmentation of signed sentences, which include video manipulation of various kinds.The overall idea behind the tests is to prevent the semantic processing of the signing in the video by human subjects, whether signers or nonsigners, as semantic processing can interfere with the pure phonetic parsing. Similar methods have been used in studies of spoken languages, whereby the spoken stream is manipulated in such a way that it becomes harder to understand the speech, yet the subjects could still process prosodic features (Van Bezooijen and Boves 1986; Mettouchi et al. 2007). Typically, this involves the application of low-pass ilters, which mask the segmental content while maintaining duration and melodic properties. A parallel signal manipulation for sign-language videos would involve a strong decrease in visual quality. This could be created, for example, by blurring the visual scene or by lowering the spatial resolution (dots per inch) in such a way that it becomes harder to understand the signing. In particular, the manual lexical content should become as 304 | Sign L anguage Studie s incomprehensible as possible, as this is where most of the information load is located. Furthermore, if the aim were to examine the speciic contribution of prosodic features of the head and face in particular, separately from the contribution of the body and the manual prosodic features, only the faces of the signers could be shown in an additional video manipulation without showing the remainder of the body. Conversely, if the aim were to examine the speciic contribution of prosodic features of the body and the hands, only the signer’s body could be shown without showing the head. Those suggestions are expected to strongly disrupt the processing of semantic information in the discourse when subjects are asked to make segmentation judgments. Both are also problematic given the many signs that have the head or face as the location for manual articulation. A technically simple way to circumvent this might be to have signers wear clothing and gloves with the same color or perhaps skin-colored clothing so as to obscure hand and/or inger movements a bit. Alternatively, prosodic cues can be made to stand out in a (manipulated) video in order to make the prosodic cues more noticeable. For example, the eye contours can be highlighted with cosmetics to make short blinks more easily detectable, or body and head contours can be highlighted (e.g., changing color when the head or body moves in a certain way). Quite a diferent method of hindering the semantic processing of the (sign) stream would be to process an unfamiliar (sign) language (cf. Fenlon et al. 2007 for signed languages; Mettouchi et al. 2007 for spoken languages). In contrast to the processing of an unfamiliar spoken language, the processing of an unfamiliar sign language may be afected by the presence of similarities between the familiar and unfamiliar signed languages, especially for those signs that are semantically motivated (showing some level of transparency between the meaning and the form of the sign). Fenlon et al. report that the British Sign Language signers in their study were able to partly understand stories in (unfamiliar) Swedish Sign Language. Signers of the familiar and unfamiliar sign language reported similar prosodic boundaries, possibly related to their partial understanding of the stories. This inding of partial understanding of signed stories emphasizes the additional beneit of using video manipulation to hinder the understanding of the content. In addition to signers’ processing of an unfamiliar sign Prosodic Correlates of Sentences in Signed Languages | 305 language, Fenlon et al. also involved hearing nonsigners, who may have relied on cues they knew from gestures in face-to-face communication. Another problem in having users of a sign language process the prosody of an unfamiliar sign language is that the prosodic systems and the phonetic cues that are used may difer between the two languages. While the earlier literature review mainly shows overlaps between languages in the type of prosodic cues that are used, the precise timing and quality of the nonmanual articulation may still vary between languages in a more subtle way than has been studied so far. In fact, given what we know about spoken-language prosody, it is highly likely that linguistic variation between signed languages is also located in the phonetic implementation of phonological features (e.g., Gussenhoven 2004). For that reason, we suggest that preventing access to semantic processing by video manipulation may be preferred to the use of subjects that do not master the sign language in question—whether foreign sign-language users or nonsigners. More generally, independent of the precise method used to elicit human segmentation judgments, post-hoc analyses of prosodic cues at highlikelihood segmentation regions would be useful in examining the co-occurrence of various combinations of cues (the end and start of domain markers in combination with boundary markers), as past empirical studies have shown that none of the individual prosodic cues provide suicient predictive power for the occurrence of a sentence boundary. Several people have emphasized the presence of co-occurring cues at boundaries; however, speciic combinations of prosodic cues have not been suggested thus far. In summary, the video manipulations of the perceiver’s own language and the general suggestion for post-hoc analyses of co-occurring cues may provide new insights into the use of prosodic cues at sentence boundaries. Study 2: Experimental Tests of Human vs. Machine Segmentation In sign-language research on prosody, video analysis and the use of movement-capture techniques could also be applied to extract useful information on prosodic cues. Video analysis techniques, such as described by Piater et al. (2010) and Jantunen et al. (2010), are very promising with respect to the analysis of prosodic cues in nonrestricted, 306 | Sign L anguage Studie s natural, continuous sign language.The simultaneous access to numeric data and video recordings that is possible with tools like ELAN makes such analyses relatively accessible to a wide audience (Crasborn et al. 2006). Moreover, kinematic recordings of inger movements by using kinematic measurements are very promising, given the high temporal and spatial resolution, even though the recording setting will be somewhat less natural. One of the main advantages of video analysis is that large corpora can be processed this way. This in turn means that statistical patterns among the co-occurrence of diferent types of cues can then be calculated. These techniques would allow reinements of the recent human perception studies in which both signers’ and nonsigners’ intuitions were analyzed with regard to boundaries in sign-language discourse. It may prove helpful to use data from video analysis and motion-tracking techniques to examine the actual cooccurrence of speciic combinations of prosodic cues that appear to be most predictive for sentence boundaries cues. We emphasize, however, that the discovery of statistical patterns in large datasets does not in itself constitute a linguistic analysis of the structures in question. They should be considered as tools for linguistic analyses, just as quantitative analyses are used in spoken-language phonetic research. It is the linguistic model that should generate the hypotheses to test. For automatic recognition and translation of signed languages, however, there are more direct advantages of computer processing of phonetic cues. To illustrate the possibilities for facial and head feature detection, igure 2 shows an example of feature extraction of the face, developed by Piater et al. (2010; see also Dreuw et al. 2010) using video analysis. For each of the four video images, three drawings of the itted model are presented (from top to bottom: a full model, a meshed shape, and a plotted shape). In addition, three vertical lines are present in the left of the image, relecting the measurement of several nonmanual cues. From left to right, the following features are presented: left eye aperture, mouth aperture, and right eye aperture. The three axes on the face represent information about the orientation of the face with the origin at the tip of the nose. Currently, video analyses can detect certain prosodic signals more easily than others. Some of the restricting factors are exceptionally Prosodic Correlates of Sentences in Signed Languages | 307 Fi g ure 2 . Visual representation of automatically detected properties of nonmanual visual cues. Images courtesy of Justus Piater and Thomas Hoyoux, University of Innsbruck, Austria. large or small movements, the visual similarity between the skin of the hands and of the face when manual signs occur in front of or near the face, and the general quality of the video materials.Theoretically, these speciic restrictions do not apply to kinematic measures when using data gloves. However, although hands can be tracked relatively well with data gloves or small markers attached to the body, some features of the face, such as eye blinks, would require diferent techniques. Regardless of these challenges, video analyses, kinematic recordings, and intuitive judgments can be mutually and highly informative for the increase of our knowledge of sentence boundaries.Video analyses and kinematic recordings can provide exceedingly detailed data on each of the possible cues. However, intuitive judgments can provide information about the actual presence of a prosodic boundary in sign language. Without native informants to conirm the feature extraction data during the initial phase of analysis of (combinations of) features deriving from video analyses, it would be impossible to determine whether the data in fact pointed toward boundaries. Beyond the current technical restrictions in video analysis, the equivalent of the “acoustic” measurements based on automatic measures should 308 | Sign L anguage Studie s therefore be combined with the perceptual transcription of the prosodic boundaries. In an ideal situation, the perceptual measures and transcriptions should be elicited from (native or near-native) signers who can provide intuitive judgments on their own language. As we already noted, additional phonological analyses would subsequently be necessary to provide more information on the actual domains that are identiied. Equally, the identiication of boundaries by intuitive signers can be much more closely analyzed if not only manual annotations of (co-) occurring cues were provided for the identiied boundaries but also information based on the feature extraction data that cannot be determined by the naked eye, such as temporal properties (detailed duration, velocity, and acceleration). Conclusion The literature on phonetic correlates of sentence boundaries in signed languages that we have reviewed in this chapter is highly diverse and of variable quality in terms of data collection and general methodology. Experimental approaches to prosody as are common in the speech prosody ield are slowly starting to appear. Many nonmanual articulators that are typically not active in the lexicon (e.g., eyebrows, head) can have multiple functions in the prosody of the languages that have been studied so far, making it hard to ind one-to-one mapping of form and function. In efect, this is not unlike the complexity of pitch in spoken languages. We have suggested two diferent ways in which the impact of various cues on clause and sentence segmentation can be studied, both focusing on the perception of signs. How can knowledge of the prosody of signed languages be employed for language technology, such as the automatic translation from signed to spoken language? As long we have not captured reliable predictors (speciic combinations of prosodic cues) of sentence boundaries and a reliable method of capturing them, at least three challenges will be present. First, a time-consuming manual procedure will be required to annotate each sentence boundary, based on native signers’ intuitive judgments. Second, automatic translation will remain relatively diicult, as the only predictors of sentence boundaries (or, put diferently, the cohesion of a series of signs in a larger syntactic unit) are statistical patterns in the semantic cohesion of the lexical items Prosodic Correlates of Sentences in Signed Languages | 309 that have been recognized by the system. Finally, as long as sentenceboundary annotations need to be added manually, interpersonal variation in the annotation process is inevitable, and some prosodic cues cannot be observed in suicient detail. This implies that increased knowledge of prosody in signed languages and prosody related to sentence boundaries in particular can lead to better predictions of the presence of boundaries. This in turn will help to improve automatic translation from sign to speech. Once these predictive cues can be captured automatically and reliably by means of video analysis (possibly combined with the detection of lexical items that often occur at sentence boundaries), sentence units can be automatically generated. For linguistic analyses, this can facilitate the examination of much larger datasets than would have been possible by phonetic transcription. Moreover, the captured chunks can in theory be translated automatically. Development of sentence translation software is of course further dependent on a suicient number of signs in the sentence that are correctly recognized by machine recognition algorithms. Many challenges are still to be overcome in this area, given our restricted understanding of signed languages and the relatively short research tradition in this area (Dreuw et al. 2010; Ormel et al. 2010). References Allen, G. D., R. B. Wilbur, and B. B. Schick. 1991. Aspects of Rhythm in American Sign Language. Sign Language Studies 72: 297–320. Bahan, B. N., and S. J. Supalla. 1995. Line Segmentation and Narrative Structure: A Study in Eyegaze Behavior in American Sign Language. In K. Emmorey and J. Reilly, eds., Language, Gesture, and Space, 171–91. Hillsdale, N.J.: Erlbaum. Baker, C., and C. Padden. 1978. Focusing on the Nonmanual Component of ASL. In P. Siple, ed., Understanding Language through Sign Language Research, 25–27. New York: Academic Press. Bögels, S., H. Schriefers, W.Vonk, D. J. Chwilla, and R. Kerkhofs. 2009. The Interplay between Prosody and Syntax in Sentence Processing: The Case of Subject- and Object-Control Verbs. Journal of Cognitive Neuroscience 22(5): 1036–53. Boyes Braem, P. 1999. Rhythmic Temporal Patterns in the Signing of Deaf Early and Late Learners of Swiss German Sign Language. Language and Speech 42(2/3): 177–208. 310 | Sign L anguage Studie s Brentari, D. 1998. A Prosodic Model of Sign Language Phonology. Cambridge, Mass.: MIT Press. ———. 2007. Perception of Sign Language Prosody by Signers and Nonsigners. Paper presented at the Workshop on Visual Prosody in Language Communication. Max Planck Institute for Psycholinguistics, Nijmegen, May 10–11., and L. Crossley. 2002. Prosody of the Hands and Face: Evidence from American Sign Language. Sign Language and Linguistics 5(2): 105–30. ———, C. González, A. Seidl, and R. Wilbur. 2011. Sensitivity to Visual Prosodic Cues in Signers and Nonsigners. Language and Speech 54(1) (March): 49–72. Byrd, D., J. Krivokapic, and S. Lee. 2006. How Far, How Long: On the Temporal Scope of Prosodic Boundary Efects. Journal of the Acoustical Society of America 120(3): 1589–99. Carlson, K., C. Clifton, and L. Frazier. 2001. Prosodic Boundaries in Adjunct Attachment. Journal of Memory and Language 45: 58–81. Crasborn, O. 2001. Phonetic Implementation of Phonological Categories in Sign Language of the Netherlands. Utrecht: LOT. ———. 2007. How to Recognize a Sentence When You See One. Sign Language and Linguistics 10: 103–11. ———, E. van der Kooij, and W. Emmerik. 2004. Eye Blinks and Prosodic Structure in NGT. Talk presented at a European Science Foundation (ESF) Exploratory Workshop, Modality Efects on the Theory of Grammar: A Crosslinguistic View from Signed Languages of Europe. Barcelona, Spain, November 14–18. ———, H. Sloetjes, E. Auer, and P. Wittenburg. 2006. Combining Video and Numeric Data in the Analysis of Sign Languages within the ELAN Annotation Software. In C.Vettori, ed., Proceedings of the 2nd Workshop on the Representation and Processing of Sign Languages: Lexicographic Matters and Didactic Scenarios, 82–87. Paris: ELRA. Cutler, A., D. Dahan, and W. van Donselaar. 1997. Prosody in the Comprehension of Spoken Language: A Literature Review. Language and Speech 40(2): 141–202. De Vos, C., E. van der Kooij, and O. Crasborn. 2009. Mixed Signals: Combining Linguistic and Afective Functions of Eyebrows in Questions in Sign Language of the Netherlands. Language and Speech 52(2/3): 315–39. Dreuw, P., J. Forster, Y. Gweth, D. Stein, H. Ney, G. Martinez, J. Verges Llahi et al. 2010. SignSpeak: Scientiic Understanding and Vision-Based Technological Development for Continuous Sign Language Recognition and Translation. In P. Dreuw, E. Efthimiou, T. Hanke, T. Johnston, G. Martinez Ruiz and A. Schembri, eds., Proceedings of the 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, 65–72. Paris: ELRA. Prosodic Correlates of Sentences in Signed Languages | 311 Engberg-Pedersen, E. 1993. Space in Danish Sign Language: The Semantics and Morphosyntax of the Use of Space in a Visual Language. Hamburg: Signum. Fenlon, J., T. Denmark, R. Campbell, and B. Woll. 2007. Seeing Sentence Boundaries. Sign Language and Linguistics 10(2): 177–200. Frazier, L., K. Carlson, and J. C. Clifton. 2006. Prosodic Phrasing Is Central to Language Comprehension. Trends in Cognitive Sciences 10(6): 244–49. Gerken, L. 1996. Prosody’s Role in Language Acquisition and Adult Parsing. Journal of Psycholinguistic Research 25(2): 345–56. Gotoh, I., and S. Renals. 2000. Sentence Boundary Detection in Broadcast Speech Transcripts. Proceedings of the International Speech Communication Association (ISCA) Workshop: Automatic Speech Recognition: Challenges for the New Millennium (ASR-2000).228-232. Paris, 18-20 September. Gussenhoven, C. 2004. The Phonology of Tone and Intonation. Cambridge: Cambridge University Press. Hansen, M., and J. Heβmann. 2007. Matching Propositional Content and Formal Makers: Sentence Boundaries in DSG Text. Sign Language and Linguistics 10(2), 145-175. Herrmann, A. 2009. Prosody in German Sign Language. Paper presented at the Workshop on Prosody and Meaning, September 17–18, Frankfurt am Main. http://prosodia.upf.edu/activitats/prosodyandmeaning/home /presentations.php. Hochgesang, J. A. 2009. Is There a “Sentence” in ASL? Insight on Segmenting Signed Language Data.Talk presented at the Sign Language Corpora: Linguistic Issues Workshop, July 24, London. Jantunen, T. J. 2007. The Equative Sentence in Finnish Sign Language. Sign Language and Linguistics 10(2): 113–43. ———, M. Koskela, J. T. Laaksonen, and P. I. Raino. 2010. Towards the Automatic Visualization and Analysis of Signed Language Prosody: Method and Linguistic Issues. Paper to appear in the proceedings of the 5th International Conference on Speech Prosody. May 11–14, Chicago. Johnson, R. E., and S. K. Liddell. 2010. Towards a Phonetic Representation of Signs: Sequentiality and Contrast. Sign Language Studies 11(2): 241–74. ———. 2011. A Segmental Framework for Representing Signs Phonetically. Sign Language Studies 11(3), 408–64. Johnston, T. 1991. Spatial Syntax and Spatial Semantics in the Inlections of Signs for the Marking of Person and Location in AUSLAN. International Journal of Sign Linguistics 2(1): 29–62. ———, and A. Schembri. 2006. Identifying Clauses in Signed Languages: Applying a Functional Approach. Paper presented at the DGfS (Deutsche Gesellschaft für Sprachwissenschaft) 2006 workshop, How to Recognize a Sentence When You See One: Methodological and Linguistic Issues in the Creation of Sign Language Corpora, February 23–24, Bielefeld, Germany. 312 | Sign L anguage Studie s Koskela, M., J. Laaksonen, T. Jantunen, R. Takkinen, P. Raino, and A. Raike. 2008. Content-Based Video Analysis and Access for Finnish Sign Language: A Multidisciplinary Research Project. In O. Crasborn, T. Hanke, E. Efthimiou, I. Zwitserlood, and E. Thoutenhoofd, eds., Proceedings of the 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora, 101-104. Ladd, D. R. 1996. Intonational Phonology. Cambridge: Cambridge University Press. Laver, J. 1994. Principles of Phonetics. Cambridge: Cambridge University Press. Liddell, S. 1980. American Sign Language Syntax. The Hague: Mouton. ———. 2003. Grammar, Gesture, and Meaning in American Sign Language. Cambridge: Cambridge University Press. Meier, R. 2002. Why Diferent, Why the Same? Explaining Efects and Non-Efects of Modality upon Linguistic Structure in Sign and Speech. In R. Meier, K. Cormier, and D. G. Quinto-Pozos, eds., Modality and Structure in Signed and Spoken Languages, 1–26. Cambridge: Cambridge University Press. Mettouchi, A., A. Lacheret-Dujour,V. Silber-Varod, and S. Izre’el. 2007. Only Prosody? Perception of Speech Segmentation in Kabyle and Hebrew. Nouveaux cahiers de linguistique française 28: 207–18. Neidle, C., J. Kegl, D. MacLaughlin, B. Bahan, and R. G. Lee. 2000. The Syntax of American Sign Language: Functional Categories and Hierarchical Structure. Cambridge, Mass.: MIT Press. Nespor, M., and I. Vogel. 1986. Prosodic Phonology. Dordrecht: Foris. Nespor, M., and W. Sandler. 1999. Prosody in Israeli Sign Language. Language and Speech 42(2–3): 143–76. Nicodemus, B. 2006. Prosody and Utterance Boundaries in ASL Interpretation. Paper presented at the DGfS (Deutsche Gesellschaft für Sprachwissenschaft) 2006 workshop, How to Recognize a Sentence When You See One: Methodological and Linguistic Issues in the Creation of Sign Language Corpora, February 23–24, Bielefeld, Germany. ———. 2009. Prosodic Markers and Utterance Boundaries in American Sign Language Interpretation. Washington D.C.: Gallaudet University Press. Ormel, E., O. Crasborn, E. van der Kooij, L. van Dijken, E.Y. Nauta, J. Forster, and D. Stein. 2010. Glossing a Multi-Purpose Sign Language Corpus. In P. Dreuw, E. Efthimiou, T. Hanke, T. Johnston, G. Martinez Ruiz, and A. Schembri, eds., Proceedings of the 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, 186–91. Paris: ELRA. Piater, J., T. Hoyoux, and W. Du. 2010. Video Analysis for Continuous Sign Language Recognition. In P. Dreuw, E. Efthimiou, T. Hanke, T. Johnston, G. Martinez Ruiz, and A. Schembri, eds., Proceedings of the 4th Workshop Prosodic Correlates of Sentences in Signed Languages | 313 on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, 192–95. Paris: ELRA. Pierrehumbert, J. 1980. The Phonetics and Phonology of English Intonation. PhD diss., MIT, Cambridge, Mass. Sandler, W. 1989. Phonological Representation of the Sign: Linearity and Nonlinearity in American Sign Language. Dordrecht: Foris. Sandler, W. 1999a. The Medium and the Message: The Prosodic Interpretation of Linguistic Content in Israeli Sign Language. Sign Language and Linguistics 2(2): 187–215. ———. 1999b. Prosody in Two Natural Language Modalities. Language and Speech 42(2–3): 127–42. ———. 2010. Prosody and Syntax in Sign Languages. Proceedings of the Philological Society 108(3): 298–328. ———. Forthcoming.The Uniformity and Diversity of Language: Evidence from Sign Language. Lingua. ———, and D. Lillo-Martin. 2006. Sign Language and Linguistic Universals. Cambridge: Cambridge University Press. Selkirk, E. 1984. Phonology and Syntax:The Relation between Sound and Structure. Cambridge, Mass.: MIT Press. Shattuck-Hufnagel, S., and A. Turk. 1996. A Prosody Tutorial for Investigators of Auditory Sentence Processing. Journal of Psycholinguistic Research 25(2): 193–247. Shriberg, E., A. Stolcke, D. Hakkani-Tur, and G. Tur. 2000. Prosody-Based Automatic Segmentation of Speech into Sentences and Topics. Special Issue on Accessing Information in Spoken Audio. Speech Communication 32(1–2): 127–54. Stern, J., and D. Dunham. 1990. The Ocular System. In J. Cacioppo and L. Tassinary, eds., Principles of Psychophysiology: Physical, Social, and Inferential Elements, 513–53. Cambridge: Cambridge University Press. Stolcke, A., E. Shriberg, R. Bates, M. Ostendorf, D. Hakkani, M. Plauche, G. Tur, and Y. Lu. 1998. Automatic detection of sentence boundaries and disluencies based on recognized words. In: Mannell, R.H., RobertRibes, J. (Eds.), Proceedings of the International Conference on Spoken Language Processing, Sydney. Australian Speech Science and Technology Association, Vol. 5, pp. 2247–2250. Sze, F. 2008. Blinks and Intonational Phrasing in Hong Kong Sign Language. In J. Quer, ed., Signs of the Time: Selected Papers from TISLR 2004, 83–107. Hamburg: Signum. Van Bezooijen, R., and L. Boves. 1986. The Efects of Low-Pass Filtering and Random Splicing on the Perception of Speech. Journal of Psycholinguistic Research 15(5): 403–17. Van der Kooij, E. 2002. Phonological Categories in Sign Language of the Netherlands: The Role of Phonetic Implementation and Iconicity. 314 | Sign L anguage Studie s PhD dissertation, Landelijke Onderzoeksschool Taalwetenschap (LOT), Utrecht. ———, and O. Crasborn. 2008. Syllables and the Word Prosodic System in Sign Language of the Netherlands. Lingua 118: 1307–27. ———, and W. Emmerik. 2006. Explaining Prosodic Body Leans in Sign Language of the Netherlands: Pragmatics Required. Journal of Pragmatics 38(10): 1598–1614. ———. O. Crasborn, and J. Ros. 2006. Manual Prosodic Cues: Palm-Up and Pointing Signs. Poster presented at the ninth conference on Theoretical Issues in Sign Language Research, December 6-9, Florianópolis, Brazil. Wilbur, R. B. 1990. An Experimental Investigation of Stressed Sign Production. International Journal of Sign Linguistics 1(1): 41–59. ———. 1994. Eye Blinks and ASL Phrase Structure. Sign Language Studies 84: 221–40. ———. 1999. Stress in ASL: Empirical Evidence and Linguistic Issues. Language and Speech 42(2–3): 229–50. ———. 2000. Phonological and Prosodic Layering of Nonmanuals in American Sign Language. In H. Lane and K. Emmorey, eds., The Signs of Language Revisited: Festschrift for Ursula Bellugi and Edward Klima, 213–41. Hillsdale, N.J.: Erlbaum. ———, and H. N. Zelaznik. 1997. Kinematic Correlates of Stress and Position in ASL. Paper presented at the Annual Meeting of the Linguistic Society of America, January 3, Chicago. Appendix 1. Overview of the empirical basis of the studies discussed in the section on prosody at sentence boundaries, as reported in the publications. Reference Language View publication stats Baker and Padden (1978) ASL Bahan and Supalla (1995) Brentari (2007) ASL ASL Brentari and Crossley (2002) Brentari et al. (2011) ASL Crasborn, van der Kooij, and Emmerik (2004) Fenlon et al. (2007) NGT ASL BSL, SSL Type of Data 1: free conversations 2: elicited sentences narrative short passages containing pairs of signs 1: lecture to a mixed deaf/hearing audience 2: explanatory narratives short passages containing pairs of signs Number of Signers 1: 4 2: 4 1 not speciied 1: 1 2. 1 2 unpublished monologues and question-answer pairs four narratives from the ECHO corpus 1 4 Hansen and Heβmann (2007) Herrmann (2009) DSG extract from a published narration 1 DGS dialogs/contexts and picture stories 8 Hochgesang (2009) Jantunen (2007) ASL FinSL short narratives 1: example sentences from an online dictionary 2: intuitions elicited in unstructured interviews elicited by picture materials Johnston and Schembri (2006) Nespor and Sandler (1999) Nicodemus (2006; 2009) Aus-lan Sze (2004) HKSL Van der Kooij, Crasborn, and Ros (2006) Wilbur (1994) NGT Wilbur and Zelaznik (1997) ASL ASL ASL ASL not speciied not speciied Amount of Material Analyzed 1. 2 dyads 2. 40 sentences 30 minutes not speciied 1. 60 minutes 2. not speciied 10 clips, including 2 target sign pairs, repeated 3 times 60 minutes two stories per language, around 1.5 minutes each 33 seconds 240 short dialogs and contexts and 24 picture stories 3 clips 30 sentences 4 not speciied elicited sentences interpreted lecture 3 5 30 sentences 22 minutes 1: monologues in response to 3 questions 2: conversations elicited question-answer pairs 2 1. 14 minutes 2. 4 minutes 59 sentences per signer 1: three published narratives 2: one elicited narrative at diferent speeds target signs in carrier phrases; use of infrared diodes 8 1: 3 2: 4 13 1: not speciied 2: not speciied not speciied Number of Subjects in Perception Studies NA NA X signers; X nonsigners (babies and adults) NA NA NA 6 signers; 6 nonsigners NA NA 21 signers NA NA NA 50 deaf signers (observing only 1 interpreter each) NA NA NA NA