Prosodic Markers and Utterance Boundaries in American Sign Language Interpretation

1996). Although it is clear that languages differ as to the nature and number of phrasal units that they utilize, some kind of phrasing has been identified in virtually every language that has been examined from this perspective. The principal acoustic dimensions identified as marking phrasal structure are frequency (f0), duration, intensity, and segmental spectral properties.[1]

In the history of psycholinguistic research on this issue, findings have suggested that the prosodic structure of an utterance has a role in aiding the listener in perceiving, organizing, and comprehending spoken language. The methods used to address this have included measuring response times when performing a language task, evaluating judgments about well-formed and ill-formed prosody, as well as other measures for assessing language processing by the listener (McWhorter, 2003).

As early as 1961, Epstein found that a string of nonsense syllables is recalled better when presented in acceptable sentence structure than without, but only if spoken with the prosodic cues typical to the syntactic construction. In a similar study, spoken strings of words with grammatical constructions were more easily replicated than ungrammatical strings, but only if spoken with sentence prosody (Martin, 1968). Later, results from a related experiment suggested that speakers could recognize previously heard sentences, even nonsense utterances, more accurately if the same prosody was used in both the first and second presentation (Speer, Crowder, & Thomas, 1993). Several other studies have supported the finding that acoustic phrase marking tends to occur at major syntactic boundaries (Brown & Miron, 1971; Cooper & Paccia-Cooper, 1980; Goldman-Eisler, 1972; Klatt, 1976).

Speakers select a specific set of linguistic features in order to communicate an underlying message (Gumperz, 1982). For example, pausing while speaking is a strategy that enables the listener to break the discourse structure of a message into chunks and to interpret its meaning. In fact, a consistent finding in prosody is the presence of longer pauses at more important boundaries in discourse (Holmes, 1988; Mushin, Stirling, Fletcher, & Wales, 2003; Noordman & Vonk, 1999; Ouden, Wijk, & Swerts, 2000). Longer pausing is found to occur at the conclusion of a larger discourse segment (Grosz & Hirschberg, 1992; Hirschberg & Nakatani, 1996). Results such as these suggest that phrasal structure is used by speakers to organize the message being communicated and by perceivers to process the input (Cutler et al., 1997).

Studies have shown that speakers and listeners do not rely solely upon syntax to determine boundaries in discourse; rather a range of prosodic cues provides information about their location. In one experiment, Passonneau and Litman (1996) asked subjects to identify points in an informal, spoken, monologic narrative where they perceived the occurrence of a discourse boundary; that is, where the speaker finished one communicative task and began a new one. The subjects demonstrated a significant pattern of agreement on the location of discourse segment boundaries. Examination of the structure of the narrative showed that segmentation, coherence, and linguistic devices (including prosody) were all influencing factors that cue the location of boundaries.

The specific prosodic cues that mark boundary locations are revealed in a number of studies. For example, perceptible differences were found in sentence-final lengthening, pause duration, and voice quality at the boundaries between sentences, regardless of whether or not they are produced at the end of a paragraph (Lehiste, 1975, 1979). In English, these cues tend to be very localized: sentence-final lengthening affects primarily the coda of the syllable immediately preceding the boundary; however, at a major discourse boundary, some lengthening also occurs in the syllable immediately following the boundary (Fon, 2002; Wightman, Shattuck-Hufnagel, Ostendorf, & Price, 1992). Pauses and pitch have been found to be highly informative features in the detection of both sentence and topic boundaries (Shriberg, Stolcke, Hakkani-Tur, & Tur, 2000). In a recent examination of Swedish and American listening groups, it was found that individuals were able to successfully identify the location of boundaries in the language that they did not know (Carlson, Hirschberg, & Swerts, 2005). These findings support the claim that syntax alone does not fully predict the way that spoken utterances are organized. For this reason, prosody is a significant issue for the examination of auditory sentence processing.

Based on these findings, it is accepted “that prosody plays an important role in a listener’s ability to interpret the speaker’s intent” (Wightman et al., 1992, p. 1707); however, there are still questions about how cues in the acoustic signal actually mark the boundaries. Studies have shown that prosodic phrase boundaries are marked by a variety of acoustic cues that include intonation, pausing, and duration (Shattuck-Hufnagel & Turk, 1996). There is no consensus, however, on the relative importance of these cues and how each is used to signal boundaries. Moreover, only in a few languages has there been much investigation of precisely what boundaries are actually signaled. The study of prosody at phrasal boundaries is expected to grow due to recent commercial demands for the information. One interest in the interplay between prosody and discourse-level organization is driven by the desire to improve synthesized texts for human-machine communication (Hirschberg, 2002; Smith, 2004).

