Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marc Swerts is active.

Publication


Featured researches published by Marc Swerts.


Speech Communication | 2004

Prosodic and other cues to speech recognition failures

Julia Hirschberg; Diane J. Litman; Marc Swerts

In spoken dialogue systems, it is important for the system to know how likely a speech recognition hypothesis is to be correct, so it can reject misrecognized user turns, or, in cases where many errors have occurred, change its interaction strategy or switch the caller to a human attendant. We have identified prosodic features which predict more accurately when a recognition hypothesis contains errors than the acoustic confidence scores traditionally used in automatic speech recognition in spoken dialogue systems. We describe statistical comparisons of features of correctly and incorrectly recognized turns in the TOOT train information corpus and the W99 conference registration corpus, which reveal significant prosodic differences between the two sets of turns. We then present machine learning results showing that the use of prosodic features, alone and in combination with other automatically available features, can predict more accurately whether or not a user turn was correctly recognized, when compared to the use of acoustic confidence scores alone.


International Journal of Speech Technology | 2001

Error Detection in Spoken Human-Machine Interaction

Emiel Krahmer; Marc Swerts; Mariët Theune; Mieke F. Weegels

Given the state of the art of current language and speech technology, errors are unavoidable in present-day spoken dialogue systems. Therefore, one of the main concerns in dialogue design is how to decide whether or not the system has understood the user correctly. In human-human communication, dialogue participants are continuously sending and receiving signals on the status of the information being exchanged. We claim that if spoken dialogue systems were able to detect such cues and change their strategy accordingly, the interaction between user and system would improve. The goals of the present study are therefore twofold: (i) to find out which positive and negative cues people actually use in human-machine interaction in response to explicit and implicit verification questions and how informative these signals are, and (ii) to explore the possibilities of spotting errors automatically and on-line. To reach these goals, we first perform a descriptive analysis, followed by experiments with memory-based machine learning techniques. It appears that people systematically use negative/marked cues when there are communication problems. The experiments using memory-based machine learning techniques suggest that it may be possible to spot errors automatically and on-line with high accuracy, in particular when focussing on combinations of cues. This kind of information may turn out to be highly relevant for spoken dialogue systems, e.g., by providing quantitative criteria for changing the dialogue strategy or speech recognition engine.


Speech Communication | 2002

The dual of denial: two uses of disconfirmations in dialogue and their prosodic correlates

Emiel Krahmer; Marc Swerts; Mariët Theune; Mieke F. Weegels

In human–human communication, dialogue participants are continuously sending and receiving signals on the status of the information being exchanged. These signals may either be positive (‘go on) or negative (‘go back), where it is usually found that the latter are comparatively marked to make sure that the dialogue partner is made aware of a communication problem. This article focuses on the users signaling of information status in human–machine interactions, and in particular looks at the role prosody may play in this respect. Using a corpus of interactions with two Dutch spoken dialogue systems, prosodic correlates of users disconfirmations were investigated. In this corpus, disconfirmations can have two uses: they may serve as a positive signal in one context and as a negative signal in another. With the data obtained from the corpus an acoustic analysis and a perception experiment have been carried out. The acoustic analysis shows that the difference in signaling function is reflected in the distribution of the various types of disconfirmations as well as in different prosodic variables (pause, duration, intonation contour and pitch range). The perception experiment revealed that subjects are very good at classifying disconfirmations as positive or negative signals (without context), which strongly suggests that the acoustic features have communicative relevance. The implications of these results for human–machine communication are discussed.nnKeywords: Spoken dialogue systems; Prosody; Error detection; Information grounding; Perception


From brows to trust | 2004

More about brows

Emiel Krahmer; Marc Swerts

In a seminal paper, Ekman (1979) remarks that brows can play an accentuation role (e.g., to signal focus). However, the literature about eyebrows is inconclusive about their exact role and as a consequence there is no agreement among developers of embodied conversational agents about their precise timing and placement. In addition, it is unclear whether eyebrow movements perform the same role in different languages. In this chapter, an analysis-by-synthesis technique is used to find out what the role of eyebrow movements is for the perception of focus and to see whether this role is the same across different languages. Three experiments are performed, both for Dutch and Italian, investigating where subjects prefer eyebrow movements, whether brows influence the perceived prominence of words and whether they are used in a functional way when subjects interpret utterances. The results for Dutch and ltalian are indeed different, but it is argued that these differences can be reduced to prosodic differences between the two languages. The advantages and potential limitations of studies via analysis-by-synthesis are discussed, and an approach to compensate for the limitations is offered.


Speech Communication | 2002

Informational and dialogue-coordinating functions of prosodic features of Japanese echoic responses

Atsushi Shimojima; Yasuhiro Katagiri; Hanae Koiso; Marc Swerts

Echoic responses, which reuse portions of the texts uttered in the preceding turns, abound in dialogues, although semantically they contribute little new information. In this paper, we attempt to identify the informational and dialogue-coordinating functions of Japanese echoic responses while focusing on their prosodic and temporal features. Toward this goal, we conducted an observational study based on a corpus of spoken dialogues, as well as three complementary experiments, where particular prosodic/temporal features of echoic responses were studied in a controlled and focused manner. In combination, the two lines of analyses provide evidence that (1) echoic responses with different timings, intonations, pitches, and speeds signal different degrees in which the speakers have integrated the repeated information into their prior knowledge, and (2) owing to this informational function, the prosodic/temporal features of an echoic response also have the dialogue-coordinating function of directing the listener in how to handle the information just repeated.


Journal of Phonetics | 2006

Perceiving word prosodic contrasts as a function of sentence prosody in two Dutch Limburgian dialects

Rachel Fournier; Jo Verhoeven; Marc Swerts; Carlos Gussenhoven

Abstract This paper investigates the perception of word prosodic contrasts as a function of focus and position in the intonational phrase in two Dutch Limburgian dialects, Roermond and Weert. While their word prosodic contrasts share a historical source, the two dialects differ in that Weert realizes the prosodic contrast by duration, while Roermond uses f0. In addition, the Roermond dialect, but not the Weert dialect, appears to neutralize the prosodic distinction outside the focus constituent in phrase-internal syllables. The stimulus materials were naturally elicited word pairs in which the prosodic contrast marks a difference in grammatical number. In two perception experiments, listeners decided in a forced-choice task whether the words represented a singular or a plural form. Listeners with a Roermond Dutch background recognized the members of the opposition in focused contexts and phrase-final contexts, but failed to do so in phrase-internal, nonfocused contexts. By contrast, listeners whose native language was Weert Dutch perceived the grammatical number distinction in all contexts with comparable measures of success. Second, the presentation of stimuli consisting of words excised from their sentences significantly impaired the recognition of grammatical number in the Roermond group, but not in the Weert group. These results suggest that the perception of the tonal contrast, but not that of the duration contrast, depends on the intonational context. The fact that in the Roermond dialect lexical and intonational tones are integrated in the same phonological grammar thus turns out to have significant consequences for the functionality of the word prosodic contrast which can be shown to be absent when this phonological contrast is encoded differently.


computational linguistics in the netherlands | 2002

Multi-feature error detection in spoken dialogue systems

Piroska Lendvai; Antal van den Bosch; Emiel Krahmer; Marc Swerts

The present paper evaluates the role selected features and feature combinations play for error detection in spoken dialogue systems. We investigate the relevance of various, readily available features extracted from a corpus of dialogues with a train timetable information system, using RIPPER, a rule-inducing machine learning algorithm. The learning task consists of the identification of communication problems arising in either the previous turn or the current turn of the dialogue. Previous experiments with our corpus have shown that combining dialogue history and word-graph features is beneficial for detecting errors (in particular in the previous turn). Other researchers have reported that combining prosodic and ASR characteristics is helpful (primarily in the current turn). In this paper, we investigate the usefulness of large-scale combinations of these features for the above two tasks. We show that we are unable to reproduce the benefits of prosodic features for learning problematic situations, even though the overall prosodic trends in our corpus are similar to those earlier reported on. Moreover, the best results are obtained using just minimal combinations of two sources of information.


Computing meaning: volume 3 / Bunt, Harry [edit.] | 2008

Meaning, Intonation And Negation

Marc Swerts; Emiel Krahmer

Abstract. This paper describes a methodology for the study of meaning andintonation, focusing both on what speakers can do (using production experiments) and on what hearers can do (using perception experiments). We show that such anexperimental paradigm may yield interesting results from a semantical point of viewby discussing the role intonation can play in the interpretation of negation phrasesin natural language. We present empirical evidence for the existence of a set ofprosodic differences between two kinds of negation, descriptive and metalinguistic.This distinction has been the subject of considerable debate in presupposition theoryand also plays an important role in discussions about the division of labor betweensemantics and pragmatics. In general, we argue that intonation gives rise to ‘softconstraints’, and point out that an optimality theoretic framework may be suitableto model the relation between intonation and meaning. We outline some problemsand prospects for an optimality theoretic account of meaning and intonation.Keywords: Intonation, negation, production and perception, optimality


computational linguistics in the netherlands | 2001

Automatic detection of problematic turns in human-machine interactions

Antal van den Bosch; Emiel Krahmer; Marc Swerts

This paper addresses the issue of on-line detection of communication problems in spoken dialogue systems. In particular, the usefulness is investigated of the sequence of system question types and the word graphs corresponding to the respective user utterances. By applying both rule-induction and memory-based learning techniques to data obtained with a Dutch train time-table information system, the current paper demonstrates that the aforementioned features indeed lead to a method for problem detection that performs significantly above baseline. The results are interesting from a dialogue perspective since they employ features that are present in the majority of spoken dialogue systems and can be obtained with little or no computational overhead. The results are also interesting from a machine learning perspective, since they show that the rule-based method performs significantly better than the memory-based method, because the former is better capable of representing interactions between features.


Archive | 1999

Prosodic cues to recognition errors

Julia Hirschberg; Diane J. Litman; Marc Swerts

Collaboration


Dive into the Marc Swerts's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Carlos Gussenhoven

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Jacques M. B. Terken

Eindhoven University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mieke F. Weegels

Eindhoven University of Technology

View shared research outputs
Top Co-Authors

Avatar

Rachel Fournier

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Wieger Wesselink

Eindhoven University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge