Jackson Liscombe
Columbia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jackson Liscombe.
conference of the international speech communication association | 2005
Jackson Liscombe; Giuseppe Riccardi; Dilek Hakkani-Tür
Most research that explores the emotional state of users of spoken dialog systems does not fully utilize the contextual nature that the dialog structure provides. This paper reports results of machine learning experiments designed to automatically classify the emotional state of user turns using a corpus of 5,690 dialogs collected with the “How May I Help You SM ” spoken dialog system. We show that augmenting standard lexical and prosodic features with contextual features that exploit the structure of spoken dialog and track user stateincreases classification accuracy by 2.6%.
conference of the international speech communication association | 2003
Jackson Liscombe; Jennifer J. Venditti; Julia Hirschberg
This paper presents results from a study examining emotional speech using acoustic features and their use in automatic machine learning classification. In addition, we propose a classification scheme for the labeling of emotions on continuous scales. Our findings support those of previous research as well as indicate possible future directions utilizing spectral tilt and pitch contour to distinguish emotions in the valence dimension. Speech is a rich source of information, not only about what a speaker says, but also about what the speaker’s attitude is toward the listener and toward the topic under discussion — as well as the speaker’s own current state of mind. Until recently, most research on spoken language systems has focused on propositional content: what words is the speaker producing? Currently there is considerable interest in going beyond mere words to discover the semantic content of utterances. However, we believe it is important to go beyond semantic content as well, in order to fully interpret what human listeners infer from listening to other humans. In this paper we present results from some recent and ongoing experiments in the study of emotional speech, designed to elicit subjective judgments of tokens of emotional speech and to identify acoustic and prosodic correlates of such speech based on these classifications. We discuss previous research as well as show results from correlation and machine learning experiments, and conclude with the implications of this study 1 .
conference of the international speech communication association | 2005
Jackson Liscombe; Julia Hirschberg; Jennifer J. Venditti
What role does affect play in spoken tutorial systems and is it automatically detectable? We investigated the classification of student certainness in a corpus collected for ITSPOKE, a speech-enabled Intelligent Tutorial System (ITS). Our study suggests that tutors respond to indications of student uncertainty differently from student certainty. Results of machine learning experiments indicate that acoustic-prosodic features can distinguish student certainness from other student states. A combination of acoustic-prosodic features extracted at two levels of intonational analysis — breath groups and turns — achieves 76.42% classification accuracy, a 15.8% relative improvement over baseline performance. Our results suggest that student certainness can be automatically detected and utilized to create better spoke dialog ITSs.
text speech and dialogue | 2009
Roberto Pieraccini; David Suendermann; Krishna Dayanidhi; Jackson Liscombe
In this paper we discuss the recent evolution of spoken dialog systems in commercial deployments. Yet based on a simple finite state machine design paradigm, dialog systems reached today a higher level of complexity. The availability of massive amounts of data during deployment led to the development of continuous optimization strategy pushing the design and development of spoken dialog applications from an art to science. At the same time new methods for evaluating the subjective caller experience are available. Finally we describe the inevitable evolution for spoken dialog applications from speech only to multimodal interaction.
international conference on acoustics, speech, and signal processing | 2009
David Suendermann; Keelan Evanini; Jackson Liscombe; Phillip Hunter; Krishna Dayanidhi; Roberto Pieraccini
Statistical Spoken Language Understanding grammars (SSLUs) are often used only at the top recognition contexts of modern large-scale spoken dialog systems. We propose to use SSLUs at every recognition context in a dialog system, effectively replacing conventional, manually written grammars. Furthermore, we present a methodology of continuous improvement in which data are collected at every recognition context over an entire dialog system. These data are then used to automatically generate updated context-specific SSLUs at regular intervals and, in so doing, continually improve system performance over time. We have found that SSLUs significantly and consistently outperform even the most carefully designed rule-based grammars in a wide range of contexts in a corpus of over two million utterances collected for a complex call-routing and troubleshooting dialog system.
spoken language technology workshop | 2008
Keelan Evanini; Phillip Hunter; Jackson Liscombe; David Suendermann; Krishna Dayanidhi; Roberto Pieraccini
In this paper we introduce a subjective metric for evaluating the performance of spoken dialog systems, caller experience (CE). CE is a useful metric for tracking the overall performance of a system in deployment, as well as for isolating individual problematic calls in which the system underperforms. The proposed CE metric differs from most performance evaluation metrics proposed in the past in that it is a) a subjective, qualitative rating of the call, and b) provided by expert, external listeners, not the callers themselves. The results of an experiment in which a set of human experts listened to the same calls three times are presented. The fact that these results show a high level of agreement among different listeners, despite the subjective nature of the task, demonstrates the validity of using CE as a standard metric. Finally, an automated rating system using objective measures is shown to perform at the same high level as the humans. This is an important advance, since it provides a way to reduce the human labor costs associated with producing a reliable CE.
conference of the international speech communication association | 2006
Jackson Liscombe; Jennifer J. Venditti; Julia Hirschberg
Current speech-enabled Intelligent Tutoring Systems do not model student question behavior the way human tutors do, despite evidence indicating the importance of doing so. Our study examined a corpus of spoken tutorial dialogues collected for development of ITSpoke, an Intelligent Tutoring Spoken Dialogue System. The authors extracted prosodic, lexical, syntactic, and student and task dependent information from student turns. Results of running 5-fold cross validation machine learning experiments using AdaBoosted C4.5 decision trees show prediction of student question-bearing turns at a rate of 79.7%. The most useful features were prosodic, especially the pitch slope of the last 200 milliseconds of the student turn. Student pre-test score was the most-used feature. Findings indicate that using turn-based units is acceptable for incorporating question detection capability into practical Intelligent Tutoring Systems.
annual meeting of the special interest group on discourse and dialogue | 2009
Alexander Schmitt; Tobias Heinroth; Jackson Liscombe
Most studies on speech-based emotion recognition are based on prosodic and acoustic features, only employing artificial acted corpora where the results cannot be generalized to telephone-based speech applications. In contrast, we present an approach based on utterances from 1,911 calls from a deployed telephone-based speech application, taking advantage of additional dialogue features, NLU features and ASR features that are incorporated into the emotion recognition process. Depending on the task, non-acoustic features add 2.3% in classification accuracy compared to using only acoustic features.
Archive | 2010
David Suendermann; Jackson Liscombe; Roberto Pieraccini; Keelan Evanini
Satisfying callers’ goals and expectations is the primary objective of every customer care contact center. However, quantifying how successfully interactive voice response (IVR) systems satisfy callers’ goals and expectations has historically proven to be a most difficult task. Such difficulties in assessing automated customer care contact centers can be traced to two assumptions made by most stakeholders in the call center industry: 1. Performance can be effectively measured by deriving statistics from call logs; and 2. The overall performance of an IVR can be expressed by a single numeric value.
annual meeting of the special interest group on discourse and dialogue | 2009
David Suendermann; Jackson Liscombe; Krishna Dayanidhi; Roberto Pieraccini
We present a set of metrics describing classification performance for individual contexts of a spoken dialog system as well as for the entire system. We show how these metrics can be used to train and tune system components and how they are related to Caller Experience, a subjective measure describing how well a caller was treated by the dialog system.