Bernhard Suhm
BBN Technologies
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bernhard Suhm.
Human-Computer Interaction | 2000
Sharon Oviatt; Phil Cohen; Lizhong Wu; John Vergo; Lisbeth Duncan; Bernhard Suhm; Josh Bers; Thomas G. Holzman; Terry Winograd; James A. Landay; Jim A. Larson; David L. Ferro
The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of human-computer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applications, be usable by a broader spectrum of the average population, and function more reliably under realistic and challenging usage conditions. In this article, we summarize the emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner-including early and late fusion approaches, and the new hybrid symbolic-statistical approach. We also describe a diverse collection of state-of-the-art multimodal systems that process users spoken and gestural input. These applications range from map-based and virtual reality systems for engaging in simulations and training, to field medic systems for mobile use in noisy environments, to web-based transactions and standard text-editing applications that will reshape daily computing and have a significant commercial impact. To realize successful multimodal systems of the future, many key research challenges remain to be addressed. Among these challenges are the development of cognitive theories to guide multimodal system design, and the development of effective natural language processing, dialogue processing, and error-handling techniques. In addition, new multimodal systems will be needed that can function more robustly and adaptively, and with support for collaborative multiperson use. Before this new class of systems can proliferate, toolkits also will be needed to promote software development for both simulated and functioning systems.
ACM Transactions on Computer-Human Interaction | 2001
Bernhard Suhm; Brad A. Myers; Alex Waibel
Although commercial dictation systems and speech-enabled telephone voice user interfaces have become readily available, speech recognition errors remain a serious problem in the design and implementation of speech user interfaces. Previous work hypothesized that switching modality could speed up interactive correction of recognition errors. This article presents multimodal error correction methods that allow the user to correct recognition errors efficiently without keyboard input. Correction accuracy is maximized by novel recognition algorithms that use context information for recognizing correction input. Multimodal error correction is evaluated in the context of a prototype multimodal dictation system. The study shows that unimodal repair is less accurate than multimodal error correction. On a dictation task, multimodal correction is faster than unimodal correction by respeaking. The study also provides empirical evidence that system-initiated error correction (based on confidence measures) may not expedite error correction. Furthermore, the study suggests that recognition accuracy determines user choice between modalities: while users initially prefer speech, they learn to avoid ineffective correction modalities with experience. To extrapolate results from this user study, the article introduces a performance model of (recognition-based) multimodal interaction that predicts input speed including time needed for error correction. Applied to interactive error correction, the model predicts the impact of improvements in recognition technology on correction speeds, and the influence of recognition accuracy and correction method on the productivity of dictation systems. This model is a first step toward formalizing multimodal interaction.
human factors in computing systems | 1999
Bernhard Suhm; Brad A. Myers; Alex Waibel
Our research addresses the problem of error correction in speechuser interfaces. Previous work hypothesized that switching modalitycould speed up interactive correction of recognition errors(so-called multimodal error correction). We present a user studythat compares, on a dictation task, multimodal error correctionwith conventional interactive correction, such as speaking again,choosing Tom a list, and keyboard input. Results show thatmultimodal correction is faster than conventional correctionwithout keyboard input, but slower than correction by typing forusers with good typing skills. Furthermore, while users initiallyprefer speech, they learn to avoid ineffective correctionmodalities with experience. To extrapolate results from this userstudy we developed a performance model of multimodal interactionthat predicts input speed including time needed for errorcorrection. We apply the model to estimate the impact ofrecognition technology improvements on correction speeds and theinfluence of recognition accuracy and correction method on theproductivity of dictation systems. Our model is a first steptowards formalizing multimodal (recognition-based) interaction.
international conference on acoustics, speech, and signal processing | 1994
Monika Woszczyna; Naomi Aoki-Waibel; Finn Dag Buø; Noah Coccaro; Keiko Horiguchi; Thomas Kemp; Alon Lavie; Arthur E. McNair; Thomas Polzin; Ivica Rogina; Carolyn Penstein Rosé; Tanja Schultz; Bernhard Suhm; Masaru Tomita; Alex Waibel
We present first results from our efforts toward translation of spontaneously spoken speech. Improvements include increasing coverage, robustness, generality and speed of JANUS, the speech-to-speech translation system of Carnegie Mellon and Karlsruhe University. The recognition and machine translation engine have been upgraded to deal with requirements introduced by spontaneous human to human dialogs. To allow for development and evaluation of our system on adequate data, a large database with spontaneous scheduling dialogs is being gathered for English, German and Spanish.<<ETX>>
international conference on spoken language processing | 1996
Bernhard Suhm; Brad A. Myers; Alex Waibel
The authors present a multimodal approach to interactive recovery from speech recognition errors for the design of speech user interfaces. They propose a framework to compare various error recovery methods, arguing that a rational user will prefer interaction methods which provide an optimal trade off between accuracy, speed and naturalness. They describe a prototypical implementation of multimodal interactive error recovery and present results from a preliminary evaluation in form filling and speech to speech translation tasks.
human factors in computing systems | 2002
Bernhard Suhm; Josh Bers; Daniel McCarthy; Barbara Freeman; David J. Getty; Katherine Godfrey; Pat Peterson
This paper presents a field study that compares natural language call routing with standard touch-tone menus. Call routing is the task of getting callers to the right place in the call center, which could be the appropriate live agent or automated service. Natural language call routing lets callers describe the reason for their call in their own words, instead of presenting them with a list of menu options to select from using the telephone touch-tone keypad. The field study was conducted in a call center of a large telecommunication service provider. Results show that with natural language call routing, more callers respond to the main routing prompt, more callers are routed to a specific destination (instead of defaulting to a general operator who may have to transfer them), and more callers are routed to the correct agent. Our survey data show that callers overwhelmingly prefer natural language call routing over standard touch-tone menus. Furthermore, natural language call routing can also deliver significant cost savings to call centers
International Journal of Speech Technology | 2002
Bernhard Suhm; Pat Peterson
Usability of many call center IVRs (Interactive Voice Response systems) is dismal. Callers dislike touch-tone IVRs and seek agent assistance at the first opportunity. However, because of high agent costs, call center managers continue to seek automation with IVRs. The challenge for call centers is providing user-friendly, yet cost-efficient, customer service. This article describes a comprehensive methodology for usability re-engineering of telephone voice user interfaces based on detailed call center assessment and call flow redesign. At the core of our methodology is a data-driven IVR assessment, in which we analyze end-to-end recordings of thousands of calls to evaluate IVR cost effectiveness and usability. Because agent time is the major cost driver in call center operations, we quantify cost-effectiveness in terms of agent time saved by automation in the IVR. We identify usability problems by carefully inspecting user-path diagrams, a visual representation of the sequence of events of thousands of calls as they flow through the IVR. Such an IVR assessment leads directly into call-flow redesign. Assessment insights lead to specific suggestions on how to improve a call-flow design. In addition, the assessment enables us to estimate the cost savings of a new design, thus providing the necessary business justification. We illustrate our IVR usability and re-engineering methodology with examples from large commercial call centers, demonstrating how the staged process maximizes the payback for the call center while minimizing risk.
arXiv: Computation and Language | 1996
John D. Lafferty; Bernhard Suhm
The maximum entropy method has recently been successfully introduced to a variety of natural language applications. In each of these applications, however, the power of the maximum entropy method is achieved at the cost of a considerable increase in computational requirements. In this paper we present a technique, closely related to the classical cluster expansion from statistical mechanics, for reducing the computational demands necessary to calculate conditional maximum entropy language models.
Archive | 2008
Bernhard Suhm
While speech offers unique advantages and opportunities as an interface modality, the known limitations of speech recognition technology and cognitive limitations of spoken interaction amplify the importance of usability in the development of speech applications. The competitive business environment, on the other hand, requires sound business justification for any investment in speech technology and proof of its usability and effectiveness. This chapter presents design principles and usability engineering methods that empower practitioners to optimize both usability and ROI of telephone speech applications, frequently also referred to as telephone Voice User Interface (VUI) or Interactive Voice Response (IVR) systems. The first section discusses limitations of speech user interfaces and their repercussions on design. From a survey of research and industry know-how a short list of guidelines for IVR design is derived. Examples illustrate how to apply these guidelines during the design phase of a telephone speech application. The second section presents a data-driven methodology for optimizing usability and effectiveness of IVRs. The methodology is grounded in the analysis of live, end-to-end calls - the ultimate field data for telephone speech applications. We will describe how to capture end-to-end call data from deployed systems and how to mine this data to measure usability and identify problems. Leveraging end-to-end call data empowers practitioners to build solid business cases, optimize ROI, and justify the cost of IVR usability engineering. Case studies from the consulting practice at BBN Technologies illustrate how these methods were applied in some of the largest US deployments of automated telephone applications.
human factors in computing systems | 2001
Bernhard Suhm; Barbara Freeman; David J. Getty
This paper presents a study on touch-tone menu design. In particular, we investigated whether short or long menus route callers more efficiently to the destination that can handle the call. A short menu offers a small number of broad selections, while a long menu offers a larger number of more specific choices. Results obtained from thousands of live calls to a commercial customer service center, show that callers route themselves more effectively using the long menu. In addition, in complex voice interfaces, using long menus reduces the number of menu layers required, thus reducing the need to navigate through multiple menu layers, one of the most severe usability problems of existing touch-tone interfaces.