Sharon L. Oviatt
Oregon Health & Science University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sharon L. Oviatt.
acm multimedia | 1997
Philip R. Cohen; Michael Johnston; David McGee; Sharon L. Oviatt; Jay Pittman; Ira Smith; Liang Chen; Josh Clow
QuickSet: Multimodal Interaction for Distributed Applications Philip R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang Chen and Josh Glow Center for Human Computer Communication Oregon Graduate Institute of Science and Technology
human factors in computing systems | 1999
Sharon L. Oviatt
As a new generation of multimodal/media systems begins to defineitself, researchers are attempting to learn how to combinedifferent modes into strategically integrated whole systems. Intheory, well designed multimodal systems should be able tointegrate complementary modalities in a manner that supports mutualdisambiguation (MD) of errors and leads to more robust performance.In this study, over 2,000 multimodal utterances by both native andaccented speakers of English were processed by a multimodal system,and then logged and analyzed. The results confirmed that multimodalsystems can indeed support significant levels of MD, and alsohigher levels of MD for the more challenging accented users. As aresult, although speech recognition as a stand-alone performed farmore poorly for accented speakers, their multimodal recognitionrates did not differ from those of native speakers. Implicationsare discussed for the development of future multimodalarchitectures that can perform in a more robust and stable mannerthan individual recognition technologies. Also discussed is thedesign of interfaces that support diversity in tangible ways, andthat function well under challenging real-world usageconditions,
Communications of The ACM | 2000
Sharon L. Oviatt
My focus here is recognition errors as a problem for spoken-language systems, especially when processing diverse speaker styles or speech produced in noisy field settings. However, when speech is combined with another input mode within a multimodal architecture, recent research has shown that two modes can function better than one alone. I also outline when and why multimodal systems display error-handling advantages. Recent studies on mobile speech and accented speakers have found that:
Human Machine Interaction | 2009
Bruno Dumas; Denis Lalanne; Sharon L. Oviatt
The grand challenge of multimodal interface creation is to build reliable processing systems able to analyze and understand multiple communication means in real-time. This opens a number of associated issues covered by this chapter, such as heterogeneous data types fusion, architectures for real-time processing, dialog management, machine learning for multimodal interaction, modeling languages, frameworks, etc. This chapter does not intend to cover exhaustively all the issues related to multimodal interfaces creation and some hot topics, such as error handling, have been left aside. The chapter starts with the features and advantages associated with multimodal interaction, with a focus on particular findings and guidelines, as well as cognitive foundations underlying multimodal interaction. The chapter then focuses on the driving theoretical principles, time-sensitive software architectures and multimodal fusion and fission issues. Modeling of multimodal interaction as well as tools allowing rapid creation of multimodal interfaces are then presented. The article concludes with an outline of the current state of multimodal interaction research in Switzerland, and also summarizes the major future challenges in the field.
Communications of The ACM | 2004
Leah Reeves; Jennifer Lai; James A. Larson; Sharon L. Oviatt; T. S. Balaji; Stéphanie Buisine; Penny Collings; Philip R. Cohen; Ben J. Kraal; Jean-Claude Martin; Michael F. McTear; Thiru Vilwamalai Raman; Kay M. Stanney; Hui Su; Qian Ying Wang
JMUI (Journal on Multimodal User Interfaces), Special issue “Best of affective computing and intelligent Guidelines for multimodal user interface design. support, human multi-modal information processing. characteristics to the design of a user-oriented and guidelines of multimodal interface design. Artifact lifecycle management, Consumer and user, Interfaces in Automated.Aug 2 Aug 7Los Angeles, CA, USAThursday, 6 August 2015 / HCI International 20152015.hci.international/thursdayCachedDefining and Optimizing User Interfaces Information Complexity for AI Design and Development of Multimodal Applications: A Vision on Key Issues and Traditional Heuristics and Industry Guidelines to Evaluate Multimodal Digital Artifacts
IEEE Transactions on Multimedia | 1999
Lizhong Wu; Sharon L. Oviatt; Philip R. Cohen
We present a statistical approach to developing multimodal recognition systems and, in particular, to integrating the posterior probabilities of parallel input signals involved in the multimodal system. We first identify the primary factors that influence multimodal recognition performance by evaluating the multimodal recognition probabilities. We then develop two techniques, an estimate approach and a learning approach, which are designed to optimize accurate recognition during the multimodal integration process. We evaluate these methods using Quickset, a speech/gesture multimodal system, and report evaluation results based on an empirical corpus collected with Quickset. From an architectural perspective, the integration technique presented offers enhanced robustness. It also is premised on more realistic assumptions than previous multimodal systems using semantic fusion. From a methodological standpoint, the evaluation techniques that we describe provide a valuable tool for evaluating multimodal systems.
IEEE Transactions on Speech and Audio Processing | 1995
Ron Cole; L. Hirschman; L. Atlas; M. Beckman; Alan W. Biermann; M. Bush; Mark A. Clements; L. Cohen; Oscar N. Garcia; B. Hanson; Hynek Hermansky; S. Levinson; Kathleen R. McKeown; Nelson Morgan; David G. Novick; Mari Ostendorf; Sharon L. Oviatt; Patti Price; Harvey F. Silverman; J. Spiitz; Alex Waibel; Cliff Weinstein; Stephen A. Zahorian; Victor W. Zue
A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the persons words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area. >
Computer Speech & Language | 1995
Sharon L. Oviatt
Abstract This research characterizes the spontaneous spoken disfluencies typical of human–computer interaction, and presents a predictive model accounting for their occurrence. Data were collected during three empirical studies in which people spoke or wrote to a highly interactive simulated system as they completed service transactions. The studies involved within-subject factorial designs in which the input modality and presentation format were varied. Spoken disfluency rates during human–computer interaction were documented to be substantially lower than rates typically observed during comparable human–human speech. Two separate factors, both associated with increased planning demands, were statistically related to higher disfluency rates: (1) length of utterance; and (2) lack of structure in the presentation format. Regression techniques demonstrated that a linear model based simply on utterance length accounted for over 77% of the variability in spoken disfluencies. Therefore, design methods capable of guiding users» speech into briefer sentences have the potential to eliminate the majority of spoken disfluencies. In this research, for example, a structured presentation format successfully eliminated 60–70% of all disfluent speech. The long-term goal of this research is to provide empirical guidance for the design of robust spoken language technology.
international conference on spoken language processing | 1996
Sharon L. Oviatt; Robert VanGent
Recent research indicates clear performance advantages and a strong user preference for interacting multimodally with computers. However, in the problematic area of error resolution, possible advantages of multimodal interface design remain poorly understood. In the present research, a semi automatic simulation method with a novel error generation capability was used to collect within subject data before and after recognition errors, and at different spiral depths in terms of number of repetitions required to resolve an error. Results indicated that users adopt a strategy of switching input modalities and lexical expressions when resolving errors, strategies that they use in a linguistically contrastive manner to distinguish a repetition from original failed input. Implications of these findings are discussed for the development of user centered predictive models of linguistic adaptation during human computer error resolution, and for the development of improved error handling in advanced recognition based interfaces.
user interface software and technology | 2000
Sharon L. Oviatt
One major goal of multimodal system design is to support more robust performance than can be achieved with a unimodal recognition technology, such as a spoken language system. In recent years, the multimodal literatures on speech and pen input and speech and lip movements have begun developing relevant performance criteria and demonstrating a reliability advantage for multimodal architectures. In the present studies, over 2,600 utterances processed by a multimodal pen/voice system were collected during both mobile and stationary use. A new data collection infrastructure was developed, including instrumentation worn by the user while roaming, a researcher field station, and a multimodal data logger and analysis tool tailored for mobile research. Although speech recognition as a stand-alone failed more often during mobile system use, the results confirmed that a more stable multimodal architecture decreased this error rate by 1935%. Furthermore, these findings were replicated across different types of microphone technology. In large part this performance gain was due to significant levels of mutual disambiguation in the multimodal architecture, with higher levels occurring in the noisy mobile environment. Implications of these findings are discussed for expanding computing to support more challenging usage contexts in a robust manner.