Victor W. Zue
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Victor W. Zue.
IEEE Transactions on Speech and Audio Processing | 2000
Victor W. Zue; Stephanie Seneff; James R. Glass; Joseph Polifroni; Christine Pao; Timothy J. Hazen; I. Lee Hetherington
In early 1997, our group initiated a project to develop JUPITER, a conversational interface that allows users to obtain worldwide weather forecast information over the telephone using spoken dialogue. It has served as the primary research platform for our group on many issues related to human language technology, including telephone-based speech recognition, robust language understanding, language generation, dialogue modeling, and multilingual interfaces. Over a two year period since coming online in May 1997, JUPITER has received, via a toll-free number in North America, over 30000 calls (totaling over 180000 utterances), mostly from naive users. The purpose of this paper is to describe our development effort in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting. We also present some evaluation results on the system and its components.
Proceedings of the IEEE | 2000
Victor W. Zue; James R. Glass
The past decade has witnessed the emergence of a new breed of human-computer interfaces that combines several human language technologies to enable humans to converse with computers using spoken dialogue for information access, creation and processing. In this paper, we introduce the nature of these conversational interfaces and describe the underlying human language technologies on which they are based. After summarizing some of the recent progress in this area around the world, we discuss development issues faced by researchers creating these kinds of systems and present some of the ongoing and unmet research challenges in this field.
Speech Communication | 1995
James R. Glass; Giovanni Flammia; David Goodine; Michael S. Phillips; Joseph Polifroni; Shinsuke Sakai; Stephanie Seneff; Victor W. Zue
Abstract This paper describes our recent work in developing multilingual spoken language systems that support human-computer interactions. Our approach is based on the premise that a common semantic representation can be extracted from the input for all languages, at least within the context of restricted domains. In our design of such systems, language dependent information is separated from the system kernel as much as possible, and encoded in external data structures. The internal system manager, discourse and dialogue component, and database are all maintained in a language transparent form. Our description will focus on the development of the multilingual MIT Voyager spoken language system, which can engage in verbal dialogues with users about a geographical region within Cambridge, MA in the USA. The system can provide information about distances, travel times or directions between objects located within this area (e.g., restaurants, hotels, banks, libraries), as well as information such as the addresses, telephone numbers or location of the objects themselves. Voyager has been fully ported to Japanese and Italian, and we are in the process of porting to French and German as well. Evaluations for the English, Japanese and Italian systems are reported. Other related multilingual research activities are also briefly mentioned.
Speech Communication | 2000
Kenney Ng; Victor W. Zue
Abstract This paper explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. In this study, we explore the space of possible subword units to determine the complexity of the subword units needed for SDR; describe the development and application of a phonetic recognition system to extract subword units from the speech signal; examine the behavior and sensitivity of the subword units to speech recognition errors; measure the effect of speech recognition performance on retrieval performance; and investigate a number of robust indexing and retrieval methods in an effort to improve retrieval performance in the presence of speech recognition errors. We find that with the appropriate subword units, it is possible to achieve performance comparable to that of text-based word units if the underlying phonetic units are recognized correctly. In the presence of speech recognition errors, retrieval performance degrades to 60% of the clean reference level. This performance can be improved by 23% (to 74% of the clean reference) with use of the robust methods.
international conference on spoken language processing | 1996
Helen M. Meng; Senis Busayapongchai; J. Giass; David Goddeau; L. Hethetingron; Edward Hurley; Christine Pao; Joseph Polifroni; Stephanie Seneff; Victor W. Zue
WHEELS is a conversational system which provides access to a database of electronic automobile classified advertisements. It leverages off the existing spoken language technologies from our GALAXY system, and enables users to search through a database of 5,000 automobile classifieds. The current end-to-end system can respond to spoken or typed inputs, and produces a short list of entries meeting the constraints specified by the user. The system operates in mixed-initiative mode, asking for specific information but not requiring compliance. The output information is conveyed to the user with visual tables and synthesized speech. This system incorporates a new type of category bigram, created with the innovative use of the natural language component. Future plans to extend the system include operating in a displayless mode, and porting the system to Spanish.
IEEE Transactions on Speech and Audio Processing | 1995
Ron Cole; L. Hirschman; L. Atlas; M. Beckman; Alan W. Biermann; M. Bush; Mark A. Clements; L. Cohen; Oscar N. Garcia; B. Hanson; Hynek Hermansky; S. Levinson; Kathleen R. McKeown; Nelson Morgan; David G. Novick; Mari Ostendorf; Sharon L. Oviatt; Patti Price; Harvey F. Silverman; J. Spiitz; Alex Waibel; Cliff Weinstein; Stephen A. Zahorian; Victor W. Zue
A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the persons words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area. >
IEEE Intelligent Systems | 1994
Victor W. Zue
MITs Voyager system is an attempt to explore issues related to a fully interactive spoken-language system and natural language understanding. The system helps users get from one location to another within a specific geographical area, and can provide information about certain objects in the area. The current version of Voyager focuses on the city of Cambridge, Massachusetts, between MIT and Harvard University. Voyagers domain knowledge (or backend) is an enhanced version of an existing direction assistance program (J.R. Davis and T.F. Trobaugh, 1987). The map database includes the locations of various classes of objects (streets, buildings, rivers) and their properties (address, phone number, etc.). To retrieve information, the Summit speech recognition system converts the users speech signal into a set of word hypotheses, the Tina natural language system interacts with Summit to obtain a word string and a linguistic interpretation of the utterance, and an interface between the two subsystems converts Tinas semantic representation into the appropriate function calls to the back-end. Voyager then responds with a map, highlighting the objects of interest, plus an textual and spoken answer. The current implementation has a vocabulary of about 350 words and can deal with various types of queries, such as the location of objects, simple properties of objects, how to get from one place to another, and the distance and travel time between objects.<<ETX>>
Speech Communication | 1994
Victor W. Zue; Stephanie Seneff; Joseph Polifroni; Michael S. Phillips; Christine Pao; David Goodine; David Goddeau; James R. Glass
Abstract This paper describes PEGASUS, a spoken dialogue interface for on-line air travel planning that we have recently developed. PEGASUS leverages off our spoken language technology development in the ATIS domain, and enables users to book flights using the American Airlines EAASY SABRE system. The input query is transformed by the speech understanding system to a frame representation that captures its meaning. The tasks of the System Manager include transforming the semantic representation into an EAASY SABRE command, transmitting it to the application backend, formatting and interpreting the resulting information, and managing the dialogue. Preliminary evaluation results suggest that users can learn to make productive use of PEGASUS for travel planning, although much work remains to be done.
international conference on acoustics, speech, and signal processing | 1990
Victor W. Zue; James R. Glass; David Goodine; Hong Leung; Michael S. Phillips; Joseph Polifroni; Stephanie Seneff
Early experience with the development of the MIT VOYAGER spoken language system is described, and its current performance is documented. The three components of VOYAGER, the speech recognition component, the natural language component, and the application back-end, are described.<<ETX>>
Journal of the Acoustical Society of America | 1979
Victor W. Zue; Martha Laferriere
This paper describes the acoustic characteristics of medial /t,d/ in American English as a function of phonetic environment. The data consisted of some 3000 word tokens, each embedded in a carrier phrase, recorded on two separate occasions by six subjects, three male and three female. Quantitative results were obtained on the acoustic characteristics of each stop for all phonetic environments, as well as the difference between /t/ and /d/ for a given phonetic environment. The interaction between certain phonological rules such as flapping and glottalization and low‐level phonetic recoding rules such as vowel nasalization and nasal deletion was investigated. Based on the statistics derived from our corpus, probabilities of occurrence were derived for all phonetic realizations. In addition, interspeaker variability was examined and a significant difference was found in the application of phonetic rules between male and female speakers.