Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where James R. Glass is active.

Publication


Featured researches published by James R. Glass.


IEEE Transactions on Speech and Audio Processing | 2000

JUPlTER: a telephone-based conversational interface for weather information

Victor W. Zue; Stephanie Seneff; James R. Glass; Joseph Polifroni; Christine Pao; Timothy J. Hazen; I. Lee Hetherington

In early 1997, our group initiated a project to develop JUPITER, a conversational interface that allows users to obtain worldwide weather forecast information over the telephone using spoken dialogue. It has served as the primary research platform for our group on many issues related to human language technology, including telephone-based speech recognition, robust language understanding, language generation, dialogue modeling, and multilingual interfaces. Over a two year period since coming online in May 1997, JUPITER has received, via a toll-free number in North America, over 30000 calls (totaling over 180000 utterances), mostly from naive users. The purpose of this paper is to describe our development effort in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting. We also present some evaluation results on the system and its components.


Proceedings of the IEEE | 2000

Conversational interfaces: advances and challenges

Victor W. Zue; James R. Glass

The past decade has witnessed the emergence of a new breed of human-computer interfaces that combines several human language technologies to enable humans to converse with computers using spoken dialogue for information access, creation and processing. In this paper, we introduce the nature of these conversational interfaces and describe the underlying human language technologies on which they are based. After summarizing some of the recent progress in this area around the world, we discuss development issues faced by researchers creating these kinds of systems and present some of the ongoing and unmet research challenges in this field.


Computer Speech & Language | 2003

A probabilistic framework for segment-based speech recognition

James R. Glass

Most current speech recognizers use an observation space based on a temporal sequence of measurements extracted from fixed-length ‘‘frames’’ (e.g., Mel-cepstra). Given a hypothetical word or sub-word sequence, the acoustic likelihood computation always involves all observation frames, though the mapping between individual frames and internal recognizer states will depend on the hypothesized segmentation. There is another type of recognizer whose observation space is better represented as a network, or graph, where each arc in the graph corresponds to a hypothesized variable-length segment that is represented by a fixed-dimensional ‘‘feature’’. In such feature-based recognizers, each hypothesized segmentation will correspond to a segment sequence, or path, through the overall segment-graph that is associated with a subset of all possible feature vectors in the total observation space. In this work we examine a maximum a posteriori decoding strategy for feature-based recognizers and develop a normalization criterion useful for a segment-based


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Unsupervised Pattern Discovery in Speech

Alex Park; James R. Glass

We present a novel approach to speech processing based on the principle of pattern discovery. Our work represents a departure from traditional models of speech recognition, where the end goal is to classify speech into categories defined by a prespecified inventory of lexical units (i.e., phones or words). Instead, we attempt to discover such an inventory in an unsupervised manner by exploiting the structure of repeating patterns within the speech signal. We show how pattern discovery can be used to automatically acquire lexical entities directly from an untranscribed audio stream. Our approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which allows us to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, we demonstrate how to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multiword phrases. On a corpus of academic lecture material, we demonstrate that clusters found using this technique exhibit high purity and that many of the corresponding lexical identities are relevant to the underlying audio stream.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Robust Speaker Recognition in Noisy Conditions

Ji Ming; Timothy J. Hazen; James R. Glass; Douglas A. Reynolds

This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise, but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information about the noise. This paper describes a method that combines multicondition model training and missing-feature theory to model noise with unknown temporal-spectral characteristics. Multicondition training is conducted using simulated noisy data with limited noise variation, providing a ldquocoarserdquo compensation for the noise, and missing-feature theory is applied to refine the compensation by ignoring noise variation outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the new model for real-world applications. These include the generation of multicondition training data to model noisy speech, the combination of different training data to optimize the recognition performance, and the reduction of the models complexity. The new algorithm was tested using two databases with simulated and realistic noisy speech data. The first database is a redevelopment of the TIMIT database by rerecording the data in the presence of various noise types, used to test the model for speaker identification with a focus on the varieties of noise. The second database is a handheld-device database collected in realistic noisy conditions, used to further validate the model for real-world speaker verification. The new model is compared to baseline systems and is found to achieve lower error rates.


Speech Communication | 1995

Multilingual spoken-language understanding in the MIT Voyager system

James R. Glass; Giovanni Flammia; David Goodine; Michael S. Phillips; Joseph Polifroni; Shinsuke Sakai; Stephanie Seneff; Victor W. Zue

Abstract This paper describes our recent work in developing multilingual spoken language systems that support human-computer interactions. Our approach is based on the premise that a common semantic representation can be extracted from the input for all languages, at least within the context of restricted domains. In our design of such systems, language dependent information is separated from the system kernel as much as possible, and encoded in external data structures. The internal system manager, discourse and dialogue component, and database are all maintained in a language transparent form. Our description will focus on the development of the multilingual MIT Voyager spoken language system, which can engage in verbal dialogues with users about a geographical region within Cambridge, MA in the USA. The system can provide information about distances, travel times or directions between objects located within this area (e.g., restaurants, hotels, banks, libraries), as well as information such as the addresses, telephone numbers or location of the objects themselves. Voyager has been fully ported to Japanese and Italian, and we are in the process of porting to French and German as well. Evaluations for the English, Japanese and Italian systems are reported. Other related multilingual research activities are also briefly mentioned.


Speech Communication | 1994

PEGASUS: a spoken dialogue interface for on-line air travel planning

Victor W. Zue; Stephanie Seneff; Joseph Polifroni; Michael S. Phillips; Christine Pao; David Goodine; David Goddeau; James R. Glass

Abstract This paper describes PEGASUS, a spoken dialogue interface for on-line air travel planning that we have recently developed. PEGASUS leverages off our spoken language technology development in the ATIS domain, and enables users to book flights using the American Airlines EAASY SABRE system. The input query is transformed by the speech understanding system to a frame representation that captures its meaning. The tasks of the System Manager include transforming the semantic representation into an EAASY SABRE command, transmitting it to the application backend, formatting and interpreting the resulting information, and managing the dialogue. Preliminary evaluation results suggest that users can learn to make productive use of PEGASUS for travel planning, although much work remains to be done.


international conference on acoustics, speech, and signal processing | 1990

The VOYAGER speech understanding system: preliminary development and evaluation

Victor W. Zue; James R. Glass; David Goodine; Hong Leung; Michael S. Phillips; Joseph Polifroni; Stephanie Seneff

Early experience with the development of the MIT VOYAGER spoken language system is described, and its current performance is documented. The three components of VOYAGER, the speech recognition component, the natural language component, and the application back-end, are described.<<ETX>>


north american chapter of the association for computational linguistics | 2004

Analysis and processing of lecture audio data: preliminary investigations

James R. Glass; Timothy J. Hazen; Lee Hetherington; Chao Wang

In this paper we report on our recent efforts to collect a corpus of spoken lecture material that will enable research directed towards fast, accurate, and easy access to lecture content. Thus far, we have collected a corpus of 270 hours of speech from a variety of undergraduate courses and seminars. We report on an initial analysis of the spontaneous speech phenomena present in these data and the vocabulary usage patterns across three courses. Finally, we examine language model perplexities trained from written and spoken materials, and describe an initial recognition experiment on one course.


human language technology | 1989

The MIT SUMMIT Speech Recognition system: a progress report

Victor W. Zue; James R. Glass; Michael S. Phillips; Stephanie Seneff

Recently, we initiated a project to develop a phonetically-based spoken language understanding system called SUMMIT. In contrast to many of the past efforts that make use of heuristic rules whose development requires intense knowledge engineering, our approach attempts to express the speech knowledge within a formal framework using well-defined mathematical tools. In our system, features and decision strategies are discovered and trained automatically, using a large body of speech data. This paper describes the system, and documents its current performance.

Collaboration


Dive into the James R. Glass's collaboration.

Top Co-Authors

Avatar

Victor W. Zue

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Stephanie Seneff

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Joseph Polifroni

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael S. Phillips

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Timothy J. Hazen

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

David Goodine

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yu Zhang

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Wei-Ning Hsu

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Scott Cyphers

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Hong C. Leung

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge