Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Timothy J. Hazen is active.

Publication


Featured researches published by Timothy J. Hazen.


IEEE Transactions on Speech and Audio Processing | 2000

JUPlTER: a telephone-based conversational interface for weather information

Victor W. Zue; Stephanie Seneff; James R. Glass; Joseph Polifroni; Christine Pao; Timothy J. Hazen; I. Lee Hetherington

In early 1997, our group initiated a project to develop JUPITER, a conversational interface that allows users to obtain worldwide weather forecast information over the telephone using spoken dialogue. It has served as the primary research platform for our group on many issues related to human language technology, including telephone-based speech recognition, robust language understanding, language generation, dialogue modeling, and multilingual interfaces. Over a two year period since coming online in May 1997, JUPITER has received, via a toll-free number in North America, over 30000 calls (totaling over 180000 utterances), mostly from naive users. The purpose of this paper is to describe our development effort in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting. We also present some evaluation results on the system and its components.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Robust Speaker Recognition in Noisy Conditions

Ji Ming; Timothy J. Hazen; James R. Glass; Douglas A. Reynolds

This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise, but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information about the noise. This paper describes a method that combines multicondition model training and missing-feature theory to model noise with unknown temporal-spectral characteristics. Multicondition training is conducted using simulated noisy data with limited noise variation, providing a ldquocoarserdquo compensation for the noise, and missing-feature theory is applied to refine the compensation by ignoring noise variation outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the new model for real-world applications. These include the generation of multicondition training data to model noisy speech, the combination of different training data to optimize the recognition performance, and the reduction of the models complexity. The new algorithm was tested using two databases with simulated and realistic noisy speech data. The first database is a redevelopment of the TIMIT database by rerecording the data in the presence of various noise types, used to test the model for speaker identification with a focus on the varieties of noise. The second database is a handheld-device database collected in realistic noisy conditions, used to further validate the model for real-world speaker verification. The new model is compared to baseline systems and is found to achieve lower error rates.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error

E. McDermott; Timothy J. Hazen; J. Le Roux; Atsushi Nakamura; Shigeru Katagiri

The minimum classification error (MCE) framework for discriminative training is a simple and general formalism for directly optimizing recognition accuracy in pattern recognition problems. The framework applies directly to the optimization of hidden Markov models (HMMs) used for speech recognition problems. However, few if any studies have reported results for the application of MCE training to large-vocabulary, continuous-speech recognition tasks. This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary (up to 100 k word) speech recognition tasks: the Corpus of Spontaneous Japanese lecture speech transcription task, a telephone-based name recognition task, and the MIT Jupiter telephone-based conversational weather information task. On these tasks, starting from maximum likelihood (ML) baselines, MCE training yielded relative reductions in word error ranging from 7% to 20%. Furthermore, this paper evaluates the use of different methods for optimizing the MCE criterion function, as well as the use of precomputed recognition lattices to speed up training. An overview of the MCE framework is given, with an emphasis on practical implementation issues


Computer Speech & Language | 2002

Recognition confidence scoring and its use in speech understanding systems

Timothy J. Hazen; Stephanie Seneff; Joseph Polifroni

In this paper we present an approach to recognition confidence scoring and a set of techniques for integrating confidence scores into the understanding and dialogue components of a speech understanding system. The recognition component uses a multi-tiered approach where confidence scores are computed at the phonetic, word, and utterance levels. The scores are produced by extracting confidence features from the computation of the recognition hypotheses and processing these features using an accept/reject classifier for word and utterance hypotheses. The scores generated by the confidence classifier can then be passed on to the language understanding and dialogue modeling components of the system. In these components the confidence scores can be combined with linguistic scores and pragmatic constraints before the system makes a final decision about the appropriate action to be taken. To evaluate the system, experiments were conducted using the jupiter weather information system. An evaluation of the confidence classifier at the word-level shows that the system detects 66% of the recognizer?s errors with a false detection rate on correctly recognized words of only 5%. An evaluation was also performed at the understanding level using key-value pair concept error rate as the evaluation metric. When confidence scores were integrated into the understanding component of the system, a relative reduction of 35% in concept error rate was achieved.


ieee automatic speech recognition and understanding workshop | 2009

Query-by-example spoken term detection using phonetic posteriorgram templates

Timothy J. Hazen; Wade Shen; Christopher M. White

This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user presents the system with audio snippets of desired search terms to act as the queries. Query and test materials are represented using phonetic posteriorgrams obtained from a phonetic recognition system. Query matches in the test data are located using a modified dynamic time warping search between query templates and test utterances. Experiments using this approach are presented using data from the Fisher corpus.


IEEE Signal Processing Magazine | 2008

Retrieval and browsing of spoken content

Ciprian Chelba; Timothy J. Hazen; Murat Saraclar

Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range from Web and local network search to searching for personal information residing on ones own hard-drive. Speech search has received less attention perhaps because large collections of spoken material have previously not been available. However, with cheaper storage and increased broadband access, there has been a subsequent increase in the availability of online spoken audio content such as news broadcasts, podcasts, and academic lectures. A variety of personal and commercial uses also exist. As data availability increases, the lack of adequate technology for processing spoken documents becomes the limiting factor to large-scale access to spoken content. In this article, we strive to discuss the technical issues involved in the development of information retrieval systems for spoken audio documents, concentrating on the issue of handling the errorful or incomplete output provided by ASR systems. We focus on the usage case where a user enters search terms into a search engine and is returned a collection of spoken document hits.


north american chapter of the association for computational linguistics | 2004

Analysis and processing of lecture audio data: preliminary investigations

James R. Glass; Timothy J. Hazen; Lee Hetherington; Chao Wang

In this paper we report on our recent efforts to collect a corpus of spoken lecture material that will enable research directed towards fast, accurate, and easy access to lecture content. Thus far, we have collected a corpus of 270 hours of speech from a variety of undergraduate courses and seminars. We report on an initial analysis of the spontaneous speech phenomena present in these data and the vocabulary usage patterns across three courses. Finally, we examine language model perplexities trained from written and spoken materials, and describe an initial recognition experiment on one course.


international conference on acoustics, speech, and signal processing | 2000

Word and phone level acoustic confidence scoring

Simo O. Kamppari; Timothy J. Hazen

This paper presents a word level confidence scoring technique based on a combination of multiple features extracted from the output of a phonetic classifier. The goal of this research was to develop a robust confidence measure based strictly on acoustic information. This research focused on methods for augmenting standard log likelihood ratio techniques with additional information to improve the robustness of the acoustic confidence scores for word recognition tasks. The most successful approach utilized a Fisher linear discriminant projection to reduce a set of acoustic features, extracted from phone level classification results, to a single dimension confidence score. The experiments in this paper were implemented within the JUPITER weather information system. The paper presents results indicating that the technique achieved significant improvements over standard log likelihood ratio techniques for confidence scoring.


international conference on acoustics speech and signal processing | 1999

Real-time telephone-based speech recognition in the Jupiter domain

James R. Glass; Timothy J. Hazen; I.L. Hetherington

This paper describes our experiences with developing a real-time telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the recognizer vocabulary, pronunciations, language and acoustic models for this system, the new weighted finite-state transducer-based lexical access component, and report on the current performance of the recognizer under several different conditions. We also analyze recognition latency to verify that the system performs in real-time.


international conference on multimodal interfaces | 2004

A segment-based audio-visual speech recognizer: data collection, development, and initial experiments

Timothy J. Hazen; Kate Saenko; Chia-Hao La; James R. Glass

This paper presents the development and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. To support this research, we have collected a new video corpus, called Audio-Visual TIMIT (AV-TIMIT), which consists of 4 total hours of read speech collected from 223 different speakers. This new corpus was used to evaluate our new AVSR system which incorporates a novel audio-visual integration scheme using segment-constrained Hidden Markov Models (HMMs). Preliminary experiments have demonstrated improvements in phonetic recognition performance when incorporating visual information into the speech recognition process.

Collaboration


Dive into the Timothy J. Hazen's collaboration.

Top Co-Authors

Avatar

James R. Glass

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Alex Park

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Stephanie Seneff

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Joseph Polifroni

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Victor W. Zue

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Ji Ming

Queen's University Belfast

View shared research outputs
Top Co-Authors

Avatar

I. Lee Hetherington

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Abeer Alwan

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge