Is this you? Create Your Porfile

Fred Richardson

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fred Richardson is active.

Explore More

Publication

Featured researches published by Fred Richardson.

IEEE Signal Processing Letters | 2015

Deep Neural Network Approaches to Speaker and Language Recognition

Fred Richardson; Douglas A. Reynolds; Najim Dehak

The impressive gains in performance obtained using deep neural networks (DNNs) for automatic speech recognition (ASR) have motivated the application of DNNs to other speech technologies such as speaker recognition (SR) and language recognition (LR). Prior work has shown performance gains for separate SR and LR tasks using DNNs for direct classification or for feature extraction. In this work we present the application of single DNN for both SR and LR using the 2013 Domain Adaptation Challenge speaker recognition (DAC13) and the NIST 2011 language recognition evaluation (LRE11) benchmarks. Using a single DNN trained for ASR on Switchboard data we demonstrate large gains on performance in both benchmarks: a 55% reduction in EER for the DAC13 out-of-domain condition and a 48% reduction in Cavg on the LRE11 30 s test condition. It is also shown that further gains are possible using score or feature fusion leading to the possibility of a single i-vector extractor producing state-of-the-art SR and LR performance.

international conference on acoustics, speech, and signal processing | 2010

The MITLL NIST LRE 2009 language recognition system

Pedro A. Torres-Carrasquillo; Elliot Singer; Terry P. Gleason; Alan McCree; Douglas A. Reynolds; Fred Richardson; Douglas E. Sturim

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2009 Language Recognition Evaluation (LRE). This system consists of a fusion of three core recognizers, two based on spectral similarity and one based on tokenization. The 2009 LRE differed from previous ones in that test data included narrowband segments from worldwide Voice of America broadcasts as well as conventional recorded conversational telephone speech. Results are presented for the 23-language closed-set and open-set detection tasks at the 30, 10, and 3 second durations along with a discussion of the language-pair task. On the 30 second 23-language closed set detection task, the system achieved a 1.64 average error rate.

international conference on acoustics, speech, and signal processing | 2008

Language recognition with discriminative keyword selection

Fred Richardson; William M. Campbell

One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and then using an N-gram language model or support vector machine (SVM) to perform the classification. One problem with these approaches is that the number of N-grams grows exponentially as the order N is increased. This is especially problematic for an SVM classifier as each utterance is represented as a distinct N-gram vector. In this paper we propose a novel approach for modeling higher order N-grams using an SVM via an alternating filter-wrapper feature selection method. We demonstrate the effectiveness of this technique on the NIST 2007 language recognition task.

international conference on acoustics, speech, and signal processing | 2007

Language Recognition with Word Lattices and Support Vector Machines

William M. Campbell; Fred Richardson; Douglas A. Reynolds

Language recognition is typically performed with methods that exploit phonotactics - a phone recognition language modeling (PRLM) system. A PRLM system converts speech to a lattice of phones and then scores a language model. A standard extension to this scheme is to use multiple parallel phone recognizers (PPRLM). In this paper, we modify this approach in two distinct ways. First, we replace the phone tokenizer by a powerful speech-to-text system. Second, we use a discriminative support vector machine for language modeling. Our goals are twofold. First, we explore the ability of a single speech-to-text system to distinguish multiple languages. Second, we fuse the new system with an SVM PRLM system to see if it complements current approaches. Experiments on the 2005 NIST language recognition corpus show the new word system accomplishes these goals and has significant potential for language recognition.

ieee automatic speech recognition and understanding workshop | 2007

Topic identification from audio recordings using word and phone recognition lattices

Timothy J. Hazen; Fred Richardson; Anna Margolis

In this paper, we investigate the problem of topic identification from audio documents using features extracted from speech recognition lattices. We are particularly interested in the difficult case where the training material is minimally annotated with only topic labels. Under this scenario, the lexical knowledge that is useful for topic identification may not be available, and automatic methods for extracting linguistic knowledge useful for distinguishing between topics must be relied upon. Towards this goal we investigate the problem of topic identification on conversational telephone speech from the Fisher corpus under a variety of increasingly difficult constraints. We contrast the performance of systems that have knowledge of the lexical units present in the audio data, against systems that rely entirely on phonetic processing.

Odyssey 2016 | 2016

The MITLL NIST LRE 2015 Language Recognition System.

Pedro A. Torres-Carrasquillo; Najim Dehak; Elizabeth Godoy; Douglas A. Reynolds; Fred Richardson; Stephen Shum; Elliot Singer; Douglas E. Sturim

Abstract : In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First, the evaluation included fixed training and open training tracks for the first time; second, language classification performance was measured across 6 language clusters using 20 language classes instead of an N-way language task; and third, performance was measured across a nominal 3-30 second range. Results are presented for the overall performance across the six language clusters for both the fixed and open training tasks. On the 6-cluster metric the Lincoln system achieved overall costs of 0.173 and 0.168 for the fixed and open tasks respectively.

international conference on acoustics speech and signal processing | 1998

The BBN Byblos 1997 large vocabulary conversational speech recognition system

George Zavaliagkos; John W. McDonough; David R. Miller; Amro El-Jaroudi; Jayadev Billa; Fred Richardson; Kristine W. Ma; Man-Hung Siu; Herbert Gish

This paper presents the 1997 BBN Byblos large vocabulary speech recognition (LVCSR) system. We give an outline of the algorithms and procedures used to train the system, describe the recognizer configuration and present the major technological innovations that lead to the performance improvements. The major testbed we present our results for is the Switchboard corpus, where current word error rates vary from 27% to 34% depending on the test set. In addition, we present results on the CallHome Spanish and Arabic tests, where we demonstrate that technology developed on English corpora is very much portable to other problems and languages.

international conference on acoustics, speech, and signal processing | 2011

The MIT LL 2010 speaker recognition evaluation system: Scalable language-independent speaker recognition

Douglas E. Sturim; William M. Campbell; Najim Dehak; Zahi N. Karam; Alan McCree; Douglas A. Reynolds; Fred Richardson; Pedro A. Torres-Carrasquillo; Stephen Shum

Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evaluation (SRE) reflects this research emphasis. In this paper, we present the MIT submission applied to the tasks of the 2010 NIST-SRE with two main goals—language-independent scalable modeling and robust nuisance mitigation. For modeling, exclusive use of inner product-based and cepstral systems produced a language-independent computationally-scalable system. For robustness, systems that captured spectral and prosodic information, modeled nuisance subspaces using multiple novel methods, and fused scores of multiple systems were implemented. The performance of the system is presented on a subset of the NIST SRE 2010 core tasks.

Odyssey 2016 | 2016

Channel Compensation for Speaker Recognition using MAP Adapted PLDA and Denoising DNNs

Fred Richardson; Brian E. Nemsick; Douglas A. Reynolds

Abstract : AbstractOver several decades, speaker recognition performance hassteadily improved for applications using telephone speech. Abig part of this improvement has been the availability of largequantities of speaker-labeled data from telephone recordings.For new data applications, such as audio from room microphones,we would like to effectively use existing telephone datato build systems with high accuracy while maintaining goodperformance on existing telephone tasks. In this paper we compareand combine approaches to compensate models parametersand features for this purpose. For model adaptation weexplore MAP adaptation of hyper-parameters and for featurecompensation we examine the use of denoising DNNs. On amulti-room, multi-microphone speaker recognition experimentwe show a reduction of 61% in EER with a combination of theseapproaches while slightly improving performance on telephonedata.

international conference on acoustics, speech, and signal processing | 2011

USSS-MITLL 2010 human assisted speaker recognition

Reva Schwartz; Joseph P. Campbell; Wade Shen; Douglas E. Sturim; William M. Campbell; Fred Richardson; Robert B. Dunn; Robert Granville

The United States Secret Service (USSS) teamed with MIT Lincoln Laboratory (MIT/LL) in the US National Institute of Standards and Technologys 2010 Speaker Recognition Evaluation of Human Assisted Speaker Recognition (HASR). We describe our qualitative and automatic speaker comparison processes and our fusion of these processes, which are adapted from USSS casework. The USSS-MIT/LL 2010 HASR results are presented. We also present post-evaluation results. The results are encouraging within the resolving power of the evaluation, which was limited to enable reasonable levels of human effort. Future ideas and efforts are discussed, including new features and capitalizing on naïve listeners.

Explore More