Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alan McCree is active.

Publication


Featured researches published by Alan McCree.


international conference on acoustics, speech, and signal processing | 2010

The MITLL NIST LRE 2009 language recognition system

Pedro A. Torres-Carrasquillo; Elliot Singer; Terry P. Gleason; Alan McCree; Douglas A. Reynolds; Fred Richardson; Douglas E. Sturim

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2009 Language Recognition Evaluation (LRE). This system consists of a fusion of three core recognizers, two based on spectral similarity and one based on tokenization. The 2009 LRE differed from previous ones in that test data included narrowband segments from worldwide Voice of America broadcasts as well as conventional recorded conversational telephone speech. Results are presented for the 23-language closed-set and open-set detection tasks at the 30, 10, and 3 second durations along with a discussion of the language-pair task. On the 30 second 23-language closed set detection task, the system achieved a 1.64 average error rate.


international conference on acoustics, speech, and signal processing | 2011

The MIT LL 2010 speaker recognition evaluation system: Scalable language-independent speaker recognition

Douglas E. Sturim; William M. Campbell; Najim Dehak; Zahi N. Karam; Alan McCree; Douglas A. Reynolds; Fred Richardson; Pedro A. Torres-Carrasquillo; Stephen Shum

Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evaluation (SRE) reflects this research emphasis. In this paper, we present the MIT submission applied to the tasks of the 2010 NIST-SRE with two main goals—language-independent scalable modeling and robust nuisance mitigation. For modeling, exclusive use of inner product-based and cepstral systems produced a language-independent computationally-scalable system. For robustness, systems that captured spectral and prosodic information, modeled nuisance subspaces using multiple novel methods, and fused scores of multiple systems were implemented. The performance of the system is presented on a subset of the NIST SRE 2010 core tasks.


international conference on acoustics, speech, and signal processing | 2006

A Scalable Phonetic Vocoder Framework Using Joint Predictive Vector Quantization of Melp Parameters

Alan McCree

We present the framework for a scalable phonetic vocoder (SPV) capable of operating at bit rates from 300 - 1100 bps. The underlying system uses an HMM-based phonetic speech recognizer to estimate the parameters for MELP speech synthesis. We extend this baseline technique in three ways. First, we introduce the concept of predictive time evolution to generate a smoother path for the synthesizer parameters, and show that it improves speech quality. Then, since the output speech from the phonetic vocoder is still limited by such low bit rates, we propose a scalable system where the accuracy of the MELP parameters is increased by vector quantizing the error signal between the true and phonetic-estimated MELP parameters. Finally, we apply an extremely flexible technique for exploiting correlations in these parameters over time, which we call joint predictive vector quantization (JPVQ). We show that significant quality improvement can be attained by adding as few as 400 bps to the baseline phonetic vocoder using JPVQ. The resulting SPV system provides a flexible platform for adjusting the phonetic vocoder bit rate and speech quality


international conference on acoustics, speech, and signal processing | 2012

The linear prediction inverse modulation transfer function (LP-IMTF) filter for spectral enhancement, with applications to speaker recognition

Bengt J. Borgström; Alan McCree

We propose a method for spectral enhancement of reverberant speech based on inverting the modulation transfer function (MTF). Using all-pole models of modulation spectra allows the linear prediction inverse MTF (LP-IMTF) filter to exhibit a smooth frequency response, and allows it to be implemented as a low-order IIR filter in the modulation envelope domain. The proposed filter adapts to current acoustic conditions without relying on explicit information regarding reverberation time. Additionally, the LP-IMTF framework allows for estimation of useful side information, such as local signal-to-reverberation ratios and band-specific reverberation times. As example applications, the LP-IMTF system is applied to enhancement and speaker recognition of reverberant speech, and significant performance improvements are achieved.


international conference on acoustics, speech, and signal processing | 2008

Multisensor very lowbit rate speech coding using segment quantization

Alan McCree; Kevin Brady; Thomas F. Quatieri

We present two approaches to noise robust very low bit rate speech coding using wideband MELP analysis/synthesis. Both methods exploit multiple acoustic and non-acoustic input sensors, using our previously-presented dynamic waveform fusion algorithm to simultaneously perform waveform fusion, noise suppression, and cross-channel noise cancellation. One coder uses a 600 bps scalable phonetic vocoder, with a phonetic speech recognizer followed by joint predictive vector quantization of the error in wideband MELP parameters. The second coder operates at 300 bps with fixed 80 ms segments, using novel variable-rate multistage matrix quantization techniques. Formal test results show that both coders achieve equivalent intelligibility to the 2.4 kbps NATO standard MELPe coder in harsh acoustic noise environments, at much lower bit rates, with only modest quality loss.


conference of the international speech communication association | 2016

Stacked Long-Term TDNN for Spoken Language Recognition.

Daniel Garcia-Romero; Alan McCree

This paper introduces a stacked architecture that uses a time delay neural network (TDNN) to model long-term patterns for spoken language identification. The first component of the architecture is a feed-forward neural network with a bottleneck layer that is trained to classify context-dependent phone states (senones). The second component is a TDNN that takes the output of the bottleneck, concatenated over a long time span, and produces a posterior probability over the set of languages. The use of a TDNN architecture provides an efficient model to capture discriminative patterns over a wide temporal context. Experimental results are presented using the audio data from the language i-vector challenge (IVC) recently organized by NIST. The proposed system outperforms a state-of-the-art shifted delta cepstra i-vector system and provides complementary information to fuse with the new generation of bottleneckbased i-vector systems that model short-term dependencies.


conference of the international speech communication association | 2016

Priors for Speaker Counting and Diarization with AHC.

Gregory Sell; Alan McCree; Daniel Garcia-Romero

Estimating the number of speakers in an audio segment is a necessary step in the process of speaker diarization, but current diarization algorithms do not explicitly define a prior probability on this estimation. This work proposes a process for including priors in speaker diarization with agglomerative hierarchical clustering (AHC). It is also shown that the exclusion of a prior with AHC is itself implicitly a prior, which is found to be geometric growth in the number of speakers. By using more sensible priors, we are able to demonstrate significantly improved robustness to calibration error for speaker counting and speaker diarization.


Odyssey 2016 | 2016

Augmented Data Training of Joint Acoustic/Phonotactic DNN i-vectors for NIST LRE15.

Alan McCree; Gregory Sell; Daniel Garcia-Romero

This paper presents the JHU HLTCOE submission to the NIST 2015 Language Recognition Evaluation, including critical and novel algorithmic components, use of limited and augmented training data, and additional post-evaluation analysis an d improvements. All of our systems used i-vectors based on Deep Neural Networks (DNNs) with discriminatively-trained Gau ssian classifiers, and linear fusion was performed with durat iondependent scaling. A key innovation was the use of three different kinds of i-vectors: acoustic, phonotactic, and join t. In addition, data augmentation was used to overcome the limite d training data of this evaluation. Post-evaluation analysi s shows the benefits of these design decisions as well as further pote ntial improvements.


Archive | 2008

Low-Bit-Rate Speech Coding

Alan McCree

Low-bit-rate speech coding, at rates below 4 kb/s, is needed for both communication and voice storage applications. At such low rates, full encoding of the speech waveform is not possible; therefore, low-rate coders rely instead on parametric models to represent only the most perceptually relevant aspects of speech. While there are a number of different approaches for this modeling, all can be related to the basic linear model of speech production, where an excitation signal drives a vocal-tract filter.


international conference on acoustics, speech, and signal processing | 2007

Multisensor Dynamic Waveform Fusion

Alan McCree; Kevin Brady; Thomas F. Quatieri

Speech communication is significantly more difficult in severe acoustic background noise environments, especially when low-rate speech coders are used. Non-acoustic sensors, such as radar sensors, vibrometers, and bone-conduction microphones, offer significant potential in these situations. We extend previous work on fixed waveform fusion from multiple sensors to an optimal dynamic waveform fusion algorithm that minimizes both additive noise and signal distortion in the estimated speech signal. We show that a minimum mean squared error (MMSE) waveform matching criterion results in a generalized multichannel Wiener filter, and that this filter will simultaneously perform waveform fusion, noise suppression, and crosschannel noise cancellation. Formal intelligibility and quality testing demonstrate significant improvement from this approach.

Collaboration


Dive into the Alan McCree's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Douglas A. Reynolds

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Douglas E. Sturim

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Pedro A. Torres-Carrasquillo

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Bengt J. Borgström

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Elliot Singer

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Fred Richardson

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Gregory Sell

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Thomas F. Quatieri

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

William M. Campbell

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge