Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robert B. Dunn is active.

Publication


Featured researches published by Robert B. Dunn.


Digital Signal Processing | 2000

Speaker Verification Using Adapted Gaussian Mixture Models

Douglas A. Reynolds; Thomas F. Quatieri; Robert B. Dunn

Reynolds, Douglas A., Quatieri, Thomas F., and Dunn, Robert B., Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing10(2000), 19Â?41.In this paper we describe the major elements of MIT Lincoln Laboratorys Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented.


international conference on acoustics, speech, and signal processing | 2002

Speaker verification using text-constrained Gaussian Mixture Models

Douglas E. Sturim; Douglas A. Reynolds; Robert B. Dunn; Thomas F. Quatieri

In this paper we present an approach to close the gap between text-dependent and text-independent speaker verification performance. Text-constrained GMM-UBM systems are created using word segmentations produced by a LVCSR system on conversational speech allowing the system to focus on speaker differences over a constrained set of acoustic units. Results on the 2001 NIST extended data task show this approach can be used to produce an equal error rate of < 1 %.


Digital Signal Processing | 2000

Approaches to Speaker Detection and Tracking in Conversational Speech

Robert B. Dunn; Douglas A. Reynolds; Thomas F. Quatieri

Dunn, Robert B., Reynolds, Douglas A., and Quatieri, Thomas F., Approaches to Speaker Detection and Tracking in Conversational Speech, Digital Signal Processing10(2000), 93?112.Two approaches to detecting and tracking speakers in multispeaker audio are described. Both approaches use an adapted Gaussian mixture model, universal background model (GMM-UBM) speaker detection system as the core speaker recognition engine. In one approach, the individual log-likelihood ratio scores, which are produced on a frame-by-frame basis by the GMM-UBM system, are used to first partition the speech file into speaker homogenous regions and then to create scores for these regions. We refer to this approach as internal segmentation. Another approach uses an external segmentationalgorithm, based on blind clustering, to partition the speech file into speaker homogenous regions. The adapted GMM-UBM system then scores each of these regions as in the single-speaker recognition case. We show that the external segmentation system outperforms the internal segmentation system for both detection and tracking. In addition, we show how different components of the detection and tracking algorithms contribute to the overall system performance.


international conference on acoustics, speech, and signal processing | 2002

Speech enhancement based on auditory spectral change

Thomas F. Quatieri; Robert B. Dunn

In this paper, an adaptive approach to the enhancement of speech signals is developed based On auditory spectral change. The algorithm is motivated by sensitivity of aural, biologic systems to signal dynamics, by evidence that noise is aurally masked by rapid changes in a signal, and by analogies to these two aural phenomena in biologic visual processing. Emphasis is on preserving nonstationarity, i.e., speech transient and time-varying components, such as plosive bursts, formant transitions, and vowel onsets, while suppressing additive noise. The essence of the enhancement technique is a Wiener :filter that uses a desired signal spectrum whose estimation adapts to stationarity of the measured signal. The degree of stationarity is derived from a signal change measurement, based on an auditory spectrum that accentuates change in spectral bands, The adaptive filter is applied in an unconventional overlap-add analysis/synthesis framework, using a very short 4-ms analysis Window and a 1-ms frame interval. In informal listening the reconstructions are judged to be “crisp” corresponding to good temporal resolution of transient and rapidly-moving speech events.


international conference on acoustics, speech, and signal processing | 1993

Detection of transient signals using the energy operator

Robert B. Dunn; Thomas F. Quatieri; J. F. Kaiser

A function of the Teager-Kaiser energy operator is introduced as a method for detecting transient signals in the presence of amplitude-modulated and frequency-modulated tonal interference. This function has excellent time resolution and is robust in the presence of white noise. The output of the detection function is also independent of the interference-to-transient ratio when that ratio is large. It is demonstrated that the detection function can be applied to interference signals with multiple amplitude-modulated and frequency-modulated tonal components.<<ETX>>


asilomar conference on signals, systems and computers | 2001

Speaker recognition from coded speech and the effects of score normalization

Robert B. Dunn; Thomas F. Quatieri; Douglas A. Reynolds; Joseph P. Campbell

We investigate the effect of speech coding on automatic speaker recognition when training and testing conditions are matched and mismatched. Experiments used standard speech coding algorithms (GSM, G.729, G.723, MELP) and a speaker recognition system based on Gaussian mixture models adapted from a universal background model. There is little loss in recognition performance for toll quality speech coders and slightly more loss when lower quality speech coders are used. Speaker recognition from coded speech using handset-dependent score normalization and test score normalization are examined. Both types of score normalization significantly improve performance, and can eliminate the performance loss that occurs when there is a mismatch between training and testing conditions.


international conference on acoustics, speech, and signal processing | 2007

Robust Speaker Recognition with Cross-Channel Data: MIT-LL Results on the 2006 NIST SRE Auxiliary Microphone Task

Douglas E. Sturim; William M. Campbell; Douglas A. Reynolds; Robert B. Dunn; Thomas F. Quatieri

One particularly difficult challenge for cross-channel speaker verification is the auxiliary microphone task introduced in the 2005 and 2006 NIST Speaker Recognition Evaluations, where training uses telephone speech and verification uses speech from multiple auxiliary microphones. This paper presents two approaches to compensate for the effects of auxiliary microphones on the speech signal. The first compensation method mitigates session effects through Latent Factor Analysis (LFA) and Nuisance Attribute Projection (NAP). The second approach operates directly on the recorded signal with noise reduction techniques. Results are presented that show a reduction in the performance gap between telephone and auxiliary microphone data.


workshop on applications of signal processing to audio and acoustics | 2007

Sinewave Analysis/Synthesis Based on the Fan-Chirp Tranform

Robert B. Dunn; Thomas F. Quatieri

There have been numerous recent strides at making sinewave analysis consistent with time-varying sinewave models [1][2][3]. This is particularly important in high-frequency speech regions where harmonic frequency modulation (FM) can be significant. One notable approach is through the Fan Chirp transform that provides a set of FM-sinewave basis functions consistent with harmonic FM [3]. In this paper, we develop a complete sinewave analysis/synthesis system using the Fan Chirp transform. With this system we are able to obtain more accurate sinewave frequencies and phases, thus creating more accurate frequency tracks, in contrast to a system derived from the short-time Fourier transform, particularly for high-frequency regions of large-bandwidth analysis. With synthesis, we show an improvement in segmental signal-to-noise ratio with respect to waveform matching with the largest gains during rapid pitch dynamics.


IEEE Transactions on Speech and Audio Processing | 1995

A subband approach to time-scale expansion of complex acoustic signals

Thomas F. Quatieri; Robert B. Dunn; Thomas E. Hanna

A new approach to time-scale expansion of short-duration complex acoustic signals is introduced. Using a subband signal representation, channel phases are selected to preserve a desired time-scaled temporal envelope. The phase representation is derived from locations of events that occur within filter bank outputs. A frame-based generalization of the method imposes phase consistency across consecutive synthesis frames. The method is applied to synthetic and actual complex acoustic signals consisting of closely spaced rapidly damped sine waves. Time-frequency resolution limitations are discussed.


international conference on acoustics, speech, and signal processing | 2000

An embedded sinusoidal transform codec with measured phases and sampling rate scalability

Gerard Aguilar; Juin-Hwey Chen; Robert B. Dunn; Robert J. McAulay; Xiaoqin Sun; Wei Wang; Robert W. Zopf

This paper describes an embedded sinusoidal transform codec that is scalable not only in bit-rate, but also in sampling rate. In a representative implementation, the system produces an embedded bit-stream at 3.2 and 6.4 kbit/s for telephone-bandwidth speech, and scales up to 9.6 kbit/s for 16 kHz sampled wideband speech. The 3.2 kbit/s codec is a sinusoidal transform codec with synthetic phases. The 6.4 kbit/s codec adds resolution to the spectral envelope and transmits measured phases of the eight lowest harmonics. The 9.6 kbit/s codec adds information in the 4 to 8 kHz band to provide higher quality wideband speech.

Collaboration


Dive into the Robert B. Dunn's collaboration.

Top Co-Authors

Avatar

Thomas F. Quatieri

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Douglas A. Reynolds

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Joseph P. Campbell

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Elliot Singer

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Robert J. McAulay

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Douglas E. Sturim

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Thomas E. Hanna

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

William M. Campbell

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Nicolas Malyska

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Alan McCree

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge