Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Thomas F. Quatieri is active.

Publication


Featured researches published by Thomas F. Quatieri.


Digital Signal Processing | 2000

Speaker Verification Using Adapted Gaussian Mixture Models

Douglas A. Reynolds; Thomas F. Quatieri; Robert B. Dunn

Reynolds, Douglas A., Quatieri, Thomas F., and Dunn, Robert B., Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing10(2000), 19Â?41.In this paper we describe the major elements of MIT Lincoln Laboratorys Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented.


IEEE Transactions on Signal Processing | 1993

Energy separation in signal modulations with application to speech analysis

Petros Maragos; James F. Kaiser; Thomas F. Quatieri

An efficient solution to the fundamental problem of estimating the time-varying amplitude envelope and instantaneous frequency of a real-valued signal that has both an AM and FM structure is provided. Nonlinear combinations of instantaneous signal outputs from the energy operator are used to separate its output energy product into its AM and FM components. The theoretical analysis is done first for continuous-time signals. Then several efficient algorithms are developed and compared for estimating the amplitude envelope and instantaneous frequency of discrete-time AM-FM signals. These energy separation algorithms are used to search for modulations in speech resonances, which are modeled using AM-FM signals to account for time-varying amplitude envelopes and instantaneous frequencies. The experimental results provide evidence that bandpass-filtered speech signals around speech formants contain amplitude and frequency modulations within a pitch period. >


IEEE Transactions on Signal Processing | 1993

On amplitude and frequency demodulation using energy operators

Petros Maragos; James F. Kaiser; Thomas F. Quatieri

It is shown that the nonlinear energy-tracking signal operator Psi (x)=(dx/dt)/sup 2/-xd/sup 2/x/dt/sup 2/ and its discrete-time counterpart can estimate the AM and FM modulating signals. Specifically, Psi can approximately estimate the amplitude envelope of AM signals and the instantaneous frequency of FM signals. Bounds are derived for the approximation errors, which are negligible under general realistic conditions. These results, coupled with the simplicity of Psi , establish the usefulness of the energy operator for AM and FM signal demodulation. These ideas are then extended to a more general class of signals that are sine waves with a time-varying amplitude and frequency and thus contain both an AM and an FM component; for such signals it is shown that Psi can approximately track the product of their amplitude envelope and their instantaneous frequency. The theoretical analysis is done for both continuous- and discrete-time signals. >


IEEE Transactions on Signal Processing | 1993

AM-FM energy detection and separation in noise using multiband energy operators

Alan C. Bovik; Petros Maragos; Thomas F. Quatieri

This paper develops a multiband or wavelet approach for capturing the AM-FM components of modulated signals immersed in noise. The technique utilizes the recently-popularized nonlinear energy operator Psi (s)=(s)/sup 2/-ss to isolate the AM-FM energy, and an energy separation algorithm (ESA) to extract the instantaneous amplitudes and frequencies. It is demonstrated that the performance of the energy operator/ESA approach is vastly improved if the signal is first filtered through a bank of bandpass filters, and at each instant analyzed (via Psi and the ESA) using the dominant local channel response. Moreover, it is found that uniform (worst-case) performance across the frequency spectrum is attained by using a constant-Q, or multiscale wavelet-like filter bank. The elementary stochastic properties of Psi and of the ESA are developed first. The performance of Psi and the ESA when applied to bandpass filtered versions of an AM-FM signal-plus-noise combination is then analyzed. The predicted performance is greatly improved by filtering, if the local signal frequencies occur in-band. These observations motivate the multiband energy operator and ESA approach, ensuring the in-band analysis of local AM-PM energy. In particular, the multi-bands must have the constant-Q or wavelet scaling property to ensure uniform performance across bands. The theoretical predictions and the simulation results indicate that improved practical strategies are feasible for tracking and identifying AM-FM components in signals possessing pattern coherencies manifested as local concentrations of frequencies. >


international conference on acoustics, speech, and signal processing | 1990

Pitch estimation and voicing detection based on a sinusoidal speech model

Robert J. McAulay; Thomas F. Quatieri

A technique for estimating the pitch of a speech waveform is developed. It fits a harmonic set of sine waves to the input data using a mean-squared-error (MSE) criterion. By exploiting a sinusoidal model for the input speech waveform, a pitch estimation criterion is derived that is inherently unambiguous, uses pitch-adaptive resolution, uses small-signal suppression to provide enhanced discrimination, and uses amplitude compression to eliminate the effects of pitch-formant interaction. The normalized minimum mean squared error proves to be a powerful discriminant for estimating the likelihood that a given frame of speech is voiced.<<ETX>>


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1981

Convergence of iterative nonexpansive signal reconstruction algorithms

Victor T. Tom; Thomas F. Quatieri; Monson H. Hayes; James H. McClellan

Iterative algorithms for signal reconstruction from partial time- and frequency-domain knowledge have proven useful in a number of application areas. In this paper, a general convergence proof, applicable to a general class of such iterative reconstruction algorithms, is presented. The proof relies on the concept of a nonexpansive mapping in both the time and frequency domains. Two examples studied in detail are time-limited extrapolation (equivalently, band-limited extrapolation) and phase-only signal reconstruction. The proof of convergence for the phase-only iteration is a new result obtained by this method of proof. The generality of the approach allows the incorporation of nonlinear constraints such as time- (or space-) domain positivity or minimum and maximum value constraints. Finally, the underrelaxed form of these iterations is also shown to converge even when the solution is not guaranteed to be unique.


international conference on acoustics, speech, and signal processing | 1992

On separating amplitude from frequency modulations using energy operators

Petros Maragos; J.F. Kaiser; Thomas F. Quatieri

To estimate the amplitude envelope and instantaneous frequency of an AM-FM signal the authors developed a novel approach that uses nonlinear combinations of instantaneous signal outputs from an energy-tracking operator to separate its output energy product into its amplitude modulation and frequency modulation components. This energy separation algorithm is then applied to search for modulations in speech resonances, which the authors model using AM-FM signals. The theoretical and experimental results demonstrate that the energy separation algorithm, due to its low computational complexity and instantaneously adapting nature, is very useful in detecting modulation patterns in speech and other time-varying signals.<<ETX>>


international conference on acoustics, speech, and signal processing | 1991

Speech nonlinearities, modulations, and energy operators

Petros Maragos; Thomas F. Quatieri; J.F. Kaiser

An AM-FM model for representing modulations in speech resonances is investigated. Specifically, an FM model is proposed for the time-varying formants whose amplitude varies as the envelope of an AM signal. To detect the modulations the energy operator Psi ( chi ) and its discrete counterpart are applied. It is found that Psi can approximately track the envelope of AM signals, the instantaneous frequency of FM signals, and the product of these two functions in the general case of AM-FM signals. Several experiments on the application of this AM-FM modeling to speech signals, band pass filtered via Gabor filters are reported.<<ETX>>


Speech Communication | 2015

A review of depression and suicide risk assessment using speech analysis

Nicholas Cummins; Stefan Scherer; Jarek Krajewski; Sebastian Schnieder; Julien Epps; Thomas F. Quatieri

Review of current diagnostic and assessment methods for depression and suicidality.Review the characteristics of active depressed and suicidal speech databases.Discuss the effects of depression and suicidality on common speech characteristics.Review of studies that use speech to classify or predict depression or suicidality.Discuss future challenges in finding a speech-based markers of either condition. This paper is the first review into the automatic analysis of speech for use as an objective predictor of depression and suicidality. Both conditions are major public health concerns; depression has long been recognised as a prominent cause of disability and burden worldwide, whilst suicide is a misunderstood and complex course of death that strongly impacts the quality of life and mental health of the families and communities left behind. Despite this prevalence the diagnosis of depression and assessment of suicide risk, due to their complex clinical characterisations, are difficult tasks, nominally achieved by the categorical assessment of a set of specific symptoms. However many of the key symptoms of either condition, such as altered mood and motivation, are not physical in nature; therefore assigning a categorical score to them introduces a range of subjective biases to the diagnostic procedure. Due to these difficulties, research into finding a set of biological, physiological and behavioural markers to aid clinical assessment is gaining in popularity. This review starts by building the case for speech to be considered a key objective marker for both conditions; reviewing current diagnostic and assessment methods for depression and suicidality including key non-speech biological, physiological and behavioural markers and highlighting the expected cognitive and physiological changes associated with both conditions which affect speech production. We then review the key characteristics; size, associated clinical scores and collection paradigm, of active depressed and suicidal speech databases. The main focus of this paper is on how common paralinguistic speech characteristics are affected by depression and suicidality and the application of this information in classification and prediction systems. The paper concludes with an in-depth discussion on the key challenges - improving the generalisability through greater research collaboration and increased standardisation of data collection, and the mitigating unwanted sources of variability - that will shape the future research directions of this rapidly growing field of speech processing research.


international conference on acoustics, speech, and signal processing | 1995

The effects of telephone transmission degradations on speaker recognition performance

Douglas A. Reynolds; Marc A. Zissman; Thomas F. Quatieri; Gerald C. O'Leary; Beth A. Carlson

The two largest factors affecting automatic speaker identification performance are the size of the population and the degradations introduced by noisy communication channels (e.g., telephone transmission). To examine experimentally these two factors, this paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both clean, wideband speech and telephone speech. A system based on Gaussian mixture speaker models is used for speaker identification and experiments are conducted on the TIMIT and NTIMIT databases. This is believed to be the first speaker identification experiments on the complete 630 speaker TIMIT and NTIMIT databases and the largest text-independent speaker identification task reported to date. Identification accuracies of 99.5% and 60.7% are achieved on the TIMIT and NTIMIT databases, respectively. This paper also presents experiments which examine and attempt to quantify the performance loss associated with various telephone degradations by systematically degrading the TIMIT speech in a manner consistent with measured NTIMIT degradations and measuring the performance loss at each step. It is found that the standard degradations of filtering and additive noise do not account for all of the performance gap between the TIMIT and NTIMIT data. Measurements of nonlinear microphone distortions are also described which may explain the additional performance loss.

Collaboration


Dive into the Thomas F. Quatieri's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Robert J. McAulay

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Robert B. Dunn

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Douglas A. Reynolds

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Nicolas Malyska

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Brian S. Helfer

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Petros Maragos

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Adam C. Lammert

University of Southern California

View shared research outputs
Researchain Logo
Decentralizing Knowledge