Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robert J. Vogt is active.

Publication


Featured researches published by Robert J. Vogt.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition

Iain A. McCowan; David Dean; Mitchell McLaren; Robert J. Vogt; Sridha Sridharan

For several reasons, the Fourier phase domain is less favored than the magnitude domain in signal processing and modeling of speech. To correctly analyze the phase, several factors must be considered and compensated, including the effect of the step size, windowing function and other processing parameters. Building on a review of these factors, this paper investigates a spectral representation based on the Instantaneous Frequency Deviation, but in which the step size between processing frames is used in calculating phase changes, rather than the traditional single sample interval. Reflecting these longer intervals, the term delta-phase spectrum is used to distinguish this from instantaneous derivatives. Experiments show that mel-frequency cepstral coefficients features derived from the delta-phase spectrum (termed Mel-Frequency delta-phase features) can produce broadly similar performance to equivalent magnitude domain features for both voice activity detection and speaker recognition tasks. Further, it is shown that the fusion of the magnitude and phase representations yields performance benefits over either in isolation.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Making Confident Speaker Verification Decisions With Minimal Speech

Robert J. Vogt; Sridha Sridharan; Michael Mason

Proposed is an approach to estimating confidence measures on the verification score produced by a Gaussian mixture model (GMM)-based automatic speaker verification system with applications to drastically reducing the typical data requirements for producing a confident verification decision. The confidence measures are based on estimating the distribution of the observed frame scores. The confidence estimation procedure is also extended to produce robust results with very limited and highly correlated frame scores as well as in the presence of score normalization. The proposed Early Verification Decision method utilizes the developed confidence measures in a sequential hypothesis testing framework, demonstrating that as little as 2-10 s of speech on average was able to produce verification results approaching that of using an average of over 100 s of speech on the 2005 NIST SRE protocol.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Data-Driven Background Dataset Selection for SVM-Based Speaker Verification

Mitchell McLaren; Robert J. Vogt; Brendan Baker; Sridha Sridharan

The recently proposed data-driven background dataset refinement technique provides a means of selecting an informative background for support vector machine (SVM)-based speaker verification systems. This paper investigates the characteristics of the impostor examples in such highly informative background datasets. Data-driven dataset refinement individually evaluates the suitability of candidate impostor examples for the SVM background prior to selecting the highest-ranking examples as a refined background dataset. Further, the characteristics of the refined dataset were analyzed to investigate the desired traits of an informative SVM background. The most informative examples of the refined dataset were found to consist of large amounts of active speech and distinctive language characteristics. The data-driven refinement technique was shown to filter the set of candidate impostor examples to produce a more disperse representation of the impostor population in the SVM kernel space, thereby reducing the number of redundant and less-informative examples in the background dataset. Furthermore, data-driven refinement was shown to provide performance gains when applied to the difficult task of refining a small candidate dataset that was mismatched to the evaluation conditions.


IEEE Transactions on Information Forensics and Security | 2010

A Comparison of Session Variability Compensation Approaches for Speaker Verification

Mitchell McLaren; Robert J. Vogt; Brendan Baker; Sridha Sridharan

This paper compares two of the leading techniques for session variability compensation in the context of support vector machine (SVM) speaker verification using Gaussian mixture model (GMM) mean supervectors: joint factor analysis (JFA) modeling and nuisance attribute projection (NAP). Motivation for this comparison comes from the distinctly different domains in which these techniques are employed-the probabilistic GMM domain versus the discriminative SVM kernel. A theoretical analysis is given comparing the JFA and NAP approaches to variability compensation. The role of speaker factors in the factor analysis model is also contrasted against the scatter difference NAP objective of retaining speaker information in the SVM kernel space. These methods for retaining speaker variation are found to provide improved verification performance over the removal of channel effects alone. Overall, experimental results on the NIST 2006 and 2008 SRE corpora demonstrate the effectiveness of both JFA and NAP techniques for reducing the effects of variability. However, the overheads associated with the implementation of JFA may make NAP a more attractive technique due to its simple yet effective approach to variability compensation.


international conference on signal processing and communication systems | 2008

Speech Endpoint Detection Using Gradient Based Edge Detection Techniques

Houman Ghaemmaghami; Robert J. Vogt; Sridha Sridharan; Michael Mason

This paper proposes a novel method for speech endpoint detection. The developed method utilises gradient based edge detection algorithms, used in image processing, to detect boundaries of continuous speech in noisy conditions. It is simple and has low computational complexity. The accuracy of the proposed method was evaluated and compared to the ITU-T G.729 Annex-B voice activity detection (VAD) algorithm. To do this, the two algorithms were tested using a synthetically produced noisy-speech database, consisting of noisy-speech signals at various lengths and SNR. The results indicated that the developed method outperforms the G.729-B VAD algorithm at various signal-to-noise ratios.


Computer Speech & Language | 2013

Eigenvoice modelling for cross likelihood ratio based speaker clustering: A Bayesian approach

David Wang; Robert J. Vogt; Sridha Sridharan

This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modelling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modelling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modelling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall diarization error rate (DER) compared to the baseline system.


Faculty of Built Environment and Engineering; Information Security Institute | 2007

A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation

Roy Wallace; Robert J. Vogt; Sridha Sridharan


Faculty of Built Environment and Engineering; Information Security Institute | 2005

Modelling Session Variability in Text-Independent Speaker Verification

Robert J. Vogt; Brendan Baker; Sridha Sridharan


Faculty of Built Environment and Engineering; Information Security Institute | 2008

Factor analysis subspace estimation for speaker verification with short utterances

Robert J. Vogt; Brendan Baker; Sridha Sridharan


Faculty of Built Environment and Engineering; Information Security Institute | 2010

The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms

David Dean; Sridha Sridharan; Robert J. Vogt; Michael Mason

Collaboration


Dive into the Robert J. Vogt's collaboration.

Top Co-Authors

Avatar

Sridha Sridharan

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Brendan Baker

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Mitchell McLaren

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael Mason

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

David Dean

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

David Wang

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Subramanian Sridharan

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Houman Ghaemmaghami

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Ahilan Kanagasundaram

Queensland University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge