Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robbie Vogt is active.

Publication


Featured researches published by Robbie Vogt.


Computer Speech & Language | 2008

Explicit modelling of session variability for speaker verification

Robbie Vogt; Sridha Sridharan

This article describes a general and powerful approach to modelling mismatch in speaker recognition by including an explicit session term in the Gaussian mixture speaker modelling framework. Under this approach, the Gaussian mixture model (GMM) that best represents the observations of a particular recording is the combination of the true speaker model with an additional session-dependent offset constrained to lie in a low-dimensional subspace representing session variability. A novel and efficient model training procedure is proposed in this work to perform the simultaneous optimisation of the speaker model and session variables required for speaker training. Using a similar iterative approach to the Gauss–Seidel method for solving linear systems, this procedure greatly reduces the memory and computational resources required by a direct solution. Extensive experimentation demonstrates that the explicit session modelling provides up to a 68% reduction in detection cost over a standard GMM-based system and significant improvements over a system utilising feature mapping, and is shown to be effective on the corpora of recent National Institute of Standards and Technology (NIST) Speaker Recognition Evaluations, exhibiting different session mismatch conditions.


international conference on acoustics, speech, and signal processing | 2006

Experiments in Session Variability Modelling for Speaker Verification

Robbie Vogt; Sridha Sridharan

Presented is an approach to modelling session variability for GMM-based text-independent speaker verification incorporating a constrained session variability component in both the training and testing procedures. The proposed technique reduces the data labelling requirements and removes discrete categorisation needed by previous techniques and provides superior performance. Experiments on Mixer conversational telephony data show improvements of as much as 46% in equal error rate over a baseline system. In this paper the algorithm used for the enrollment procedure is described in detail. Results are also presented investigating the response of the technique to short test utterances and varying session subspace dimension


international conference on acoustics, speech, and signal processing | 2012

Weighted LDA techniques for i-vector based speaker verification

Ahilan Kanagasundaram; David Dean; Robbie Vogt; Mitchell McLaren; Sridha Sridharan; Michael Mason

This paper introduces the Weighted Linear Discriminant Analysis (WLDA) technique, based upon the weighted pairwise Fisher criterion, for the purposes of improving i-vector speaker verification in the presence of high inter-session variability. By taking advantage of the speaker discriminative information that is available in the distances between pairs of speakers clustered in the development i-vector space, the WLDA technique is shown to provide an improvement in speaker verification performance over traditional Linear Discriminant Analysis (LDA) approaches. A similar approach is also taken to extend the recently developed Source Normalised LDA (SNLDA) into Weighted SNLDA (WSNLDA) which, similarly, shows an improvement in speaker verification performance in both matched and mismatched enrolment/verification conditions. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that both WLDA and WSNLDA are viable as replacement techniques to improve the performance of LDA and SNLDA-based i-vector speaker verification.


international conference on acoustics, speech, and signal processing | 2009

Spoken term detection using fast phonetic decoding

Roy Wallace; Robbie Vogt; Sridha Sridharan

While spoken term detection (STD) systems based on word indices provide good accuracy, there are several practical applications where it is infeasible or too costly to employ an LVCSR engine. An STD system is presented, which is designed to incorporate a fast phonetic decoding front-end and be robust to decoding errors whilst still allowing for rapid search speeds. This goal is achieved through monophone open-loop decoding coupled with fast hierarchical phone lattice search. Results demonstrate that an STD system that is designed with the constraint of a fast and simple phonetic decoding front-end requires a compromise to be made between search speed and search accuracy.


international conference on acoustics, speech, and signal processing | 2010

Optimising Figure of Merit for phonetic spoken term detection

Roy Wallace; Robbie Vogt; Brendan Baker; Sridha Sridharan

This paper introduces a novel technique to directly optimise the Figure of Merit (FOM) for phonetic spoken term detection. The FOM is a popular measure of STD accuracy, making it an ideal candidate for use as an objective function. A simple linear model is introduced to transform the phone log-posterior probabilities output by a phone classifier to produce enhanced log-posterior features that are more suitable for the STD task. Direct optimisation of the FOM is then performed by training the parameters of this model using a nonlinear gradient descent algorithm. Substantial FOM improvements of 11% relative are achieved on held-out evaluation data, demonstrating the generalisability of the approach.


international conference on acoustics, speech, and signal processing | 2009

Improved SVM speaker verification through data-driven background dataset collection

Mitchell McLaren; Brendan Baker; Robbie Vogt; Sridha Sridharan

The problem of background dataset selection in SVM-based speaker verification is addressed through the proposal of a new data-driven selection technique. Based on support vector selection, the proposed approach introduces a method to individually assess the suitability of each candidate impostor example for use in the background dataset. The technique can then produce a refined background dataset by selecting only the most informative impostor examples. Improvements of 13% in min. DCF and 10% in EER were found on the SRE 2006 development corpus when using the proposed method over the best heuristically chosen set. The technique was also shown to generalise to the unseen NIST 2008 SRE corpus.


international conference on signal processing and communication systems | 2008

Automatic audio segmentation using the Generalized Likelihood Ratio

David Wang; Robbie Vogt; Michael Mason; Sridha Sridharan

This paper presents a novel technique for segmenting an audio stream into homogeneous regions according to speaker identities, background noise, music, environmental and channel conditions. Audio segmentation is useful in audio diarization systems, which aim to annotate an input audio stream with information that attributes temporal regions of the audio into their specific sources. The segmentation method introduced in this paper is performed using the Generalized Likelihood Ratio (GLR), computed between two adjacent sliding windows over preprocessed speech. This approach is inspired by the popular segmentation method proposed by the pioneering work of Chen and Gopalakrishnan, using the bayesian information criterion (BIC) with an expanding search window. This paper will aim to identify and address the shortcomings associated with such an approach. The result obtained by the proposed segmentation strategy is evaluated on the 2002 rich transcription (RT-02) Evaluation dataset, and a miss rate of 19.47% and a false alarm rate of 16.94% is achieved at the optimal threshold.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Discriminative Optimization of the Figure of Merit for Phonetic Spoken Term Detection

Roy Wallace; Brendan Baker; Robbie Vogt; Sridha Sridharan

This paper proposes to improve spoken term detection (STD) accuracy by optimizing the figure of merit (FOM). In this paper, the index takes the form of a phonetic posterior-feature matrix. Accuracy is improved by formulating STD as a discriminative training problem and directly optimizing the FOM, through its use as an objective function to train a transformation of the index. The outcome of indexing is then a matrix of enhanced posterior-features that are directly tailored for the STD task. The technique is shown to improve the FOM by up to 13% on held-out data. Additional analysis explores the effect of the technique on phone recognition accuracy, examines the actual values of the learned transform, and demonstrates that using an extended training data set results in further improvement in the FOM.


international conference on biometrics | 2007

SVM speaker verification using session variability modelling and GMM supervectors

Mitchell McLaren; Robbie Vogt; Sridha Sridharan

This paper demonstrates that modelling session variability during GMM training can improve the performance of a GMM supervector SVM speaker verification system. Recently, a method of modelling session variability in GMM-UBM systems has led to significant improvements when the training and testing conditions are subject to session effects. In this work, session variability modelling is applied during the extraction of GMM supervectors prior to SVM speaker model training and classification. Experiments performed on the NIST 2005 corpus show major improvements over the baseline GMM supervector SVM system.


international conference on biometrics | 2009

Scatter Difference NAP for SVM Speaker Recognition

Brendan Baker; Robbie Vogt; Mitchell McLaren; Sridha Sridharan

This paper presents Scatter Difference Nuisance Attribute Projection (SD-NAP) as an enhancement to NAP for SVM-based speaker verification. While standard NAP may inadvertently remove desirable speaker variability, SD-NAP explicitly de-emphasises this variability by incorporating a weighted version of the between-class scatter into the NAP optimisation criterion. Experimental evaluation of SD-NAP with a variety of SVM systems on the 2006 and 2008 NIST SRE corpora demonstrate that SD-NAP provides improved verification performance over standard NAP in most cases, particularly at the EER operating point.

Collaboration


Dive into the Robbie Vogt's collaboration.

Top Co-Authors

Avatar

Sridha Sridharan

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Brendan Baker

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Mitchell McLaren

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

David Dean

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael Mason

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Roy Wallace

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Ahilan Kanagasundaram

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

David Wang

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Houman Ghaemmaghami

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge