Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Brendan Baker is active.

Publication


Featured researches published by Brendan Baker.


international conference on acoustics, speech, and signal processing | 2010

Optimising Figure of Merit for phonetic spoken term detection

Roy Wallace; Robbie Vogt; Brendan Baker; Sridha Sridharan

This paper introduces a novel technique to directly optimise the Figure of Merit (FOM) for phonetic spoken term detection. The FOM is a popular measure of STD accuracy, making it an ideal candidate for use as an objective function. A simple linear model is introduced to transform the phone log-posterior probabilities output by a phone classifier to produce enhanced log-posterior features that are more suitable for the STD task. Direct optimisation of the FOM is then performed by training the parameters of this model using a nonlinear gradient descent algorithm. Substantial FOM improvements of 11% relative are achieved on held-out evaluation data, demonstrating the generalisability of the approach.


international conference on acoustics, speech, and signal processing | 2009

Improved SVM speaker verification through data-driven background dataset collection

Mitchell McLaren; Brendan Baker; Robbie Vogt; Sridha Sridharan

The problem of background dataset selection in SVM-based speaker verification is addressed through the proposal of a new data-driven selection technique. Based on support vector selection, the proposed approach introduces a method to individually assess the suitability of each candidate impostor example for use in the background dataset. The technique can then produce a refined background dataset by selecting only the most informative impostor examples. Improvements of 13% in min. DCF and 10% in EER were found on the SRE 2006 development corpus when using the proposed method over the best heuristically chosen set. The technique was also shown to generalise to the unseen NIST 2008 SRE corpus.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Data-Driven Background Dataset Selection for SVM-Based Speaker Verification

Mitchell McLaren; Robert J. Vogt; Brendan Baker; Sridha Sridharan

The recently proposed data-driven background dataset refinement technique provides a means of selecting an informative background for support vector machine (SVM)-based speaker verification systems. This paper investigates the characteristics of the impostor examples in such highly informative background datasets. Data-driven dataset refinement individually evaluates the suitability of candidate impostor examples for the SVM background prior to selecting the highest-ranking examples as a refined background dataset. Further, the characteristics of the refined dataset were analyzed to investigate the desired traits of an informative SVM background. The most informative examples of the refined dataset were found to consist of large amounts of active speech and distinctive language characteristics. The data-driven refinement technique was shown to filter the set of candidate impostor examples to produce a more disperse representation of the impostor population in the SVM kernel space, thereby reducing the number of redundant and less-informative examples in the background dataset. Furthermore, data-driven refinement was shown to provide performance gains when applied to the difficult task of refining a small candidate dataset that was mismatched to the evaluation conditions.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Discriminative Optimization of the Figure of Merit for Phonetic Spoken Term Detection

Roy Wallace; Brendan Baker; Robbie Vogt; Sridha Sridharan

This paper proposes to improve spoken term detection (STD) accuracy by optimizing the figure of merit (FOM). In this paper, the index takes the form of a phonetic posterior-feature matrix. Accuracy is improved by formulating STD as a discriminative training problem and directly optimizing the FOM, through its use as an objective function to train a transformation of the index. The outcome of indexing is then a matrix of enhanced posterior-features that are directly tailored for the STD task. The technique is shown to improve the FOM by up to 13% on held-out data. Additional analysis explores the effect of the technique on phone recognition accuracy, examines the actual values of the learned transform, and demonstrates that using an extended training data set results in further improvement in the FOM.


international conference on biometrics | 2009

Scatter Difference NAP for SVM Speaker Recognition

Brendan Baker; Robbie Vogt; Mitchell McLaren; Sridha Sridharan

This paper presents Scatter Difference Nuisance Attribute Projection (SD-NAP) as an enhancement to NAP for SVM-based speaker verification. While standard NAP may inadvertently remove desirable speaker variability, SD-NAP explicitly de-emphasises this variability by incorporating a weighted version of the between-class scatter into the NAP optimisation criterion. Experimental evaluation of SD-NAP with a variety of SVM systems on the 2006 and 2008 NIST SRE corpora demonstrate that SD-NAP provides improved verification performance over standard NAP in most cases, particularly at the EER operating point.


IEEE Transactions on Information Forensics and Security | 2010

A Comparison of Session Variability Compensation Approaches for Speaker Verification

Mitchell McLaren; Robert J. Vogt; Brendan Baker; Sridha Sridharan

This paper compares two of the leading techniques for session variability compensation in the context of support vector machine (SVM) speaker verification using Gaussian mixture model (GMM) mean supervectors: joint factor analysis (JFA) modeling and nuisance attribute projection (NAP). Motivation for this comparison comes from the distinctly different domains in which these techniques are employed-the probabilistic GMM domain versus the discriminative SVM kernel. A theoretical analysis is given comparing the JFA and NAP approaches to variability compensation. The role of speaker factors in the factor analysis model is also contrasted against the scatter difference NAP objective of retaining speaker information in the SVM kernel space. These methods for retaining speaker variation are found to provide improved verification performance over the removal of channel effects alone. Overall, experimental results on the NIST 2006 and 2008 SRE corpora demonstrate the effectiveness of both JFA and NAP techniques for reducing the effects of variability. However, the overheads associated with the implementation of JFA may make NAP a more attractive technique due to its simple yet effective approach to variability compensation.


Proceedings of the third workshop on Searching spontaneous conversational speech | 2009

The effect of language models on phonetic decoding for spoken term detection

Roy Wallace; Brendan Baker; Robbie Vogt; Sridha Sridharan

Spoken term detection (STD) popularly involves performing word or sub-word level speech recognition and indexing the result. This work challenges the assumption that improved speech recognition accuracy implies better indexing for STD. Using an index derived from phone lattices, this paper examines the effect of language model selection on the relationship between phone recognition accuracy and STD accuracy. Results suggest that language models usually improve phone recognition accuracy but their inclusion does not always translate to improved STD accuracy. The findings suggest that using phone recognition accuracy to measure the quality of an STD index can be problematic, and highlight the need for an alternative that is more closely aligned with the goals of the specific detection task.


international conference on biometrics | 2009

Data-Driven Impostor Selection for T-Norm Score Normalisation and the Background Dataset in SVM-Based Speaker Verification

Mitchell McLaren; Robbie Vogt; Brendan Baker; Sridha Sridharan

A data-driven background dataset refinement technique was recently proposed for SVM based speaker verification. This method selects a refined SVM background dataset from a set of candidate impostor examples after individually ranking examples by their relevance. This paper extends this technique to the refinement of the T-norm dataset for SVM-based speaker verification. The independent refinement of the background and T-norm datasets provides a means of investigating the sensitivity of SVM-based speaker verification performance to the selection of each of these datasets. Using refined datasets provided improvements of 13% in min. DCF and 9% in EER over the full set of impostor examples on the 2006 SRE corpus with the majority of these gains due to refinement of the T-norm dataset. Similar trends were observed for the unseen data of the NIST 2008 SRE.


Computer Speech & Language | 2006

A syllable-scale framework for language identification

Terrence Martin; Brendan Baker; Eddie Wong; Sridha Sridharan

Whilst several examples of segment based approaches to language identification (LID) have been published, they have been typically conducted using only a small number of languages, or varying feature sets, thus making it difficult to determine how the segment length influences the accuracy of LID systems. In this study, phone-triplets are used as crude approximates for a syllable-length sub-word segmental unit. The proposed pseudo-syllabic length framework is subsequently used for both qualitative and quantitative examination of the contributions made by acoustic, phonotactic and prosodic information sources, and trialled in accordance with the NIST 1996 LID protocol. Firstly, a series of experimental comparisons are conducted which examine the utility of using segmental units for modelling short term acoustic features. These include comparisons between language specific Gaussian mixture models (GMMs), language specific GMMs for each segmental unit, and finally language specific hidden Markov models (HMM) for each segment, undertaken in an attempt to better model the temporal evolution of acoustic features. In a second tier of experiments, the contribution of both broad and fine class phonotactic information, when considered over an extended time frame, is contrasted with an implementation of the currently popular parallel phone recognition language modelling (PPRLM) technique. Results indicate that this information can be used to complement existing PPRLM systems to obtain improved performance. The pseudo-syllabic framework is also used to model prosodic dynamics and compared to an implemented version of a recently published system, achieving comparable levels of performance.


international conference on acoustics, speech, and signal processing | 2010

Exploiting multiple feature sets in data-driven impostor dataset selection for speaker verification

Mitchell McLaren; Brendan Baker; Robbie Vogt; Sridha Sridharan

This study assesses the recently proposed data-driven background dataset refinement technique for speaker verification using alternate SVM feature sets to the GMM supervector features for which it was originally designed. The performance improvements brought about in each trialled SVM configuration demonstrate the versatility of background dataset refinement. This work also extends on the originally proposed technique to exploit support vector coefficients as an impostor suitability metric in the data-driven selection process. Using support vector coefficients improved the performance of the refined datasets in the evaluation of unseen data. Further, attempts are made to exploit the differences in impostor example suitability measures from varying features spaces to provide added robustness.

Collaboration


Dive into the Brendan Baker's collaboration.

Top Co-Authors

Avatar

Sridha Sridharan

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Robert J. Vogt

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Mitchell McLaren

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Robbie Vogt

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael Mason

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Roy Wallace

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Eddie Wong

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Houman Ghaemmaghami

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Subramanian Sridharan

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Terrence Martin

Queensland University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge