Peder A. Olsen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peder A. Olsen is active.

Explore More

Publication

Featured researches published by Peder A. Olsen.

international conference on acoustics, speech, and signal processing | 2007

Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models

John R. Hershey; Peder A. Olsen

The Kullback Leibler (KL) divergence is a widely used tool in statistics and pattern recognition. The KL divergence between two Gaussian mixture models (GMMs) is frequently needed in the fields of speech and image recognition. Unfortunately the KL divergence between two GMMs is not analytically tractable, nor does any efficient computational algorithm exist. Some techniques cope with this problem by replacing the KL divergence with other functions that can be computed efficiently. We introduce two new methods, the variational approximation and the variational upper bound, and compare them to existing methods. We discuss seven different techniques in total and weigh the benefits of each one against the others. To conclude we evaluate the performance of each one through numerical experiments.

Computer Speech & Language | 2010

Super-human multi-talker speech recognition: A graphical modeling approach

John R. Hershey; Steven J. Rennie; Peder A. Olsen; Trausti Kristjansson

We present a system that can separate and recognize the simultaneous speech of two people recorded in a single channel. Applied to the monaural speech separation and recognition challenge, the system out-performed all other participants -including human listeners - with an overall recognition error rate of 21.6%, compared to the human error rate of 22.3%. The system consists of a speaker recognizer, a model-based speech separation module, and a speech recognizer. For the separation models we explored a range of speech models that incorporate different levels of constraints on temporal dynamics to help infer the source speech signals. The system achieves its best performance when the model of temporal dynamics closely captures the grammatical constraints of the task. For inference, we compare a 2-D Viterbi algorithm and two loopy belief-propagation algorithms. We show how belief-propagation reduces the complexity of temporal inference from exponential to linear in the number of sources and the size of the language model. The best belief-propagation method results in nearly the same recognition error rate as exact inference.

Computer Speech & Language | 2002

Theory and practice of acoustic confusability

Harry Printz; Peder A. Olsen

In this paper we define two alternatives to the familiar perplexity statistic (hereafter lexical perplexity), which is widely applied both as a figure of merit and as an objective function for training language models. These alternatives, respectively acoustic perplexity and the synthetic acoustic word error rate, fuse information from both the language model and the acoustic model. We show how to compute these statistics by effectively synthesizing a large acoustic corpus, demonstrate their superiority (on a modest collection of models and test sets) to lexical perplexity as predictors of language model performance, and investigate their use as objective functions for training language models. We develop an efficient algorithm for training such models, and present results from a simple speech recognition experiment, in which we achieved a small reduction in word error rate by interpolating a language model trained by synthetic acoustic word error rate with a unigram model.

IEEE Transactions on Speech and Audio Processing | 2005

Subspace constrained Gaussian mixture models for speech recognition

Scott Axelrod; Vaibhava Goel; Ramesh A. Gopinath; Peder A. Olsen; Karthik Visweswariah

A standard approach to automatic speech recognition uses hidden Markov models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in which the weight vectors of these exponential models are constrained to lie in an affine subspace shared by all the Gaussians. This class of models includes Gaussian models with linear constraints placed on the precision (inverse covariance) matrices (such as diagonal covariance, maximum likelihood linear transformation, or extended maximum likelihood linear transformation), as well as the LDA/HLDA models used for feature selection which tie the part of the Gaussians in the directions not used for discrimination. In this paper, we present algorithms for training these models using a maximum likelihood criterion. We present experiments on both small vocabulary, resource constrained, grammar-based tasks, as well as large vocabulary, unconstrained resource tasks to explore the rather large parameter space of models that fit within our framework. In particular, we demonstrate significant improvements can be obtained in both word error rate and computational complexity.

Speech Communication | 2002

Automatic transcription of Broadcast News

Scott Saobing Chen; Ellen Eide; Mark J. F. Gales; Ramesh A. Gopinath; D. Kanvesky; Peder A. Olsen

Abstract This paper describes the IBM approach to Broadcast News (BN) transcription. Typical problems in the BN transcription task are segmentation, clustering, acoustic modeling, language modeling and acoustic model adaptation. This paper presents new algorithms for each of these focus problems. Some key ideas include Bayesian information criterion (BIC) (for segmentation, clustering and acoustic modeling) and speaker/cluster adapted training (SAT/CAT).

international conference on acoustics, speech, and signal processing | 2002

Modeling inverse covariance matrices by basis expansion

Peder A. Olsen; Ramesh A. Gopinath

This paper proposes a new covariance modeling technique for Gaussian mixture models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis i.e., /spl Sigma//sub j//sup -1/=P/sub j/=/spl Sigma//sub k=1//sup D//spl lambda//sub k//sup j/a/sub k/a/sub k//sup T/, /spl lambda//sub k//sup j//spl isin//spl Ropf/,a/sub k//spl isin//spl Ropf//sup d/. A generalized EM algorithm is proposed to obtain maximum likelihood parameter estimates for the basis set {a/sub k/a/sub k//sup T/}/sub k=1//sup D/ and the expansion coefficients {/spl lambda//sub k//sup j/}. This model, called the extended maximum likelihood linear transform (EMLLT) model, is extremely flexible: by varying the number of basis elements from D=d to D=d(d+1)/2 one gradually moves from a maximum likelihood linear transform (MLLT) model to a full-covariance model. Experimental results on two speech recognition tasks show that the EMLLT model can give relative gains of up to 35% in the word error rate over a standard diagonal covariance model, 30% over a standard MLLT model.

international symposium on circuits and systems | 2000

Maximum entropy and maximum likelihood criteria for feature selection from multivariate data

Sankar Basu; Charles A. Micchelli; Peder A. Olsen

We discuss several numerical methods for optimum feature selection for multivariate data based on maximum entropy and maximum likelihood criteria. Our point of view is to consider observed data x/sup 1/, x/sup 2/,..., x/sup N/ in R/sup d/ to be samples from some unknown pdf P. We project this data onto d directions, subsequently estimate the pdf of the univariate data, then find the maximum entropy (or likelihood) of all multivariate pdfs in R/sup d/ with marginals in these directions prescribed by the estimated univariate pdfs and finally maximize the entropy (or likelihood) further over the choice of these directions. This strategy for optimal feature selection depends on the method used to estimate univariate data.

international conference on acoustics speech and signal processing | 1999

Recent improvements to IBM's speech recognition system for automatic transcription of broadcast news

Scott Saobing Chen; Ellen Eide; Mark J. F. Gales; Ramesh A. Gopinath; Dimitri Kanevsky; Peder A. Olsen

We describe extensions and improvements to IBMs system for automatic transcription of broadcast news. The speech recognizer uses a total of 160 hours of acoustic training data, 80 hours more than for the system described in Chen et al. (1998). In addition to improvements obtained in 1997 we made a number of changes and algorithmic enhancements. Among these were changing the acoustic vocabulary, reducing the number of phonemes, insertion of short pauses, mixture models consisting of non-Gaussian components, pronunciation networks, factor analysis (FACILT) and Bayesian information criteria (BIC) applied to choosing the number of components in a Gaussian mixture model. The models were combined in a single system using NISTs script voting machine known as rover (Fiscus 1997).

international conference on acoustics, speech, and signal processing | 2006

Dynamic Noise Adaptation

Steven J. Rennie; Trausti T. Kristjansson; Peder A. Olsen; Ramesh A. Gopinath

We consider the problem of robust speech recognition in the car environment. We present a new dynamic noise adaptation algorithm, called DNA, for the robust front-end compensation of evolving semi-stationary noise as typically encountered in the car setting. A large dataset of in-car noise was collected for the evaluation of the new algorithm. This dataset was combined with the Aurora II framework to produce a new, publicly available framework, called DNA + AURORA II, for the evaluation of adaptive noise compensation algorithms. We show that DNA consistently outperforms several existing, related state-of-the-art front-end denoising techniques

international conference on acoustics, speech, and signal processing | 2008

Efficient model-based speech separation and denoising using non-negative subspace analysis

Steven J. Rennie; John R. Hershey; Peder A. Olsen

We present a new probabilistic architecture for analyzing composite non-negative data, called Non-negative Subspace Analysis (NSA). The NSA model provides a framework for understanding the relationships between sparse subspace and mixture model based approaches, and encompasses a range of models, including Sparse Non-negative Matrix Factorization (SNMF) [1] and mixture-model based analysis as special cases. We present a convenient instantiation of the NSA model, and an efficient variational approximate learning and inference algorithm that combines the advantages of SNMF and mixture model-based approaches. Preliminary recognition results on the Pascal Speech Separation Challenge 2006 test set [2], based on NSA separation results, are presented. The results fall short of those achieved by Algonquin [3], a state-of-the-art mixture-model based method, but considering that NSA runs an order of magnitude faster, the results are impressive. NSA outperforms SNMF in terms of word error rate (WER) on the task by a significant margin of over 9% absolute.

Explore More