Pierre L. Dognin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pierre L. Dognin is active.

Explore More

Publication

Featured researches published by Pierre L. Dognin.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages

Xiaodong Cui; Jian Xue; Xin Chen; Peder A. Olsen; Pierre L. Dognin; Upendra V. Chaudhari; John R. Hershey; Bowen Zhou

This paper proposes an acoustic modeling approach based on bootstrap and restructuring to dealing with data sparsity for low-resourced languages. The goal of the approach is to improve the statistical reliability of acoustic modeling for automatic speech recognition (ASR) in the context of speed, memory and response latency requirements for real-world applications. In this approach, randomized hidden Markov models (HMMs) estimated from the bootstrapped training data are aggregated for reliable sequence prediction. The aggregation leads to an HMM with superior prediction capability at cost of a substantially larger size. For practical usage the aggregated HMM is restructured by Gaussian clustering followed by model refinement. The restructuring aims at reducing the aggregated HMM to a desirable model size while maintaining its performance close to the original aggregated HMM. To that end, various Gaussian clustering criteria and model refinement algorithms have been investigated in the full covariance model space before the conversion to the diagonal covariance model space in the last stage of the restructuring. Large vocabulary continuous speech recognition (LVCSR) experiments on Pashto and Dari have shown that acoustic models obtained by the proposed approach can yield superior performance over the conventional training procedure with almost the same run-time memory consumption and decoding speed.

international conference on acoustics, speech, and signal processing | 2009

Refactoring acoustic models using variational density approximation

Pierre L. Dognin; John R. Hershey; Vaibhava Goel; Peder A. Olsen

In model-based pattern recognition it is often useful to change the structure, or refactor, a model. For example, we may wish to find a Gaussian mixture model (GMM) with fewer components that best approximates a reference model. One application for this arises in speech recognition, where a variety of model size requirements exists for different platforms. Since the target size may not be known a priori, one strategy is to train a complex model and subsequently derive models of lower complexity. We present methods for reducing model size without training data, following two strategies: GMM-approximation and Gaussian clustering based on divergences. A variational expectation-maximization algorithm is derived that unifies these two approaches. The resulting algorithms reduce the model size by 50% with less than 4% increase in error rate relative to the same-sized model trained on data. In fact, for up to 35% reduction in size, the algorithms can improve accuracy relative to baseline.

international conference on acoustics, speech, and signal processing | 2012

Factorial Hidden Restricted Boltzmann Machines for noise robust speech recognition

Steven J. Rennie; Petr Fousek; Pierre L. Dognin

We present the Factorial Hidden Restricted Boltzmann Machine (FHRBM) for robust speech recognition. Speech and noise are modeled as independent RBMs, and the interaction between them is explicitly modeled to capture how speech and noise combine to generate observed noisy speech features. In contrast with RBMs, where the bottom layer of random variables is observed, inference in the FHRBM is intractable, scaling exponentially with the number of hidden units. We introduce variational algorithms for efficient approximate inference that scale linearly with the number of hidden units. Compared to traditional factorial models of noisy speech, which are based on GMMs, the FHRBM has the advantage that the representations of both speech and noise are highly distributed, allowing the model to learn a parts-based representation of noisy speech data that can generalize better to previously unseen noise compositions. Preliminary results suggest that the approach is promising.

international conference on acoustics, speech, and signal processing | 2011

Robust speech recognition using dynamic noise adaptation

Steven J. Rennie; Pierre L. Dognin; Petr Fousek

Dynamic noise adaptation (DNA) [1, 2] is a model-based technique for improving automatic speech recognition (ASR) performance in noise. DNA has shown promise on artificially mixed data such as the Aurora II and DNA+Aurora II tasks [1]—significantly outperforming well-known techniques like the ETSI AFE and fMLLR [2]—but has never been tried on real data. In this paper, we present new results generated by commercial-grade ASR systems trained on large amounts of data. We show that DNA improves upon the performance of the spectral subtraction (SS) and stochastic fMLLR algorithms of our embedded recognizers, particularly in unseen noise conditions, and describe how DNA has been evolved to become suitable for deployment in low-latency ASR systems. DNA improves our best embedded system, which utilizes SS, fMLLR, and fMPE [3] by over 22% relative at SNRs below 6 dB, reducing the word error rate in these adverse conditions from 4.24% to 3.29%.

ieee automatic speech recognition and understanding workshop | 2013

Combining stochastic average gradient and Hessian-free optimization for sequence training of deep neural networks

Pierre L. Dognin; Vaibhava Goel

Minimum phone error (MPE) training of deep neural networks (DNN) is an effective technique for reducing word error rate of automatic speech recognition tasks. This training is often carried out using a Hessian-free (HF) quasi-Newton approach, although other methods such as stochastic gradient descent have also been applied successfully. In this paper we present a novel stochastic approach to HF sequence training inspired by recently proposed stochastic average gradient (SAG) method. SAG reuses gradient information from past updates, and consequently simulates the presence of more training data than is really observed for each model update. We extend SAG by dynamically weighting the contribution of previous gradients, and by combining it to a stochastic HF optimization. We term the resulting procedure DSAG-HF. Experimental results for training DNNs on 1500h of audio data show that compared to baseline HF training, DSAG-HF leads to better held-out MPE loss after each model parameter update, and converges to an overall better loss value. Furthermore, since each update in DSAG-HF takes place over smaller amount of data, this procedure converges in about half the time as baseline HF sequence training.

ieee automatic speech recognition and understanding workshop | 2011

Matched-condition robust Dynamic Noise Adaptation

Steven J. Rennie; Pierre L. Dognin; Petr Fousek

In this paper we describe how the model-based noise robustness algorithm for previously unseen noise conditions, Dynamic Noise Adaptation (DNA), can be made robust to matched data, without the need to do any system re-training. The approach is to do online model selection and averaging between two DNA models of noise: one that is tracking the evolving state of the background noise, and one clamped to the null mis-match hypothesis. The approach, which we call DNA with (matched) condition detection (DNA-CD), improves the performance of a commerical-grade speech recognizer that utilizes feature-space Maximum Mutual Information (fMMI), boosted MMI (bMMI), and feature-space Maximum Likelihood Linear Regression (fMLLR) compensation by 15% relative at signal-to-noise ratios (SNRs) below 10 dB, and over 8% relative overall.

international conference on acoustics, speech, and signal processing | 2015

Annealed dropout trained maxout networks for improved LVCSR

Steven J. Rennie; Pierre L. Dognin; Xiaodong Cui; Vaibhava Goel

A significant barrier to progress in automatic speech recognition (ASR) capability is the empirical reality that techniques rarely “scale”-the yield of many apparently fruitful techniques rapidly diminishes to zero as the training criterion or decoder is strengthened, or the size of the training set is increased. Recently we showed that annealed dropout-a regularization procedure which gradually reduces the percentage of neurons that are randomly zeroed out during DNN training-leads to substantial word error rate reductions in the case of small to moderate training data amounts, and acoustic models trained based on the cross-entropy (CE) criterion [1]. In this paper we show that deep Maxout networks trained using annealed dropout can substantially improve the quality of commercial-grade LVCSR systems even when the acoustic model is trained with sequence-level training criterion, and on large amounts of data.

international conference on acoustics, speech, and signal processing | 2013

Direct product based deep belief networks for automatic speech recognition

Petr Fousek; Steven J. Rennie; Pierre L. Dognin; Vaibhava Goel

In this paper, we present new methods for parameterizing the connections of neural networks using sums of direct products. We show that low rank parameterizations of weight matrices are a subset of this set, and explore the theoretical and practical benefits of representing weight matrices using sums of Kronecker products. ASR results on a 50 hr subset of the English Broadcast News corpus indicate that the approach is promising. In particular, we show that a factorial network with more than 150 times less parameters in its bottom layer than its standard unconstrained counterpart suffers minimal WER degradation, and that by using sums of Kronecker products, we can close the gap in WER performance while maintaining very significant parameter savings. In addition, direct product DBNs consistently outperform standard DBNs with the same number of parameters. These results have important implications for research on deep belief networks (DBNs). They imply that we should be able to train neural networks with thousands of neurons and minimal restrictions much more rapidly than is currently possible, and that by using sums of direct products, it will be possible to train neural networks with literally millions of neurons tractably-an exciting prospect.

international conference on acoustics, speech, and signal processing | 2009

A fast, accurate approximation to log likelihood of Gaussian mixture models

Pierre L. Dognin; Vaibhava Goel; John R. Hershey; Peder A. Olsen

It has been a common practice in speech recognition and elsewhere to approximate the log likelihood of a Gaussian mixture model (GMM) with the maximum component log likelihood. While often a computational necessity, the max approximation comes at a price of inferior modeling when the Gaussian components significantly overlap. This paper shows how the approximation error can be reduced by changing component priors. In our experiments the loss in word error rate due to max approximation, albeit small, is reduced by 50–100% at no cost in computational efficiency. Furthermore, we expect acoustic models will become larger with time and increase component overlap and word error rate loss. This makes reducing the approximation error more relevant. The techniques considered do not use the original data and can easily be applied as a post-processing step to any GMM.

international conference on acoustics, speech, and signal processing | 2015

Evaluating Deep Scattering Spectra with deep neural networks on large scale spontaneous speech task

Petr Fousek; Pierre L. Dognin; Vaibhava Goel

Deep Scattering Network features introduced for image processing have recently proved useful in speech recognition as an alternative to log-mel features for Deep Neural Network (DNN) acoustic models. Scattering features use wavelet decomposition directly producing log-frequency spectrograms which are robust to local time warping and provide additional information within higher order coefficients. This paper extends previous works by showing how scattering features perform on a state-of-the-art spontaneous speech recognition utilizing DNN acoustic model. We revisit feature normalization and compression topics in an extensive study, putting emphasis on comparing models of the same size. We observe that scattering features outperform baseline log-mel in all conditions, with additional gains from multi-resolution processing.

Explore More