Magne Hallstein Johnsen
Norwegian University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Magne Hallstein Johnsen.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Øystein Birkenes; Tomoko Matsui; Kunio Tanabe; Sabato Marco Siniscalchi; Tor Andre Myrvoll; Magne Hallstein Johnsen
Hidden Markov models (HMMs) are powerful generative models for sequential data that have been used in automatic speech recognition for more than two decades. Despite their popularity, HMMs make inaccurate assumptions about speech signals, thereby limiting the achievable performance of the conventional speech recognizer. Penalized logistic regression (PLR) is a well-founded discriminative classifier with long roots in the history of statistics. Its classification performance is often compared with that of the popular support vector machine (SVM). However, for speech classification, only limited success with PLR has been reported, partially due to the difficulty with sequential data. In this paper, we present an elegant way of incorporating HMMs in the PLR framework. This leads to a powerful discriminative classifier that naturally handles sequential data. In this approach, speech classification is done using affine combinations of HMM log-likelihoods. We believe that such combinations of HMMs lead to a more accurate classifier than the conventional HMM-based classifier. Unlike similar approaches, we jointly estimate the HMM parameters and the PLR parameters using a single training criterion. The extension to continuous speech recognition is done via rescoring of N-best lists or lattices.
international conference on acoustics, speech, and signal processing | 1994
Finn Tore Johansen; Magne Hallstein Johnsen
This paper deals with speaker-independent continuous speech recognition. Our approach is based on continuous density hidden Markov models with a non-linear input feature transformation performed by a multilayer perceptron. We discuss various optimisation criteria and provide results on a TIMIT phoneme recognition task, using single frame (mutual information or relative entropy) MMI embedded in Viterbi training, and a global MMI criterion. As expected, global MMI is found superior to the frame-based criterion for continuous recognition. We further observe that optimal sentence decoding is essential to achieve maximum recognition rate for models trained by global MMI. Finally, we find that the simple MLP input transformation, with five frames of context information, can increase the recognition rate significantly compared to just using delta parameters.<<ETX>>
international conference on acoustics, speech, and signal processing | 2007
Svein Gunnar Pettersen; Magne Hallstein Johnsen; Christian Wellekens
Many feature enhancement methods make use of probabilistic models of speech and noise in order to improve performance of speech recognizers in the presence of background noise. The traditional approach for training such models is maximum likelihood estimation. This paper investigates the novel application of variational Bayesian learning for front-end models under the Algonquin denoising framework. Compared to maximum likelihood training, it is shown that variational Bayesian learning has advantages both in terms of increased robustness with respect to choice of model complexity, as well as increased performance.
international workshop on machine learning for signal processing | 2012
Magne Hallstein Johnsen; Alfonso M. Canterla
This paper presents methods and results for joint optimization of the feature extraction and the model parameters of a detector. We further define a discriminative training criterion called Minimum Detection Error (MDE). The criterion can optimize the F-score or any other detection performance metric. The methods are used to design detectors of subwords in continuous speech, i.e. to spot phones and articulatory features. For each subword detector the MFCC filterbank matrix and the Gaussian means in the HMM models are jointly optimized. For experiments on TIMIT, the optimized detectors clearly outperform the baseline detectors and also our previous MCE based detectors. The results indicate that the same performance metric should be used for training and test and that accuracy outperforms F-score with respect to relative improvement. Furter, the optimized filterbanks usually reflect typical acoustic properties of the corresponding detection classes.
ieee automatic speech recognition and understanding workshop | 2011
Alfonso M. Canterla; Magne Hallstein Johnsen
This paper presents methods and results for optimizing subword detectors in continuous speech. Speech detectors are useful within areas like detection-based ASR, pronunciation training, phonetic analysis, word spotting, etc. We propose a new discriminative training criterion for subword unit detectors that is based on the Minimum Phone Error framework. The criterion can optimize the F-score or any other detection performance metric. The method is applied to the optimization of HMMs and MFCC filterbanks in phone detectors. The resulting filterbanks differ from each other and reflect acoustic properties of the corresponding detection classes. For the experiments in TIMIT, the best optimized detectors had a relative accuracy improvement of 31.3% over baseline and 18.2% over our previous MCE-based method.
nordic signal processing symposium | 2006
Ingunn Amdal; Magne Hallstein Johnsen; Torbjørn Svendsen
Accurate labeling and segmentation of the unit inventory database is of vital importance to the quality of unit selection text-to-speech synthesis. Misalignments and mismatch between the predicted and pronounced unit sequences require manual correction to achieve natural sounding synthesis. In this paper we have used a log likelihood ratio based utterance verification to automatically detect annotation errors in a Norwegian two-speaker synthesis database. Each sentence is assigned a confidence score and those falling below a threshold can be discarded or manually inspected and corrected. Using equal reject number as a criterion the transcription sentence error rate was reduced from 9.8% to 2.7%. Insertions are the largest error category, and 95.6% of these were detected. A closer inspection of false rejections was performed to assess (and improve) the phoneme prediction system
international conference on acoustics, speech, and signal processing | 2001
Narada D. Warakagoda; Magne Hallstein Johnsen
The work presented is centered around a speech production model called the chained dynamical system model (CDSM) which is motivated by the fundamental limitations of the mainstream ASR approaches. The CDSM is essentially a smoothly time varying continuous state nonlinear dynamical system, consisting of two sub dynamical systems coupled as a chain so that one system controls the parameters of the next system. The speech recognition problem is posed as inverting the CDSM, for which we propose a solution based on the theory of embedding. The resulting architecture, which we call inverted CDSM (ICDSM) is evaluated in a set of experiments involving a speaker independent, continuous speech recognition task on the TIMIT database. Results of these experiments which can be compared with the corresponding results in the literature, confirm the feasibility and advantages of the approach.
Journal of the Acoustical Society of America | 2010
Øystein Birkenes; Tomoko Matsui; Kunio Tanabe; Magne Hallstein Johnsen
Penalized logistic regression (PLR) is a well‐founded discriminative classifier with long roots in the history of statistics. Speech classification with PLR is possible with an appropriate choice of map from the space of feature vector sequences into the Euclidean space. In this talk, one such map is presented, namely, the one that maps into vectors consisting of log‐likelihoods computed from a set of hidden Markov models (HMMs). The use of this map in PLR leads to a powerful discriminative classifier that naturally handles the sequential data arising in speech classification. In the training phase, the HMM parameters and the regression parameters are jointly estimated by maximizing a penalized likelihood. The proposed approach is shown to be a generalization of conditional maximum likelihood (CML) and maximum mutual information (MMI) estimation for speech classification, leading to more flexible decision boundaries and higher classification accuracy. The posterior probabilities resulting from classificat...
conference of the international speech communication association | 2000
Magne Hallstein Johnsen; Torbjørn Svendsen; Tore Amble; Trym Holter; Erik Harborg
conference of the international speech communication association | 2000
Trym Holter; Erik Harborg; Magne Hallstein Johnsen; Torbjørn Svendsen