Klaus Reinhard
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Klaus Reinhard.
Speech Communication | 1999
Klaus Reinhard; Mahesan Niranjan
Abstract This paper describes an attempt at capturing segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. In approaches such as recurrent neural networks there is the hope, but not the convincing demonstration, that such transitional information could be captured. The method presented here starts from the very different position of explicitly capturing the trajectory of short time spectral parameter vectors on a subspace in which the temporal sequence information is preserved. This was approached by introducing a temporal constraint into the well known technique of Principal Component Analysis (PCA). On this subspace, an attempt of parametric modelling the trajectory was made, and a distance metric was computed to perform classification of diphones. Using the Principal Curves method of Hastie and Stuetzle and the Generative Topographic map (GTM) technique of Bishop, Svensen and Williams as description of the temporal evolution in terms of latent variables was performed. On the difficult problem of /bee/, /dee/, /gee/ it was possible to retain discriminatory information with a small number of parameters. Experimental illustrations present results on ISOLET and TIMIT database.
international conference on acoustics, speech, and signal processing | 2006
Koichi Yamamoto; Firas Jabloun; Klaus Reinhard; Akinori Kawamura
Accurate endpoint detection is important for improving the speech recognition capability. This paper proposes a novel endpoint detection method which combines energy-based and likelihood ratio-based voice activity detection (VAD) criteria, where the likelihood ratio is calculated with speech/non-speech Gaussian mixture models (GMMs). Moreover, the proposed method introduces the discriminative feature extraction technique (DFE) in order to improve the speech/non-speech classification. The DFE is used in the training of parameters required for calculating the likelihood ratio. Experimental results have shown that the proposed endpointer achieves good performance compared to an energy-based endpointer in terms of start-of-speech (SOS) and end-of-speech (EOS) detections. Due to the improvement of the endpointer, the performance of automatic speech recognition (ASR) has also been improved
international conference on acoustics speech and signal processing | 1998
Klaus Reinhard; Mahesan Niranjan
We report on attempting to capture segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. In attempts such as recurrent neural networks there is the hope, but not convincing demonstration, that such transitional information could be captured. We start from the very different position of explicitly capturing the trajectory of short time spectral parameter vectors on a subspace in which the temporal sequence information is preserved (time constrained principal component analysis). On this subspace, we attempt a parametric modelling of the trajectory, and compute a distance metric to perform classification of diphones. Much of the discriminant information is still retained in this subspace. This is illustrated on the isolated transitions /bee/,/dee/ and /gee/.
international conference on acoustics speech and signal processing | 1999
Klaus Reinhard; Mahesan Niranjan
We report on the extension of capturing speech transitions embedded in diphones using trajectory models. The slowly varying dynamics of spectral trajectories carry much discriminant information that is very crudely modelled by traditional approaches such as HMMs. We improved our methodology of explicitly capturing the trajectory of short time spectral parameter vectors introducing multi-trajectory concepts in a probabilistic framework. Optimal subspace selection is presented which finds the most discriminant plane for classification. Using the E-set from the TIMIT database results suggest that discriminant information is preserved in the subspace.
international conference on acoustics, speech, and signal processing | 2000
Klaus Reinhard; Mahesan Niranjan
Considering the perceptual importance of phonetic transitions as minimal contextual variant units, this paper addresses the problem by modelling explicitly interphone dynamics covered in diphones. Subspace projections based on a time-constrained PCA (TC-PCA) are developed which focus on the temporal evolution. They reveal characteristic trajectories present in a low-dimensional spectral representation facilitating robust parameter estimation and simultaneously optimise the discriminant information. A matched filter design is applied to a multiple hypotheses rescoring scheme which enables operating in very low-dimensional parameter space. Using such multiple hypotheses paradigm the complementary information effectiveness of modelling explicitly inter-phone dynamics covered in diphones can be shown using the TIMIT database, resulting in improved phone error rates.
Archive | 1998
Klaus Reinhard; Mahesan Niranjan
Speech Communication | 2002
Klaus Reinhard; Mahesan Niranjan
international conference on artificial neural networks | 1997
Klaus Reinhard; Mahesan Niranjan
conference of the international speech communication association | 2015
Caroline Kaufhold; Vadim Gamidov; Andreas Kießling; Klaus Reinhard; Elmar Nöth
Archive | 2004
Jochen Junkawitsch; Raymond Brückner; Klaus Reinhard; Stefan Dobler