Klaus Reinhard | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Klaus Reinhard is active.

Explore More

Publication

Featured researches published by Klaus Reinhard.

Speech Communication | 1999

Parametric subspace modeling of speech transitions

Klaus Reinhard; Mahesan Niranjan

Abstract This paper describes an attempt at capturing segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. In approaches such as recurrent neural networks there is the hope, but not the convincing demonstration, that such transitional information could be captured. The method presented here starts from the very different position of explicitly capturing the trajectory of short time spectral parameter vectors on a subspace in which the temporal sequence information is preserved. This was approached by introducing a temporal constraint into the well known technique of Principal Component Analysis (PCA). On this subspace, an attempt of parametric modelling the trajectory was made, and a distance metric was computed to perform classification of diphones. Using the Principal Curves method of Hastie and Stuetzle and the Generative Topographic map (GTM) technique of Bishop, Svensen and Williams as description of the temporal evolution in terms of latent variables was performed. On the difficult problem of /bee/, /dee/, /gee/ it was possible to retain discriminatory information with a small number of parameters. Experimental illustrations present results on ISOLET and TIMIT database.

international conference on acoustics, speech, and signal processing | 2006

Robust Endpoint Detection for Speech Recognition Based on Discriminative Feature Extraction

Koichi Yamamoto; Firas Jabloun; Klaus Reinhard; Akinori Kawamura

Accurate endpoint detection is important for improving the speech recognition capability. This paper proposes a novel endpoint detection method which combines energy-based and likelihood ratio-based voice activity detection (VAD) criteria, where the likelihood ratio is calculated with speech/non-speech Gaussian mixture models (GMMs). Moreover, the proposed method introduces the discriminative feature extraction technique (DFE) in order to improve the speech/non-speech classification. The DFE is used in the training of parameters required for calculating the likelihood ratio. Experimental results have shown that the proposed endpointer achieves good performance compared to an energy-based endpointer in terms of start-of-speech (SOS) and end-of-speech (EOS) detections. Due to the improvement of the endpointer, the performance of automatic speech recognition (ASR) has also been improved

international conference on acoustics speech and signal processing | 1998

Parametric subspace modelling of speech transitions

Klaus Reinhard; Mahesan Niranjan

We report on attempting to capture segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. In attempts such as recurrent neural networks there is the hope, but not convincing demonstration, that such transitional information could be captured. We start from the very different position of explicitly capturing the trajectory of short time spectral parameter vectors on a subspace in which the temporal sequence information is preserved (time constrained principal component analysis). On this subspace, we attempt a parametric modelling of the trajectory, and compute a distance metric to perform classification of diphones. Much of the discriminant information is still retained in this subspace. This is illustrated on the isolated transitions /bee/,/dee/ and /gee/.

international conference on acoustics speech and signal processing | 1999

Diphone multi-trajectory subspace models

Klaus Reinhard; Mahesan Niranjan

We report on the extension of capturing speech transitions embedded in diphones using trajectory models. The slowly varying dynamics of spectral trajectories carry much discriminant information that is very crudely modelled by traditional approaches such as HMMs. We improved our methodology of explicitly capturing the trajectory of short time spectral parameter vectors introducing multi-trajectory concepts in a probabilistic framework. Optimal subspace selection is presented which finds the most discriminant plane for classification. Using the E-set from the TIMIT database results suggest that discriminant information is preserved in the subspace.

international conference on acoustics, speech, and signal processing | 2000

Matched filter design for diphone subspace models

Klaus Reinhard; Mahesan Niranjan

Considering the perceptual importance of phonetic transitions as minimal contextual variant units, this paper addresses the problem by modelling explicitly interphone dynamics covered in diphones. Subspace projections based on a time-constrained PCA (TC-PCA) are developed which focus on the temporal evolution. They reveal characteristic trajectories present in a low-dimensional spectral representation facilitating robust parameter estimation and simultaneously optimise the discriminant information. A matched filter design is applied to a multiple hypotheses rescoring scheme which enables operating in very low-dimensional parameter space. Using such multiple hypotheses paradigm the complementary information effectiveness of modelling explicitly inter-phone dynamics covered in diphones can be shown using the TIMIT database, resulting in improved phone error rates.

Archive | 1998