Petr Fousek
Idiap Research Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Petr Fousek.
international conference on acoustics, speech, and signal processing | 2008
Frantisek Grezl; Petr Fousek
This work continues in development of the recently proposed bottle-neck features for ASR. A five-layers MLP used in bottleneck feature extraction allows to obtain arbitrary feature size without dimensionality reduction by transforms, independently on the MLP training targets. The MLP topology - number and sizes of layers, suitable training targets, the impact of output feature transforms, the need of delta features, and the dimensionality of the final feature vector are studied with respect to the best ASR result. Optimized features are employed in three LVCSR tasks: Arabic broadcast news, English conversational telephone speech and English meetings. Improvements over standard cepstral features and probabilistic MLP features are shown for different tasks and different neural net input representations. A significant improvement is observed when phoneme MLP training targets are replaced by phoneme states and when delta features are added.
text speech and dialogue | 2008
Petr Fousek; Lori Lamel; Jean-Luc Gauvain
Multi-Layer Perceptron (MLP) features have recently been attracting growing interest for automatic speech recognition due to their complementarity with cepstral features. In this paper the use of MLP features is evaluated in a large vocabulary continuous speech recognition task, exploring different types of MLP features and their combination. Cepstral features and three types of Bottle-Neck MLP features were first evaluated without and with unsupervised model adaptation using models with the same number of parameters. When used with MLLR adaption on a broadcast news Arabic transcription task, Bottle-Neck MLP features perform as well as or even slightly better than a standard 39 PLP based front-end. This paper also explores different combination schemes (feature concatenations, cross adaptation, and hypothesis combination). Extending the feature vector by combining various feature sets led to a 9% relative word error rate reduction relative to the PLP baseline. Significant gains are also reported with both ROVER hypothesis combination and cross-model adaptation. Feature concatenation appears to be the most efficient combination method, providing the best gain with the lowest decoding cost.
international conference on acoustics, speech, and signal processing | 2006
Petr Fousek; Hynek Hermansky
The paper presents an alternative approach to automatic recognition of speech in which each targeted word is classified by a separate binary classifier against all other sounds. No time alignment is done. To build a recognizer for N words, N parallel binary classifiers are applied. The system first estimates uniformly sampled posterior probabilities of phoneme classes, followed by a second step in which a rather long sliding time window is applied to the phoneme posterior estimates and its content is classified by an artificial neural network to yield posterior probability of the keyword. On a small vocabulary ASR task, the system still does not reach the performance of the state-of-the-art system but its conceptual simplicity, the ease of adding new target words, and its inherent resistance to out-of-vocabulary sounds may prove significant advantage in many applications
text, speech and dialogue | 2005
Hynek Hermansky; Petr Fousek; Mikko Lehtonen
Natural audio-visual interface between human user and machine requires understanding of users audio-visual commands. This does not necessarily require full speech and image recognition. It does require, just as the interaction with any working animal does, that the machine is capable of reacting to certain particular sounds and/or gestures while ignoring the rest. Towards this end, we are working on sound identification and classification approaches that would ignore most of the acoustic input and react only to a particular sound (keyword).
conference of the international speech communication association | 2005
Hynek Hermansky; Petr Fousek
conference of the international speech communication association | 2008
Petr Fousek; Lori Lamel; Jean-Luc Gauvain
conference of the international speech communication association | 2009
Julien Despres; Petr Fousek; Jean-Luc Gauvain; Yvan Josse; Lori Lamel; Abdelkhalek Messaoudi
international conference on spoken language processing | 2004
Petr Fousek; Petr Svojanovsky; Frantisek Grezl; Hynek Hermansky
conference of the international speech communication association | 2006
Hynek Bo; Petr Fousek
conference of the international speech communication association | 2014
Tara N. Sainath; Vijayaditya Peddinti; Brian Kingsbury; Petr Fousek; Bhuvana Ramabhadran; David Nahamoo