Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Petr Fousek is active.

Publication


Featured researches published by Petr Fousek.


international conference on acoustics, speech, and signal processing | 2008

Optimizing bottle-neck features for lvcsr

Frantisek Grezl; Petr Fousek

This work continues in development of the recently proposed bottle-neck features for ASR. A five-layers MLP used in bottleneck feature extraction allows to obtain arbitrary feature size without dimensionality reduction by transforms, independently on the MLP training targets. The MLP topology - number and sizes of layers, suitable training targets, the impact of output feature transforms, the need of delta features, and the dimensionality of the final feature vector are studied with respect to the best ASR result. Optimized features are employed in three LVCSR tasks: Arabic broadcast news, English conversational telephone speech and English meetings. Improvements over standard cepstral features and probabilistic MLP features are shown for different tasks and different neural net input representations. A significant improvement is observed when phoneme MLP training targets are replaced by phoneme states and when delta features are added.


text speech and dialogue | 2008

On the Use of MLP Features for Broadcast News Transcription

Petr Fousek; Lori Lamel; Jean-Luc Gauvain

Multi-Layer Perceptron (MLP) features have recently been attracting growing interest for automatic speech recognition due to their complementarity with cepstral features. In this paper the use of MLP features is evaluated in a large vocabulary continuous speech recognition task, exploring different types of MLP features and their combination. Cepstral features and three types of Bottle-Neck MLP features were first evaluated without and with unsupervised model adaptation using models with the same number of parameters. When used with MLLR adaption on a broadcast news Arabic transcription task, Bottle-Neck MLP features perform as well as or even slightly better than a standard 39 PLP based front-end. This paper also explores different combination schemes (feature concatenations, cross adaptation, and hypothesis combination). Extending the feature vector by combining various feature sets led to a 9% relative word error rate reduction relative to the PLP baseline. Significant gains are also reported with both ROVER hypothesis combination and cross-model adaptation. Feature concatenation appears to be the most efficient combination method, providing the best gain with the lowest decoding cost.


international conference on acoustics, speech, and signal processing | 2006

Towards ASR Based on Hierarchical Posterior-Based Keyword Recognition

Petr Fousek; Hynek Hermansky

The paper presents an alternative approach to automatic recognition of speech in which each targeted word is classified by a separate binary classifier against all other sounds. No time alignment is done. To build a recognizer for N words, N parallel binary classifiers are applied. The system first estimates uniformly sampled posterior probabilities of phoneme classes, followed by a second step in which a rather long sliding time window is applied to the phoneme posterior estimates and its content is classified by an artificial neural network to yield posterior probability of the keyword. On a small vocabulary ASR task, the system still does not reach the performance of the state-of-the-art system but its conceptual simplicity, the ease of adding new target words, and its inherent resistance to out-of-vocabulary sounds may prove significant advantage in many applications


text, speech and dialogue | 2005

The role of speech in multimodal human-computer interaction: towards reliable rejection of non-keyword input

Hynek Hermansky; Petr Fousek; Mikko Lehtonen

Natural audio-visual interface between human user and machine requires understanding of users audio-visual commands. This does not necessarily require full speech and image recognition. It does require, just as the interaction with any working animal does, that the machine is capable of reacting to certain particular sounds and/or gestures while ignoring the rest. Towards this end, we are working on sound identification and classification approaches that would ignore most of the acoustic input and react only to a particular sound (keyword).


conference of the international speech communication association | 2005

Multi-resolution RASTA filtering for TANDEM-based ASR

Hynek Hermansky; Petr Fousek


conference of the international speech communication association | 2008

Transcribing broadcast data using MLP features.

Petr Fousek; Lori Lamel; Jean-Luc Gauvain


conference of the international speech communication association | 2009

Modeling Northern and Southern Varieties of Dutch for STT

Julien Despres; Petr Fousek; Jean-Luc Gauvain; Yvan Josse; Lori Lamel; Abdelkhalek Messaoudi


international conference on spoken language processing | 2004

New Nonsense Syllables Database -- Analyses and Preliminary ASR Experiments

Petr Fousek; Petr Svojanovsky; Frantisek Grezl; Hynek Hermansky


conference of the international speech communication association | 2006

Data-Driven Design of Front-End Filter Bank for Lombard Speech Recognition

Hynek Bo; Petr Fousek


conference of the international speech communication association | 2014

Deep Scattering Spectra with Deep Neural Networks for LVCSR Tasks

Tara N. Sainath; Vijayaditya Peddinti; Brian Kingsbury; Petr Fousek; Bhuvana Ramabhadran; David Nahamoo

Collaboration


Dive into the Petr Fousek's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jean-Luc Gauvain

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Lori Lamel

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Frantisek Grezl

Brno University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge