Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Frantisek Grezl is active.

Publication


Featured researches published by Frantisek Grezl.


international conference on acoustics, speech, and signal processing | 2008

Optimizing bottle-neck features for lvcsr

Frantisek Grezl; Petr Fousek

This work continues in development of the recently proposed bottle-neck features for ASR. A five-layers MLP used in bottleneck feature extraction allows to obtain arbitrary feature size without dimensionality reduction by transforms, independently on the MLP training targets. The MLP topology - number and sizes of layers, suitable training targets, the impact of output feature transforms, the need of delta features, and the dimensionality of the final feature vector are studied with respect to the best ASR result. Optimized features are employed in three LVCSR tasks: Arabic broadcast news, English conversational telephone speech and English meetings. Improvements over standard cepstral features and probabilistic MLP features are shown for different tasks and different neural net input representations. A significant improvement is observed when phoneme MLP training targets are replaced by phoneme states and when delta features are added.


spoken language technology workshop | 2012

The language-independent bottleneck features

Karel Vesely; Martin Karafiát; Frantisek Grezl; Milos Janda; Ekaterina Egorova

In this paper we present novel language-independent bottleneck (BN) feature extraction framework. In our experiments we have used Multilingual Artificial Neural Network (ANN), where each language is modelled by separate output layer, while all the hidden layers jointly model the variability of all the source languages. The key idea is that the entire ANN is trained on all the languages simultaneously, thus the BN-features are not biased towards any of the languages. Exactly for this reason, the final BN-features are considered as language independent. In the experiments with GlobalPhone database, we show that Multilingual BN-features consistently outperform Monolingual BN-features. Also, cross-lingual generalization is evaluated, where we train on 5 source languages and test on 3 other languages. The results show that the ANN can produce very good BN-features even for unseen languages, in some cases even better than if we trained the ANN on the target language only.


international conference on acoustics, speech, and signal processing | 2006

Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons

Andreas Stolcke; Frantisek Grezl; Mei-Yuh Hwang; Xin Lei; Nelson Morgan; Dimitra Vergyri

Recent results with phone-posterior acoustic features estimated by multilayer perceptrons (MLPs) have shown that such features can effectively improve the accuracy of state-of-the-art large vocabulary speech recognition systems. MLP features are trained discriminatively to perform phone classification and are therefore, like acoustic models, tuned to a particular language and application domain. In this paper we investigate how portable such features are across domains and languages. We show that even without retraining, English-trained MLP features can provide a significant boost to recognition accuracy in new domains within the same language, as well as in entirely different languages such as Mandarin and Arabic. We also show the effectiveness of feature-level adaptation in porting MLP features to new domains


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Transcribing Meetings With the AMIDA Systems

Thomas Hain; Lukas Burget; John Dines; Philip N. Garner; Frantisek Grezl; Asmaa El Hannani; Marijn Huijbregts; Martin Karafiát; Mike Lincoln; Vincent Wan

In this paper, we give an overview of the AMIDA systems for transcription of conference and lecture room meetings. The systems were developed for participation in the Rich Transcription evaluations conducted by the National Institute for Standards and Technology in the years 2007 and 2009 and can process close talking and far field microphone recordings. The paper first discusses fundamental properties of meeting data with special focus on the AMI/AMIDA corpora. This is followed by a description and analysis of improved processing and modeling, with focus on techniques specifically addressing meeting transcription issues such as multi-room recordings or domain variability. In 2007 and 2009, two different strategies of systems building were followed. While in 2007 we used our traditional style system design based on cross adaptation, the 2009 systems were constructed semi-automatically, supported by improved decoders and a new method for system representation. Overall these changes gave a 6%-13% relative reduction in word error rate compared to our 2007 results while at the same time requiring less training material and reducing the real-time factor by five times. The meeting transcription systems are available at www.webasr.org.


ieee automatic speech recognition and understanding workshop | 2013

Score normalization and system combination for improved keyword spotting

Damianos Karakos; Richard M. Schwartz; Stavros Tsakalidis; Le Zhang; Shivesh Ranjan; Tim Ng; Roger Hsiao; Guruprasad Saikumar; Ivan Bulyko; Long Nguyen; John Makhoul; Frantisek Grezl; Mirko Hannemann; Martin Karafiát; Igor Szöke; Karel Vesely; Lori Lamel; Viet-Bac Le

We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system combination, where the detections of multiple systems are merged together, and their scores are interpolated with weights which are optimized using MTWV as the maximization criterion. Both score normalization and system combination approaches show that significant gains in ATWV/MTWV can be obtained, sometimes on the order of 8-10 points (absolute), in five different languages. A variant of these methods resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.


ieee automatic speech recognition and understanding workshop | 2011

Convolutive Bottleneck Network features for LVCSR

Karel Vesely; Martin Karafiát; Frantisek Grezl

In this paper, we focus on improvements of the bottleneck ANN in a Tandem LVCSR system. First, the influence of training set size and the ANN size is evaluated. Second, a very positive effect of linear bottleneck is shown. Finally a Convolutive Bottleneck Network is proposed as extension of the current state-of-the-art Universal Context Network. The proposed training method leads to 5.5% relative reduction of WER, compared to the Universal Context ANN baseline. The relative improvement compared to the 5-layer single-bottleneck network is 17.7%. The dataset ctstrain07 composed of more than 2000 hours of English Conversational Telephone Speech was used for the experiments. The TNet toolkit with CUDA GPGPU implementation was used for fast training.


international conference on machine learning | 2005

Further progress in meeting recognition: the ICSI-SRI spring 2005 speech-to-text evaluation system

Andreas Stolcke; Xavier Anguera; Kofi Boakye; Özgür Çetin; Frantisek Grezl; Adam Janin; Arindam Mandal; Barbara Peskin; Chuck Wooters; Jing Zheng

We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversational telephone speech (CTS) recognition system, with meeting-adapted models and various audio preprocessing steps. This years system features better delay-sum processing of distant microphone channels and energy-based crosstalk suppression for close-talking microphones. Acoustic modeling is improved by virtue of various enhancements to the background (CTS) models, including added training data, decision-tree based state tying, and the inclusion of discriminatively trained phone posterior features estimated by multilayer perceptrons. In particular, we make use of adaptation of both acoustic models and MLP features to the meeting domain. For distant microphone recognition we obtained considerable gains by combining and cross-adapting narrow-band (telephone) acoustic models with broadband (broadcast news) models. Language models (LMs) were improved with the inclusion of new meeting and web data. In spite of a lack of training data, we created effective LMs for the CHIL lecture domain. Results are reported on RT-04S and RT-05S meeting data. Measured on RT-04S conference data, we achieved an overall improvement of 17% relative in both MDM and IHM conditions compared to last years evaluation system. Results on lecture data are comparable to the best reported results for that task.


ieee automatic speech recognition and understanding workshop | 2011

Study of probabilistic and Bottle-Neck features in multilingual environment

Frantisek Grezl; Martin Karafiát; Milos Janda

This study is focused on the performance of Probabilistic and Bottle-Neck features on different language than they were trained for. It is shown, that such porting is possible and that the features are still competitive to PLP features. Further, several combination techniques are evaluated. The performance of combined features is close to the best performing system. Finally, bigger NNs were trained on large data from different domain. The resulting features outperformed previously trained systems and combination with them further improved the system performance.


international conference on acoustics, speech, and signal processing | 2014

Adaptation of multilingual stacked bottle-neck neural network structure for new language

Frantisek Grezl; Martin Karafiát; Karel Vesely

The neural network based features became an inseparable part of state-of-the-art LVCSR systems. In order to perform well, the network has to be trained on a large amount of in-domain data. With the increasing emphasis on fast development of ASR system on limited resources, there is an effort to alleviate the need of in-domain data. To evaluate the effectiveness of other resources, we have trained the Stacked Bottle-Neck neural networks structure on multilingual data investigating several training strategies while treating the target language as the unseen one. Further, the systems were adapted to the target language by re-training. Finally, we evaluated the effect of adaptation of individual NNs in the Stacked Bottle-Neck structure to find out the optimal adaptation strategy. We have shown that the adaptation can significantly improve system performance over both, the multilingual network and network trained only on target data. The experiments were performed on Babel Year 1 data.


text speech and dialogue | 2010

Parallel training of neural networks for speech recognition

Karel Veselý; Lukas Burget; Frantisek Grezl

The feed-forward multi-layer neural networks have significant importance in speech recognition. A new parallel-training tool TNet was designed and optimized for multiprocessor computers. The training acceleration rates are reported on a phoneme-state classification task.

Collaboration


Dive into the Frantisek Grezl's collaboration.

Top Co-Authors

Avatar

Martin Karafiát

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Lukas Burget

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jan Cernocky

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Igor Szöke

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jan Cernocký

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Pavel Matejka

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Karel Veselý

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Karel Vesely

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Ondrej Glembek

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Petr Schwarz

Brno University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge