Jan Cernocky
Brno University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jan Cernocky.
international conference on acoustics, speech, and signal processing | 2011
Tomas Mikolov; Stefan Kombrink; Lukas Burget; Jan Cernocky; Sanjeev Khudanpur
We present several modifications of the original recurrent neural network language model (RNN LM).While this model has been shown to significantly outperform many competitive language modeling techniques in terms of accuracy, the remaining problem is the computational complexity. In this work, we show approaches that lead to more than 15 times speedup for both training and testing phases. Next, we show importance of using a backpropagation through time algorithm. An empirical comparison with feedforward networks is also provided. In the end, we discuss possibilities how to reduce the amount of parameters in the model. The resulting RNN model can thus be smaller, faster both during training and testing, and more accurate than the basic one.
ieee automatic speech recognition and understanding workshop | 2011
Tomas Mikolov; Anoop Deoras; Daniel Povey; Lukas Burget; Jan Cernocky
We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10% relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens.
international conference on acoustics, speech, and signal processing | 2006
Petr Schwarz; Pavel Matejka; Jan Cernocky
This paper deals with phoneme recognition based on neural networks (NN). First, several approaches to improve the phoneme error rate are suggested and discussed. In the experimental part, we concentrate on temporal patterns (TRAPs) and novel split temporal context (STC) phoneme recognizers. We also investigate into tandem NN architectures. The results of the final system reported on standard TIMIT database compare favorably to the best published results
IEEE Transactions on Audio, Speech, and Language Processing | 2007
LukÁ¿ Burget; Pavel Matejka; Petr Schwarz; Ond¿ej Glembek; Jan Cernocky
In this paper, several feature extraction and channel compensation techniques found in state-of-the-art speaker verification systems are analyzed and discussed. For the NIST SRE 2006 submission, cepstral mean subtraction, feature warping, RelAtive SpecTrAl (RASTA) filtering, heteroscedastic linear discriminant analysis (HLDA), feature mapping, and eigenchannel adaptation were incrementally added to minimize the systems error rate. This paper deals with eigenchannel adaptation in more detail and includes its theoretical background and implementation issues. The key part of the paper is, however, the post-evaluation analysis, undermining a common myth that ldquothe more boxes in the scheme, the better the system.rdquo All results are presented on NIST Speaker Recognition Evaluation (SRE) 2005 and 2006 data.
international conference on acoustics, speech, and signal processing | 2011
Pavel Matejka; Ondrej Glembek; Fabio Castaldo; Md. Jahangir Alam; Oldrich Plchot; Patrick Kenny; Lukas Burget; Jan Cernocky
In this paper, we describe recent progress in i-vector based speaker verification. The use of universal background models (UBM) with full-covariance matrices is suggested and thoroughly experimentally tested. The i-vectors are scored using a simple cosine distance and advanced techniques such as Probabilistic Linear Discriminant Analysis (PLDA) and heavy-tailed variant of PLDA (PLDA-HT). Finally, we investigate into dimensionality reduction of i-vectors before entering the PLDA-HT modeling. The results are very competitive: on NIST 2010 SRE task, the results of a single full-covariance LDA-PLDA-HT system approach those of complex fused system.
international conference on acoustics, speech, and signal processing | 2006
Lukas Burget; Pavel Matejka; Jan Cernocky
This paper presents comparison of maximum likelihood (ML) and discriminative maximum mutual information (MMI) training for acoustic modeling in language identification (LID). Both approaches are compared on state-of-the-art shifted delta-cepstra features, the results are reported on data from NIST 2003 evaluations. Clear advantage of MMI over ML training is shown. Further improvements of acoustic LID are discussed: heteroscedastic linear discriminant analysis (HLDA) for feature de-correlation and dimensionality reduction and ergodic hidden Markov models (EHMM) for better modeling of dynamics in the acoustic space. The final error rate compares favorably to other results published on NIST 2003 data
2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006
Pavel Matejka; Lukas Burget; Petr Schwarz; Jan Cernocky
This paper presents the language identification (LID) system developed in Speech@FIT group at Brno University of Technology (BUT) for NIST 2005 Language Recognition Evaluation. The system consists of two parts: phonotactic and acoustic. Phonotactic system is based on hybrid phoneme recognizers trained on SpeechDat-E database. Phoneme lattices are used to train and test phonotactic language models. Further improvement is obtained by using anti-models. Acoustic system is based on GMM modeling trained under maximum mutual information framework. We describe both parts and provide a discussion of performance on LRE 2005 recognition task
international conference on acoustics, speech, and signal processing | 2008
Lukas Burget; Petr Schwarz; Pavel Matejka; Mirko Hannemann; Ariya Rastrow; Christopher M. White; Sanjeev Khudanpur; Hynek Hermansky; Jan Cernocky
This paper addresses the detection of OOV segments in the output of a large vocabulary continuous speech recognition (LVCSR) system. First, standard confidence measures from frame-based word- and phone-posteriors are investigated. Substantial improvement is obtained when posteriors from two systems - strongly constrained (LVCSR) and weakly constrained (phone posterior estimator) are combined. We show that this approach is also suitable for detection of general recognition errors. All results are presented on WSJ task with reduced recognition vocabulary.
spoken language technology workshop | 2008
Igor Szöke; Lukas Burget; Jan Cernocky; Michal Fapso
This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more than 7% relative.
ieee automatic speech recognition and understanding workshop | 2011
Martin Karafiát; Lukas Burget; Pavel Matejka; Ondrej Glembek; Jan Cernocky
We presented a novel technique for discriminative feature-level adaptation of automatic speech recognition system. The concept of iVectors popular in Speaker Recognition is used to extract information about speaker or acoustic environment from speech segment. iVector is a low-dimensional fixed-length representing such information. To utilized iVectors for adaptation, Region Dependent Linear Transforms (RDLT) are discriminatively trained using MPE criterion on large amount of annotated data to extract the relevant information from iVectors and to compensate speech feature. The approach was tested on standard CTS data. We found it to be complementary to common adaptation techniques. On a well tuned RDLT system with standard CMLLR adaptation we reached 0.8% additive absolute WER improvement.