Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Petr Schwarz is active.

Publication


Featured researches published by Petr Schwarz.


Computer Speech & Language | 2011

The subspace Gaussian mixture model-A structured model for speech recognition

Daniel Povey; Lukas Burget; Mohit Agarwal; Pinar Akyazi; Feng Kai; Arnab Ghoshal; Ondřej Glembek; Nagendra Goel; Martin Karafiát; Ariya Rastrow; Richard C. Rose; Petr Schwarz; Samuel Thomas

We describe a new approach to speech recognition, in which all Hidden Markov Model (HMM) states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state. The model is defined by vectors associated with each state with a dimension of, say, 50, together with a global mapping from this vector space to the space of parameters of the GMM. This model appears to give better results than a conventional model, and the extra structure offers many new opportunities for modeling innovations while maintaining compatibility with most standard techniques.


international conference on acoustics, speech, and signal processing | 2006

Hierarchical Structures of Neural Networks for Phoneme Recognition

Petr Schwarz; Pavel Matejka; Jan Cernocky

This paper deals with phoneme recognition based on neural networks (NN). First, several approaches to improve the phoneme error rate are suggested and discussed. In the experimental part, we concentrate on temporal patterns (TRAPs) and novel split temporal context (STC) phoneme recognizers. We also investigate into tandem NN architectures. The results of the final system reported on standard TIMIT database compare favorably to the best published results


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System

LukÁ¿ Burget; Pavel Matejka; Petr Schwarz; Ond¿ej Glembek; Jan Cernocky

In this paper, several feature extraction and channel compensation techniques found in state-of-the-art speaker verification systems are analyzed and discussed. For the NIST SRE 2006 submission, cepstral mean subtraction, feature warping, RelAtive SpecTrAl (RASTA) filtering, heteroscedastic linear discriminant analysis (HLDA), feature mapping, and eigenchannel adaptation were incrementally added to minimize the systems error rate. This paper deals with eigenchannel adaptation in more detail and includes its theoretical background and implementation issues. The key part of the paper is, however, the post-evaluation analysis, undermining a common myth that ldquothe more boxes in the scheme, the better the system.rdquo All results are presented on NIST Speaker Recognition Evaluation (SRE) 2005 and 2006 data.


international conference on acoustics, speech, and signal processing | 2010

Subspace Gaussian Mixture Models for speech recognition

Daniel Povey; Lukśš Burget; Mohit Agarwal; Pinar Akyazi; Kai Feng; Arnab Ghoshal; Ondřej Glembek; Nagendra Goel; Martin Karafiát; Ariya Rastrow; Richard C. Rose; Petr Schwarz; Samuel Thomas

We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subspace. This style of acoustic model allows for a much more compact representation and gives better results than a conventional modeling approach, particularly with smaller amounts of training data.


text speech and dialogue | 2004

Towards Lower Error Rates in Phoneme Recognition

Petr Schwarz; Pavel Matějka; Jan Cernocký

We investigate techniques for acoustic modeling in automatic recognition of context-independent phoneme strings from the TIMIT database. The baseline phoneme recognizer is based on TempoRAl Patterns (TRAP). This recognizer is simplified to shorten processing times and reduce computational requirements. More states per phoneme and bi-gram language models are incorporated into the system and evaluated. The question of insufficient amount of training data is discussed and the system is improved. All modifications lead to a faster system with about 23.6% relative improvement over the baseline in phoneme error rate.


2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006

Brno University of Technology System for NIST 2005 Language Recognition Evaluation

Pavel Matejka; Lukas Burget; Petr Schwarz; Jan Cernocky

This paper presents the language identification (LID) system developed in Speech@FIT group at Brno University of Technology (BUT) for NIST 2005 Language Recognition Evaluation. The system consists of two parts: phonotactic and acoustic. Phonotactic system is based on hybrid phoneme recognizers trained on SpeechDat-E database. Phoneme lattices are used to train and test phonotactic language models. Further improvement is obtained by using anti-models. Acoustic system is based on GMM modeling trained under maximum mutual information framework. We describe both parts and provide a discussion of performance on LRE 2005 recognition task


international conference on acoustics, speech, and signal processing | 2008

Combination of strongly and weakly constrained recognizers for reliable detection of OOVS

Lukas Burget; Petr Schwarz; Pavel Matejka; Mirko Hannemann; Ariya Rastrow; Christopher M. White; Sanjeev Khudanpur; Hynek Hermansky; Jan Cernocky

This paper addresses the detection of OOV segments in the output of a large vocabulary continuous speech recognition (LVCSR) system. First, standard confidence measures from frame-based word- and phone-posteriors are investigated. Substantial improvement is obtained when posteriors from two systems - strongly constrained (LVCSR) and weakly constrained (phone posterior estimator) are combined. We show that this approach is also suitable for detection of general recognition errors. All results are presented on WSJ task with reduced recognition vocabulary.


international conference on biometrics | 2013

The 2013 speaker recognition evaluation in mobile environment

Elie Khoury; B. Vesnicer; Javier Franco-Pedroso; Ricardo Paranhos Velloso Violato; Z. Boulkcnafet; L. M. Mazaira Fernandez; Mireia Diez; J. Kosmala; Houssemeddine Khemiri; T. Cipr; Rahim Saeidi; Manuel Günther; J. Zganec-Gros; R. Zazo Candil; Flávio Olmos Simões; M. Bengherabi; A. Alvarez Marquina; Mikel Penagarikano; Alberto Abad; M. Boulayemen; Petr Schwarz; D.A. van Leeuwen; J. Gonzalez-Dominguez; M. Uliani Neto; E. Boutellaa; P. Gómez Vilda; Amparo Varona; Dijana Petrovska-Delacrétaz; Pavel Matejka; Joaquin Gonzalez-Rodriguez

This paper evaluates the performance of the twelve primary systems submitted to the evaluation on speaker verification in the context of a mobile environment using the MOBIO database. The mobile environment provides a challenging and realistic test-bed for current state-of-the-art speaker verification techniques. Results in terms of equal error rate (EER), half total error rate (HTER) and detection error trade-off (DET) confirm that the best performing systems are based on total variability modeling, and are the fusion of several sub-systems. Nevertheless, the good old UBM-GMM based systems are still competitive. The results also show that the use of additional data for training as well as gender-dependent features can be helpful.


text speech and dialogue | 2005

Phoneme based acoustics keyword spotting in informal continuous speech

Igor Szöke; Petr Schwarz; Pavel Matějka; Lukas Burget; Martin Karafiát; Jan Cernocký

This paper describes several ways of acoustic keywords spotting (KWS), based on Gaussian mixture model (GMM) hidden Markov models (HMM) and phoneme posterior probabilities from FeatureNet. Context-independent and dependent phoneme models are used in the GMM/HMM system. The systems were trained and evaluated on informal continuous speech. We used different complexities of KWS recognition network and different types of phoneme models. We study the impact of these parameters on the accuracy and computational complexity, an conclude that phoneme posteriors outperform conventional GMM/HMM system.


international conference on acoustics, speech, and signal processing | 2010

Approaches to automatic lexicon learning with limited training examples

Nagendra Goel; Samuel Thomas; Mohit Agarwal; Pinar Akyazi; Lukas Burget; Kai Feng; Arnab Ghoshal; Ondřej Glembek; Martin Karafiát; Daniel Povey; Ariya Rastrow; Richard C. Rose; Petr Schwarz

Preparation of a lexicon for speech recognition systems can be a significant effort in languages where the written form is not exactly phonetic. On the other hand, in languages where the written form is quite phonetic, some common words are often mispronounced. In this paper, we use a combination of lexicon learning techniques to explore whether a lexicon can be learned when only a small lexicon is available for boot-strapping. We discover that for a phonetic language such as Spanish, it is possible to do that better than what is possible from generic rules or hand-crafted pronunciations. For a more complex language such as English, we find that it is still possible but with some loss of accuracy.

Collaboration


Dive into the Petr Schwarz's collaboration.

Top Co-Authors

Avatar

Lukas Burget

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Pavel Matejka

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Martin Karafiát

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jan Cernocky

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jan Cernocký

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Igor Szöke

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Michal Fapso

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Ondrej Glembek

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Daniel Povey

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge