Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Oldrich Plchot is active.

Publication


Featured researches published by Oldrich Plchot.


international conference on acoustics, speech, and signal processing | 2011

Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification

Lukas Burget; Oldrich Plchot; Sandro Cumani; Ondrej Glembek; Pavel Matejka; Niko Brümmer

Recently, i-vector extraction and Probabilistic Linear Discriminant Analysis (PLDA) have proven to provide state-of-the-art speaker verification performance. In this paper, the speaker verification score for a pair of i-vectors representing a trial is computed with a functional form derived from the successful PLDA generative model. In our case, however, parameters of this function are estimated based on a discriminative training criterion. We propose to use the objective function to directly address the task in speaker verification: discrimination between same-speaker and different-speaker trials. Compared with a baseline which uses a generatively trained PLDA model, discriminative training provides up to 40% relative improvement on the NIST SRE 2010 evaluation task.


international conference on acoustics, speech, and signal processing | 2011

Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification

Pavel Matejka; Ondrej Glembek; Fabio Castaldo; Md. Jahangir Alam; Oldrich Plchot; Patrick Kenny; Lukas Burget; Jan Cernocky

In this paper, we describe recent progress in i-vector based speaker verification. The use of universal background models (UBM) with full-covariance matrices is suggested and thoroughly experimentally tested. The i-vectors are scored using a simple cosine distance and advanced techniques such as Probabilistic Linear Discriminant Analysis (PLDA) and heavy-tailed variant of PLDA (PLDA-HT). Finally, we investigate into dimensionality reduction of i-vectors before entering the PLDA-HT modeling. The results are very competitive: on NIST 2010 SRE task, the results of a single full-covariance LDA-PLDA-HT system approach those of complex fused system.


international conference on acoustics, speech, and signal processing | 2014

Automatic language identification using deep neural networks

Ignacio Lopez-Moreno; Javier Gonzalez-Dominguez; Oldrich Plchot; David Martinez; Joaquin Gonzalez-Rodriguez; Pedro J. Moreno

This work studies the use of deep neural networks (DNNs) to address automatic language identification (LID). Motivated by their recent success in acoustic modelling, we adapt DNNs to the problem of identifying the language of a given spoken utterance from short-term acoustic features. The proposed approach is compared to state-of-the-art i-vector based acoustic systems on two different datasets: Google 5M LID corpus and NIST LRE 2009. Results show how LID can largely benefit from using DNNs, especially when a large amount of training data is available. We found relative improvements up to 70%, in Cavg, over the baseline system.


international conference on acoustics, speech, and signal processing | 2014

Domain adaptation via within-class covariance correction in I-vector based speaker recognition systems

Ondrej Glembek; Jeff Z. Ma; Pavel Matejka; Bing Zhang; Oldrich Plchot; Lukas Burget; Spyros Matsoukas

In this paper we propose a technique of Within-Class Covariance Correction (WCC) for Linear Discriminant Analysis (LDA) in Speaker Recognition to perform an unsupervised adaptation of LDA to an unseen data domain, and/or to compensate for speaker population difference among different portions of LDA training dataset. The paper follows on the study of source-normalization and inter-database variability compensation techniques which deal with multimodal distribution of i-vectors. On the DARPA RATS (Robust Automatic Transcription of Speech) task, we show that, with two hours of unsupervised data, we improve the Equal-Error Rate (EER) by 17.5%, and 36% relative on the unmatched and semi-matched conditions, respectively. On the Domain Adaptation Challenge we show up to 70% relative EER reduction and we propose a data clustering procedure to identify the directions of the domain-based variability in the adaptation data.


international conference on acoustics, speech, and signal processing | 2016

Analysis of DNN approaches to speaker identification

Pavel Matejka; Ondrej Glembek; Ondrej Novotny; Oldrich Plchot; Frantisek Grezl; Lukas Burget; Jan Cernocky

This work studies the usage of the Deep Neural Network (DNN) Bottleneck (BN) features together with the traditional MFCC features in the task of i-vector-based speaker recognition. We decouple the sufficient statistics extraction by using separate GMM models for frame alignment, and for statistics normalization and we analyze the usage of BN and MFCC features (and their concatenation) in the two stages. We also show the effect of using full-covariance GMM models, and, as a contrast, we compare the result to the recent DNN-alignment approach. On the NIST SRE2010, telephone condition, we show 60% relative gain over the traditional MFCC baseline for EER (and similar for the NIST DCF metrics), resulting in 0.94% EER.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Pairwise Discriminative Speaker Verification in the

Sandro Cumani; Niko Brümmer; Lukas Burget; Pietro Laface; Oldrich Plchot; Vasileios Vasilakakis

This work presents a new and efficient approach to discriminative speaker verification in the i-vector space. We illustrate the development of a linear discriminative classifier that is trained to discriminate between the hypothesis that a pair of feature vectors in a trial belong to the same speaker or to different speakers. This approach is alternative to the usual discriminative setup that discriminates between a speaker and all the other speakers. We use a discriminative classifier based on a Support Vector Machine (SVM) that is trained to estimate the parameters of a symmetric quadratic function approximating a log-likelihood ratio score without explicit modeling of the i-vector distributions as in the generative Probabilistic Linear Discriminant Analysis (PLDA) models. Training these models is feasible because it is not necessary to expand the i -vector pairs, which would be expensive or even impossible even for medium sized training sets. The results of experiments performed on the tel-tel extended core condition of the NIST 2010 Speaker Recognition Evaluation are competitive with the ones obtained by generative models, in terms of normalized Detection Cost Function and Equal Error Rate. Moreover, we show that it is possible to train a gender-independent discriminative model that achieves state-of-the-art accuracy, comparable to the one of a gender-dependent system, saving memory and execution time both in training and in testing.


international conference on acoustics, speech, and signal processing | 2013

{\rm I}

Oldrich Plchot; Spyros Matsoukas; Pavel Matejka; Najim Dehak; Jeff Z. Ma; Sandro Cumani; Ondrej Glembek; Hynek Hermansky; Sri Harish Reddy Mallidi; Nima Mesgarani; Richard M. Schwartz; Mehdi Soufifar; Zheng-Hua Tan; Samuel Thomas; Bing Zhang; Xinhui Zhou

This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present results using multiple SID systems differing mainly in the algorithm used for voice activity detection (VAD) and feature extraction. We show that (a) unsupervised VAD performs as well supervised methods in terms of downstream SID performance, (b) noise-robust feature extraction methods such as CFCCs out-perform MFCC front-ends on noisy audio, and (c) fusion of multiple systems provides 24% relative improvement in EER compared to the single best system when using a novel SVM-based fusion algorithm that uses side information such as gender, language, and channel id.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

-Vector Space

Sandro Cumani; Oldrich Plchot; Pietro Laface

The i-vector extraction process is affected by several factors such as the noise level, the acoustic content of the observed features, the channel mismatch between the training conditions and the test data, and the duration of the analyzed speech segment. These factors influence both the i-vector estimate and its uncertainty, represented by the i-vector posterior covariance. This paper presents a new PLDA model that, unlike the standard one, exploits the intrinsic i-vector uncertainty. Since the recognition accuracy is known to decrease for short speech segments, and their length is one of the main factors affecting the i-vector covariance, we designed a set of experiments aiming at comparing the standard and the new PLDA models on short speech cuts of variable duration, randomly extracted from the conversations included in the NIST SRE 2010 extended dataset, both from interviews and telephone conversations. Our results on NIST SRE 2010 evaluation data show that in different conditions the new model outperforms the standard PLDA by more than 10% relative when tested on short segments with duration mismatches, and is able to keep the accuracy of the standard model for long enough speaker segments. This technique has also been successfully tested in the NIST SRE 2012 evaluation.


international conference on acoustics, speech, and signal processing | 2016

Developing a speaker identification system for the DARPA RATS project

Oldrich Plchot; Lukas Burget; Hagai Aronowitz; Pavel Matejka

In this paper we present a design of a DNN-based autoencoder for speech enhancement and its use for speaker recognition systems for distant microphones and noisy data. We started with augmenting the Fisher database with artificially noised and reverberated data and trained the autoencoder to map noisy and reverberated speech to its clean version. We use the autoencoder as a preprocessing step in the later stage of modelling in state-of-the-art text-dependent and text-independent speaker recognition systems. We report relative improvements up to 50% for the text-dependent system and up to 48% for the text-independent one. With text-independent system, we present a more detailed analysis on various conditions of NIST SRE 2010 and PRISM suggesting that the proposed preprocessig is a promising and efficient way to build a robust speaker recognition system for distant microphone and noisy data.


Odyssey 2016 | 2016

On the use of i-vector posterior distributions in probabilistic linear discriminant analysis

Alicia Lozano-Diez; Anna Silnova; Pavel Matejka; Ondrej Glembek; Oldrich Plchot; Jan Pesán; Lukas Burget; Joaquin Gonzalez-Rodriguez

Recently, Deep Neural Network (DNN) based bottleneck features proved to be very effective in i-vector based speaker recognition. However, the bottleneck feature extraction is usually fully optimized for speech rather than speaker recognition task. In this paper, we explore whether DNNs suboptimal for speech recognition can provide better bottleneck features for speaker recognition. We experiment with different features optimized for speech or speaker recognition as input to the DNN. We also experiment with under-trained DNN, where the training was interrupted before the full convergence of the speech recognition objective. Moreover, we analyze the effect of normalizing the features at the input and/or at the output of bottleneck features extraction to see how it affects the final speaker recognition system performance. We evaluated the systems in the SRE’10, condition 5, female task. Results show that the best configuration of the DNN in terms of phone accuracy does not necessary imply better performance of the final speaker recognition system. Finally, we compare the performance of bottleneck features and the standard MFCC features in i-vector/PLDA speaker recognition system. The best bottleneck features yield up to 37% of relative improvement in terms of EER.

Collaboration


Dive into the Oldrich Plchot's collaboration.

Top Co-Authors

Avatar

Lukas Burget

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Pavel Matejka

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Ondrej Glembek

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jan Cernocký

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mehdi Soufifar

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Frantisek Grezl

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jan Cernocky

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Ondrej Novotny

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge